Robust End-To-End SPEAKER recognition based on deep learning and attention models

Description du projet

Une technologie optimisée de reconnaissance automatique du locuteur

La reconnaissance vocale est au cœur d’un large éventail d’applications. Le développement croissant des techniques d’exploitation et d’analyse des données offre des solutions pour une amélioration permanente dans le secteur du traitement de la parole. Le projet ETE SPEAKER, financé par l’UE, entend développer un outil innovant basé sur la reconnaissance automatique du locuteur (SID) qui isole les informations nécessaires pour déterminer l’identité du locuteur sur un enregistrement vocal. ETE SPEAKER s’attachera à étudier et à utiliser pleinement le potentiel des réseaux neuronaux profonds pour dissocier les informations spécifiques au locuteur du reste de la variabilité parasite. Son objectif principal est l’introduction d’un SID de bout en bout conforme aux dernières normes d’évaluation de la reconnaissance du locuteur.

Objectif

This project focuses on automatic speaker recognition (SID), the task of determining the identity of the speaker in a speech recording. Disentangling the speaker specific information from the rest of nuisance variability requires complex models. Deep neural networks (DNNs) have recently showed their potential for this, as the popular x-vector learnt by a DNN.
Here, we aim for end-to-end SID where the system is optimized as a whole for the target task. Despite several attempts in this line of research, many aspects still remain unexplored or not explored thoroughly.
We also propose to explore recurrent approaches, suitable for dealing with temporal signals, as well as different pooling methods to obtain a fixed-length representation from a variable length input sequence of speech features.
Next, we want to explore different flavors of attention mechanisms, which make the DNN to focus on relevant parts of the input, providing a way to quantify how much evidence has been collected about the speaker identity and the uncertainty of the obtained representation, which is a critical issue when making (Bayesian) decisions in SID.
Finally, some other approaches such as using the raw signal (instead of features) or other advances that might arise will be also explored for SID and related tasks.
To achieve our goals, we will start from theory, implement the proposed approaches and test on public SID benchmarks such as NIST SREs. The outcomes are intended to benefit both scientific community and speech processing industry.
The applicant Dr. Alicia Lozano-Diez is an excellent female researcher, who has done her Ph.D. at Audias (Universidad Autonoma de Madrid, Spain), a respected research lab. The host group Speech@FIT from Brno University of Technology (Czechia) has a top-class track on speech processing research. Thus, we expect the combination of both the researcher and the host to boost the researcher career and benefit the host group (and its industrial European partners).

Champ scientifique

Programme(s)

Thème(s)

MSCA-IF-2018 - Individual Fellowships

Appel à propositions

H2020-MSCA-IF-2018

Voir d’autres projets de cet appel

Régime de financement

MSCA-IF-EF-ST - Standard EF

Coordinateur

VYSOKE UCENI TECHNICKE V BRNE

Contribution nette de l'UE

€ 120 817,20

Adresse

ANTONINSKA 548/1
601 90 Brno Stred
Tchéquie

Région

Česko Jihovýchod Jihomoravský kraj

Type d’activité

Higher or Secondary Education Establishments

Liens

Contacter l’organisation Site web

Participation aux programmes de R&I de l'UE

Réseau de collaboration HORIZON

Coût total