The ESPERANTO project is targeting 5 main goals to develop speech processing applications and support the community.
- In WP2, the project tackles the specific aspects of limited resources that can affect the development of reliable systems.
7 corpora have been collected and are publicly released or in the process of being:
* speech corpus for individuals with vocal disabilities
* corpus dedicated to Human Assisted Lifelong Learning Speaker Diarization
* corpus featuring Malay dialect speech, accompanied by associated transcriptions
* collection of 13 hours of conversational speech in Sarawak Malay
* multi-dialectal Arabic Speech Corpus
* database specifically curated for nonnative English speech, with the primary aim of developing pronunciation scoring systems
Additionally, ESPERANTO consortium has been activelly developping and publishing new approaches to deal with under resource tasks or languages for applications as diverse as speech-to-speech translation, speaker diarization, speaker recognition, voice pathologies monitoring, Automatic Voice Disorder Detection, pronunciation scoring spoken language understanding, emotion recognition and multi-channel distant speech processing.
- in WP3, ESPERANTO aims at developing automatic systems integrating human assisted learning.
A human interface for speaker diarization correction interface has been developed and publicly released.
A special day on data collection and annotation has been organised by LMU, involving LNE, UNIZAR, USFD, BUT, OMILIA, JHU, USM, UNIMAS, Elyadata, Phonexia, CONICET, CENATAV, UY1.
- in WP4, Esperanto partners have been dealing with explainability and interpretability for speech applications.
The consortium has mostly considered 4 applications: speaker verification, diarization, emotion recognition and speech-to-speech translation.
A first approach focused on extracting Speaker and Emotion Information from Self-Supervised Speech Models via Channel-Wise Correlations.
For speaker diarization, considering that the prediction of speaker segments is not enough and that it is necessary to include additional paralinguistic information several partners aimed at converting the existing automatic outputs into interpretable clues which explains the automatic diarization.
Several articles related to this work have been submitted for publications in conferences which date is behind the date of this report.
- in WP5, the on-going work will develop metrics, protocols and scenarios to evaluate different aspects of
intelligent systems and more specifically to develop and evaluate protocols to evaluate systems involving a human in the loop, to evaluate the ability of systems to deal with limited resources when transferring knowledge from a well studied language to one with limited resources and to evaluate the level of explainability of systems.
To enable fair and reproducible benchmarking of human-in-the-loop speaker diarization, a simulation of a human expert has been implemented. This work has been published and publicly released under open-source licence.
Four challenges have been organized. Databases, scripts and evaluation protocols have been released at this occasion.
- WP6 aims at producing training material to foster a new generation of speech scientists and engineers as well as supporting and coordinating the production of teaching material, tutorials and documentations for the software frameworks.
4 Workshops and 4 two-week summer schools that train young researchers in speech processing (from master degree to post-doc researchers) have been organized, gathering more than 80 and 110 researchers in speech processing from academics and industry.
The workshops have been great opportunities for ESPERANTO partners to collaborate with each othrer but also with institutions outside the consortium and establish new promising collaborations.
Software frameworks, tools, and corpuses have been supported or created (SpeechBrain, Kaldi, Hyperion, ATCO2, Lhotse, DiaPer, GTensorFSTs, CalibrationTutorial...)
More than 30 videos of lectures and presentations are available on-line.
Dissemination and exploitation actions have been widely taken by the partners, leading to participations in European Researcher's nights, presentations to young audience (secondary and high school), many press release in main stream or specialized media.