Stream Learning for Multilingual Knowledge Transfer

Project Information

SELMA

Grant agreement ID: 957017

DOI

10.3030/957017

Project closed

EC signature date 3 July 2020

Start date 1 January 2021

End date 31 March 2024

Funded under

INDUSTRIAL LEADERSHIP - Leadership in enabling and industrial technologies - Information and Communication Technologies (ICT)

Total cost

€ 3 452 506,25

EU contribution

€ 3 452 506,25

3 452 506,25

Coordinated by

DEUTSCHE WELLE
Germany

CORDIS provides links to public deliverables and publications of HORIZON projects.

Links to deliverables and publications from FP7 projects, as well as links to some specific result types such as dataset and software, are dynamically retrieved from OpenAIRE .

Deliverables

Initial progress report on continuous massive stream learning

Report describing the technologies developed so far as well as experimental results showing the quality of the respective methods

Intermediate progress report on speech and natural language processing

Report describing the improvements for the technologies, with experimental results on the respective benchmarks and comparison to third-party commercial ASR solutions

Final Prototype Report

Final report on prototyping, including the SELMA prototypes

Final Impact Report

Report describing the consortium’s dissemination, exploitation and communication activities during the entire project and focussing on exploitation

Final progress report on continuous massive stream learning

Report describing both the research and development of the technologies as well as the interface to other components

Initial Prototype Report

Initial report on the two usecase prototypes UI wireframes requirements and userstories

Quality Assurance and Risk Assessment Plan

Report that incorporates details on the quality assurance processes adopted within SELMA

Intermediate Prototype Report

Report on the two use-case prototypes, UI, wireframes, requirements and user-stories

Final Evaluation Report

Report describing all evaluation procedures of T5.1 and T5.2, demonstrating the progress that the individual components and the platform have made

Use Case Description and Requirements

Complete description of the primary use cases of SELMA and personae user stories and requirements

Final report on speech and natural language processing

Report describing advances in NLP technologies exploiting multilingual and user feedback transfer learning, and distributed learning

Interim Evaluation Report

Report describing the technical and end-user evaluations and further testing plans for the last project year

Intermediate progress report on continuous massive stream learning

Report describing the improvements of the technologies, with experimental results on the respective benchmarks

Initial progress report on speech and natural language processing

Report describing the language processing technologies developed in the WP and experimental results including first experiments on multilingual and user feedback transfer learning and distributed learning

Impact Plan

Report describing the consortiums intentions regarding dissemination exploitation and communication

Final Periodic Progress Report

Report covering all administrative and financial details for the final 18 months

Interim Impact Report

Report describing the consortium’s activities with regard to dissemination, exploitation and communication and outlining the plans for the last project year

Platform architecture and API documentation

Document describing the extended platform architecture for integrating the natural language processing components collection and storage of editorial corrections and integration of the continuous massive stream learning functionality

Evaluation Plan

Report describing the technical as well as the enduser testing scenariosto be carried out

Interim Periodic Progress Report

Report covering all administrative and financial details for the first 18 months

Initial release of post-editing and user feedback capabilities

Software modules documented and published in the project repositories for automatic post-editing and user-feedback

Final release of speech and natural language processing tools

Final release of the software components for integration in the SELMA prototype

Intermediate release of segmentation, summarization and news classification capabilities

Second release of software packages

Final release of continuous massive stream learning tools

Final release of the software components

Initial release of transcription, punctuation, translation, voice synthesis capabilities

Software modules documented and published in the project repositories

Initial release of segmentation, summarization and news classification capabilities

Software modules documented and published in the project repositories

Initial release of stream learning and entity linking capabilities

Software modules documented and published in the project repositories for stream learning and entity linking

Final platform release with full continuous massive stream learning capabilities

Software that includes all capabilities of the SELMA from D2.8 and D3.8 and the complete prototype interface from T1.3 and T1.4 will be published to the project repository and deployed for user evaluation

Demonstrator for use case one

Software that demonstrates the multilingual media monitoring use case published to project repository and deployed for user evaluation

Intermediate platform with continuous massive stream learning NLP capabilities

Software that extends the baseline platform with the functionality from D2.5, D2.6, D3.5 and D3.6 and prototype interfaces that expose the functionality will be published to project repository and deployed for user evaluation

Demonstrator for use case two

Software that demonstrates the multilingual news production use case published to project repository and deployed for user evaluation

Intermediate release of stream learning and entity linking capabilities

Second release of the software packages for stream learning and entity linking

Intermediate release of post-editing and user feedback capabilities

Second release of the software packages for post-editing and user feedback

Initial platform release with the primary NLP pipeline

Software that integrates NLP modules from D2.2, D2.3, D3.2 and D3.3 and prototype interfaces that expose preliminary functionality will be published to the project repository and also deployed for user evaluation

Intermediate release of transcription, punctuation and translation, voice synthesis capabilities

Second release of software packages

Interim Data Management Plan

Update of D6.1

Final Data Management Plan

Final Report on data management, protection and IPR issues

Initial Data Management Plan

Report explaining how data in the platform is managed also addressing the issues of data protection and access rights

Publications

ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks

Author(s): Marcely Zanon Boito, John Ortega, Hugo Riguidel, Antoine Laurent, Loïc Barrault, Fethi Bougares, Firas Chaabani, Ha Nguyen, Florentin Barbier, Souhir Gahbiche, Yannick Estève
Published in: IWSLT 2022, 2022
Publisher: IWSLT 2022

Task Agnostic and Task Specific Self-Supervise Learning from Speech with LeBenchmark

Author(s): Solène Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Estève, Benjamin Lecouteux, François Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
Published in: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021, Page(s) 10ff
Publisher: NeurIPS

The Spoken Language Understanding Media Benchmark Dataset in the Era of Deep Learning: data updates, training and evaluation tools

Author(s): Gaëlle Laperrière, Valentin Pelloin, Antoine Caubrière, Salima Mdhaffar, Sahar Ghannay, Bassam Jabaian, Nathalie Camelin, Yannick Estève
Published in: 13rd Language Resources and Evaluation Conference (LREC), 2022
Publisher: LREC

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Author(s): Solène Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Estève, Benjamin Lecouteux, François Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
Published in: Proc. Interspeech 2021, 1439-1443, doi: 10.21437/Interspeech.2021-556, 2021
Publisher: Interspeech 2021

Modèles neuronaux pré-appris par auto-supervision sur des enregistrements de parole en français

Author(s): Solène Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Estève, Benjamin Lecouteux, François Portet, Solange Rossato, Fabien Ringeval, Didier Schwab and Laurent Besacier
Published in: Journées d'Études sur la Parole - JEP2022, 2022
Publisher: JEP 2022

Speech Resources in the Tamasheq Language

Author(s): Marcely Zanon Boito, Fethi Bougares, Florentin Barbier, Souhir Gahbiche, Loïc Barrault, Mickael Rouvier, Yannick Estève
Published in: 13rd Language Resources and Evaluation Conference (LREC), 2022
Publisher: LREC

Priberam Labs at the 3rd Shared Task on SlavNER

Author(s): Pedro Ferreira, Rúben Cardoso, Afonso Mendes
Published in: Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing, 2021, Page(s) 86-92
Publisher: 8th Workshop on Balto-Slavic Natural Language Processing

LeBenchmark, un référentiel d’évaluation pour le français oral

Author(s): Hang Le, Sina Alisamir, Marco Dinarelli, Fabien Ringeval, Solène Evain, Ha Nguyen, Marcely Zanon Boito, Salima Mdhaffar, Ziyi Tong, Natalia Tomashenko, Titouan Parcollet, Alexandre Allauzen, Yannick Estève, Benjamin Lecouteux, François Portet, Solange Rossato, Didier Schwab and Laurent Besacier
Published in: Journées d'Études sur la Parole - JEP2022, 2022
Publisher: JEP 2022

Impact Analysis of the Use of Speech and Language Models Pretrained by Self-Supersivion for Spoken Language Understanding

Author(s): Salima Mdhaffar, Valentin Pelloin, Antoine Caubrière, Gaëlle Laperrière, Sahar Ghannay, Bassam Jabaian, Nathalie Camelin, Yannick Estève
Published in: 13rd Language Resources and Evaluation Conference (LREC), 2022
Publisher: LREC

Le benchmark MEDIA revisité : données, outils et évaluation dans un contexte d’apprentissage profond

Author(s): Gaëlle Laperrière, Valentin Pelloin, Antoine Caubrière, Salima Mdhaffar, Nathalie Camelin, Sahar Ghannay, Bassam Jabaian, Yannick Estève
Published in: Journées d'Études sur la Parole - JEP2022, 2022
Publisher: JEP

Where are we in semantic concept extraction for Spoken Language Understanding?

Author(s): Sahar Ghannay, Antoine Caubrière, Salima Mdhaffar, Gaëlle Laperrière, Bassam Jabaian, Yannick Estève
Published in: Speech and Computer: 23rd International Conference, SPECOM 2021, St. Petersburg, Russia, September 27–30, 2021, Proceedings, 2021
Publisher: SPECOM 2021

Metamodel Specialisation based Tool Extension

Author(s): Paulis Barzdins, Audris Kalnins, Edgars Celms, Janis Barzdins, Arturs Sprogis, Mikus Grasmanis, Sergejs Rikacovs, Guntis Barzdins
Published in: Baltic Journal of Modern Computing, Issue Vol.10, No. 1, 2022, Page(s) 17-35, ISSN 2255-8950
Publisher: University of Latvia

Searching for OpenAIRE data...

Deliverables

Publications

Download Download the content of the page