Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Stream Learning for Multilingual Knowledge Transfer

Periodic Reporting for period 1 - SELMA (Stream Learning for Multilingual Knowledge Transfer)

Reporting period: 2021-01-01 to 2022-06-30

Large amounts of multilingual information in the form of data is all around us and growing strongly. Still, the potential to fully take advantage of these digital content streams based on machine learning has remained widely untapped. SELMA tackles these potentials from two sides: by advancing language technologies from a research perspective and by integrating concrete technological improvements into a platform which to a large extent will be available open-source for the public and the (media) industry. The overall aim is to build a deep learning platform using extreme analytics, transfer learning and advanced natural language processing technologies.
In the first period of the project, SELMA has already made good progress: SELMA OSS - the open-source platform - was developed and is capable of processing larger volumes of data (stress and scalability tests will be done in the second period). Concerning the multilingual models, significant methodical improvements in downstream tasks, e.g. entity recognition and linking, topic labelling, clustering and summarization have been achieved and were integrated into the two main use cases (Media Monitoring, Media Production). For the Knowledge and Language Transfer Objective very good progress has been made in the context of self-supervised learning. In terms of collecting user feedback to improve the language models an end-to-end model for NER has been added to the framework.
With the development of the publicly available SELMA OSS platform many “beyond state of the art” NLP achievements are already accessible for the public under: https://selma-project.github.io/. Also, many SELMA developments and outcomes were integrated into the two use case platforms for news media monitoring and media production purposes. The new models are expected to significantly improve the use of large data through advanced analytics and NLP technology.
Screenshot of SELMA OSS - available under: https://selma-project.github.io/