Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

My Personal AI Mediator for Virtual MEETtings BetWEEN People

Periodic Reporting for period 1 - Meetween (My Personal AI Mediator for Virtual MEETtings BetWEEN People)

Reporting period: 2024-01-01 to 2025-06-30

The Meetween project was launched in response to a fundamental need: to make digital communication, and in particular virtual meetings, more inclusive, effective, and trustworthy. In recent years, online interaction has become an indispensable part of work, education, research, and public life. Yet, despite the ubiquity of videoconferencing tools, major barriers remain: linguistic diversity is often poorly supported, accessibility features are fragmented, and participants face persistent concerns around privacy, data protection, and the lack of transparency of the AI systems embedded in these platforms.

Against this backdrop, Meetween seeks to create an AI-powered mediation system that goes beyond conventional videoconferencing support. Its aim is to enable meetings where technology does not just passively transmit speech but actively mediates interaction in a way that is multilingual, multimodal, and ethically robust. The project combines the development of large multilingual models, the design of downstream applications such as real-time translation and summarisation, and the creation of intelligent assistants – Agentar and Butler – that can actively support meeting participants. At the same time, the project has placed equal emphasis on governance, transparency, and ethical safeguards, ensuring that innovation is developed in line with the principles of trustworthiness and accountability.

The overall objective is not merely technical excellence but impact: to provide Europe with a new generation of communication technologies that reflect its values of inclusivity, fairness, and respect for privacy. By demonstrating how AI can serve human needs while complying with evolving EU frameworks such as the AI Act and the Digital Decade strategy, Meetween aims to become a flagship example of how research can be translated into trustworthy innovation for society.
In its first reporting period, Meetween has made significant progress across its scientific and technical work. On the modelling side, the consortium released SpeechLMM and Mumospee as the project’s two cornerstone models. These are supported by the creation of the Mumospee V1 dataset, consisting of more than half a million hours of multilingual speech audio, and by transparent training pipelines that make the development process reproducible. The models have already been fine-tuned on eight downstream tasks including spoken question answering, lip reading, multilingual summarisation, and translation, with results showing clear improvements over baseline systems.

Alongside model development, the project has invested heavily in benchmarking and evaluation. The SPEECHM platform was created to provide a centralised, transparent infrastructure for evaluating performance across multiple tasks and languages, while the MCIF benchmark was introduced as a resource for instruction-following evaluation in multilingual and multimodal contexts. These resources not only strengthen the project’s own results but also offer tools for the wider research community.

A second major area of achievement lies in human factors and ethics. Model cards were produced for all released models, documenting strengths, limitations, and risks. Human factor scenarios were developed to test how users experience issues such as bias, hallucination, or overconfidence in model output. Novel privacy frameworks such as confidentiality circles were designed, and technical advances were made in expressive speech technologies, lip-synchronised avatars, and multi-speaker recognition. These efforts ensure that scientific progress is not detached from social responsibility but aligned with Europe’s ethical and legal frameworks.

Finally, progress has been made on the application layer, where Agentar and Butler are being developed as meeting-native AI assistants. Structured interviews with potential users have provided detailed requirements, and early integration work has begun with open-source videoconferencing platforms. These prototypes, once fully developed, will serve as demonstrators of how the project’s research can be translated into tangible tools that enhance real-world virtual meetings.
Meetween’s results to date already extend significantly beyond what was previously available in the field of AI for human communication. The project has produced SpeechLMM, a multilingual and multimodal model capable of handling tasks such as speech recognition, translation, summarisation, and question answering across multiple languages and modalities. Alongside this, Mumospee has been developed to address a dimension that is often overlooked: the ability to capture and reproduce emotional tone in speech. Together, these models represent a step change from conventional large language models or single-modality speech models, offering capabilities that are richer, more adaptable, and better suited to real-world meeting environments.

Importantly, these technical advances are accompanied by innovations in how models are trained and deployed. The use of adapter modules and parameter-efficient fine-tuning techniques such as LoRA allows for faster, less resource-intensive adaptation of models to new tasks and languages, which opens the door to scalability in contexts where computational power and data are limited. The project has also placed a strong emphasis on open science: releasing training data, benchmarks, and evaluation frameworks through platforms like Hugging Face, so that results are not only reproducible but also available for wider uptake by research and industry.

Beyond the models themselves, Meetween has broken new ground in the design of privacy-aware and human-centred AI systems. The concept of confidentiality circles, for instance, introduces a new way to protect sensitive organisational information during meetings, while advances in expressive text-to-speech, lip synchronisation, and avatar-based representation push the frontier of how AI agents can support human interaction authentically. Through these achievements, the project is not just adding incremental improvements to the state of the art but is setting new standards for how trustworthy multimodal AI should be designed and deployed in Europe and beyond.
My booklet 0 0