Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Democratize Trustworthy and Efficient Large Language Model Technology for Europe

Periodic Reporting for period 1 - TrustLLM (Democratize Trustworthy and Efficient Large Language Model Technology for Europe)

Reporting period: 2023-11-01 to 2025-04-30

The TrustLLM project, funded by the European Union, will develop trustworthy European large language models (LLMs) that will cover a range of underrepresented languages. The main objective for TrustLLM is the development of an open, trustworthy and factual LLM, initially targeting the Germanic languages. This will create the foundation for an advanced open ecosystem for next generation modular and extensible European trustworthy, sustainable, and democratised LLMs. The focus on Germanic languages can serve as a blueprint for future activities in other families of languages. The TrustLLM project and the surrounding ecosystem will enable, support, and improve context-aware human-machine interaction in a wide range of applications.

The consortium has unique expertise and practical experience in building LLMs, combined with leading NLP researchers, as well as organizations, working on transferring the technology to companies and end-users.
Data Infrastructure & Processing
TrustLLM built a robust infrastructure for Germanic language models, creating multilingual datasets (Danish, Dutch, Faroese, German, Icelandic, Norwegian, Swedish) with strong privacy safeguards. The custom “TrustLLM Trove” framework enhanced open-source tools with quality filters and language ID for Germanic languages. Efficient text filtering and a TDM-compliant HTML pipeline ensured legal, high-quality data extraction.

Model Development & Training
A 7.8B parameter multilingual model was trained on 2.3T tokens across 17 Germanic variants, outperforming models like Llama-2 7B. LoRA-based adapters enabled efficient tuning for under-resourced languages like Icelandic and Faroese.

Evaluation & Applications
The EuroEval platform standardized evaluation across 7 tasks and 8 languages. Bias datasets for Danish and Dutch were introduced. Real-world tools include BijsluiterBot (medical info), BookBot (youth reading), Svarkur (Icelandic Q&A), and apps for automotive and accessibility.

Technical Innovation & Trust
Innovations include RAG-Ex for explainability, retrieval-enhanced transformer insights, tool-augmented reasoning, and dynamic tokenization for morphologically rich languages. These advances support transparent, reliable AI aligned with European values.
Results beyond he state of the art are packages in form of publications in peer reviewed conference papers and journal articles. There are no more results beyond the state of the art at the moments which are produced by the project.
My booklet 0 0