Democratize Trustworthy and Efficient Large Language Model Technology for Europe

Project Information

TrustLLM

Grant agreement ID: 101135671

DOI

10.3030/101135671

EC signature date 20 October 2023

Start date 1 November 2023

End date 31 October 2026

Funded under

Digital, Industry and Space

Total cost

€ 6 929 702,50

EU contribution

€ 6 929 701,00

6 929 701,00

1,50

Coordinated by

LINKOPINGS UNIVERSITET
Sweden

Periodic Reporting for period 1 - TrustLLM (Democratize Trustworthy and Efficient Large Language Model Technology for Europe)

Reporting period: 2023-11-01 to 2025-04-30

The TrustLLM project, funded by the European Union, will develop trustworthy European large language models (LLMs) that will cover a range of underrepresented languages. The main objective for TrustLLM is the development of an open, trustworthy and factual LLM, initially targeting the Germanic languages. This will create the foundation for an advanced open ecosystem for next generation modular and extensible European trustworthy, sustainable, and democratised LLMs. The focus on Germanic languages can serve as a blueprint for future activities in other families of languages. The TrustLLM project and the surrounding ecosystem will enable, support, and improve context-aware human-machine interaction in a wide range of applications.

The consortium has unique expertise and practical experience in building LLMs, combined with leading NLP researchers, as well as organizations, working on transferring the technology to companies and end-users.

Data Infrastructure & Processing
TrustLLM built a robust infrastructure for Germanic language models, creating multilingual datasets (Danish, Dutch, Faroese, German, Icelandic, Norwegian, Swedish) with strong privacy safeguards. The custom “TrustLLM Trove” framework enhanced open-source tools with quality filters and language ID for Germanic languages. Efficient text filtering and a TDM-compliant HTML pipeline ensured legal, high-quality data extraction.

Model Development & Training
A 7.8B parameter multilingual model was trained on 2.3T tokens across 17 Germanic variants, outperforming models like Llama-2 7B. LoRA-based adapters enabled efficient tuning for under-resourced languages like Icelandic and Faroese.

Evaluation & Applications
The EuroEval platform standardized evaluation across 7 tasks and 8 languages. Bias datasets for Danish and Dutch were introduced. Real-world tools include BijsluiterBot (medical info), BookBot (youth reading), Svarkur (Icelandic Q&A), and apps for automotive and accessibility.

Technical Innovation & Trust
Innovations include RAG-Ex for explainability, retrieval-enhanced transformer insights, tool-augmented reasoning, and dynamic tokenization for morphologically rich languages. These advances support transparent, reliable AI aligned with European values.

Periodic Reporting for period 1 - TrustLLM (Democratize Trustworthy and Efficient Large Language Model Technology for Europe)

Download Download the content of the page