Skip to main content
Weiter zur Homepage der Europäischen Kommission (öffnet in neuem Fenster)
Deutsch de
CORDIS - Forschungsergebnisse der EU
CORDIS

Democratize Trustworthy and Efficient Large Language Model Technology for Europe

CORDIS bietet Links zu öffentlichen Ergebnissen und Veröffentlichungen von HORIZONT-Projekten.

Links zu Ergebnissen und Veröffentlichungen von RP7-Projekten sowie Links zu einigen Typen spezifischer Ergebnisse wie Datensätzen und Software werden dynamisch von OpenAIRE abgerufen.

Leistungen

Language-specific adapters for multilingual LLMs (öffnet in neuem Fenster)

Pre-trained adapters for Germanic languages, to be used with existing LLMs and the TrustLLM models.

Data formatting pipeline (öffnet in neuem Fenster)

Pipeline for formatting data as a preparation for LLM training.

Initial training code (öffnet in neuem Fenster)

First version of parallel training code for training LLMs on European HPC systems

Multi-dimensional evaluation metric for text generation (öffnet in neuem Fenster)

An evaluation process usable in an online environment for generated texts testing reliability, accuracy, fluency, etc.

Methods for factual correctness based on retriever modelling (öffnet in neuem Fenster)

Report and software framework for LLM with factual correctness (Version 1) based on retriever-based modelling.

Quality filtering and deduplication pipeline (öffnet in neuem Fenster)

Pipeline for doing quality filtering and deduplication of data as a preparation for LLM training.

Alignment data (öffnet in neuem Fenster)

Multilingual datasets for instruction fine-tuning.

Bias Dataset (öffnet in neuem Fenster)

An evaluation dataset quantifying the models’ potential biases toward minority groups

Benchmarking Platform (öffnet in neuem Fenster)

Open-source software package allowing for automatic benchmarking using the evaluation datasets developed in the project.

Germanic Language Modelling Evaluation Dataset (öffnet in neuem Fenster)

An evaluation dataset quantifying the models’ general Germanic linguistic capabilities.

Communication and dissemination toolkit (öffnet in neuem Fenster)

Printed and digital material for C&D, for example flyers, posters, social media posts and videos

Project Handbook (öffnet in neuem Fenster)

Outline of planned management procedures, tools, project roles and responsibilities.

IPR Management Plan (öffnet in neuem Fenster)

Detailed IPR management plan

Strategic plan for communication and dissemination (öffnet in neuem Fenster)

Initial plan for strategic communication and dissemination, to be yearly updated (internally)

Design Five Use Cases (öffnet in neuem Fenster)

A report detailing the design for each of the use cases

Data Management Plan, V2 (öffnet in neuem Fenster)

Detailed data management plan, including the plans for open source and open access publishing, V2

Data Management Plan (öffnet in neuem Fenster)

Detailed data management plan, including the plans for open source and open access publishing

Veröffentlichungen

How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs? (öffnet in neuem Fenster)

Autoren: Ehsan Doostmohammadi, Oskar Holmström, Marco Kuhlmann
Veröffentlicht in: Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Herausgeber: Association for Computational Linguistics
DOI: 10.18653/V1/2024.FINDINGS-EMNLP.367

Ice and Fire: Dataset on Sentiment, Emotions, Toxicity, Sarcasm, Hate speech, Sympathy and More in Icelandic Blog Comments

Autoren: Steinunn Rut Friðriksdóttir, Annika Simonsen, Atli Snær Ásmundsson, Guðrún Lilja Friðjónsdóttir, Anton Karl Ingason, Vésteinn Snæbjarnarson, Hafsteinn Einarsson
Herausgeber: ELRA and ICCL

A Human Perspective on GPT-4 Translations: Analysing Faroese to English News and Blog Text Translations

Autoren: Annika Simonsen, Hafsteinn Einarsson
Herausgeber: European Association for Machine Translation (EAMT)

FoQA: A Faroese Question-Answering Dataset

Autoren: Annika Simonsen, Dan Saattrup Nielsen, Hafsteinn Einarsson
Veröffentlicht in: 2025
Herausgeber: Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

Tokenizer Choice For LLM Training: Negligible or Crucial? (öffnet in neuem Fenster)

Autoren: Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Buschhoff, Charvi Jain, Alexander Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte
Veröffentlicht in: Findings of the Association for Computational Linguistics: NAACL 2024, 2024
Herausgeber: Association for Computational Linguistics
DOI: 10.18653/V1/2024.FINDINGS-NAACL.247

Rethinking Low-Resource MT: The Surprising Effectiveness of Fine-Tuned Multilingual Models in the LLM Age

Autoren: Barbara Scalvini, Iben Nyholm Debess, Annika Simonsen, Hafsteinn Einarsson
Herausgeber: University of Tartu Library

Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions? (öffnet in neuem Fenster)

Autoren: Alexander Arno Weber, Klaudia Thellmann, Jan Ebert, Nicolas Flores-Herr, Jens Lehmann, Michael Fromm, Mehdi Ali
Veröffentlicht in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Herausgeber: Association for Computational Linguistics
DOI: 10.18653/V1/2024.EMNLP-MAIN.1159

From text to knowledge graph: Comparing relation extraction methods in a practical context

Autoren: Bakker, R. M., & Di Scala, D. L.
Veröffentlicht in: First International Workshop on Generative Neuro-Symbolic AI, co-located with ESWC, Ausgabe Vol. 4, p. 7, 2024
Herausgeber: CEUR-WS

Memory and Bandwidth are All You Need for Fully Sharded Data Parallel (öffnet in neuem Fenster)

Autoren: Jiangtao Wang, Jan Ebert, Oleg Filatov, Stefan Kesselheim
Veröffentlicht in: ICML'24 Workshop on Advancing Neural Network Training (WANT), 2025
Herausgeber: arXiV
DOI: 10.48550/ARXIV.2504.03655

WikiQA-IS: Assisted Benchmark Generation and Automated Evaluation of Icelandic Cultural Knowledge in LLMs

Autoren: Þórunn Arnardóttir, Elías Bjartur Einarsson, Garðar Ingvarsson Juto, Þorvaldur Páll Helgason, Hafsteinn Einarsson
Veröffentlicht in: Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Herausgeber: Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

Prompt Engineering Enhances Faroese MT, but Only Humans Can Tell

Autoren: Barbara Scalvini, Annika Simonsen, Iben Nyholm Debess, Hafsteinn Einarsson
Herausgeber: University of Tartu Library

How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters

Autoren: Romina Oji, Jenny Kunz
Veröffentlicht in: Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), 2025
Herausgeber: University of Tartu Library

Do Multilingual Large Language Models Mitigate Stereotype Bias? (öffnet in neuem Fenster)

Autoren: Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Görge, Akbar Karimi, Joan Plepi, Nazia Mowmita, Nicolas Flores-Herr, Mehdi Ali, Lucie Flek
Veröffentlicht in: 2024
Herausgeber: Association for Computational Linguistics
DOI: 10.18653/V1/2024.C3NLP-1.6

Train More Parameters But Mind Their Placement: Insights into Language Adaptation with PEFT

Autoren: Jenny Kunz
Veröffentlicht in: Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), 2025
Herausgeber: University of Tartu Library

Hotter and Colder: A New Approach to Annotating Sentiment, Emotions, and Bias in Icelandic Blog Comments

Autoren: Steinunn Rut Friðriksdóttir, Dan Saattrup Nielsen, Hafsteinn Einarsson
Herausgeber: University of Tartu Library

The Impact of Language Adapters in Cross-Lingual Transfer for NLU

Autoren: Jenny Kunz, Oskar Holmström
Veröffentlicht in: Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024), 2024
Herausgeber: Association for Computational Linguistics

Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks

Autoren: Dan Saattrup Nielsen, Kenneth Enevoldsen, Peter Schneider-Kamp
Veröffentlicht in: 2025
Herausgeber: Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Ontology Learning from Text: an Analysis on LLM Performance

Autoren: Bakker, R. M., Di Scala, D. L., & de Boer, M. H. T.
Veröffentlicht in: Proceedings of the 3rd NLP4KGC International Workshop on Natural Language Processing for Knowledge Graph Creation, colocated with Semantics, Ausgabe pp. 17-19, 2024
Herausgeber: CEUR-WS

Suche nach OpenAIRE-Daten ...

Bei der Suche nach OpenAIRE-Daten ist ein Fehler aufgetreten

Es liegen keine Ergebnisse vor

Mein Booklet 0 0