Skip to main content
Ir a la página de inicio de la Comisión Europea (se abrirá en una nueva ventana)
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Democratize Trustworthy and Efficient Large Language Model Technology for Europe

CORDIS proporciona enlaces a los documentos públicos y las publicaciones de los proyectos de los programas marco HORIZONTE.

Los enlaces a los documentos y las publicaciones de los proyectos del Séptimo Programa Marco, así como los enlaces a algunos tipos de resultados específicos, como conjuntos de datos y «software», se obtienen dinámicamente de OpenAIRE .

Resultado final

Language-specific adapters for multilingual LLMs (se abrirá en una nueva ventana)

Pre-trained adapters for Germanic languages, to be used with existing LLMs and the TrustLLM models.

Data formatting pipeline (se abrirá en una nueva ventana)

Pipeline for formatting data as a preparation for LLM training.

Initial training code (se abrirá en una nueva ventana)

First version of parallel training code for training LLMs on European HPC systems

Multi-dimensional evaluation metric for text generation (se abrirá en una nueva ventana)

An evaluation process usable in an online environment for generated texts testing reliability, accuracy, fluency, etc.

Methods for factual correctness based on retriever modelling (se abrirá en una nueva ventana)

Report and software framework for LLM with factual correctness (Version 1) based on retriever-based modelling.

Quality filtering and deduplication pipeline (se abrirá en una nueva ventana)

Pipeline for doing quality filtering and deduplication of data as a preparation for LLM training.

Alignment data (se abrirá en una nueva ventana)

Multilingual datasets for instruction fine-tuning.

Bias Dataset (se abrirá en una nueva ventana)

An evaluation dataset quantifying the models’ potential biases toward minority groups

Benchmarking Platform (se abrirá en una nueva ventana)

Open-source software package allowing for automatic benchmarking using the evaluation datasets developed in the project.

Germanic Language Modelling Evaluation Dataset (se abrirá en una nueva ventana)

An evaluation dataset quantifying the models’ general Germanic linguistic capabilities.

Communication and dissemination toolkit (se abrirá en una nueva ventana)

Printed and digital material for C&D, for example flyers, posters, social media posts and videos

Project Handbook (se abrirá en una nueva ventana)

Outline of planned management procedures, tools, project roles and responsibilities.

IPR Management Plan (se abrirá en una nueva ventana)

Detailed IPR management plan

Strategic plan for communication and dissemination (se abrirá en una nueva ventana)

Initial plan for strategic communication and dissemination, to be yearly updated (internally)

Design Five Use Cases (se abrirá en una nueva ventana)

A report detailing the design for each of the use cases

Data Management Plan, V2 (se abrirá en una nueva ventana)

Detailed data management plan, including the plans for open source and open access publishing, V2

Data Management Plan (se abrirá en una nueva ventana)

Detailed data management plan, including the plans for open source and open access publishing

Publicaciones

How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs? (se abrirá en una nueva ventana)

Autores: Ehsan Doostmohammadi, Oskar Holmström, Marco Kuhlmann
Publicado en: Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Editor: Association for Computational Linguistics
DOI: 10.18653/V1/2024.FINDINGS-EMNLP.367

FoQA: A Faroese Question-Answering Dataset

Autores: Annika Simonsen, Dan Saattrup Nielsen, Hafsteinn Einarsson
Publicado en: 2025
Editor: Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

Tokenizer Choice For LLM Training: Negligible or Crucial? (se abrirá en una nueva ventana)

Autores: Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Buschhoff, Charvi Jain, Alexander Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte
Publicado en: Findings of the Association for Computational Linguistics: NAACL 2024, 2024
Editor: Association for Computational Linguistics
DOI: 10.18653/V1/2024.FINDINGS-NAACL.247

Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions? (se abrirá en una nueva ventana)

Autores: Alexander Arno Weber, Klaudia Thellmann, Jan Ebert, Nicolas Flores-Herr, Jens Lehmann, Michael Fromm, Mehdi Ali
Publicado en: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Editor: Association for Computational Linguistics
DOI: 10.18653/V1/2024.EMNLP-MAIN.1159

From text to knowledge graph: Comparing relation extraction methods in a practical context

Autores: Bakker, R. M., & Di Scala, D. L.
Publicado en: First International Workshop on Generative Neuro-Symbolic AI, co-located with ESWC, Edición Vol. 4, p. 7, 2024
Editor: CEUR-WS

Memory and Bandwidth are All You Need for Fully Sharded Data Parallel (se abrirá en una nueva ventana)

Autores: Jiangtao Wang, Jan Ebert, Oleg Filatov, Stefan Kesselheim
Publicado en: ICML'24 Workshop on Advancing Neural Network Training (WANT), 2025
Editor: arXiV
DOI: 10.48550/ARXIV.2504.03655

How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters

Autores: Romina Oji, Jenny Kunz
Publicado en: Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), 2025
Editor: University of Tartu Library

Do Multilingual Large Language Models Mitigate Stereotype Bias? (se abrirá en una nueva ventana)

Autores: Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Görge, Akbar Karimi, Joan Plepi, Nazia Mowmita, Nicolas Flores-Herr, Mehdi Ali, Lucie Flek
Publicado en: 2024
Editor: Association for Computational Linguistics
DOI: 10.18653/V1/2024.C3NLP-1.6

Train More Parameters But Mind Their Placement: Insights into Language Adaptation with PEFT

Autores: Jenny Kunz
Publicado en: Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), 2025
Editor: University of Tartu Library

The Impact of Language Adapters in Cross-Lingual Transfer for NLU

Autores: Jenny Kunz, Oskar Holmström
Publicado en: Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024), 2024
Editor: Association for Computational Linguistics

Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks

Autores: Dan Saattrup Nielsen, Kenneth Enevoldsen, Peter Schneider-Kamp
Publicado en: 2025
Editor: Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Ontology Learning from Text: an Analysis on LLM Performance

Autores: Bakker, R. M., Di Scala, D. L., & de Boer, M. H. T.
Publicado en: Proceedings of the 3rd NLP4KGC International Workshop on Natural Language Processing for Knowledge Graph Creation, colocated with Semantics, Edición pp. 17-19, 2024
Editor: CEUR-WS

Buscando datos de OpenAIRE...

Se ha producido un error en la búsqueda de datos de OpenAIRE

No hay resultados disponibles

Mi folleto 0 0