Objective
"Large Language Models (LLMs) are increasingly leveraged as evaluators of machine-generated text, a paradigm known as ""LLMs-as-judges."" While this approach offers flexibility and typically strong performance, its reliability remains inconsistent and poorly understood. Strong generative performance does not guarantee reliable evaluation, and the mechanisms linking these two capabilities remain opaque. Without systematic validation, current evaluation practices risk being blind to misleading and factually incorrect content and misrepresenting system capabilities.
GenEval addresses this challenge by investigating the fundamental relationship between generation and evaluation in LLMs, developing novel representation-based metrics, and predicting LLMs' evaluation reliability across tasks and models. We will analyze LLMs at a mechanistic level, identifying circuits and representations that underlie generative and evaluative behaviors. This understanding will enable the design of new evaluation metrics that directly exploit LLMs' internal representations, providing interpretable, efficient, and robust alternatives to existing approaches. Finally, GenEval will develop predictive tools to estimate when an LLM is likely to be a reliable evaluator even in the absence of human judgment data, supporting informed model selection and human-in-the-loop evaluation.
By integrating mechanistic insights with practical evaluation methods, GenEval will deliver both theoretical advances and applied tools.
This action will be possible thanks to the integration of the scientific expertise of Prof. Horacio Saggion, an internationally recognized expert in Natural Language Generation, and that of the researcher, who has a strong background in evaluation, Natural Language Processing, and Machine Learning. The action will develop impacting technology, and provide the researcher with the necessary training to become independent and strengthen her academic profile."
Fields of science (EuroSciVoc)
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: The European Science Vocabulary.
- natural sciences computer and information sciences data science natural language processing
- natural sciences computer and information sciences artificial intelligence machine learning
You need to log in or register to use this function
Keywords
Project’s keywords as indicated by the project coordinator. Not to be confused with the EuroSciVoc taxonomy (Fields of science)
Project’s keywords as indicated by the project coordinator. Not to be confused with the EuroSciVoc taxonomy (Fields of science)
Programme(s)
Multi-annual funding programmes that define the EU’s priorities for research and innovation.
Multi-annual funding programmes that define the EU’s priorities for research and innovation.
-
HORIZON.1.2 - Marie Skłodowska-Curie Actions (MSCA)
MAIN PROGRAMME
See all projects funded under this programme
Topic(s)
Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.
Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.
Funding Scheme
Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.
Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.
HORIZON-TMA-MSCA-PF-EF - HORIZON TMA MSCA Postdoctoral Fellowships - European Fellowships
See all projects funded under this funding scheme
Call for proposal
Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.
Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.
(opens in new window) HORIZON-MSCA-2025-PF
See all projects funded under this callCoordinator
Net EU financial contribution. The sum of money that the participant receives, deducted by the EU contribution to its linked third party. It considers the distribution of the EU financial contribution between direct beneficiaries of the project and other types of participants, like third-party participants.
08002 Barcelona
Spain
The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.