Periodic Reporting for period 3 - AVeriTeC (Automated Verification of Textual Claims)
Periodo di rendicontazione: 2024-01-01 al 2025-06-30
Research in automated verification of textual claims is at an early stage. The methods developed either assess the truthfulness of the claim without considering evidence, or handle very simple claims such as “UK has 3.2 million EU immigrants” that requires the retrieval of a single factoid from a knowledge base. While useful, claims are often more complex, and taking evidence into account is necessary for the verdicts to be credible.
AVeriTeC will transform automated verification by enabling the verification of more complex claims than previously attempted, such as “the United Kingdom has ten times Italy’s number of immigrants”, which require multiple pieces of evidence. We will achieve this by developing methods able to generate multiple questions per claim, retrieve answers from both knowledge bases and textual sources, and combine them into verdicts. As these tasks are interdependent, we will develop novel machine learning approaches able to handle them jointly so that the verdicts are accompanied by suitable justifications in the form of questions and answers. The latter will be formulated in natural language, thus the process followed by the models developed will be explainable to the users, while the evidence itself can be useful even if the overall verdict is incorrect.
Beyond developing novel methods and creating publicly available evaluation resources, AVeriTeC will establish verification of textual claims as a real-world challenge to stimulate progress in natural language processing, machine learning and related fields.
- We have published the first large-scale dataset for automated verification of claims involving both tabular and textual information, which we named FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information. This dataset consists of 87,026 claims validated on evidence from Wikipedia, and we developed a baseline approach for it. Using this dataset, we organized a shared task in which participants improved upon the baseline we had presented, and the dataset has spurred further research on the topic.
- We developed a novel approach for automated verification of textual claims against evidence that constructs proofs in order to predict whether the claim is supported or refuted. Our approach (we refer to it as ProofVeR) uses natural logic, a paradigm that combines ideas from computer science (finite state automaton) with linguist operators that enables the comparison between two pieces of text. ProofVeR, apart from achieving state-of-the-art results in term of accuracy, it exhibits superior robustness in the face of spurious evidence, as well as improved explainability of its predictions, as evaluated by human subjects. We extended this idea further in the context of evidence retrieval, where we adapted the approach of ProofVeR to assess whether the retrieved evidence is considered sufficient.
- We published an up-to-date survey of the field of automated verification that covers the most recent developments at the time of writing, and it has gathered 134 citations in 18 months
- We developed a novel method that improves the robustness of models judging whether two pieces of text are entailed with each other or not. This new method, referred to as minimax training, relies on identifying automatically the more important examples to train on and focuses the models on them.
- The ProofVeR method achieved results beyond the state of the art in terms of robustness and explainability in fact-checking benchmarks
- The minimax training method improved upon the state-of-the-art of methods for robustness for textual entailment, and we also demonstrated its applicability to other tasks.
In the next phase of the project we expect:
- A new dataset which will consist of real-world claims verified against the web. We are planning a shared task around it in order to promote it in the community.
- Novel methods of proof-based verification that require less training data and are applicable to tabular data
- Novel methods for handling temporal claims
- Novel methods for evidence-finding in the world wide web as opposed to Wikipedia which is the current source being used.