vera.ai: VERification Assisted by Artificial Intelligence

Periodic Reporting for period 1 - vera.ai (vera.ai: VERification Assisted by Artificial Intelligence)

Reporting period: 2022-09-15 to 2023-09-14

vera.ai focuses on disinformation analysis and AI-supported verification tools and services. It seeks to develop and build trustworthy Artificial Intelligence (AI) solutions in the fight against disinformation, co-created in collaboration with technology experts and end users – brought together in the vera.ai consortium. vera.ai delivers solutions that can be used by media professionals and sets the foundations for future research in the area of AI against disinformation. The expected solutions are open, accessible to and usable by trusted partners.
vera.ai's objectives fall under the following specific objectives:
SO1: AI methods for content analysis, enhancement, and evidence retrieval
SO2: AI tools for the detection of synthetic media (including deepfakes) and manipulated content
SO3: Discovery, tracking, and impact measurement of disinformation narratives and campaigns across social platforms, modalities, and languages
SO4: Intelligent verification and debunk authoring assistant, based on chatbot NLP technology
SO5: Fact-checker-in-the-loop approach to gather new feedback as a side effect of verification
SO6: Adoption and sustainability of the new AI tools in real-world applications through integration in leading verification tools and platforms with established communities.

During its first year, vera.ai achieved significant progress towards its Specific Objectives:

D2.1 reports on the methodology, co-creation activities and the resulting use cases and requirements and serves as a reference for the research and development of WP3, WP4 and WP5 methods that address user requirements.
Details on the conducted research are presented in the Scientific Advances in AI Methods for Detecting and Mitigating Disinformation report.

Towards SO1, partners achieved progress in developing methods for extracting credibility indicators and trustworthy evidence, audio-visual content analysis and enhancement, extraction of verification clues from visual content, and cross-modal detection of decontextualised content. Additionally, they built on the existing Near Duplicate Detection service for image and video retrieval, cross-lingual and multimodal search to support the retrieval of already debunked narratives, videos, or images from the project’s Database of Known Fakes (DBKF) to present as authoritative evidence to users.

Towards SO2, the main limitations of prior image and video deepfake detection methods have been investigated, focusing on the poor generalisation of deepfake detection models to unseen and novel synthesis methods. There has also been progress on the challenge of detecting AI-generated false news and narratives through a study of the capabilities of SotA LMs to generate misleading content and trustworthy-looking arguments in favour of disinformation narratives. vera.ai also contributed to creating a dataset of synthetic speech, and synthetic speech detection methods were evaluated with encouraging first results.

Towards SO3, research aimed to support professionals in uncovering coordinated inauthentic behaviour and other disinformation campaigns and to measure their impact and spread within the target communities. A workflow for periodic monitoring of a known list of coordinated social media accounts across different platforms was designed and implemented to study the 2022 Italian general election. Finally, an approach is developed to evaluate the impact of social media posts based on an implementation of the Misinformation Amplification Factor method introduced by the Integrity Institute.

No progress towards SO4 yet (related tasks start on M13).

Towards SO5, a fact-checker-in-the-loop approach is implemented to seamlessly gather new feedback as a side effect of verification workflows, which is used by our AI methods to continuously adapt to evolving disinformation targets, narratives, and types.

Towards SO6, a list of existing and planned services is created, and integration planning has started to ensure the adoption and sustainability of the new AI tools in real-world applications through integration in leading verification tools (Truly Media and the InVID-WeVerify verification plugin). D5.1 presents a common data annotation model that represents the inputs and outputs of all tools developed within the project and the improvements within the Database of Known Fakes-specific services, workflows and functionalities.

Methods for extracting credibility indicators and trustworthy evidence were developed and evaluated in SemEval Task 3 as the best-performing solutions. A solution was proposed for image forgery detection and localization that has been designed to be applicable across a wide spectrum of image manipulation techniques. The approach, called TruFor, reduces false alarms in pristine areas. Additionally, we developed a novel architecture for fusing the outputs of multiple image forensics algorithms into a robust single output. Moreover, to investigate generalisation, we performed a study on the setting of detecting synthetic images across different concept classes and we generated a large variety of synthetic images using recent generators and a dataset for the IEEE VIP Cup on synthetic image detection at ICIP 2023. For deepfake detection in videos, we worked on an identity-based multimodal approach based on audio-visual features that characterise the identity of a person and use them to create a person-of-interest deepfake detector. The method can detect both single-modality (audio-, video-only) and multimodality (audio-video) attacks and is robust to low-quality videos. Towards the challenge of detecting synthetic audio and manipulated audio content, we created an open dataset of synthetic speech suitable for that purpose and which was collected by respecting the GDPR requirements.

For detection of decontextualised content, we identified the challenge of using test sets generated by Synthetic Misinformers instead of real-world multimodal misinformation. To this end, we conducted an extensive comparative study where we train a Transformer-based model and compare performance on the COSMOS benchmark that encompasses real-world multimodal misinformation. We found that the COSMOS evaluation benchmark enables text-side unimodal biases on an inherently multimodal task.

To enable the discovery, tracking, and impact measurement of disinformation narratives and campaigns across social platforms, modalities, and languages, we designed a workflow that allows for periodic monitoring of a known list of coordinated social media accounts across different platforms. This workflow had multiple objectives: to automatically update the account list, to keep a quasi-real-time record of these accounts' top-performing content and narratives and to generalise the detection logic for coordinated sharing, considering multimodal near-duplicates. An early version of this workflow was implemented to study the 2022 Italian general election. To improve the capabilities of the near duplicate detection, we have developed and published a novel approach capable of addressing multiple video retrieval and detection tasks at once, with no requirement for labelled data.

Periodic Reporting for period 1 - vera.ai (vera.ai: VERification Assisted by Artificial Intelligence)

Share this page

Download