FARE requires the establishment of three main datasets and two frameworks (one technical and the other theoretical).
The first dataset is the fake/real news database (yellow, in Figure 1). Information is selected for fact-checking by independent organizations that use different criteria and classification metrics. Therefore, the team has selected more than 300 fact-checkers, collected over 60,000 fact-checked pieces in almost 30 languages, from 2012 to 2023, and standardized their classification as “False”, “True” or “In-between”. This dataset is now being classified by topic and we expect to make it freely available by the end of 2023.
The second step serves to identify the classified fake/real news on a social network to estimate their spread and detect spreaders. From close to 40,000 reviewed items, we have identified more than 10,000 shares, corresponding to close to 3 million different user profiles. We are now in the process of selecting samples and collecting relevant information to complete this second dataset (grey, in Figure 1).
For the third dataset, and to explicitly test our hypothesis, we must identify individual or contextual specificities that may serve as good predictors of belief in misinformation. We are currently testing small pilot surveys and preparing to deploy a full-scale questionnaire, in an experimental context, by the end of the year (blue, in Figure 1).
Regarding the technical framework, we have created a schema to test a proof-of-concept system to analyse the data without crossing the datasets, to minimize ethical risks and protect the privacy of social network users (purple, in figure 1). This is now being implemented and will be deployed in 2025.
Finally, we are using some of the epidemiological models that we developed in the past (and during our efforts to fight the COVID-19 pandemic) to study how misinformation is spreading online (red, in Figure 1). This corresponds to one of our last aims and requires that previous tasks are completed before we can expect results.