Vaccine Media Analytics

Informazioni relative al progetto

VACMA

ID dell’accordo di sovvenzione: 797876

DOI

10.3030/797876

Progetto chiuso

Data della firma CE 28 Marzo 2018

Data di avvio 20 Febbraio 2019

Data di completamento 21 Giugno 2022

Finanziato da

EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions

Costo totale

€ 170 121,60

Contributo UE

€ 170 121,60

170 121,60

Coordinato da

FUNDACION PARA EL FOMENTO DE LA INVESTIGACION SANITARIA Y BIOMEDICA DE LA COMUNITAT VALENCIANA
Spain

Periodic Reporting for period 1 - VACMA (Vaccine Media Analytics)

Periodo di rendicontazione: 2019-02-20 al 2021-02-19

One of medical science’s greatest achievements is the development of vaccines. Even if availability is improving, prices are dropping, evidence of effects are increasing and the side effects are fewer, an increasing number of people av voluntarily choosing not to accept vaccines. This project do not focus on the very visible group of active vaccine opponents, but instead on the less visible group where vaccine is just one of many other issues in daily life. From a public health communication point of view this group is particularly interesting since they is in a phase where they are deciding if they should vaccinate. This group have been hard to study since they are hard to target both with traditional surveys and interview and by traditional media analytics only. To be able to understand the sentiments and stance of this group, we will need to analyse large amounts of text - a task that is simply impossible to do manually.

The project focus on machine learning techniques and semantic analysis. The choice of methodology together with the hosting at FISABIO and the University of Valencia coupled with secondments at the Vaccine Confidence Project at London School of Hygiene & Tropical Medicine and with the SME Salumedia, is the starting point for the research.

The main research objective in this project was to get a deeper understanding of what matters for the people who are still making up their minds about vaccines, and to develop the techniques that is necessary to gain this understanding.

In the project description it was suggested that the latest state-of-the-art methods, particularly probabilistic latent semantic analysis and long short term memory networks (LSTM), was expected to give the best result. From the time the project description was written, until the start of the project in February 2019, there were significant changes in the field of natural language processing. Most notably, Jacob Devlin at Google did in November 2018 publish an article called “Bidirectional Encoder Representations from Transformers”. The researcher was one of the first that where able to use these improved methods.

The main scientific achievement in the project has been to be able to use transformer-based language models within the field of vaccine confidence. After collecting and coding the data (WP1), a model was trained to be able to analyse vaccine stance, sentiments and categories (WP2). In the end the data were analysed (WP3). The method has improved the accuracy of sentiment and stance analysis, and made it possible also to identify more complex and fine-grained sentiments regarding vaccine stance.

As the article “Categorizing vaccine confidence with a transformer-based machine learning model: analysis of nuances of vaccine sentiment in Twitter discourse.” (https://medinform.jmir.org/2021/10/e29584/) clearly shows, the project has been able to apply machine learning methods to gain an accuracy in vaccine sentiment and stance analysis that is beyond what was previously possible. The project has also been able to participate in releasing the first transformer model pre-trained on Covid-19 social media. This is described in the article “Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter.”. The CT-BERT-model that was trained has contributed to several other research projects. The provided list under 1.1 includes many examples of how it has contributed
out interesting areas in implementing advanced machine learning services in commercial products, and it also provided useful knowledge for the tasks T4.1-2.

In the latter stages of the project, the focus was on applying and testing vaccine sentiment analysis. A lot of experiments were also carried out with regard to comparing large zero-shot models with encoder-based models. The most important work here was the publication of the article “Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model” in Frontiers of Public Health. The article is a baseline for further research.

The first article “Categorizing vaccine confidence with a transformer-based machine learning model: analysis of nuances of vaccine sentiment in Twitter discourse.” proved that it was possible to gain close to human accuracy by using transformer-based methods on large scale social media data.

The largest impact from the project is probably the publishing of Covid-Twitter-BERT(CT-BERT) a transformer-model pretrained specifically on social media data. It was the first model of its kind, and has proved to be useful even in analysing tweets that are not related to Covid-19. The article describing the training of the model was immediately published on Arxiv. The article has now been referenced more than 200 times. Most interesting is it however that in around half of this articles, the model is actually used to perform analysis. It can also be noted that the model has been downloaded more than 100.000 times from HuggingFace.

In the latter part of the study, the methods have been used for multilingual research, and it is published an article analysing Italian Tweets.

VACMA

Periodic Reporting for period 1 - VACMA (Vaccine Media Analytics)

Scarica Scarica il contenuto della pagina