Periodic Reporting for period 1 - VACMA (Vaccine Media Analytics)
Okres sprawozdawczy: 2019-02-20 do 2021-02-19
The project focus on machine learning techniques and semantic analysis. The choice of methodology together with the hosting at FISABIO and the University of Valencia coupled with secondments at the Vaccine Confidence Project at London School of Hygiene & Tropical Medicine and with the SME Salumedia, is the starting point for the research.
The main research objective in this project was to get a deeper understanding of what matters for the people who are still making up their minds about vaccines, and to develop the techniques that is necessary to gain this understanding.
The main scientific achievement in the project has been to be able to use transformer-based language models within the field of vaccine confidence. After collecting and coding the data (WP1), a model was trained to be able to analyse vaccine stance, sentiments and categories (WP2). In the end the data were analysed (WP3). The method has improved the accuracy of sentiment and stance analysis, and made it possible also to identify more complex and fine-grained sentiments regarding vaccine stance.
As the article “Categorizing vaccine confidence with a transformer-based machine learning model: analysis of nuances of vaccine sentiment in Twitter discourse.” (https://medinform.jmir.org/2021/10/e29584/(odnośnik otworzy się w nowym oknie)) clearly shows, the project has been able to apply machine learning methods to gain an accuracy in vaccine sentiment and stance analysis that is beyond what was previously possible. The project has also been able to participate in releasing the first transformer model pre-trained on Covid-19 social media. This is described in the article “Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter.”. The CT-BERT-model that was trained has contributed to several other research projects. The provided list under 1.1 includes many examples of how it has contributed
out interesting areas in implementing advanced machine learning services in commercial products, and it also provided useful knowledge for the tasks T4.1-2.
In the latter stages of the project, the focus was on applying and testing vaccine sentiment analysis. A lot of experiments were also carried out with regard to comparing large zero-shot models with encoder-based models. The most important work here was the publication of the article “Understanding the vaccine stance of Italian tweets and addressing language changes through the COVID-19 pandemic: Development and validation of a machine learning model” in Frontiers of Public Health. The article is a baseline for further research.
The largest impact from the project is probably the publishing of Covid-Twitter-BERT(CT-BERT) a transformer-model pretrained specifically on social media data. It was the first model of its kind, and has proved to be useful even in analysing tweets that are not related to Covid-19. The article describing the training of the model was immediately published on Arxiv. The article has now been referenced more than 200 times. Most interesting is it however that in around half of this articles, the model is actually used to perform analysis. It can also be noted that the model has been downloaded more than 100.000 times from HuggingFace.
In the latter part of the study, the methods have been used for multilingual research, and it is published an article analysing Italian Tweets.