Periodic Reporting for period 1 - CounteR (Privacy-First Situational Awareness Platform for Violent Terrorism and Crime Prediction, Counter Radicalisation and Citizen Protection)
Reporting period: 2021-05-01 to 2022-10-31
To achieve this main objective, the CounteR system incorporates state-of-the-art NLP technologies combined with expert knowledge of the psychology of radicalisation processes to provide a complete solution for LEAs to understand factors of radicalisation in the community. This is designed to help combat propaganda, fundraising, recruitment and mobilization, networking, information sharing, planning, data manipulation, and misinformation. The information gained by the system will also allow LEAs and other community stakeholders to implement prevention programs and employ counternarratives rather than rely solely on surveillance.
The CounteR solution will cover a wide range of information sources, both dynamic (e.g. social media) and offline (e.g. open data sources). The CounteR solution will allow LEAs to take coordinated action in real-time while also preserving the privacy of citizens, as the system will target “hotspots” of radicalisation rather than individuals.
In parallel, social and psychological factors of radicalisation were thoroughly examined, in view of social changes and their interconnections with personal elements, as part of WP2. Furthermore, within WP3 the data ingestion architecture was created, and data collection tools were designed and implemented for social media sources, blogs, forums, and open web, as well as the dark web.
The collected data is analysed and extracted in WP4, with the use of an NLP module, an image analysis module and a social media analysis module. A radicalisation classifier was trained on the data received from the sub-contractor for Arabic, French and English languages. Partners built a transfer learning architecture based on large multilingual language models and developed a thorough evaluation protocol using a so-called zero-shot scenario. The analysis is meant to go even further and, in conjunction with WP5, partners performed a series of experiments to contextualise the analysis and provide indications on the radicalisation level of an individual given their social network dynamics. WP5 aims to create models from the data collected and processed in WP3 and WP4. A semantic reasoning and insight correlation engine has been developed, to allow for a better learning and more extended labelling of network nodes, links or groups. Furthermore, within WP5, partners created a Deep Reinforcement Learning module for crawling webpages, blogs, and social media sources.
The CounteR R&D activities are performed within a normative framework, where ethical principles and legal requirements defined in WP7 Data Privacy and Ethics Requirements, to ensure the implementation of data minimization and anonymization principles and establish the Legitimate Interest Assessment of CounteR data acquisition for training the algorithms. In close connection with this WP, WP11 was dedicated to developing a non-discrimination strategy, which includes the use of technics to embed non-discriminatory capabilities into the CounteR system and the definition of diversity, non-discrimination, and fairness applied to CounteR.
WP9 Dissemination, ecosystem development & exploitation has ensured a proper dissemination of the project, its objectives and current results. Non-sensitive pieces of research and findings stemming from the CounteR project have been shared with the scientific and research communities to increase the impact of the results and encourage further research initiatives. All project partners initiated key steps towards designing and delivering a joint CounteR exploitation plan.
CounteR has set out to develop large-scale data acquisition tools to collect data from a plethora of sources (Social media data, Blogs, Forums, Websites, Public groups, Dark web), and for a vast number of use cases (Extreme right, Extreme left, Racist groups, Hate speech communities, Conspiracy theories, Jihadism). Each Collector module is associated with specific pipelines focused on transforming data obtained by the collector to a format suitable for the CounteR platform. The outcomes of the ingestion pipelines are raw data in a coherent and usable format, and the data is sanitized and pseudo-anonymized. The collected data is analysed using natural language processing methods (NLP - morpho-syntactic analysis, named-entities recognition, sentiment analysis), image analysis and social network analysis. Based on large multilingual language models, the consortium developed transfer learning architecture that constitutes the cornerstone of the radicalisation detection classifier. The semantic reasoning and insight correlation engine extracts relevant insights on radicalisation risk within the communities as part of a dynamic network analysis: main actors, information flow, evolution of relationships. The reinforcement learning module enhances and scales up the input from natural language pre-processing and combines the results with the output from imaging and network analyses to suggest further possible information sources. The social network analysis component reveals indirect, seemingly hidden correlations by applying different network metrics, non-linear embeddings and community detection methods.