Periodic Reporting for period 2 - CounteR (Privacy-First Situational Awareness Platform for Violent Terrorism and Crime Prediction, Counter Radicalisation and Citizen Protection)
Berichtszeitraum: 2022-11-01 bis 2024-04-30
Additionally, ISPs can use the NLP capabilities to moderate content and prevent the publication of radicalised comments. The CounteR solution covers diverse information sources, both dynamic (e.g. social media) and offline (e.g. open data). It enables LEAs to take real-time coordinated action while preserving privacy by targeting radicalisation "hotspots" rather than individuals.
WP7 focused on ensuring compliance with data privacy and protection regulations and the Commission’s ethical requirements. This included a continuous assessment to align the CounteR system with citizen rights, policy concerns, and security objectives.
The project proposed to develop large-scale data acquisition tools under WP3 to collect unlabelled data from a large pallet of sources (social media, blogs, forums, websites, deep and dark web) and for a vast number of use cases (extreme right, racist groups, hate speech communities, jihadism, etc.). The Social Media Collection Engine supports four different collectors utilising official social media APIs for data retrieval.
The data collected in WP3 was processed in WP4, involving extracting features from text content using natural language processing (NLP) methods and extracting features from other non-textual content. The image analysis module assesses images to determine the presence of radical or terrorist content. Following data collection related to radicalisation, deep neural networks were developed, trained, and optimised to classify and detect radicalised elements. These deep learning models were further validated on real-world datasets produced in WP8. A flexible architecture was designed for domain coverage in all languages, using transfer learning and a zero-shot approach. The NLP-based radicalisation classifier pipeline aimed to produce the core radicalisation classification text-based models that are used jointly with other multimodal detection models (network, images). Each target language in the training data is included in the core models, and each can be seen as an independent product.
WP5 used the data collected and pre-processed within the two previous WPs to generate the models. The data modelling ecosystem has been further developed and enhanced to serve as a high-end computer wizard designed to provide insights into the risks of radicalising social groups and communities through the intricate, seemingly hidden correlations derived from the data. (1) SRICE (Semantic Reasoning and Insights Correlation Engine) accepts an input of machine learning results and returns an output of a correlation graph that is used by other algorithms such as network analysis. (2) The Deep Reinforcement Learning (DRL) module enhances and scales up the input from natural language pre-processing and combines the results with the output from imaging and network analysis to suggest further possible information sources. (3) Social Network Analysis provides several methods for discovering structures in complex relationships. Embedding methods for networks were implemented, prediction methods were developed, implemented, and tested.
The components developed were integrated into one unique software solution in WP6. The final version of the CounteR was deployed, integrating the feedback collected from the piloting and testing sessions as part of WP8. To ensure easy solution deployment, the team upgraded and optimised the CI/CD pipelines to fit the ongoing upgrades of CounteR components and on-premises deployments. To ensure it meets the requirements and standards established in the industry, the platform has undergone a series of functional, performance and security testing. The CounteR solution constitutes the core end-product and exploitable asset: both as a software application (cloud and on-premises) and its backend and frontend, without attached components.
WP8 was dedicated to establishing a robust testing framework for validating and assessing the solution. This ensured that end-users (LEAs and ISPs) could proficiently navigate the platform, comprehend the underlying technologies, and integrate the insights from the research in WP2 into their professional activities. WP8 also encompassed training of all involved end-users.
CounteR gained improved visibility by maximising its presence at international events, leveraging these opportunities to reach relevant stakeholders and connect with sister projects (WP9). The research findings have been published in 21 scientific papers. Valuable information about the CounteR project was disseminated at a series of international events, such as tech fairs, security conferences, and events.
CounteR has set out to develop large-scale data acquisition tools to collect data from various sources (social media, blogs, forums, open, deep and dark web) and for a vast number of use cases (extreme right, racist groups, hate speech communities, jihadism, etc). Each Collector module is associated with specific pipelines focused on transforming data obtained by the collector to a format suitable for the platform. The outcomes of the ingestion pipelines are raw data in a coherent and usable format, and the data is sanitised and pseudo-anonymized. The collected data is analysed using NLP methods (morpho-syntactic analysis, named-entities recognition, sentiment analysis), image analysis and SNA. Based on large multilingual language models, the consortium developed a transfer learning architecture that constitutes the cornerstone of the radicalisation detection classifier. The semantic reasoning and insight correlation engine extract relevant insights on radicalisation risk within communities as part of dynamic network analysis, which includes main actors, information flow, and the evolution of relationships. The DLR module enhances and scales up the input from natural language pre-processing and combines the results with the output from imaging and network analyses to suggest further possible information sources. The SNA component reveals indirect, seemingly hidden correlations by applying different network metrics, non-linear embeddings and community detection methods.