Periodic Reporting for period 2 - INSIKT (Novel Social Data Mining Platform to Detect and Defeat Violent Online Radicalization)
Reporting period: 2018-10-01 to 2020-03-31
Extremist and terrorist groups use the Internet for a myriad of purposes including psychological warfare, propaganda, fundraising, recruitment and mobilization, networking, information sharing, planning/coordination, data manipulation and misinformation. All active terrorist groups have established at least one form of presence on the Internet and most of them are using several formats of online platforms1. Therefore, online content monitoring and analysis is a critical part of almost every national security investigation.
Importance for Society
From 2015 to now, 20+ terrorist attacks occurred in EU28, all of them carried out by individuals radicalized by terrorist propaganda. Recruitment of this new breed of terrorists was done via social media and the Internet. To prevent such events from happening in the future and fight radicalization, it is crucial to detect cyberpropaganda early. However, social media providers (e.g. Twitter) admit that at the moment there is no adequate tool to identify terrorist-related content on the Internet. As the result, they are forced to rely on proprietary spam-fighting tools, user reports and human analysis to track down radicalized accounts that promote terrorism.
Overall Objective
INSIKT provides a novel solution for LEA analysts to detect terrorist propaganda on all social media: it identifies radical content, suspicious messages and covert radicalization process with the help of sophisticated text mining algorithms. INSIKT relies on deep learning to develop automatically new models which can be used to detect other criminal activity as well, and it is fit to become an important new tool in LEAs’ investigation and evidence gathering arsenal, giving them highly accurate, multilingual and real-time detection capabilities.
From the start of the project we have developed the task of adding new functionalities to the existing TRL7 solution, specifically o add new key functionalities to the existing system by adding new data sources.
This task has been achieved by developing 2 components:
• API Data acquisition: The module that acquires the data from the Social Media sources. It is the first step of data flow, so some of their formats are preserved along the process, if the input is already a JSON file, typical for social media APIs. In this module we have added new sources to the solution: Facebook, Instagram, YouTube.
• Uploaded Data acquisition: It is the module that acquire the data from the user. It is also the first step of data flow and the data format will depend on the user data. It is necessary because a lot of social media platform don't allow to extract data from them: DarkNet data, SnapChat, JustPasteIt, WhatsApp, Telegram,Blogs, Forums, between others.
Detection of radical content and users in a highly dynamic big data-driven environment such as social media is highly challenging both on scientific and technical level. We already went beyond the state of the art in comparison to existing solutions by offering complex analysis modules that understand the way social media “speaks” using natural language processing, and is capable of detecting threats in real-time by comprehensive terrorist domain knowledge.
Expected results
INSIKT will extend capabilities of our TRL7 solution with the functionality that will allow to:
detect radicalisation processes early, before they become potential threats,
detect radicalizers and vulnerable individuals or groups that have the potential to be radicalised,
detect communities of radicalisers,
detect radical content and predict its spread for LEAs to intervene before it’s too late,
understand better online radicalisation processes and the methods and patterns of radicalisers,
Potential impacts
To detect propagation of terrorist beliefs, intelligence analysts need an intuitive solution to pinpoint radical content and potential radicalisation processes from huge, live and fast-moving social data of disparate media types and sources. This need is relatively new, and it yet has to be met by existing solutions. LEAs all over the world are beginning to take social data seriously, and started to incorporate intelligence gathered from social sources into their investigations as well as surveillance and prevention operations.