Retrieval and Analysis of Heterogeneous Online Content for Terrorist Activity Recognition

Periodic Reporting for period 2 - TENSOR (Retrieval and Analysis of Heterogeneous Online Content for Terrorist Activity Recognition)

Reporting period: 2018-03-01 to 2019-11-30

Currently, Law Enforcement Agencies (LEAs) across Europe face important challenges in how they identify and gather terrorist generated content online. The Dark Web presents additional challenges due to its inaccessibility and the fact that undetected material can contribute to the advancement of terrorist violence and radicalisation. LEAs also face the challenge of extracting and summarising meaningful and relevant content huge amounts of online data to inform their resource deployment and investigations. Even with the wider deployment of technology to gather and extract information, this work is dependent on the parameters set by the investigating team and applied by the available technology, which can often be restrictive. To deal with these issues, LEAs require technologies that can penetrate the Internet to gather (hidden) online terrorist generated content in order to provide unified access to multilingual and multimedia content and ensure its spatio-temporal semantic interpretation and summarisation. Although there are several research tools and works targeting these areas independently for intelligence and forensic investigation, nowadays LEAs do not have access to intelligent holistic tools. In order to bridge this gap TENSOR brings together LEAs, industry & academia in an aim at developing a platform that provides LEAs with tools necessary to enhance their capacity for dealing with huge amounts of online content in the early detection of online terrorist organised activities, radicalisation and recruitment.
In order to achieve the overall project objectives, the project and measure its success the project separates the objectives into three distinct areas, Innovation Objectives and Actions; User-Orientated Objectives and activities & Impact-making objectives and impact-making activities.
WP1 focuses on the project management and coordination of the project. This has included project meetings, liaison with EC, Financial Management, establishment of external advisory boards, Security Advisory Board and Ethics Committee. Establishing the processes to deliver the project and monitoring of progress to ensure the project is meeting its objectives.
WP2 has focussed on developing the initial user requirements & scenarios to inform detailed user requirements. The Detailed User Requirements were developed including security requirements to guide the technical development of the TENSOR Platform. WP2 prepared Web Search Entry Points so the project understands the current state of knowledge in terrorism online. WP2 also considered the impact of Misinformation, Disinformation, attempting to identify prioritisation indicators in relation to terrorist related threats.
WP3 considered the Legal, ethical and data protection requirements. It focussed on assessing the legal frameworks which could affect the deployment of the TENSOR platform, at a national, EU and global level. It also considered ethical and societal aspects which need to be considered when developing tools for the proposed purpose, with a framework for use during the project. It also included guidance for Project partners in relation to Data Protection.
WP4 deals with the automated, large-scale, Web crawling, scraping, machine learning, automatic bots, text and multimedia mining techniques in order to more efficiently detect and monitor terrorist-related content on the Internet. So far there has been development of the initial version of domain-specific search using Web search engine, an initial version of domain-specific data discovery using focused crawling based on adaptive hyperlink selection methods, and the set-up of online web tool for presenting the outcomes of the domain-specific search and discovery framework. A concept extractor has been developed and integrated into the pipeline.
WP5 has looked at filtering techniques to reduce noise, the definition and development of classifiers, the clustering and linking of frameworks to support correlation of information, reducing noise. WP5 has an initial version of social network analysis framework, an online web tool for presenting the outcomes of the key actor and community detection framework, a component for semantic reasoning. WP5 has developed tools to identify multimedia content and analysis for tampering or adjustments to images.
WP6 has developed the summarization infrastructure dividing the pipeline into three main parts Language Analysis, Text Planning and Natural Language Generation. The work package has also developed semantic ranking methods, and the implementation of a visual concept-based summarizer. In relation to Machine Translation the WP has a fully integrated part of a hybrid multilingual text analysis pipeline, and a stand-alone translation system which is continuing to develop.
WP7 is planning the technological roadmap and architecture design based on user requirements integrating the modules from previous WPs. WP7 has developed the first version of the TENSOR Platform Server. It has developed the analysis of the social media content structures allowing classification of the data entities, edges and fields that can then be used as a frame to generate similar data content for use in a testing environment, analysis of the frameworks to allow interpretation of existing datasets from the social media platforms with potential to anonymise sensitive fields, and research into anonymization tools and techniques capable on ingesting datasets and providing a visual representation of the content for anonymization and validation.
WP9 is focussed on Dissemination and Exploitation of the TENSOR project. In the first phase work has been focussed on disseminating the project with limited work in Exploitation other than Market Analysis. Dissemination has included the establishment of the Website, Twitter
TENSOR has been working to identify the current known state of the art within the LEA community and working to take the development of the platform beyond the state of the art. This includes developing novel query approaches to improve domain searching and web crawling capabilities, including the improvement of classifiers. These tools will improve capabilities both on the open and dark web. It is envisaged that the work in relation to dark web searching in have a significant effect for LEAs. It is also expected that the proposed development of the social media crawlers and multimedia analysis will enhance the current state of the art, especially in determining misinformation and enriched machine learning and translation to support the classifiers as well as identifying key influencers and communities.
Through the work being undertaken in TENSOR the technical solutions will close the capability gaps and provide a platform that will enable LEAs to improve their ability to carry accurate actionable threat intelligence. It is hoped such intelligence will assist LEAs to support earlier intervention measures reducing the potential for individuals being radicalised or progressing to Violent Extremism. Added to this, the work in Work Package 3 has provided a framework to consider legal and ethical considerations that must be implemented to support the use of such tools. This helps to protect the fundamental human rights such as freedom of expression and privacy, through the designed inclusion of data protection and anonymisation of personal legitimate data.