The lack of realistic, up-to-date and sufficient training and testing data for research purposes has been regularly raised by the projects working in the area of fighting crime and terrorism (FCT), to the extent that such data are necessary instead of dummy and synthetic data. Namely, the accuracy of tools, notably (but not only) digital ones, depends heavily on the quantity and on the quality of the training and testing data, including the quality of their structure and labelling, and how well these data represent the problem to be tackled.
This issue is generally present in any research area, but it gets more emphasised in the, e.g. security, health or defence domain due to the special categories of data involved and the sensitivity of the domain, which calls for additional requirements to access to real datasets or the creation of representative datasets at a national level.
In EU-funded projects, in the area of FCT, the problem of having a scientifically satisfactory amount of up-to-date high-quality and realistic data needed to develop reliable (digital and non-digital - e.g. detection and/or qualification of explosives, drugs, DNA traces) tools in support of Police Authorities becomes even more complex. Namely, training and testing data sets considered legal and used in one Member State have to be shared and accepted in other Member States, while simultaneously observing fundamental rights and substantial or procedural safeguards.
In addition, with continuous and fast technological improvements, including but not limited to the Internet of Things, new data formats and mechanisms for data transfer, storage and security are and will be developed. In addition, data formats are often not harmonised amongst similar research projects, thus hampering potential interoperability requirements.
Another problem that is often encountered is a lack of trust between researchers and practitioners/end-users, as well as between different projects when it comes to data sharing. To this end, it is important to break down barriers between projects and keep on passing the message that the projects should not be competing to outperform each other, but working together to provide the EU with the best possible solutions. As a pre-requisite for all the above, there is a need to have a common research data repository.
The aim of this topic is to tackle this multi-layered issue and set the basis for such a common data repository by creating a roadmap consisting of a clear set of rules, conditions and characteristics that such a repository should have, be it the variety of the data in function of the type and of the problem at hand, legal issues, avoidance of any bias, accessibility levels related to the sensitivity of various data sets, harmonisation of data formats, solutions for annotation as well as for the aging of the data, etc.
As an integral part of proposed activities, apart from the above sets of requirements, technical solutions should be developed that could help research activities comply with privacy and data protection requirements when handling data, while being able to extract information if needed. Namely, as learnt from the previous research activities, standard pseudonymisation and anonymisation methods are not satisfactory in this domain, as they, e.g. either break the links between different pieces of evidence or take a lot of time and effort. Thus, new and/or improved anonymisation and pseudonymisation technologies, including other security measures, such as masking and unmasking technologies, should be developed to facilitate data management ensuring full access to the data actually needed (in line with the necessity and proportionality principle), in full respect of fundamental rights and applicable legislation.
Although proposed activities should focus on the research data for fighting crime and terrorism within the remits of Horizon Europe regulation (including ethics), proposals should take into account the possible application of the identified solutions in different security research domains, such as infrastructure resilience, border management or disaster resilience.
Coordination with the successful proposals from topic SU-AI02-2020 (on AI research datasets) and future successful proposals in HORIZON-CL3-FCT-2021-01-01 (on travel intelligence training and testing data for research purposes as well as on pseudonymisation techniques), HORIZON-CL3-FCT-2022-01-05 and HORIZON-CL3-FCT-2022-01-01 (on ground-truth data sets for conventional forensics) as well as HORIZON-CL3-FCT-2022-01-02 (on common data formats) should be envisaged so as to avoid duplication and to exploit complementarities as well as opportunities for increased impact. Possibilities of coordination with related activities in the Digital Europe Programme or European Open Science Cloud should be analysed too.
In this topic the integration of the gender dimension (sex and gender analysis) in research and innovation content is not a mandatory requirement.
The duration of the proposed activities should not exceed 24 months.