European Commission logo
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

LESSEN DATA ACCESS AND GOVERNANCE OBSTACLES

Periodic Reporting for period 1 - LAGO (LESSEN DATA ACCESS AND GOVERNANCE OBSTACLES)

Periodo di rendicontazione: 2022-11-01 al 2023-10-31

The FCT research community needs a trusted and safe infrastructure to share and co-produce large, high-quality datasets that are sufficiently realistic and domain-specific to drive FCT research and innovation forward. LAGO addresses “The Data Issue” in the FCT research landscape proposing the foundation for the establishment of a Research Data Ecosystem (RDE) to foster the access to research data in Fight against Crime and Terrorism (FCT) domain. The reference architecture will define the agreed and validated technological, procedural, SELP and governance building blocks that an EU FCT RDE should comprise to become a safe, secure, trusted and sustainable means for data-oriented research collaboration among LEAs, security practitioners, relevant EU agencies, academic and industry researchers, policy makers and regulators.
LAGO includes nine work packages. WP1 coordinates all the project activities ensuring the achievement of its objectives.
WP2 addresses the legal, ethical and societal aspects and provides support to the consortium in multiple cases.
WP3 performs the assessment of the current research data landscape in FCT domain, analysing the practices, data sharing procedures, potential barriers and enablers to the adoption of a RDE, and recommendations for enabling access to research data. The first version of requirements has been prepared and prioritized. A Reference Model for the RDE has been proposed, focusing at high level on the actors and processes enabling access to research data in trusted and secured way. The main functionality areas identified so far are related to the setup of the RDE, onboarding of new participants, data creation procedures, dataset publishing, search, request and exchange, training and testing of model. The RDE Reference Architecture derives from the Reference Model and focuses more on logical view and technical aspects, dividing the solution into multiple technical and software components, which together form the envisioned ecosystem, and detailing interactions among them in terms of services and exchanged messages.
WP4 delivers methodologies and tools for data creation, annotation, anonymization, synthesis and watermarking. During the first period, multiple tools have been realised to guide end users in the proper creation of significant datasets, thus based on the principles of delivering high-quality datasets and meaningful annotations and considering methods for security and privacy preservation.
In WP5 the first version of the Data Quality Assessment tool has been realised, with the goal of providing users with indicators about the quality of data being shared. The first version of the Risk Assessment tool has been developed too, aiming at making users aware of the risks related to sharing data with specific characteristics (data types, usage purposes, presence of personal information, FCT domain, etc.) and proposing mitigation measures to avoid risks in sharing those data for research purposes. To properly trace events occurring in the RDE in secure way, an Ethereum-based prototype has been realised and custom smart contracts have been defined in support of decentralised authentication mechanisms envisioned in WP6. Research data usage also includes procedures for model training and testing with data received from a provider. A sandbox environment is under development to allow end users to test trained models without the need for disclosing data outside their premises. The sandbox environment is based on containerization technologies, to ensure portability of the solutions. For the case in which access to data is not possible, a Federated Learning (FL) approach is under development as complementary strategy for model training.
WP6 is responsible for defining a governance model for the FCT Research Data Ecosystem. To this end, a trust establishment mechanism based on Verifiable Credentials standard has been adopted for the accreditation of participants in the RDE in trusted way, with roles and responsibilities defined. To enable interoperability, a semantic harmonization is ongoing, aimed at defining a LAGO vocabulary, to use as reference for modelling concepts and metadata related to the RDE processes. Relevant existing ontologies have been identified, whose concepts will be incorporated into the LAGO vocabulary to foster the reuse of open standards. In addition, the proposed governance framework foresees the possibility for participants to define their own licenses, to ensure that data providers are able to define the conditions under which their data will be used and to enforce a usage agreement between the parties before the data is transferred.
Definition of demonstration scenarios and planning of demonstration rounds have been addressed in WP7. Test scenarios have been derived from use cases and divided into unit test scenarios and system scenarios.
WP8 deals with the planning and implementation of dissemination and communication activities, the preparation of an exploitation plan, community building, and training activities.
Finally, WP9 activities focused on the fulfilment of the four Ethics Requirements laid out by the European Commission.
LAGO is producing a range of cutting-edge solutions, each addressing key challenges related to data access in the FCT domain. A notable innovation enhances Human Attribute Segmentation through adaptive techniques that excel in varied contexts, setting new standards for data processing efficiency in limited-access scenarios typical in FCT. In cybersecurity, LAGO introduces ingenious data generation methodologies to combat botnets, leveraging generative models to create realistic, high-quality samples for machine learning model training under data constraints. This approach is complemented by another groundbreaking technique that uses generative methods to embed and detect malware in digital images, significantly bolstering cyber defence mechanisms against digital terrorism. In the field of computer vision, the project tackles the issue of catastrophic forgetting in incremental learning, employing federated learning and advanced knowledge transfer methods to ensure robust performance in distributed and data-sensitive environments. Furthermore, the project pioneers in efficiently training visual models under data scarcity and distribution challenges, utilizing a harmonious blend of self-supervised and supervised learning within a federated framework.
LAGO Project