Skip to main content

Machine learning to augment shared knowledge in federated privacy-preserving scenarios

Periodic Reporting for period 2 - MUSKETEER (Machine learning to augment shared knowledge in federated privacy-preserving scenarios)

Reporting period: 2020-06-01 to 2021-11-30

The massive increase in data collected and stored worldwide calls for new ways to preserve privacy while still allowing data sharing in respect of their sovereignty among multiple data owners.

Today, the lack of usable trusted and secure environments for data sharing inhibits data economy while legality, privacy, trustworthiness, data value and confidentiality hamper the free flow of data.

MUSKETEER aims to create a validated, federated, privacy-preserving machine learning Industrial Data Platform (IDP) that is inter-operable, scalable and efficient enough to be deployed in real use cases.

MUSKETEER aims to alleviate data sharing barriers by providing secure, scalable and privacy-preserving analytics over decentralized datasets using machine learning based on IDSA concepts (architecture model and components). An initial set of privacy preserving machine learning algorithms to solve regression, classification and clustering problems will be provided, although the platform will be flexible enough to accept new algorithmic implementations.

Data can continue to be stored in different locations with different privacy constraints, but shared securely. The MUSKETEER cross-domain platform will validate progress in the industrial scenarios of smart manufacturing and health.



MUSKETEER pursues different objectives:



1. Machine Learning over a high variety of different privacy-preserving scenarios.

2. Providing robustness against external and internal threats.

3. Enhancement of the Data Economy.

4. Providing a standardized and extensible architecture.

5. Industrial demonstration of the technology advances in operational environments.
The work carried out during the reporting period has been consistent with the schedule. During RP1 we had two milestones:

MS1 Industrial, technical and legal requirements for the MUSKETEER platform.
MS2 Architecture final design of the MUSKETEER platform.

The achievement of these two milestones contributed to the development of a first prototype of the MUSKETEER platform which is able to host a wide variety of Machine Learning algorithms over a high variety of different privacy-preserving scenarios (from POM1 to POM6). The implementation of this initial prototype has been done after a detailed analysis of the appropriate design that can cater for all the different technical and end users’ requirements. A careful analysis in compliance with the legal and confidentiality restrictions of most industrial scenarios has also been conducted.

The work carried out during the reporting period has been consistent with the schedule. During RP2 we had five milestones:

MS3 MUSKETEER Platform development
MS4 Final machine learning algorithms
MS5 MUSKETEER Platform assessment
MS6 Pilot implementation, execution and validation
MS7 Exploitation and commercialization plan

The achievement of the remaining milestones drove the completion of the final prototype of the MUSKETEER platform which is able to host a wide variety of Machine Learning algorithms over a high variety of different privacy-preserving scenarios (from POM1 to POM8). The implementation of this final prototype enabled us to proceed with a complete assessment of the platform, the implementation of the pilots, their execution and validation. The 2nd period eventually enabled us to complete the Exploitation and commercialization plan for the project.


MUSKETEER project publications are presented in the following paragraphs, first the scientific publications (JCR journals, conference publications, book chapters), then more technical publications in the form of open repositories on Github and eventually external presentations in the media (newspapers, fairs or blog platform). It reports about available publications at the end of the project.

Scientific publications. 8 scientific publications have already been accepted, 3 more are under review, and a number of them collecting recent results are currently assembled. But we are confident that more will soon complete our work and contribute to the project’s impact. MUSKETEER’s results have been submitted and presented in major conferences (International Conference on Machine Learning - ICML, International Conference on Learning Representations - ICLR, European Symposium on Artificial Neural Networks - ESANN) and journals (Journal of Neurocomputing, ACM Transactions on Intelligent Systems and Technology, IEEE Transactions on Parallel and Distributed Systems, Computerrecht). Besides, MUSKETEER’s results were accepted in 3 books published by Springer and led by the Big Data Value Association named “Data Spaces: Design, Deployments, and Future Directions”, “The Elements of Big Data Value” and “Technology and Application for Big Data Value”.

Github repositories. Different Github repositories have been set up and host the project’s results. All repositories have not been opened publicly yet but plans to do so have been made by the partners. What can be found already include the pycloud messenger from IBM, the client connector from Engineering and some public libraries related to the MUSKETEER machine learning libraries (MMLL) by Charles III University of Madrid (UC3M) and Tree Technology.

Pycloud messenger from IBM: https://github.com/IBM/pycloudmessenger
Client connector from Engineering: https://github.com/Engineering-Research-and-Development/musketeer-client-connector-backend
MUSKETEER machine learning libraries: https://github.com/orgs/Musketeer-H2020/repositories

Media. MUSKETEER project appeared in a number of media during the project. The most significant activities are presented below. It was mentioned in a national newspaper in Spain where it was seen as a potential solution for the rising COVID crisis in 2020. MUSKETEER was also featured in large conferences where partners had the opportunity to extensively present the project like during the Big Things Data & AI Conference in Madrid. We also created a dedicated blog on the platform Medium, well known for its stories about entrepreneurship and technology, to present MUSKETEER results. Some of our blog posts had the chance to be featured in larger publications that increased awareness about the project.

MUSKETEER presentation at the Big Things Data & AI 2019 Conference in Madrid: https://www.youtube.com/watch?v=Pjjd53MwLGA
MUSKETEER as a potential solution for the rising COVID crisis in 2020: https://www.elmundo.es/papel/futuro/2020/07/26/5f19b6edfdddffa0b78b4601.html
MUSKETEER blog: https://h2020musketeer.medium.com/
The first prototype of the Musketeer platform offers State-of-the-Art integration of privacy preserving techniques enabling participants of a data sharing ecosystem to exchange data in secure way. The project also showed promising results in the domain of data poisoning with the robust aggregation technique developed in some of our activities to defend against poisoning attacks and faulty clients compared to other state-of-the-art robust aggregation method. The new period will see more scientific achievements as more results will come with the project progress such as the current activity of some of the partners of the project about the applicability of Federated Learning to answer some GDPR challenges. This can have an important impact on how this technology could help the very contemporary issue of privacy.
MUSKETEER topology - M18