Skip to main content
European Commission logo print header

Scalable Oblivious Data Analytics

Periodic Reporting for period 2 - SODA (Scalable Oblivious Data Analytics)

Okres sprawozdawczy: 2018-07-01 do 2019-12-31

More and more data is being generated, driving knowledge and value creation across society. Unlocking this potential requires sharing of data between organizations, but this meets unwillingness from data subjects and data controllers. Hence, techniques that protect personal information for data processing and analysis are needed. To address this, the SODA project enabled practical privacy-preserving analytics of information from multiple data assets using multi-party computation (MPC) techniques. This data does not need to be shared, only made available for encrypted processing. The main technological challenge is to make MPC scale to big data, where we achieved substantial performance improvements. We embedded MPC into a comprehensive privacy approach.
Our first objective was to enable MPC for big data applications. We followed a use case-driven approach, combining expertise from the domains of MPC and data analytics. Our second objective was to combine these improvements with a multidisciplinary approach towards privacy. By enabling differential privacy in the MPC setting aggregated results will not leak individual personal data. Legal analysis ensured improved compliance with EU data privacy regulation. User studies made data subjects more confident to have their data processed with our techniques. Finally, we validated our approach in a medical demonstrator and in a use case arising from the ICT-14.b data experimentation incubators. The technical innovations were released as open-source improvements to the FRESCO MPC framework.
The project achieved the ambition to enable practical privacy-preserving analytics on big data and bring scalable oblivious data analytics closer to reality with secure multiparty computation.
SODA created 6 demonstrators, integrating many of the results from the project. These demonstrate realistic use cases in MPC-based data analytics in healthcare and beyond. They also demonstrate the progress made on MPC performance, MPC-based machine learning algorithms, user studies and legal analysis.
SODA achieved the targets of the project objectives: enable MPC on big volumes / velocities / varieties of data, prevent leakage of undesired sensitive information, improve compliance with the EU GDPR, improve confidence of stakeholders in processing of personal information, and develop technical solutions that solve problems with real-world impact.
Aspects of MPC and demonstrators, legal and user studies were highlighted as part of dissemination. These events successfully reached out to audiences that are hard to reach via traditional channels.
Exploitation shows a number of highlights and opportunities. The improved FRESCO framework will be the basis for further business creation by Alexandra Institute. A TUE postdoc is launching a MPC start-up. Finally, Philips is prepared to apply MPC in its innovations, research studies and applications.
WP1 “Cryptographic Protocols” produced work of outstanding quality, which is recognized by the international research community: many of the results of this WP have been accepted for publications at the top-tier conferences for the field of cryptography.
WP2 “MPC-based analytics on Big Data” made contributions on methods and algorithms. From this, the work on secure linear algebra, secure comparison, and Conclave present more fundamental contributions, whereas the work on DNA matching and neural networks is more applied. Significant results include work on the Moore-Penrose pseudo-inverse, machine learning algorithms, and first results on streaming algorithms.
WP3 “Privacy: technical, legal, and user experience” delivered its main results. Progress on technical aspects relates to state of the art research performed and initial research in combining MPC with differential privacy. D3.1 focuses on the techniques being developed and provides a comprehensive overview on the most relevant legal challenges. Deliverable D3.5 provides an analysis of use-case-specific legal aspects and focuses on multi-party computation as a privacy-preserving tool. D3.4 on leakage control and differential privacy considers leakage of data from computations and how MPC and differential privacy can be combined to ensure privacy, and in some cases even improve performance. D3.2 provides the plan for the user studies and explain the idea behind the methodology. D3.3 presents the results of the user studies on data subjects and data scientists and how to approach aspects like technology, trust and consent.
WP4 “Demonstration” completed its challenge, proof-of-concept implementations and demonstrators. New releases of FRESCO incorporates many new MPC techniques and functionality, has improved maturity and performance, and is much easier to use by developers. The first release of MPyC is a third significant result that brings MPC closer to non-experts. A number of independent proof of concept and prototype implementations were created. Several real-world demonstrators show what has become possible with MPC-based data analytics today.
WP1 has been very successful in improving the state of the art in the field of MPC protocols, which is demonstrated by many scientific publications at top-tier conferences and workshops.
WP2 delivered novel algorithms enabling MPC-based big data analytics, including MPC-based machine learning, a big-data MPC query processing framework, the Moore-Penrose pseudo-inverse, and first results on streaming algorithms
WP3 delivered a number of first of a kind results. It provided an analysis of use-case-specific legal aspects and focuses on multi-party computation as a privacy-preserving tool that could result in anonymized data. It analyzed leakage of data from computations and how MPC and differential privacy can be combined to ensure privacy, and in some cases even improve performance as well. In user studies, it delivered user studies on data subjects and data scientists and how to approach aspects like technology, trust and consent.
WP4 provided frameworks, prototypes and demonstrators. New releases of the FRESCO MPC framework include support for novel methods and protocols like SPDZ2k. The new MPyC framework eases MPC development, which is very beneficial for education and prototyping. Proof–of-concept implementations push the boundary of MPC-based analytics, a number of them available as open source. Real-world demonstrators show a broad range of MPC applications: KPIs and benchmarking in a medical setting, Kaplan Meier survival analysis, Logistic regression for chronic heart failure risk, Random forest to predict emergency transport risk in a personal emergency response system, Outsourced MPC on obliviously queried databases, Predictive models on HR data using federated learning. These demonstrated the feasibility and applicability of the MPC technology developed in SODA in real-world privacy sensitive data analytics situations.
WP5 ensured socio-economic impact through dissemination and exploitation. Dissemination reached novel non-expert audiences and stakeholders through targeted publications, position papers and events. Exploitation highlights positioning of the enterprise to use MPC for AI privacy challenges, a concrete continuation of one of the demonstrators with a commercial company, and a postdoc pursuing an MPC startup.