Skip to main content

PlAtform for PrivAcY preserving data Analytics

Periodic Reporting for period 2 - PAPAYA (PlAtform for PrivAcY preserving data Analytics)

Reporting period: 2019-09-01 to 2021-07-31

The Big Data technology allows sharing and analyzing large amount of data to help businesses improve services. The growing use of such data raises serious privacy concerns. While the General Data Protection Regulation (GDPR) helps protect the privacy rights of individuals, it has created serious challenges for businesses to comply with it. Traditional privacy solutions are unfortunately not compatible with the underlying data analytics technology.
The main goal of PAPAYA is to develop a platform of privacy-preserving (pp) analytics. The project considers the following objectives:
- Design efficient pp data analytics techniques;
- Explore different settings involving one or more data sources and third-party queriers;
- Enable risk management and user control of data disclosure;
- Design and develop an integrated platform;
- Lead an end-to-end analysis for different use cases;
- Disseminate and exploit PAPAYA results to maximize the visibility and sustainability of the project outcomes.
The project (which concluded on July 31st, 2021) developed the PAPAYA framework to enable the execution of data analytics operations without disclosing the underlying data and enabling data subjects to have some control over the operations whenever possible. The platform regroups the following services and tools:
- 4 pp data analytics modules: neural networks (classification and training), trajectory clustering, counting and basic statistics. These modules leverage cryptographic techniques such as homomorphic encryption, secure multi-party computation, differential privacy or functional encryption;
- security and transparency services, including the identity access management (IAM) service, auditing services, and the key management service;
- the Platform and Agent dashboards for configuration, monitoring and visualization;
- Data subject tools that (i) present risk management artefacts, (ii) illustrate the PAPAYA pp analytics, and (iii) enable data subjects to express their privacy preferences and apply their rights.
The platform was demonstrated through 5 use cases (UC) regrouped in 2 families (healthcare and telecom UCs):
- The arrhythmia detection UC allows the platform to execute a pp Neural Network classification over ECG data and obtain the arrhythmia type of the patient;
- The stress detection UC implements a pp collaborative training solution where each source maintains a private health dataset and stress conditions of workers are automatically detected;
- The mobility analytics UC allows stakeholders that run the PAPAYA platform to measure the audience in some areas or extract mobility patterns in a pp manner;
- The mobile usage analytics UC allows extracting analytics (through pp counting) on individuals’ usage of their mobile phones.
- The threat detection UC executes pp neural networks to detect system threats originating from several sources.
The project identified 12 exploitable assets that can be regrouped in three categories:
- The platform for pp data analytics;
- The individual modules: 4 pp analytics and 2 GDPR compliance modules;
- The 5 UCs of the project.
One patent related to the pp mobility analytics is under submission.
The project output 21 publications and participated to various events where the project results were presented and demonstrated.
The work carried in this project is summarized as follows:
• All pp analytics modules, transparency tools and data subject tools are developed and integrated into the platform. All the information related to their specification can be found in project deliverables. The consortium has also produced a guide for the platform.
• The project‘s 5 UCs (see previous paragraph) were validated through practical demonstrations (videos of demonstrations can be found in PAPAYA’s website). These UCs are also identified as innovation assets. 2 additional UCs related to the COVID-19 pandemia were identified:
- pp contact tracing: Comparing the MAC addresses that connect to the hotspot of a public place such as a resturant with the ones of registered COVID-19 positive individuals, may help inform relevant individuals for the need to be tested. This UC could take advantage of PAPAYA’s privacy-preserving counting module;
- telemonitoring patients at home: The goal of this UC is to monitor non-critical patients at home in a pp manner. It has been shown that COVID-19 may aggravate arrhythmia. We hence believe that PAPAYA’s pp arrhythmia detection tool can be suitable. Also, the integration of components of the H2020 PoSeID-on project could help check that a proper consent is received from the data subject.
• The consortium identified the following 12 assets:
- Platform for pp analytics
- Pp analytics modules
- Pp arrhythmia classifier
- Pp collaborative training
- Pp analytics
- Compliance toolbox
- Privacy engine
- Arrhythmia detection tool
- Stress management tool
- Pp mobility analytics service
- We-Stat - Pp mobile usage statistics
-Threat detection for sensitive data tool
These derive from either individual pp modules or from the project UCs which use several different components. Also, a patent from Orange related to the pp mobility analytics UC is currently under submission. The project has identified some potential compliance activity with existing standards (such as ISO12485) for the stress management, the arrhythmia detection and the threat detection tools.
• The project’s website regroups information related to the project results. All public deliverables as well as UC demonstrations are available. 21 publications were produced and 2 of them received an award. A PAPAYA business workshop was organized on March 15th, 2021 and various stakeholders including PAPAYA industrial partners participated. PAPAYA members also actively participated to relevant events including the EU GDPR cluster and collaborated with other EU projects such as PoseID-on, DEFEND, and PROMETHEUS.
In PAPAYA, we have developed innovative pp analytics modules that are customized for the underlying application and therefore address the trade-off between privacy, accuracy and performance. These solutions use advanced cryptographic techniques and cover multiple different settings (single data source, multiple data sources, third-party queriers).

PAPAYA has also developed data subject tools to increase transparency for data subjects. These consist of user interfaces (i) presenting risk management artefacts for assessing the privacy risks of pp data analytics, and (ii) illustrating the PAPAYA pp data analytics modules. The platform also includes the Privacy Engine that helps data subjects increase their control on their privacy through the Privacy Preference and the Data Subjects’ Right Managers.

We envision that the PAPAYA platform and the individual pp data analytics modules will help businesses process data for their business decision making and predictions while applying the appropriate safeguards to protect users’ privacy. Thanks to this technology, European companies will be equipped with advanced privacy solutions. The validation of the 5 UCs proved that the integration with the PAPAYA framework and technologies is accessible and easy to perform, even with pre-existing. Furthermore, these UCs also show that PAPAYA solutions allow companies to enrich their offer with pp solutions, and this: i) requires low effort; ii) guarantees high impact on perceived data protection level and high usability.
PAPAYA privacy-preserving analytics and Use Cases
Overview of PAPAYA assets
PAPAYA platform architecture