Skip to main content

PlAtform for PrivAcY preserving data Analytics

Periodic Reporting for period 1 - PAPAYA (PlAtform for PrivAcY preserving data Analytics)

Reporting period: 2018-05-01 to 2019-08-31

The Big Data technology allows sharing large amount of data for the purpose of quickly analysing them to help businesses better understand their customers and improve services. The growing use of such data that usually contains sensitive information raises serious privacy concerns. While the General Data Protection Regulation (GDPR) helps protect and ensure the privacy rights of individuals, it has created serious challenges for businesses to comply with it. Traditional privacy solutions such as standardized encryption techniques are unfortunately not compatible with the underlying data analytics technology. Thus, there is an urgent need for novel privacy preserving data analytics techniques that enable companies to operate on protected data so as to ensure their clients’ privacy while keeping data meaningful.
Privacy and data protection are fundamental rights as enshrined in the Charter of Fundamental Rights of the European Union, and are cornerstone for a functioning society. Data analytics promise to provide better insights as the technologies develop but also threaten the privacy of those whose personal data are subject to analysis. There is a strong need for enforcing the principle of Privacy by Design and by Default and achieve compliance with the GDPR. Also, transparency is a pre-requisite for data subjects for controlling their personal spheres and thus for exercising their rights of informational self-determination.
The main objective of PAPAYA is to develop a platform of privacy preserving analytics modules that operate while ensuring data privacy. Data analytics operations range from simple statistics (e.g. counting) to more sophisticated techniques (e.g. neural networks). The PAPAYA platform also offers end-users the ease to control the use of their data and to exercise their data subjects’ rights.
In more details, the project considers the following objectives:
- Design efficient privacy-preserving data analytics techniques;
- Explore different privacy and restricted access settings involving one or more data sources and third-party queriers;
- Enable risk management and user control of data disclosure;
- Design and develop an integrated platform;
– Lead an end-to-end analysis for different use cases;
- Disseminate and exploit PAPAYA results to ensure and maximize the visibility and sustainability of the project outcomes.
The work carried out during the first 16 months can be summarized as follows:
• The project has identified 5 use cases illustrating different analytics operations and privacy settings. They are regrouped under 2 umbrellas: analytics for healthcare and telecom(see deliverable D2.1). Based on these use cases, different legal, end-user and platform requirements are reported in deliverable D2.2.
• Initial privacy preserving analytics solutions and transparency tools have been designed. Both the literature review and the description of new privacy preserving analytics solutions are reported in deliverable D3.1. Among these solutions, 4 privacy preserving neural network classification primitives are proposed. Each of them is based on a different cryptographic tool (such as homomorphic encryption or secure two-party computation). Moreover, a privacy preserving collaborative training solution based on differential privacy is developed. The problem of privacy preserving counting and clustering are also investigated.
• The main components of the architecture of the PAPAYA platform have been defined (see deliverable D4.1): In particular, some run in the cloud environment (such as privacy preserving data analytics or auditing services), and others on the client side (e.g. the Data Subject toolbox.). The PAPAYA dashboards consist of 2 independent dashboard components and a data subject toolbox. The independent dashboards target platform administrators and users (data controllers and/or data processors) of the platform, providing, e.g. auditing views and basic management operations. The data subject toolbox is used by the user of the platform in her data subject facing applications to provide functionality that enable her to exercise control over her personal data.
• Regarding activities related to innovation management, the innovation strategy of the project is reported in deliverable D1.2. The first study conducted on marketability aspects already allowed for identifying key features and components in the PAPAYA ecosystem. PAPAYA's relevant assets have been defined through the marketability questionnaires. The consortium identified 10 assets. The interdependencies between these and the marketability strategy for each of them have been defined.
•The dissemination activities consist of the website (see deliverable D6.1) the set-up and maintenance of social media accounts, the publication of flyers (see deliverables D6.2 and D6.3). The project also produced scientific publications, organized different workshops, actively participated to relevant events including the EU GDPR cluster, and collaborated with other EU projects such as PoseID-on or DEFEND.
We have conducted an elaborate literature review according to PAPAYA's data analytics techniques. For example, existing privacy preserving neural network solutions either are computationally prohibitive or decrease the accuracy of the model. In PAPAYA, we aim at developing solutions that are customized for the underlying application and therefore address the trade-off between privacy, accuracy and performance. Moreover, while existing privacy solutions mostly consider the case with a single data source, PAPAYA also considers the case with multiple independent data sources and is developing collaborative training techniques based on differential privacy. Finally, while existing solutions either use homomorphic encryption (HE) or secure multi-party computation (MPC), PAPAYA investigates additional cryptographic techniques such as functional encryption or encrypted bloom filters.
PAPAYA also aims at developing supporting technologies that make the privacy preserving data analytics transparent to data subjects. We plan to improve the usability and some artefacts from the CNIL’s PIA tool in particular for explaining the assessed risks associated with the use of personal data. For explaining how PAPAYA modules work, we performed a literature review and limited examples exist with little concrete work in the area. The technical design of mobile user interface (UI) focuses on creating simple, standalone UI views that can easily be layered and composed as part of integration into existing apps. Also, the Privacy Engine (PE) will ensure that data subjects’ preferences are adhered to.
We envision that the PAPAYA platform and the individual privacy preserving data analytics modules will help businesses process data for their business decision making and predictions while applying the appropriate safeguards to protect users’ privacy. Thanks to this technology, European companies will be equipped with advanced privacy solutions. At this stage, the following 10 innovation assets have been identified and their development will continuously increase the impact of PAPAYA:
- Privacy-preserving analytics platform
- Compliance tools
- Privacy engine
- Privacy preserving analytics primitives
- Privacy preserving arrhythmia classifier
- Arrhythmia detection tool
- Stress management tool
- Mobile usage statistics tool
- Mobile patterns analytics tool
-Threat detection for sensitive data tool
Overview of PAPAYA assets