Data breaches have made consumers increasingly wary of personal data safety on cloud servers. With the General Data Protection Regulation (GDPR) now in place, the PAPAYA (PlAtform for PrivAcY preserving data Analytics) project aims to strike a delicate balance between privacy and valuable data analytics. Its technology is being tested in five real use cases, ranging from heart arrhythmia detection to mobile phone usage analytics. Melek Önen, associate professor at EURECOM’s Digital Security Department and PAPAYA coordinator, discusses the project’s ambitions and achievements so far.
What gaps in data privacy do you aim to close with this project?
Melek Önen: The PAPAYA project aims at addressing data privacy concerns arising when data analytics are outsourced to powerful but untrusted cloud servers. Data analytics can help stakeholders leverage collected data to derive relevant information and make better decisions. For example, a healthcare agency can use data analytics to predict or detect the risk of pandemics. Data analytics can also help marketing or commercial companies in their decision-making. But there is a key issue. Despite all their value for the entities collecting them, data sets also contain highly sensitive information on the individuals from whom the data is collected. Data confidentiality and data subjects’ privacy really are in jeopardy. By adopting a privacy-by-design approach, our project aims to devise and develop a platform of privacy-preserving modules that protects the privacy of users on an end-to-end basis without sacrificing data analytics functionalities.
How do you explain the current lack of prior measures aiming to strike such balance?
Society is facing ever-increasing data breaches causing serious damage. Many individuals have lost confidence in organisations’ data security solutions and are more and more concerned about the safety of their personal information. The European General Data Protection Regulation (GDPR) can reverse this trend, but it also means that companies are now looking for secure data handling practices. There is, now more than ever, a need for privacy-preserving data analytics that enable companies to operate on protected data, ensure their clients’ privacy and keep the said data meaningful and useful. The usual data protection techniques (namely, standard encryption techniques such as AES) are unfortunately not suitable for this new context as they prevent third-party servers from operating over the encrypted data. Data owners would rather need to first download the encrypted data, decrypt it and execute operations on the cleartext data. This is not possible when the data owner does not have the computational resources to perform operations on such a high volume of data, or when the algorithm to be executed is owned by the third-party server. One solution would be to provide the third-party server with the key to decrypt the data, but then confidentiality could not be ensured any more.
How does your approach help overcome all of these problems?
PAPAYA develops privacy-enhancing technologies enabling protected data analytics. These analytics range from simple statistical operations to more sophisticated machine learning techniques such as neural networks. They provide significant protections to stakeholders whose data is being processed, while giving data holders/data controllers utility. Our solution is in line with data protection by design required under the GDPR. Besides, the project also develops specific tools easing legal compliance with the GDPR and related privacy and data protection legislation for organisations using privacy-preserving analytics. The tools focus on the rights of people whose personal data is being processed – referred to as ‘data subjects’ in the GDPR.
How does your platform work exactly?
The PAPAYA framework revolves around two main groups of components. First, the platform-side components that will be running on the non-trusted cloud server. Then, the client-side components that will be running on a trusted client environment (such as a smartphone). The platform regroups privacy-preserving analytics modules for the following operations: neural network classification, collaborative neural network training, trajectory clustering, and basic statistics. On a high level, platform clients – namely stakeholders – send their queries to perform the requested analytics in a privacy-preserving manner and receive the corresponding output without leaking any privacy-sensitive information. The framework also includes a data subject toolbox. It provides versatile tools for data protection by design by platform clients towards data subjects whose personal data is processed in their services. For example, data subjects can receive more information on the underlying privacy-preserving analytics service or on the disclosure of their data.
Could you provide some concrete examples of use cases?
PAPAYA defines five use cases, each of them targeting different settings. One use case targeting healthcare applications (led by MediaClinics Italia, an Italian SME) consists of heart arrhythmia detection in a privacy-preserving manner. Under this use case, sensitive health data in the form of electrocardiograms (ECGs) is collected from a patient. The PAPAYA platform detects arrhythmia by using neural networks, without having access to these ECGs. Another use case targeting telecom operators (led by Orange, the French telecommunications company) helps stakeholders extract mobility patterns using some trajectory clustering algorithms, all this without identifying each individual trajectory.
What would you say are the project’s most important achievements so far?
The project has developed privacy-preserving variants of a group of four analytics, namely neural networks (classification, collaborative training), trajectory clustering, counting and basic statistics. These modules use different advanced cryptographic tools such as homomorphic encryption, differential privacy or functional encryption. Additionally, various user interfaces (UIs) have been developed to enhance transparency for data subjects and other stakeholders. These include an extension of the CNIL’s Privacy Impact Assessment (PIA) tool, which helps PAPAYA stakeholders assess the impact of privacy-preserving analytics on privacy and security goals. The tool is also much more transparent for data subjects. Our UIs explain how PAPAYA privacy-preserving analytics work, and our privacy engine tool takes data subjects’ privacy preferences and rights into account.
What do you still need to achieve?
The project is now in its validation phase. Our goal is to set up prototypes demonstrating the five use cases, as well as produce a platform guide that would help users easily operate the platform.
PAPAYA, data analytics, cloud, GDPR, arrhythmia, telecom