Skip to main content
An official website of the European UnionAn official EU website
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary
Content archived on 2024-06-18

BIOSEC: Security and Privacy in Life Sciences Data

Final Report Summary - BIOSEC (BIOSEC: Security and Privacy in Life Sciences Data)

Project overview

There is an abundance of data associated with life sciences, continuously growing at an exponential rate. Nowadays, there are over a thousand different life science databases with hundreds of gigabytes of content ranging from gene-property data for different organisms to brain image data. Such databases are distributed over the web and have high degrees of heterogeneity and different levels of quality. Typically, research in life sciences requires the combination of various sources of data and processing by bioinformatics tools deployed on powerful and efficient platforms for patterns, similarities, and unusual occurrences to be observed. Therefore, sharing of data among institutions and researchers is essential for life sciences. However, this gives rise to many security and privacy issues that need to be addressed at the lowest level, that of data. This constitutes the principle goal of BIOSEC.

BIOSEC has an interdisciplinary nature as it lies in the intersection of two research realms: life sciences and data management. The common denominator is data and we study it from a security viewpoint. The objectives of the project can be summarised as follows:
1. Performing a critical review of state-of-the-art techniques for ensuring security and privacy in databases.
2. Studying the data management requirements for various life sciences applications.
3. Analysing security and privacy threats particular to data management in life sciences application.
4. Introducing novel or adapting existing techniques for addressing the concerns raised.
5. Applying these techniques in real world applications.

Work performed and results achieved

All five phases of BIOSEC have concluded successfully. The first phase involved a thorough review of the state-of-the-art techniques for ensuring security and privacy in databases. The main goal was to attain an in-depth comprehension of the proposed techniques in the literature and to analyse their strengths and weaknesses. In particular, the study proceeded according to two work packages (W). In (W1) privacy concerns in publicly released data were examined, while in (W2) security issues related to data integrity were studied. The methodology included reviewing of related bibliography from premier venues in the area of data management as well as in the area of theory of security. The first deliverable of BIOSEC (D1) summarises the work performed during this phase.

The second phase continued the literature review focusing, however, on the data management requirements of life sciences applications. The main goal was to acquire a deep understanding of the requirements and the operations at the data management level performed in these applications. The study advanced in two work packages. The first (W3) studied the dissemination of health sciences records, while the second (W4) revolved around data sharing in large biological repositories, such as nucleotide and protein databases. Both work packages studied the data cycles involved in the two life sciences applications. In particular, BIOSEC examined how data are (1) produced and collected by experiments, simulations, etc, (2) processed, cleaned, transformed, etc., (3) modelled, normalised and stored, (4) curated and annotated, (5) transferred among public repositories, (6) retrieved by query languages, and (7) analysed by complex and expensive mining computations. The second deliverable of BIOSEC (D2) includes an in-depth analysis of the data cycles in some common life sciences applications.

The third phase joined and augmented the research results of the previous two phases. Specifically, it analysed security and privacy threats during data management in life sciences applications. The goal was to identify overlooked scenarios of malicious attacks. The investigation proceeded according to two work packages. The first (W5) examined threats in de-identifying medical records, while the second (W6) targeted attack scenarios that can extract sensitive information from biological databases. The third deliverable of BIOSEC (D3) is a technical report describing all security and privacy concerns at the data management and dissemination level for the examined life sciences applications.

The fourth phase is the most important to BIOSEC. It introduced novel techniques for addressing security and privacy issues at the data management and dissemination level in life sciences applications. The goal was to provide with a framework that addresses the most important issues and threats revealed by the previous phases. Research in this phase was separated among two work packages. In the first (W7), privacy-related threats were addressed, while in the second (W8), secure exchange of data was investigated. To achieve these goals, first an appropriate data model was selected suitable for the life sciences applications studied. Then, following this abstraction, the BIOSEC framework was proposed, which builds upon existing work and further introduces novel concepts. The fourth deliverable of BIOSEC (D4) is a technical report presenting the work undertaken during this phase, and essentially constitutes the basic foreground knowledge of the BIOSEC project.

The fifth phase was an ongoing phase that lasted almost during the entire duration of the project. Its goal is to acquire expertise and obtain feedback from real world applications, providing food for thought to the design of our approaches. It involved two work packages (W9) and (W10), which concern the close interaction with biologists in academia and industry. Our work during this phase played a significant role in understanding the life cycles of data in life sciences applications, as were portrayed in deliverables (D2) and (D3). Furthermore, building upon this close collaboration, the proposed methodology of the fourth phase was evaluated, and new research directions were born. These findings are discussed in the fifth deliverable of BIOSEC (D5).

Impact

The primary impact of BIOSEC is the proposal of methods for ensuring privacy and security, which are custom-tailored to real-world life sciences applications. The fellow has acquired sufficient expertise in order to: (1) disseminate knowledge about the security and privacy implications to life sciences researchers; and (2) provide with real-world case studies and detailed security and privacy requirements to data management researchers.

The secondary impact of BIOSEC is the training of the fellow. The outgoing phase has offered great opportunities. In particular, the chance to work in a third country raises the fellow's international profile and provides the means to network and share experience with others in the same field. The participation of the applicant in a multicultural research team abroad gives the chance to experience the benefits of cultural diversity. Apart from the research skills the fellow has acquired from working in a novel, yet related to his background, research field, he has also greatly benefited from working independently from his advisor and the EU host group. Becoming part of a different group with high research standards was a unique opportunity to establish his scientific independence and gain professional maturity.

Furthermore, during the re-integration phase, the benefits of the cooperation with a third country host are propagated to the return host. The research experience acquired by the applicant is expected to have a long-standing positive reflection on the research skills of many other young researchers, and thus further strengthen the research mentality of the return host.

Another important benefit of BIOSEC is the development of the necessary skills for the fellow to understand and cooperate with scientists outside the computer science world. The return host has also directly benefited from this close collaboration with researchers in health sciences. Furthermore, throughout the project, the fellow assumed administrative responsibilities that are required for executing the project in a timely manner. Through the interactions with the research groups in both hosts, the fellow has developed significant experience in addressing real world needs and providing solutions that have a direct impact in the workflow of institutions. Overall, the BIOSEC projected is expected to be a valuable asset to the fellow's future academic career.

Links

Project website: http://www.web.imis.athena-innovation.gr/projects/BIOSEC(opens in new window)

Host Institution: http://www.imis.athena-innovation.gr/index.php?lang=en(opens in new window)?