Periodic Reporting for period 1 - HEAP (Human Exposome Assessment Platform)
Reporting period: 2020-01-01 to 2021-06-30
Coordinated the first phase of EHEN
Formalized the different datasets from the selected cohorts
Improved the wearable sensor prototype to be manufactured and used in the pilot study
Implemented the data management software platform (Hopsworks) based on the DMP
Integrated the ICT components (computational resources and software platform)
Created the first version of the ethical and regulatory framework
Fine-tuned synergies between work packages to carry out the planned tasks
Consolidated the dissemination and education framework
Created synergies with EHEN
The MLC Foundation (MLCF) has developed standards for anonymization that will be implemented in HEAP. In parallel a legal framework around the consumer data collection platform developed in WP4 has been finalized and the Consumer purchase data solution has been launched. Regarding the consumer cohort, as a response to the Covid-19 pandemic, Consumer purchase data may be used to analyse changes in relation to Covid-19 infection as part of a Danish initiative to study late effects of Covid-19. Legal, ethical and data governance aspects of the innovative informatics platform being developed in HEAP have been discussed and proposals for developing this governance have been refined. The contribution of MLCF to the HEAP project and to the EHEN opens new opportunities for MLCF to explore and generate new knowledge regarding ethics and regulatory issues related to exposome research, which can be reverted in collaborations with other projects or enterprises working with exposome-related projects.
Selected pipelines for metagenomics analysis have been evaluated and compared in terms of quality of results and performance (analysis time) and are being used for generating data for the machine learning analyses. The deep learning algorithms generated will improve the metagenomics pipelines. A collaboration with the University of Eastern Finland, for implementing Machine Learning algorithms for finding relationships between cancer types and metagenomics profiling using the TCGA database and pipelines developed in HEAP that will be deployed in customised Hopsworks, has been initiated.
A secure and standardised IaaS for the first version of the HEAP Information Commons has been developed, which enables the test of storing and sharing heterogeneous data from the cohorts. The HEAP software platform provides a flexible framework for deep learning, which is a great advantage of the system that will be demonstrated in the coming phases of the project. HEAP software platform (Hopsworks) makes the implementation of machine learning easy, which can be very attractive to other projects in EHEN once the application of ML to the analysis of heterogeneous data sources is demonstrated.
In summary, progress of the HEAP software platform integrated with the Information Commons (IaaS) anticipates the possibility of implementing HEAP instances in various infrastructures and integrating and managing sensitive data in a secure way. At the same time, the generic concept applied to this platform allows creating new analysis pipelines and reusing analysis tools created by other researchers, institutions, and projects.