Periodic Reporting for period 3 - HEAP (Human Exposome Assessment Platform)
Período documentado: 2023-01-01 hasta 2023-12-31
1. Expanding and improving HEAP cohorts data and producing results and publications.
2. Final stage of the wearable sensor pilot study.
3. A new version of the HEAP software platform (Hopsworks) with relevant improvements in functionalities and UI based on HEAP research requirements is deployed at CSC.
4. Population of Information Commons with data from cohorts and test of analysis pipelines.
5. Ethical and regulatory framework produced the first HEAP governance document.
6. WPs collaborations to share analysis pipelines and data.
7. Relevant work in the dissemination and education framework for HEAP and EHEN.
8. HEAP - EHEN collaboration in Ethics and Dissemination.
9. Work towards HEAP impact in exposome research and sustainability.
The MLC Foundation (MLCF) has developed standards for anonymization being implemented in HEAP. In parallel a legal framework around the consumer data collection platform developed in WP4 was finalized and the Consumer purchase data solution has been launched. Regarding the consumer cohort, as a response to the Covid-19 pandemic, Consumer purchase data may be used to analyse changes in relation to Covid-19 infection as part of a Danish initiative to study late effects of Covid-19. Legal, ethical and data governance aspects of the innovative HEAP informatics platform have been discussed and proposals for developing this governance have been refined. The contribution of MLCF to the HEAP project and to the EHEN opens new opportunities for MLCF to explore and generate new knowledge regarding ethics and regulatory issues related to exposome research, which can be reverted in collaborations with other projects or enterprises working with exposome-related projects.
Metagenomics analysis pipelines developed in HEAP are generating data for the machine learning analyses, aiming to improve classifications and predictions based on metagenomics data. A machine learning algorithm for classification of HPV infection, is being improved to find relationships between cancer types and metagenomics profiling (WP9). The metagenomics pipelines and the machine learning model are deployed and tested in the testbed of HEAP Hopsworks platform at CSC. A set of analysis tools and pipelines from the HEAP cohorts are defined and formalized for deployment into the platform, which now provides support for multi-tenant RStudio.
A secure and standardised IaaS for the first version of the HEAP Information Commons was developed, enabling the storing and sharing of heterogeneous data from the cohorts. The HEAP software platform provides a flexible framework for deep learning, which is a great advantage of the system that will be demonstrated in the coming phases of the project. HEAP software platform (Hopsworks) makes the implementation of machine learning easy, which can be very attractive to other projects in EHEN once the application of ML to the analysis of heterogeneous data sources is demonstrated.
The HEAP Hopsworks is a horizontally scalable Data Science and Analytics platform that can storage and manage massive amount of big sensitive data, including unstructured data such as sequences, IoT, images, etc., and structured data such as electronic health data. Researcher can implement and deploy their own analysis tools and pipelines, install existing ones, and create machine learning models to make predictions among heterogeneous datasets managed by the platform. The HEAP Hopsworks platform is integrated with the Information Commons (IaaS) provided by CSC, and in a computer cluster at KI as proof of concept for reproducibility.