CORDIS - EU research results

Deep-Learning and HPC to Boost Biomedical Applications for Health

Periodic Reporting for period 2 - DeepHealth (Deep-Learning and HPC to Boost Biomedical Applications for Health)

Reporting period: 2020-07-01 to 2022-06-30

EU health systems are generating large datasets of biomedical data that constitute a large unexploited knowledge database. The aim of DeepHealth is to put HPC and Cloud computing power at the service of biomedical applications and apply Deep Learning (DL) and Computer Vision (CV) techniques on biomedical datasets to support new and more efficient ways of medical diagnosis and treatment of diseases.
Three main objectives are derived from different perspectives:
- AI: Increase the productivity of IT professionals in terms of training AI models without the need of combining numerous tools.
- HPC: To offer a unified framework adapted to exploit underlying heterogeneous HPC and cloud infrastructures for supporting state-of-the-art and next-generation DL and CV algorithms
- Reaching industry and society: work towards reducing the gap between the availability of cutting-edge technologies and its extensive use for medical imaging - enhance European-based medical software platforms.

DeepHealth has achieved these goals that are translated into three main areas of results. The first two ones are the DeepHealth toolkit and the HPC and Cloud infrastructure support, that together allow the IT staff to train models and run the training algorithms over Hybrid HPC and cloud architectures without a profound knowledge of Deep Learning, HPC or Cloud, and increase their productivity reducing the required time to do it. The third one is the enhanced biomedical applications, leveraging the libraries and the infrastructure support for training and inference operations and bring the benefits to the health professionals. These components have been validated through 15 use cases covering three medical areas.
Work an results in DeepHealth from January 1st, 2019 to June 30th, 2022 (M1-M42) have been made in the following areas:

* DeepHealth Requirements and Specifications: All partners cooperated to detail out the specifications for the entire project including 14 use cases (UC), 7 application platforms, EDDL and ECVL libraries’ APIs and a Toolkit for them, HPC and cloud infrastructure, Validation procedure and GDPR and data privacy aspects.
*Design and development of the EDDLL library: A stable version of EDDL has been published ready to run either in sequential mode or in distributed mode. It includes all DL needs of the UCs. The distributed version follows two strategies: the use of orchestrators like COMPSs and StreamFlow and a C++ ad-hoc version. EDLL has been adapted to FPGAs and HPC and cloud environments. A Python wrapper (PyEDDL) has been also implemented.
*Design and Development of the ECVL library: The ECVL is completed and stable, its functionalities covering all the UCs. The ECVL includes a Hardware Abstraction Layer that allows further development on different hardware platforms. Python version has been also developed. The use of the ECVL on cloud infrastructures has been facilitated through Docker container images and porting the entire toolkit to Kubernetes and a Helm package. Finally, a web-based fronted (composed of a GUI and a backend) has been created.
*Integration of libraries and UCs in application platforms: EDDLL and ECVL libraries have been integrated into seven European biomedical platforms within the project. Besides, platforms have adjusted their architecture, and extended and adapted their functionalities to fit the assigned UC requirements. In parallel platform owners and UC leaders have prepared the datasets and the related specific topologies for pilot testing.
*HPC infrastructure adaptation: The DeepHealth HPC infrastructure allows to implement different parallelisation strategies. It consists on: a task-based model (COMPSs) to describe the parallelism of training operations agnostic to the underlying platform, a workflow orchestrator (StreamFlow), a hybrid cloud solution to combine private a public cloud resources, the use of the advanced parallel programming models to exploit different granularity levels of parallelism, the DeepHealth PCI Express board optimised for DL operations.
*Testing and validation: All SW components have been tested and validated: technical testing of EDDL and ECVL and the frontend application and of the HPC infrastructure support. 23 pairs UC & platforms have been tested and validated measuring for each platform and UC the targeted KPIs; and the whole project concept has been validated
*Dissemination and exploitation: DeepHealth defined a visual identity, key messages, strategy and plans. Communication channels and the Open Access repository in Zenodo have been populated to disseminate the Project activities. Communication material has been created. Partners have widely presented DeepHealth in events (conferences, workshops, etc.) and have actively cooperated with EU relevant associations and projects. Scientific publications have been also published. 5 OA datasets have been published. Also, a Winter School and a Anonymization Hackathon were organized. Concerning exploitation, Key Results were identified and a complete business and sustainability plan has been developed for the DeepHealth toolkit, besides individual exploitation plans for all KERs identified.
DeepHealth results contribute to the following impacts:
- Contributing to the European digital sovereignty: DeepHealth developed tools (EDDL, ECVL, StreamFlow, COMPs) provide developers a EC toolkit that minimizes the dependence of non-EU big players providers.
- The impact for the developer community and IT experts: pilots have demonstrated that it increases the productivity of IT staff working in the health sector by allowing them to design, train and test many more predictive models in the same period of time, by reducing the time to train models and allowing them to perform all operations with just one toolkit. It has been also proved that DeepHealth also provides HPC and cloud infrastructure support, with a focus on usability and portability, so the procedure for training predictive models could be efficiently distributed on Hybrid and Heterogeneous HPC, Big Data and Cloud architectures in a transparent manner. Thus, The DeepHealth framework allows data scientists and IT experts to train models over hybrid HPC, cloud and Big Data architectures without a profound knowledge of these technologies and increase their productivity. Additionally, DeepHealth widens the use of and facilitates the access to advanced HPC, Big Data and cloud infrastructures to any company or institution.
- Impacts in the health application domain: The validation of the use-cases of the usefulness of the applications/models developed demonstrate that DeepHealth can improve well-being, increasing early diagnosis and improving treatments, increasing diseases and pathologies knowledge, and saving direct and indirect healthcare costs.
- Impact beyond the health sector, since its outcomes are directly useful for other sectors and applications.
DeepHealth results also contribute to reduce the bottlenecks to turn AI an enabling technology for Science and the toolkit contributes towards the offering of AI+HPC as a service, which is key for reaching certain industries that have only temporary needs for high-computational resources, and to make it easy for them to exploit HPC capabilities for an increasing number of applications.