Skip to main content

Deep-Learning and HPC to Boost Biomedical Applications for Health

Periodic Reporting for period 1 - DeepHealth (Deep-Learning and HPC to Boost Biomedical Applications for Health)

Reporting period: 2019-01-01 to 2020-06-30

Healthcare is one of the key sectors of the global economy. European public health systems are generating large datasets of biomedical data that constitute a large unexploited knowledge database, since most of its value comes from the interpretations of the experts, done manually in most cases. In this scenario, the main goal of the DeepHealth project is to put HPC computing power at the service of biomedical applications and through an interdisciplinary approach, apply Deep Learning (DL) and Computer Vision (CV) techniques on large and complex biomedical datasets to support new and more efficient ways of medical diagnosis, monitoring and treatment of diseases. To this end, DeepHealth proposes a unified framework completely adapted to exploit underlying heterogeneous HPC and Big Data, supporting state of the art and next-generation DL and CV algorithms to enhance European-based medical software platforms.
The technical objectives of the DeepHealth project are:
- Developing the DeepHealth toolkit for training and testing predictive models based on Deep Neural Networks (DNNs), composed of two libraries and web-based front-end to ease their use.
- Developing the European Distributed Deep Learning Library (EDDLL), that will be of general purpose, but in this project will be applied to medical imaging.
- Developing the European Computer Vision Library (ECVL), that will act as a wrapper of other existing libraries for transforming/processing images, and will include specific image processing algorithms for biomedical applications.
- Integrating both libraries into existing software platforms in order to use predictive models designed and trained by using the DeepHealth toolkit.
From January 1st, 2019 to June 30th, 2020 (M1-M18) progress has been made in the following areas:

* DeepHealth Requirements and Specifications: All partners cooperated to detail out the specifications for the entire project including 14 use cases (UC), 7 application platforms, EDDL and ECVL libraries’ APIs and a Toolkit for them, HPC and cloud infrastructure, Validation procedure and GDPR and data privacy aspects.
*Design and development of the EDDLL library: The EDDL API was released during the first months of the project. The non-distributed version of EDDL is completed regarding the DL needs of the UCs and stable. The distributed version of the library is on-going following two strategies: the use of orchestrators like COMPSs and StreamFlow and a C++ ad-hoc version. Work has been advanced on adapting EDDL to HW accelerators and HPC architectures as well as to cloud environments. A Python wrapper (PyEDDL) has been also implemented. Finally, a first version of the GUI front-end has been released.
*Design and Development of the ECVL library: The ECVL API was released together with the EDDLL API. The ECVL has reached a stable and usable version, its functionalities covering most of the project UCs. The ECVL includes a Hardware Abstraction Layer that allows further development on different hardware platforms. Python API has been also developed. The use of the ECVL on cloud infrastructures has been facilitated through Docker container images and porting the entire toolkit to Kubernetes and a Helm package. Finally, in conjuntion with EDDL development, a web-based fronted (composed of a GUI and a backend) has been created.
*Integration of libraries and UCs in application platforms: All the functionalities needed for each platform to pursuit each associated UC were identified, as well as the strategy for the libraries integration in each platform. Partners have worked developing utilities to host the libraries into the platforms and extend platform functionalities to fit the assigned UCs. All platforms integrate a ready to use EDDL/ECVL environment and the training/inference phase was already tested on some project UCs. In parallel platform owners and UC leaders have prepared the datasets and the related specific topologies for several testbeds.
*HPC infrastructure adaptation: The structure of the HPC infrastructure is composed of 3 main layers: (1) an API exposed to ECVL and EDDLL developers; (2) the software architecture composed of a set of run-time frameworks for managing the parallel execution; and (3) a set of hardware acceleration computing devices. Work has been focused on the integration of the SW components identified into a common development framework and the needed developments for the usage of different HW acceleration technologies, with special focus on GPUs and FPGAs.
*Testing and validation: These activities have started recently. The first steps done comprise the development of several complete pipelines for training and inference using different UCs; and present them during three internal workshops for receiving feedback from partners.
*Dissemination and exploitation: DeepHealth defined a visual identity, key messages, strategy and plans, etc. Communication channels were set-up (website and SM accounts) as well as the Open Access repository in Zenodo; and populated to disseminate the Project activities. Communication material was created (flyer, presentation, poster). Partners have presented DeepHealth in 27 events including conferences, workshops, etc and have actively cooperated with EU relevant associations and projects. Scientific publications have been also published. Concerning exploitation, Key Results have been identified and a preliminary business plan has been developed for the DeepHealth toolkit including the realization of a benchmarking study.
DeepHealth will develop a EU framework based on two new libraries: the EDDLL and the ECVL, that will take advantage of the current and coming development of HPC systems, and will provide a transparent use of heterogeneous hardware accelerators to optimize the training of predictive models. The framework also includes a front-end (web based GUI plus back-end) to simplify the use of the libraries. DeepHealth will also develop HPC infrastructure support for an efficient execution of libraries, focusing on usability and portability.
The libraries will be integrated into 7 existing commercial and research software platforms. The platforms will be used for validating the framework in 14 pilot test-beds in three areas: neurological diseases, tumor detection and early cancer prediction; and digital pathology and automated image annotation. The pilots will allow evaluating the performance in terms of the time needed for pre-processing images, the time needed to train models, and the time to put models in production.
Main expected impacts are:
- Facilitate the daily work of computer scientists working in the health sector, allowing them to design and train efficiently DNNs without a profound knowledge in DL, HPC, Big Data, or Cloud computing.
- Increase the productivity of IT staff, reducing the time to design and develop end-user applications deployed in software platforms;
- Contribute to reduce the gap between the availability of cutting-edge technologies and its extensive use for medical imaging.
DeepHealth foresees impacts in the health application domain: improving well-being, increasing early diagnosis and improving treatments, increasing diseases and pathologies knowledge, and saving direct and indirect healthcare costs.
Finally, it is expected to impact beyond the health sector, since its outcomes are directly useful for other sectors and applications.