CORDIS - EU research results

CLoud ARtificial Intelligence For pathologY

Periodic Reporting for period 1 - CLARIFY (CLoud ARtificial Intelligence For pathologY)

Reporting period: 2019-11-01 to 2021-10-31

Digital pathology is an image-based information environment that enables the management and interpretation of pathological information generated from the digitalization of a tissue sample. Digital pathology has seen incredible growth in recent years as the quality of microscopy scanners has improved, data transfer speeds have increased and computational software and hardware have become more powerful. Digital pathology offers a number of potential benefits, as it facilitates sharing cases between multiple pathologists, enables automatic image interpretation and increases diagnostic efficiency.

The main goal of CLARIFY is to develop a robust automated digital diagnostic environment based on artificial intelligence (AI) and cloud-oriented data algorithms to facilitate automatic histological image interpretation and diagnosis everywhere as a paradigm shift in the pathology field with the aim of maximizing the benefits of digital pathology and aiding pathologists in their daily work.

Specific and challenging cancer types have been selected to test the methods developed through the project reflecting the existing variability in cancer diagnosis: triple negative breast cancer (TNBC), high-risk non-muscle invasive bladder cancer (HR-NMIBC) and Spitzoid melanocytic lesions (SML). The management of these diseases is nowadays a challenge for pathologist’s community that likely can benefit from the advantages of digital pathology.

CLARIFY produces young scientists addressing scientific, educational and training aspects to reach better-informed decisions in pathology. The advanced image processing techniques and artificial intelligence methods will benefit the pathology community with a better stratification of patients with two purposes: diagnosis and image retrieval. The novel cloud-oriented data infrastructure and algorithms will enable to securely store, retrieve and share a publicly available database assuring data interoperability and portability. The innovative and user-friendly software will improve workflow efficiency, stimulate collaboration and increase diagnostic confidence at pathology labs no matter their location.
Obtention of approvals by ethics committees for the CLARIFY research.

Preparation and creation of CLARIFY histological image database. The CLARIFY database is composed of ~1500 slides (~350 for TNBC, ~950 for HR-NMIBC slides and ~200 for SML).

Collaboration among clinical and technical partners to develop specific annotation protocols depending on the tumor type.

Development of a user-friendly web-based application for histological image navigation and annotation.

Investigation of metadata standards, data format, and storage in the medical domain.

Presentation of main findings and recommendations for whole slide image (WSI) preprocessing and standardization.

Publication of eight research papers in international conferences (5) and journals (3). One book chapter is accepted and pendent of publication. Six submitted papers are under review. Four research papers are in preparation.

Other scientific activities related to WP2-WP4 have started but are currently ongoing and will be concluded in the final period:
- Investigation of technical choices for managing distributed workflows in cloud and different research assets search algorithms.
- Characterisation of the performance of a centralized and decentralized management framework.
- Development of privacy-protected medical federal learning platform and blockchain-based consortium.
- Development of artifact detection and whole-slide-image preprocessing methods.
- Development of weakly-supervised Multiple Instance Learning applied to the bladder cancer dataset.
- Development of fully supervised deep learning model for mitosis detection on TNBC images.
- Development of a fully supervised method to segment skin regions of interest and a weakly-supervised method to identify their malignancy character.
- Development of residual convolutional autoencoder for relevant histological feature extraction in a content-based image retrieval context.
- Development of probabilistic deep learning and Multiple Instance Learning approaches for computer-aided diagnosis.
- Annotation of histological slides with both high and low details.
- Updating the CLARIFY database with histopathological, molecular, genetic and immunohistochemical information (if available) as well as the clinical outcome of the patients.
CLARIFY will significantly advance the state of the art in a variety of aspects:


CLARIFY will investigate and build an advanced data management architecture for WSIs and related metadata, linking the necessary analysis tools in a cloud architecture able to handle large and concentrated data streams. CLARIFY will also give access to an open, wide and quality-controlled database of WSI and useful metadata. CLARIFY proposes a better application of semantic technologies, especially regarding terminology, to enhance the clarity and comparability of datasets as well as a seamless data fabric to remove many of the barriers to data access while managing restrictions due to privacy and trust concerns. CLARIFY will develop ranking and search for the machine learning components, workflows, and data sets to support the efficient discovery, composition, and sharing of decentralized AI-centric medical workflows. CLARIFY will also execute and optimize decentralized workflow on distributed programmable virtual infrastructures. CLARIFY will integrate and build on advances in cloud computing and blockchain to address the needs of modern information architectures.


CLARIFY will develop novel and robust methods for WSI interpretation for the three selected diseases. The potential impact is high since those diseases are relatively unexplored from an AI perspective. CLARIFY will explore new and innovative ways of detecting blur, cauterized areas, folding and other artifacts. CLARIFY will use data-driven features for diagnostic and prognostic classification as well as for retrieval purposes for TNBC, HR-NMIBC and SML. CLARIFY will explore different strategies depending on the availability of annotated data, i.e. fully supervised, weakly-supervised, semi-supervised and unsupervised learning. CLARIFY will even combine different learning paradigms. CLARIFY will address active learning and large-scale crowdsourcing problems capable of dealing with uncertainty. CLARIFY will also look for the most efficient and effective way to do feature matching for histological images in a content-based image retrieval context. In addition, CLARIFY will achieve an important challenge, bringing new image-based diagnostic criteria which may open new conceptual and innovative ways to understand and confront these challenging diseases. Clinical significance of the new identified patterns will be analysed in depth.


CLARIFY will provide insights into the field of digital pathology to develop tailored technical solutions. Due to the complex nature of the selected diseases, CLARIFY will give helpful clues of how to make a better classification based on AI by combining imaging with molecular, genetic, immunohistochemical and/or histopathological information. Those models can be the basis to apply them in other diseases with a similar diagnostic difficulty.
ESRs’ training session on biopsy analysis during a secondment
Possible pipelines for WSI preprocessing
Whole Slide Images belonging to different cancer types: 1 and 2) HR-NMIBC 3) SML; 4) TNBC
Interface of the annotation tool developed in CLARIFY
Generic dataflow of whole slide images and clinical data in WSI-based research
Detailed annotation of a WSI