Periodic Reporting for period 2 - EXA MODE (EXtreme-scale Analytics via Multimodal Ontology Discovery & Enhancement) Reporting period: 2020-07-01 to 2021-12-31 Summary of the context and overall objectives of the project Healthcare produces exabytes of multilingual text, images and biosignals, creating commercial opportunities improving healthcare. The difficulty to obtain large datasets annotated by experts and clinical data heterogeneity limit the exploitation of such huge resources.ExaMode helps with the challenges by linking salient concepts included in clinical reports to visual content without human interaction. ExaMode targets histopathology, the gold standard for the diagnosis of several diseases including cancer. Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far ExaMode advanced fast during its first 36 months, achieving the planned results despite Covid-19 that strongly affected the entire consortium and particularly, hospitals.The ExaMode consortium is composed of seven partners that collaborate tightly to reach the project objectives, also thanks to frequent online meetings. The academic partners (University of Applied Sciences Western Switzerland - CH, University of Padova - IT, and Radboud University Medical Center - NL) work in strong collaboration on developing tools for dealing with data heterogeneity to extract multimodal knowledge from clinical reports and digital pathology images. The industrial partners (MicroscopeIT - PL, and SIRMA AI - BG) focus on new products that match the requirements of hospitals and histopathology departments and create prototypes based on the research work performed by the academic partners. The provider of High-Performance Computing (HPC) competences and resources (SURF - NL) provides the consortium with access to cloud, storage and supercomputing facilities and actively collaborates to the development of the research tools and product prototypes. The two clinical partners (Radboud University Medical Center - NL and Azienda Ospedaliera per le Emergenze Cannizzaro - IT) focus first on dealing with ethics and privacy requirements, second on providing clinical guidance to research and product development, and third on the provision of clinical data and annotations. Covid-19 created problems that affected ExaMode: hospitals and pathologists involved in the project were strongly involved in fighting the pandemic. The project was also affected by a partner change but a reorganization addressed the challenges.The results obtained by ExaMode during the first 36 months are striking. Despite the complex global situation, all deliverables and milestones are completed, over 70 scientific articles are accepted or published and several open source software libraries and prototypes were released and are showcased on the project website.The main technical achievements obtained by ExaMode include: 1) datasets, 2) tools aiming at the extraction of knowledge from heterogeneous clinical data, and 3) product prototypes. Datasets currently include over 20'000 high resolution microscopy images from clinical practice that are associated with the anonymized clinical reports and data from publicly available sources, including the scientific biomedical literature. Tools aiming at the extraction of knowledge from heterogeneous clinical data include many libraries for: color heterogeneity management; multi-scale learning; neural compression; detection, segmentation and compression of digital pathology images; separation of compound figures; extracting concepts, visualizing RDF graphs and annotating report; tagging biomedical concepts and extracting concepts from medical reports; creating ontologies; safely & automatically transfer data from hospitals to data centers; weakly supervised and multiple instance learning; multimodal learning from clinical images and reports. The code is in most cases publicly available and is showcased in a dedicated section on the project website (https://www.examode.eu/software/). Finally, software resources include ontologies and knowledge graphs aiming at the ExaMode use cases. Several product prototypes were developed by the companies involved in ExaMode. Microscopeit developed VIRTUM PP1 (a cloud-based software to manage, store, and annotate histopathological slide collections), PP2 (which integrates a colon segmentation model developed by RUMC) and PP3 (which aims at aiding researchers by making the PubMed knowledge-base interconnected with visual context references). SIRMA AI developed HistoGrapher and SNOMEDICO: Histographer is a software platform to support histopathologists in making more informed decisions based on a larger amount of data (judging from similarity to other cases in their clinical practice or the identified likelihood in scientific literature). SNOMEDICO is a REST API Service for structuring diagnosis information based on SNOMED CT ontology. The ExaMode research results are of high quality and value to the scientific community. Over 70 scientific articles were published, several of which in top journals (category Q1), leading to a total of over 600 citations already at this stage. Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far) Multimodal healthcare data are produced every day in hospitals worldwide (over 2’000 exabytes/year in 2020). Current machine learning techniques allow to exploit such data resources only partly as they still require manual annotations by specialists to achieve performance compatible with healthcare needs; moreover, current machine learning methods are only partially capable to deal with data heterogeneity, making it difficult to create algorithms that can generalize.ExaMode contributes to solving the challenges by developing systems to extract and link multimodal information from highly heterogeneous and unstructured data (like diagnostic reports and images) without human interaction. The consortium is ideating, developing and releasing open source libraries targeting natural language and image processing in the biomedical domain, with focus on digital pathology. Expected results include multimodal representations of the digital pathology knowledge and tools that learn multimodal diagnostic models directly from the clinical activity of pathologists, without human interaction, in order to aid them and reduce the workload. Decision support provides quantitative outcomes, leading to more solid diagnosis and therapy planning. Semantic multimodal medical knowledge management can infer relationships and identify the most effective treatment strategies. The methods developed in ExaMode can also be helpful in other domains, as they allow training deep neural networks from heterogeneous data faster. They lead to the creation of systems allowing to handle extreme scales of multimodal data with less effort, increasing speed of data throughput and data access. ExaMode aims at the adoption of the results in industry, leading to a positive impact on society (starting from the application of the prediction tools in decision making by the clinical partners). The machine learning methodologies developed in ExaMode allow the EU to reach a leadership position in digital pathology and other other diagnostic imaging domains that are valued at hundreds of billions of euros.