Skip to main content
European Commission logo print header

Methods for Integrated analysis of Multiple Omics datasets

Final Report Summary - MIMOMICS (Methods for Integrated analysis of Multiple Omics datasets)

Executive Summary:
The MIMOmics project developed methods for the integrated analysis of multiple omics datasets. Experts in data science (machine learning, bioinformatics, biostatistics) collaborated with experts in epidemiology and in high throughput omics measurements. Their input was essential: our methods acknowledge the measurement error structure and answer relevant epidemiological questions. A movie at our website explains our project to the general public. Our multidisciplinary approach - from experiments, via statistical analysis to interpretation - provided also a unique training environment for our young researchers, who will significantly contribute to the analysis of future datasets in relationship to human disease and health.
Within MIMOmics we had access to unique metabolomics and Glycomics datasets. For Glycomics technical improvements are still ongoing. Appropriate handling of omics data is essential for efficiency and interpretability when analyzing the data with regard to outcome variables. Borrowing methods from other omics fields, MIMOmics developed standard operation procedures and guidelines for batch correction and normalization. The presence of different technical platforms (LCMS, UPLC) imposes challenges when comparing and combining results across studies. MIMOmics developed mapping functions between technical platforms based on biological knowledge and by estimation of common latent factors. Imputation algorithms were developed to estimate abundances for missing omics variables of a specific platform from another platform.
The focus in MIMOmics was on understanding the biological mechanisms underlying Metabolic Health and related traits and their omics based predictions. Other outcomes (cancer, rheumatoid arth ritis, inflammatory bowed disease) were considered as well. Our methods simultaneously analyze multiple omics datasets to match the complexity of the traits. Specifically multi-level networks with variables (genes, metabolites, glycans) as nodes translate various high throughput datasets into a visualized structure facilitating a better biological understanding. A second important goal was to build a prediction model which can be used to stratify patients according to future outcomes and risks. MIMOmics developed methods to assess the augmented predictive value of an omic dataset on top of another dataset. Further to improve the interpretation of a prediction model, we used the structures of the omics datasets obtained from network methods when building a prediction model. In epidemiology to increase efficiency and for validation, meta analyses are performed which combines results across studies. Our methods address heterogeneity across studies in design (multiple case families versus a random subsample from the population), in high throughput platforms, and in measured omics data sets. Finally our consortium datasets were from observational studies, hence identified associations do not have to be causal. Our multi instrument variables for mendelian randomization methods can assess whether a found relationship is causal. Identification of mechanistic interactions is the next step. Using our consortium datasets, we identified a causal effect of body mass index (BMI) on phenylalanine and a mechanistic interaction between the effect of BMI and sex.
Parallel implementation was developed for computationally intensive methods. Several of our methods were implemented in user friendly R-packages. For the rest source code was made available to apply the methods to other datasets and outcomes, and to further develop the methodology.
To conclude the MIMOmics methods will lead to biological insight and better prediction models. Improved omics marker profiles of Metabolic Health are potentially better indicators of future cardiovascular disease development than body mass index or obesity. Ultimately patients will benefit from better prognosis and predictions based on their multi omics profiles.

Project Context and Objectives:
MIMOmics developed statistical methods for the integrated analysis of metabolomics, proteomics, glycomics and genomic datasets in large studies. Our project was based on our involvement in studies participating in EU funded projects, i.e. GEHA, IDEAL, Mark-age, ENGAGE and EuroSpan. In these consortia the primary goal was to identify molecular profiles that monitor and explain complex traits with novel findings so far. Support for methodological development was missing. The state-of-the-art methodology did not match by far the complexity of the biological problem. Complex data were being analysed in a rather simple way which misses the opportunity to uncover combinations of predictive profiles among the omics data.
The objectives of MIMOmics were:
• to develop a statistical framework of methods for all analysis steps needed for identifying and inter preting omics-based biomarkers
• to integrate data derived from multiple omics platforms across several study designs and population

The following recurrent epidemiological questions were answered:
1. which molecular profiles are associated?
2. can we identify subgroups of patients based on molecular profiles?
3. which biological pathways play a role?
4. which informative marker profiles within each population can be transferred across populations?
5. are the identified molecular marker profiles causal?

The methodological tasks were divided over seven work packages with the following work package specific objectives:

WP1 Statistical data cleaning
objective 1: Development of protocols and data processing methods for Glycomics
objective 2: Development of methods for efficient signal extraction by using biological information via a Bayesian framework
objective 3: Development of harmonization methods for glycomis and for metabolomics datasets
objective 4: Development of quality scores for detected glycans and metabolites
objective 5: Development of imputation algorithms of missing omics phenotypes using reference samples
objective 6: Evaluation of across population heterogeneity

WP 2: Systems approach to modelling pathways and structures of biological networks
objective 1: Development of supervised and unsupervised methods for pathways identification and reconstruction based on multilevel omics datasets
objective 2: Extension of network-based pathways analysis method to multilevel omics
objective 3: Development of methods for learning dependency structures of omics data allowing for unknown clusters and confounders
objective 4: Extension of pathway methods for complex study designs that integrate genetic, glycomic and metabolomic data

WP 3: Prediction, classification and clustering
objective 1: Development of methods for discovery of predictive combinatorial omics biomarkers and identification of low-dimensional omic-interactions in diagnosis and prognosis within a single omic data source.
objective 2: Development of methods to allow predictive combination of distinct omics measurement types within and across patients for enhanced prediction
objective 3: Method development to detect heterogeneity in patient response (diagnostic, prognostic or treatment response) and which allows for patient risk stratification based on identified biomarkers:
objective 4: Development of methods to assess the relative added value of distinct omics sources (e.g. glycomic or proteomic relative to genomic level measures) and/or of novel omics measures and biomarkers in addition to established classical clinical risk scores

WP 4: Meta-analysis
objective 1: Development of methods to combine single omics variable associations across studies adjusting for population and platform heterogeneity
objective 2: Developments of methods and tools to obtain parameters with a marginal interpretation while using random effects models to model the relationship between biomarkers and outcomes – diseases, healthy aging. These parameters can be pooled with parameters from case-control or cohort datasets using standard meta-analysis approaches
objective 3: Development of a meta-analysis approach for pathway analysis which takes into account the structure of the single studies and is able to identify genes which are members of different pathways
objective 4: Development of methods for combining multilevel biomarkers across studies (Super-Meta)

WP 5: Causal inference
objective 1: Development of methods to elucidate pathogenetic mechanisms on the basis of complexly structured multilevel omic data
objective 2: Development of methods for selecting, from a potentially huge space of causal models, a subset of promising models pointing at a causal explanation of the data
objective 3: Development of methods to validate causal hypotheses generated from data analysis or more in general during the research process.

WP 6: Data integration and distributing computing
objective 1: Provide database and computational environment for the project participants
objective 2: Develop general framework for data-level parallelization of omics data analyses
objective 3: Develop high-throughput implementation of omics analysis algorithms, both existing and developed in this project
objective 4: Develop a web-based scientific portal to the MIMOmics computational tools

WP 7: Proof of principle (metabolic health)
objective 1: Identify interesting end-phenotypes and corresponding existing omics datasets for further analysis
objective 2: Proof-of-principle analysis to evaluate the clinical significance of novel methods (phenotype: metabolic health)
objective 3: Further evaluate the methods developed in WP2-WP5 in appropriate clinical specimens/datasets and plan new experiments
objective 4: Interpret the output of the methods applied to the datasets of this consortium
objective 5: Coordinate with leaders of WPs 2-5 to interpret findings in a clinical context and publish in high impact biomedical journals

WP 8: Dissemination: Health care, Industry and Academia
objective 1: Create awareness of the methods and tools developed outside MIMOmics
objective 2: Facilitate the use and application of the tools developed
objective 3: Organize constant improvement by gathering user feedback

Project Results:
Preparation of MIMOmics datasets
We agreed that the most promising traits to study are molecular-level metabolic traits obtained through high-throughput analysis of human plasma: metabolomics and Glycomics traits. In the first year datasets from two studies were made available for the whole consortium: DILGOM (SNPs, Gene Expression and Metabolomics) and Croatian populations (SNPs and Glycosylation). In the fourth year data from a third study, the Leiden Longevity Study (SNPs, Metabolomics, Glycosylation) as well as further follow up data from the DILGOM study were made available to the consortium. We prepared legal documents and metadata for datasets, and arranged a secure computational environment that consists of several virtual machines at the location of partner CNR-ITB. A web portal provided secure access to this environment for all registered MIMOmics users. A new tool for data import that uses the BCPlatforms API was introduced. For analysis, OmicStudio was made available. Parallel implementation of a part of our methods enabled efficient analysis of omics datasets available within our consortium.

Case Study
To facilitate and enhance collaborations and discussions among MIMOmics members, we formulated a case study related to the proof of principle with specific questions. This case study was motivated by Lu et al (2014), where they showed that the effect of Body Mass Index (BMI) on Coronary heart disease is mediated by blood pressure, cholesterol and glucose. The overall goal of the case study was to identify further factors associated with BMI from omics or NGS datasets by using network analysis, prediction models and cluster analysis. With regard to networks and pathways we applied multiplex networks and Graphical Lasso. We identified metabolites and Glycans that were associated with BMI. With regard to predictions we identified metabolites and Glycans, which improved the prediction. We identified causal effects of BMI on phenylalanine both in females and in males.

Glycans: Data cleaning and haromonization
Glycomics datasets were available from various technological platforms with different measurement error structures and methodological biases. We developed standard operating procedures (SOPs) and analysis plans for GWAS. We developed a data analytic approach for an automatic construction of a "dictionary" of reference glyd, which is less prone to inducing spurious correlations between expression peaks while preserving the homogeneous scale of the measurements. We studied the traditionally used normalization method of total area (percentage) and compared it to various normalization methods borrowed from other omics field by extensive simulation studies. For exploring the measurement error structure, and improvement of batch correction and normalization benchmark datasets were developed. These efforts resulted in a software package glycanr. A chromatogram extraction tool was developed to generate labelled data, where we used "active learning" based on the clustering of a small number of chromatographic time series to improve prediction of peak locations from time series data.

Metabolomics: Data Cleaning and relationship with genetics and epigenetics
We have documented and implemented SOPs and analysis plans that have been used for genome- wide association studies and an epigenome-wide association study with metabolites. Our findings provided new insights into the role of inherited variation in blood metabolic diversity and identified potential new opportunities for drug development and for understanding disease.

Metabolomics and Glycomics: Assessment of correspondence between different platforms and Imputation
For Glycomics we composed the lists of matching glycans measured in different platforms based on the known biological functions. We computed correlation coefficients between peaks which are supposed to measure the same glycan. For Metabolomics, we measured the concordance between measurements of three different metabolomic platforms by simple Pearson correlations.
To measure the similarity of 'biologically' matching datasets the RV coefficient (Escoufier, 1973) - a multivariate generalization of the squared Pearson correlation coefficient - was computed. We also considered a more flexible method, the O2PLS (Trygg, 2003) method, that decomposes two data matrices into three parts: a joint, a platform-specific and a residual part. The explained variance by the joint part compared to the total variance (R2) was used to assess concordance of two platforms.
The advantage of this method is that components spanning a joint space are identified. These components can be used to impute missing -omics data in studies where only one of the omics datasets is available.

NMR: Efficient signal extraction by using biological information
We focused the research on signal-extraction methods for NMR and studied the performance of the existing method, BATMAN (Bayesian Automated Metabolite Analyser for NMR; Astle et al. 2011). The goal was to investigate whether using the relative concentrations extracted with the help of BATMAN provide an improvement over the use of Binning-based features (currently used in analysis of NMR spectra) in terms of their ability to discriminate between lung cancer and healthy profiles. Using various classification rules, the conclusion was that, irrespectively of the rule, the results obtained for the moderate-resolution NMR platform showed that the same classification accuracy can be achieved with the BATMAN estimates as compared to Binning estimates. However, classification models based on BATMAN estimates were biologically more plausible and interpretable, as the BAT MAN features can be linked to particular metabolites.

Development of network and pathway methods
Network and pathway methods are used to learn about the underlying biological structure in an omics dataset as well as for visualization of the structure. A network consists of nodes and edges, where the edges represent the similarity between two nodes. Various measures for similarity can be used. We considered Pearson's correlation and partial correlations, and developed a method for network construction regarding specific parts of variables. In such a network variables that have a similar relationship with a specific covariate are close to each other. The metabolite networks estimated from the DILGOM data appeared to be more homogeneous and showed a higher density.
Often domain experts wish to split a network in communities and/or modules. We considered network diffusion-based methods in order to calculate a global measure of network proximity. The methods were applied to gene expression, metabolomics and Glycomics datasets with the variables as nodes. Subjects were also considered as nodes. Here, the edges represent the similarity with regard to the omics datasets. Unsupervised method of pathway identification based on network diffusion and supervised methods based on "a priori biological knowledge" were considered.
We generalized these methods to analyze multi omics datasets. A method for modelling the correlation among genes as function of metabolites was developed. Multi omic models (metabolic networks, codon usage and transcriptomics) based on Flux balanced analysis showed that most important pathways that characterize differences among hundreds or thousands of experimental conditions could be identified. Duplex and multiplex networks were considered where the nodes are either sub jects with different omics datasets as layer or metabolites and the layers are based on an outcome variable (obese and normal). Entropy-based network representation was used to extract relevant path ways and subnetworks. Sparse Gaussian Graphical Models (Sparse GGMs) were also considered for inference of pathway topologies. We extended the models by adding information from genotypic data and/or results of genome-wide association studies and by accounting for confounders and cluster-specific network variations. A new computationally efficient way to incorporate prior knowledge about the network topologies and to infer distributions over network structure was developed.
A method based on network eigenvalues perturbation was developed to control pathways resilience. The information flow between molecules was represented by a chemical master equation on a network modelling the physical and functional interactions among molecules. We mapped the molecular alterations on the network and analysed the collective behaviour of such alternations on specific target networks. We analyzed gender-specific difference of the metabolome using a network-based approach, by generating Gaussian graphical models on blood metabolomics data, depicting the statistic al relationships between measured metabolites. This network was then clustered and associated with the gender, to find clusters of metabolites in the pathway that are either up- or down-regulated in males or females. A novel approach to infer combined metabolomics and transcriptomics networks from high-throughput measurements was developed. Application of this method to a population cohort resulted in a medium-scale association network. To obtain more insight, bioinformatics analyses were performed, ranging from pathway enrichment over transcription-factor binding site analysis (to identify patterns of coregulation) to the investigation of phenotypic outcomes. A network diffusion-based score was computed to determine network-based genome-wide gene ranking. This method was applied to the DILGOM datasets and we identified metabolic pathways involved in the production of metabolites correlated with obesity.

Prediction, Classification and clustering
Several methods for prediction of metabolic health based on omics profiles were applied and compared. For the case study we used BMI as outcome variable. In addition also binary outcomes were considered, mainly cancer. Metabolomics, genomics, transcriptomics, and Glycomics were used as predictors. The work focused on regularized large-p small-n methods, non-parametric methods, and network-enriched classifiers used in combinations with dimensionality reduction techniques. A range of generative and discriminative methods for high-dimensional classification and regression were tested and compared. A prediction combination method based on stacked generalization was considered.
In addition we extended methods for prediction to the situation of multiple omics datasets and sources. A method was developed that combines summary results from meta-analysis studies with individual-level data, or aggregated signals on different functional levels by using cellular functions or pathway an notations. One of the developed methods was tested in the operational environment and has been deployed as a part of a service offering to improve provision of cleaner glycomics measurements. We developed also a novel approach for studying effects of metabolomics on gene co-expressions, which provides information for identification of a non-redundant combination of the two types of data for predictions. We applied ensemble methods for combining genomics and Glycomics in MIMOmics case study, and validated the findings in a larger cohort.
A well know discussion point in prediction models is whether the models should be interpretable from a biological viewpoint. We developed an approach that combines network-based classification with dimensionality reduction and clustering, and relates the identified clusters to phenotypes. Application of the method to transcriptomics and metabolomics data from DILGOM showed that the obtained prediction models were more stable. We developed a new approach that identifies subnetworks varying across the clusters of individuals in the presence of environmental confounders. We developed and started to apply the Sparse Mixtures-of-Experts method aimed at identifying patient clusters and disease subtypes by maximizing the quality of predictions.
We also developed several methods where one -omics dataset is predicted from a second -omics data set. For example we integrated metabolomics into transcriptomics to investigate the metabolite-mediated co-expression dynamics of a gene module. In a second method Sparse supervised Gaussian graphical models were used where -omics measurements at the outer layer were modelled as a mixture of Gaussians with sparse inverse covariance matrices.
For quantitative and binary outcome variables, we developed a new sequential method based on double-cross validation for establishing the added predictive value of omic sources on top of a pre- established predictive source (low-dimensional clinical model or an alternative omic source) in the prediction of unidimensional continuous health traits. We developed a novel approach for assessment of the quality of different 'omics data sources without accessing individual-level data sets, simply by using summary-level statistics and training likelihoods. We developed an empirical approach for evaluating the effects of multiple sets of predictors on the quality of predictions for regression and classification.

Meta analysis
Meta analyses approaches are used to combine results across studies. The challenge is how to deal with the presence of heterogeneity due to the application of different technology platforms and the differences in study design. When different study designs are used the parameters across populations might have different interpretations and cannot be combined. One example is the multiple case family design where random-effects models are used to properly model the within family genetic relationships. The work on study design is motivated by the Leiden Longevity Study where data are available from families with at least two nonagenarian siblings.
Random effects models are often used to model family data. Here the random effects represent the unobserved shared effects. However the parameters might have an interpretation conditional on the random effects. Thereby inferences for family-specific effects can only be derived. However, it is often more relevant to estimate effects that have a marginal interpretation, such that inferences can be generalized to the whole population. Methods are available which provide these estimates; however these methods cannot deal with outcome dependent sampling. Therefore we extended the random effects approach to provide parameters with an interpretation for the whole population.
A second challenge was the analysis of secondary phenotypes in multiple case or proband family designs, for example the analysis of the relationship between secondary phenotypes such as BMI, Glycomics, Metabolomics and for example genetic markers. Such an analysis is not interpretable when the study design is not taken into account. Methods are available for case control studies but not for the family design. We extended current methods to be applicable to analysis of secondary phenotypes measured in ascertained families. For the proband family design we compared our method to the software SOLAR, which is often used for statistical analysis in proband family designs. It appeared that SOLAR might provide incorrect estimates while our method performed well. Finally methods are lacking for survival analysis in families subject to delayed entry. For example in the Leiden Longevity study the selection is based on at least two nonagenarians and the correlation between family members has to be accounted for. We developed such a method.
To enable combining results across studies with-omics data, we considered Partial Least Squares (PLS) methods. These methods identify a joint space of two omics datasets. The disadvantage of PLS is that the components are not uniquely identifiable and hence results from different studies cannot be compared. Therefore we developed Probabilistic Partial Least Squares. By maximizing the likelihood under some constraints, unique components are obtained. These components can then be combined across studies. The method was successfully applied to the consortia glycan datasets. A further extension is to combine results from studies which use different related -omics datasets. For example a UPLC glycan profile is identified in one study. Other studies have LCMS measurements instead of UPLC measurements. By mapping LCMS on UPLC via the joint space obtained from PLS applied to UPLC and LCMS datasets, a profile in the UPLC subspace can be obtained. Since this profile can be computed in all cohorts, a meta analysis can be performed on the effect estimate of this profile on the outcome across studies. We successfully applied the method to UPLC and LCMS datasets in three cohorts to estimate a glycan profile which predicts age.

Causal inference
To obtain insight in whether an omics association might be causal we considered and developed methods for causal inference. We developed a method that uses sparse regularization methods to combine summary results from public GWAS catalogues with individual-level SNP data for predicting intermediate biomarkers from genotypes. The method produces new polygenic scores of intermediate traits, which may provide new genetic instruments for the instrumental variable approach to causality. We investigated causal path models, with an application to the identification of metabolites which can be used as biomarkers for BMI taking into account the C-reactive protein concentration.
We used structural equation models (SEMs) for the mediation analysis in the "omics" context, and extend models to mediation models with many potential mediators (e.g. genes). A method was developed, which exploits gene-expression data and network biology information to build a mediation- analysis model for the evaluation of the effect of treatment on the disease at the molecular level.
In many situations, an important question is whether some phenotype is causal with respect to another one. Therefore we explored Mendelian randomization approaches to assess cause-effect relationships in the presence of confounders. A Bayesian approach was extended to account for multiple, pleiotropic, instruments. The method was successfully applied to investigate causal relationships between BMI and metabolites in the DILGOM study. Finally we developed and applied methods to identify causative mechanistic interaction. Such an interaction appeared to be significant for the effect of sex and BMI on phenylaline.

Further applications and follow up
We applied the in MIMOmics developed approaches for clustering and predictive calibration using distinct omics datasets methods to colorectal cancer (CRC) glycomics dataset. Multiple high- dimensional multi-task classifiers were fitted to patients clustered based on clinical markers of colorectal cancer survival, resulting in moderately improved prediction of patients with poorer prognosis in stage 4 of cancer when comparing models with clinical phenotypes only, versus models including both clinical phenotypes and IgG glycans. To uncover molecular profiles related with a change from healthy to unhealthy aging we worked on differential network analysis of metabolite networks related with obesity, comparing both difference between normal and obese people at one time point, but also between two time points.
We tested predictive approaches for single omics developed on a group of metabolic health related phenotypes (longevity, metabolic health, type 2 diabetes). Sparse Supervised Gaussian Graphical Models were used for CRC classification and differential network analysis.
We performed novel experiments that experimentally validated data-driven hypotheses of novel enzymatic steps in IgG glycosylation pathway suggested from the network modelling.

Dissemination
MIMOmics has a page at LinkedIn and Facebook.
Software packages and code for most of the developed methods are available at github (https://github.com/mimomics) open to the whole scientific community. A list of code is also available on mimomics.eu.
We collaborated with the HighGlycan consortium, which developed new high-throughput techniques for glycomics; methods for data cleaning, peak picking etc were implemented by our wet lab partners and disseminated via our partners to the HighGlycan consortium. We organized a course for HighGlycan on statistical tools such as study design, transformations, handling missing data. We had a joint meeting with the InterOmics consortium, an Italian consortium with a similar goal: development of methods for integrated omics research. We organized a course about network methods for both consortia.
We were heavily involved in the Summer school for statistical omics (RSSSO), which was organized three times (Split, August 2014, 2015 and 2016). We organized one Summer school in Cambridge by ourselves. Within the four years the school was running, about 65 students (third year Bachelor, Master and PhD students), 30 staff members (among these 15 members from MIMOmics) and 30 guest speakers were engaged in the project. The geography of participants was wide: while most of them were coming from Europe, we also had participants/staff from Russia, China, Australia, and New Zealand.
We created an animation video for the general public, in which the properties of various omics data sets were explained and the added value of integrating multiple omics datasets were demonstrated. We organized two workshops at the end of the project to disseminate our work to the statistical and machine learning community (June 2017, Leeds) and to epidemiologists and bioinformaticians (August 2017, Cambridge).





Potential Impact:
Methodology for data analysis needs to match the complexity of the biological problem. These methods should integrate multilevel omics data to bring biological understanding to the next level. MIMOmics was designed to make these improvements to the current available methods. Excellent scientists in methods development and data integration teamed up to improve the current situation. The ultimate validation of our methods was realized by proof of principle jointly performed with our outstanding partners in biology and epidemiology. Population studies of these partners comprise cutting edge high throughput omics datasets. These datasets differ in scale, structure, bias and coverage. To obtain interpretable results from combined omics datasets, the input of our partners in high throughput was required.
The MIMOmics program developed methods for integrated analysis of multiple omics datasets which can be used to answer the following recurrent epidemiological questions:

1. Which molecular profiles are associated with clinical endpoints, intermediate phenotypes and dis ease?
2. Can we identify subgroups of patients with respect to progression and complications based on molecular profiles?
3. Which biological pathways determine the transition from health to disease for common diseases?
4. Which informative marker profiles within each population can be transferred across populations?
5. Are the identified molecular marker profiles causal?

The main goal of MIMOmics was to answer these questions for the outcome, Metabolic Health. Around the world, conditions such as obesity, hypertension, and diabetes are highly prevalent and rising. In some cases metabolic disturbances are clearly associated to disease outcomes, but accumulating evidence suggests they can harm well-being, health, and longevity even in the asymptomatic stage. Improved marker profiles of metabolic health are potentially better indicators of future cardiovascular disease (CVD) development than body mass index (BMI) or obesity. In addition we showed that the methods are valuable for other diseases as well, e.g. cancer. A large proportion of our partners are experts in methods for data analysis with various background (statistics, mathematics, physics). MIMOmics contributed to making a leap forward with omics data analysis: namely taking the essential step from single dataset analysis to integrated analysis of multilevel omics datasets and to develop methods which are applicable to heterogeneous datasets (different platforms). Although we focused on currently available omics datasets (genomics, metabolomics and Glycomics), our methodological framework is generic and necessary extensions to deal with future omics datasets are expected to fit in our framework. MIMOmics also pioneered new directions for research in data science. Other potential extensions of our work are integration of heterogeneous dynamic datasets. Our collaboration promoted exchanges in methodological approaches among our SME and Academic partners. Our consortium also trained young researchers in a multidisciplinary environment: from experiments, via statistical analysis to interpretation.

To be successful in future multidisciplinary environments in Academia or Industry, researchers need to bridge between several diverse disciplines. These young researchers greatly benefit from our consortium training facilities in their career opportunities. These future researchers will significantly contribute to the analysis of novel omics datasets in relationship to human disease and health and knowledge translation. Indeed at least three of our fellows have already post-doc positions. At this moment there is still a shortage of these researchers in data analytics and it is acknowledged that the statistical knowledge in biomedical research is still not sufficient. The MIMOmics program had significant impact on the project's participants, life sciences (regarding both methods and new trained researchers), and ultimately patients will benefit from better prognosis and predictions based on their multi omics profiles.

Dissemination
MIMOmics has a page at LinkedIn and Facebook.
Software packages and code for most of the developed methods are available at github (https://github.com/mimomics) open to the whole scientific community. A list of code is also available on mimomics.eu.
We collaborated with the HighGlycan consortium, which developed new high-throughput techniques for glycomics; methods for data cleaning, peak picking etc were implemented by our wet lab partners and disseminated via our partners to the HighGlycan consortium. We organized a course for HighGlycan on statistical tools such as study design, transformations, handling missing data. We had a joint meeting with the InterOmics consortium, an Italian consortium with a similar goal: development of methods for integrated omics research. We organized a course about network methods for both consortia.
We were heavily involved in the Summer school for statistical omics (RSSSO), which was organized three times (Split, August 2014, 2015 and 2016). We organized one Summer school in Cambridge by ourselves. Within the four years the school was running, about 65 students (third year Bachelor, Master and PhD students), 30 staff members (among these 15 members from MIMOmics) and 30 guest speakers were engaged in the project. The geography of participants was wide: while most of them were coming from Europe, we also had participants/staff from Russia, China, Australia, and New Zealand.
We created an animation video for the general public, in which the properties of various omics data sets were explained and the added value of integrating multiple omics datasets were demonstrated. We organized two workshops at the end of the project to disseminate our work to the statistical and machine learning community (June 2017, Leeds) and to epidemiologists and bioinformaticians (August 2017, Cambridge).

List of Websites:
www.mimomics.eu