Periodic Reporting for period 2 - INFERNET (New algorithms for inference and optimization from large-scale biological data)
Periodo di rendicontazione: 2019-03-01 al 2022-12-31
The perimeter of the research activity has been divided into:
A Research Theme characterized by the toolbox and methods developed
(i) the inference of interaction networks from data
(ii) the analysis of static and dynamical processes on networks
B Application Domains divided into four main areas
(i) the inference and modeling of multi-scale biological networks
(ii) the rational design of biological molecules
(iii) the quantitative study of cell energetics in proliferative regimes
(iv) the characterization of functional states of large-scale regulatory networks
Relevance for society
INFERNET application domains and research themes lie at the heart of current trends and emerging paradigms in science and technology. The main activity has been the development of new conceptual tools to organize vast amounts of heterogeneous biological data and to unveil its hidden large-scale relational order.
Overall objectives
The focus is finding an optimal strategy for learning noisy data sets. To do so we aim at:
(i) Setting up a coherent and effective theoretical framework for the statistical mechanics of inverse problems
(ii) Exploiting this framework for the development of efficient and distributed algorithms
(iii) Analysing the intrinsic limits of the techniques developed in terms of theoretical bounds on the statistical relevance of the inferred results as a function of both quality and quantity of available data
The first six months (M1-M6) the administrative organization of the project was set up: kick-off meeting (M2), the project web portal (M3), and Data Management Plan [DMP] (M6). The first public outreach activity of the project has been an International school held in Bardonecchia (Torino) on January 22-26 (M16) 2018. A second International School/Workshop was held at the University of Havana on M24. A third has been held in Puerto Madryn (Argentina) on M36. On M42 we decided to suspend the project due to the COVID-19 pandemic. The project was restarted in September 2022. The consortium decided to ask for an extension of 12 months. During the project suspension, (i) UK beneficiary node BEATSON Institute Principal Investigator Alexey Vazquez left the institution, (ii) Argentina Third Country node Universidad General Sarmiento [UNGS] node coordinator Alejandro Fendrik passed due to a severe COVID infection. The consortium identified Ecole Politechnique de Lausanne and Paolo de Los Rios as new partner beneficiary and node coordinator. The new node coordinator of the UNGS node is Lilia Romanelli. On M59 we held the final project’s final workshop CAMBI.
(Computational Aspects and Modeling of Biological Information), at the Bocconi University of Milan from 12-14 Dec 2022.
WORK PACKAGE ACTIVITY
WP1 [Algorithms]
The main objective achieved are: (i) the prediction in silico of functional protein variants, (ii) algorithms that specifically take into account the phylogenetic correlation of sequence alignments, (iii) algorithms for aligning sequences that include long-range correlation in the sequence, (iv) maximum-entropy driven prediction of fluxes from large scale metabolic reconstructions including partial experimental measurements, (iv) generative algorithms specific for analyzing panning experiments.
WP2 [Multi-scale biological networks]
Many proteins do not work alone, but their action depends on specific interactions between proteins, on the formation of multi-protein complexes, or on the interaction of proteins with other molecules (like, e.g. RNA or DNA, but also many other small molecules). While the original task to be delivered here, i.e. a genome-wide map of bacterial protein-protein interactions has been solved by another group, we have concentrated our effort on the computational prediction of specific protein-protein interactions in amplified protein families, which contain multiple paralogs.
WP3 [Design of Biological Molecules]
The WP activity was divided into three pillars:
1 Development of inference algorithms to analyze Deep Mutational Scanning and Directed Evolution experiments
2 Development of generative techniques to predict artificial target sequences
3 In silico Prediction and Validation of the methods
WP4 [Proliferative Metabolism]
High-throughput annotation of genome-scale metabolic networks achieved in the past two decades has fueled a massive surge in the use of quantitative methods for metabolic engineering. Predicting cellular responses to different types of perturbations is an open challenge. We have focused on the following six problems: (i) Using data to infer metabolic fluxes within cellular populations at single-cell resolution; (ii) Using data to infer how cellular crosstalk shapes the cellular microenvironment; (iii) Understanding the role of the inoculum density in determining the growth capability of cancer cell cultures; (iv) Maxent chemostat; (v) Metabolic Heterogeneity; (vi) Optimization of continuous cultures.
WP5 [Regulatory Networks]
Non-coding RNAs play a key role in the post-transcriptional regulation of mRNA translation and turnover in eukaryotes. microRNAs (miRNAs), in particular, interact with their target RNAs through protein-mediated, sequence-specific binding, giving rise to extended and highly heterogeneous miRNA–RNA interaction networks.
We studied how microRNAs (miRNAs) organize fluctuations across gene networks and orchestrate distinct expression states. In collaboration with Garg's group, we studied the specific case of pluripotent embryonic stem cells (ESCs). In collaboration with Ventura’s group on the dynamical aspects of the miRNA-target interactions, focusing on the response of a system of miRNAs and their targets to oscillating input signals.
Dissemination and outreach
The dissemination and outreach activities of the project have been around three main pillars:
1) Workshops/Schools: We originally scheduled four main venues for the dissemination of the project themes. We actually organized four events: (i) School in Bardonecchia (22-26 January 2018), (ii) International School in La Havana (4-13 February 2019) + International Workshop in La Havana (14-15 February 2019), (iii) International Workshop in Puerto Madryn (2-5 February 2020), (iv) Final Workshop in Milan (12-14 December 2022).
2) The web page relative to the project (www.infernet.eu)
3) The GitHub pages where the code developed during the project has been uploaded (https://github.com/infernet-h2020)
1. Predicting super-binding Antibodies from Repertoire Sequencing Data
We developed an integrated protocol to (i) analyze existing RepSeq data to tune the algorithm in general cases of interest, (ii) use the inferred model to design new super-binding antibodies, (iii) test the sequence in wet lab experiments
2. Protein Design
We improved direct coupling analysis along a number of lines: (i) Highly precise inference methods going beyond mean-field and pseudo-likelihood inference are needed to advance from the before mentioned topological network description to quantitative generative statistical models. (ii) Phylogenetic biases in the sequence data need to be encountered by novel models of epistatic protein evolution. (iii) Integrative Bayesian inference allows for incorporating prior structural and functional information about the protein of interest.