Skip to main content
European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Optimization and historical contingency in living systems: a biophysical approach

Periodic Reporting for period 2 - OptimHist (Optimization and historical contingency in living systems: a biophysical approach)

Okres sprawozdawczy: 2021-09-01 do 2023-02-28

Populations of living organisms are pushed to optimality by evolution, but may also be shaped by the contingency of their evolutionary history. The recent explosion of sequence data gives us access to the outcomes of molecular evolution, and controlled microbial evolution experiments allow us to analyze the predictability of evolution. In this exciting context, the OptimHist project explores quantitatively the importance of optimization and contingency both at the molecular scale and at the scale of populations of microorganisms, using a theoretical biophysical approach.

At the molecular scale, we are focusing on how functional optimization and evolutionary history, i.e. phylogeny, shape protein sequences. We are showing that correlations arising from phylogeny are a double-edged sword, often confounding signal from functional optimization, but sometimes providing useful complementary information. We are improving sequence-based predictions for protein-protein interactions by exploiting information both from phylogeny and from the required complementarity of interacting residues. We are showing how various statistical models, including some based on natural language processing, encode phylogeny and constraints, and how they can generate new protein sequences. We are proposing methods to disentangle correlations in protein sequences due to optimization from those due to phylogeny, and investigating the importance of functional sectors of collectively correlated amino acids as an organizing principle of proteins. These contributions are improving our understanding of the sequence-function relationship of proteins.

At the scale of populations, we are analyzing the impact of optimization and contingency on the evolution of microbial populations. Natural microbial populations are not homogeneous and well-mixed, but possess complex spatial structures. We are working on building a general model of structured populations. We are also focusing on microorganisms with a rugged fitness landscape presenting several optima. In these realistic cases, populations tend to remain trapped in local optima. However, some spatial structures may help these populations to explore their fitness landscapes more efficiently than others. We are studying these effects quantitatively, to better understand the impact of spatial structure on evolution. We are also studying applications to the evolution of antimicrobial resistance, which is a major public health concern.
At the molecular scale, we showed, using controlled synthetic data, that phylogenetic correlations impair the inference of structural contacts from sequences, but this issue is much stronger for local methods than for global statistical models, which rationalizes the success of the latter. By contrast, we demonstrated that correlations from structure and phylogeny combine constructively to allow the inference of protein partners among paralogs using just sequences. We also showed that pairs of amino acids that are not in contact in the structure have a major impact on partner inference, in a natural data set and in realistic synthetic ones. These findings help to understand the success of methods based on pairwise maximum-entropy models or on information theory at predicting protein partners from sequences among paralogs. We also contributed to the development of a phylogenetic correction to structural contact prediction and sector determination methods based on statistical inference from protein sequence data. This method, which is called Nested Coevolution, reveals some hidden functional signal in protein sequences. We also showed that protein language models trained on multiple sequence alignments encode phylogenetic relationships and are able to disentangle correlations from phylogeny and from contacts even better than more traditional global statistical models.

At the scale of microbial populations, we introduced a model for structured populations on graphs that generalizes previous ones by making migration events independent of birth and death. We demonstrated that by tuning migration asymmetry, some graphs transition from amplifying to suppressing natural selection. Our results do not hinge on a modeling choice of microscopic dynamics or update rules. Instead, they depend on migration asymmetry, which can be experimentally tuned and measured. We also quantified the exploration of rugged fitness landscapes by finite populations, which is a starting point toward studying subdivided populations. We investigated how finite populations explore model and experimental fitness landscapes, both with stochastic simulations and with analytical calculations based on Markov chains. We found that the height of the first fitness peak reached by a population, which characterizes early adaptation, is affected by strong finite-size effects. Furthermore, we rationalized these results by considering the accessibility of fitness peaks.
At the molecular scale, we are characterizing the contribution of phylogeny and optimization to the higher-order correlations observed in multiple sequence alignments of protein sequences. We are also investigating how they are captured by statistical models and protein language models. Besides, we are working on new methods to predict protein-protein interactions from sequence data. One of them explicitly combines phylogenetic information and residue coevolution, by using neighbor graph alignment to produce a robust training set for coevolution methods. The second one is based on a deep learning model. These methods are promising and are expected to yield improvements in the cases that were difficult to address using coevolution methods. In particular, we are aiming to address the difficult problems of small protein families and families with many copies in each genome. The largest impact would be to make predictions for eukaryotic protein families. We are also investigating the impact of phylogeny on sector analysis, focusing in particular on how robust sector analysis methods are to phylogenetic correlations, and on how phylogenetic correlations can further help.

At the scale of microbial populations, we are further generalizing our model for structured populations on graphs by going beyond the limit of rare migrations. With this more general model, we aim to bridge two different lines of research, one from traditional population genetics, and the other from applied mathematics and to reconcile their results. We are also expecting general results about the impact of population structure on mutant fate, specifically mutant fixation probability and fixation time. Furthermore, we are building on our study of the impact of finite-size effects on the exploration of rugged fitness landscapes by microbial populations to now address the impact of spatial structure on the exploration of rugged fitness landscapes by microbial populations. This line combines our models of spatially structured populations together with our study of rugged fitness landscapes. Finally, we are starting to study the impact of population spatial structure on the evolution of antimicrobial resistance.
Summary figure