Skip to main content

Sequence similarity networks: a promising complement to the phylogenetic framework to study evolutionary biology

Final Report Summary - EVOLUNET (Sequence similarity networks: a promising complement to the phylogenetic framework to study evolutionary biology)

The EVOLUNET project aimed at developing the use of similarity networks, adapted to evolutionary biology questions. Such networks, inspired by studies on social and regulatory networks, allow for fast inclusive comparative analyses of both (highly) divergent and conserved molecular sequences, fully or partly similar, and, more generally, of similarities between samples or historical phenomena. Consistently, the EVOLUNET team has implemented novel network methods and approaches, and applied them to various questions and datasets. A first type of results is that cells, organisms and mobile genetic elements undergo a much more complex evolution than assumed in tree-like models, because numerous aspects of their evolution are affected by reticulate processes, which produce composite entities. Applied to the analyses of reads, i.e. short sequences produced by next-generation sequencing, network approaches contributed to diagnose genome heterogeneity within the cytoplasm of arbuscular mycorrhizal fungi, making an original contribution to an ongoing debate about genetic diversity within single fungal organisms. At the subgenic level, network approaches unraveled extensive gene remodeling in viruses, and novel symbiogenetic genes resulting from plastid endosymbiosis in photosynthetic eukaryotes, and from mitochondrial endosymbiosis in eukaryotes. Composite and chimeric genes (genes with segments from different phylogenetic origins) were also detected in Haloarchaea, and demonstrated to play major biological roles, indicating that gene remodeling is a major process, still underappreciated in most evolutionary studies. At the gene level, our analyses showed that network can be used to detect imprints of major diversification events in the evolution of microbial life, in particular changes that affected the sequences of prokaryotic core gene families during transitions between anaerobic and aerobic lifestyles, and the process of gene externalization (the duplication of chromosomal genes on mobile genetic elements). Likewise, bipartite graph analyses revealed lateral gene transfers between archaea and very recently discovered ultra-small bacteria. A second type of findings concerned environmental data. In particular, EVOLUNET used sequence similarity networks to test ecological theories and to establish that marine ciliates exhibit similar geographic dispersal patterns to multicellular organisms. EVOLUNET also used these networks to analyze the ‘microbial dark matter’, the large gathering of taxonomically and functionally unassigned or divergent molecular sequences from the environments. This approach detected highly divergent, ancient, gene families in the environment, which is compatible with the existence of additional, and possibly deep, divisions of life, new lineages of protists, and identified genes coding for autotrophic carbon fixation pathways in ultra-small prokaryotes from the oceans. A final type of findings was that analyses of similarities using networks can be fruitfully generalized beyond the comparisons of molecular sequences to analyze various kinds of evolved data. Therefore, EVOLUNET started disseminating our network methods in other historical sciences, in particular in linguistics, and towards various colleagues thanks to 2 summer schools introducing network methods.