CORDIS - Forschungsergebnisse der EU
CORDIS

Comparative genomics of non-model invertebrates

Periodic Reporting for period 2 - IGNITE (Comparative genomics of non-model invertebrates)

Berichtszeitraum: 2020-01-01 bis 2022-06-30

Genomes provide records of the organism’s evolutionary history that can be exploited to understand key biological questions, such as the relationships among organisms, their key functional traits, and the evolution of biodiversity. Furthermore, genomes can help understanding normal versus aberrant development of animals (including humans), regeneration, and the origin of diseases. Also, genomes enable the study of pathways related to the production of biochemically active compounds that are potential drugs targets for human health benefit. Despite >95% of the Earth’s animal biodiversity being non-vertebrates, the current tally of sequenced genomes does not reflect this diversity and many traits and potential bioproducts of human benefit remain unknown.

The first objective of IGNITE is increasing the availability of invertebrate genomes from as yet underrepresented lineages to establish resources to explore the likely vast potential encoded in genomes of spineless animals. The analysis of genome data involves several computationally intensive steps, such as genome assembly from raw data, gene prediction, comparative gene/genome and phylogenetic analyses. Limitations concerning the lack of international standards with respect to computational code and availability of analytic tools currently limit our capabilities to comprehensively analyse and understand the massive amount of information contained in invertebrate genomes. Consequently, IGNITE’s second objective is to develop novel tools to standardise genome data analyses, make it more computationally efficient, and to enable the continued rapid generation of high-quality genomes.
The fields of interest in IGNITE are as manifold as the targeted study systems. Four main research areas, however, unite all subprojects:
(1) production of high-quality genome and transcriptome resources for various underrepresented non-model invertebrates
(2) testing and adjusting of existing, and implementation of new software, to produce and analyse high-quality genome assemblies, including novel method development for publication, harvesting, and re-use of biodiversity and genomic data
(3) establishing robust relationships of main animal lineages to provide a reliable backbone of the animal tree of life
(4) identification and exploitation of bioactive compounds with potential for biomedical application

Multiple draft genome assemblies have been generated from diverse taxa including five sponges (Porifera), one mollusc (Mollusca), one acoel worm (Xenacoelomorpha), one acorn worm (Hemichordata) and one insect (Hemiptera). Additional genomes are currently sequenced from sponges, molluscs, corals (Cnidaria), arrow worms (Chaetognatha), and wheel animals (Rotifera). In addition, multiple transcriptomes are sequenced to increase taxon diversity.

To improve the quality of genomes, IGNITE adopted and further developed chromosome conformation capture (Hi-C) protocols for problematic invertebrate tissues. The aim was to provide a Hi-C protocol that works in a broad diversity of invertebrates. The establishment of new lab protocols, as well as the testing and adjusting of Hi-C scaffolding software, helped to enhance the genome assembly quality of very distantly related invertebrate lineages. Besides improving lab protocols and scaffolding, IGNITE is developing new software to fill gaps in scaffolded assemblies and to phase genomes to yield chromosome-level and gap-free nuclear genome assemblies.

To produce robust and yet energy efficient sequence-based phylogenetic analyses, IGNITE has been working on phylogenetic likelihood implementations, with the overall objective to develop efficient open-source bioinformatics tools for analysing large molecular data sets under complex evolutionary models. The specific aim was to implement new algorithms that are more energy-efficient than existing software solutions. Those software tools are routinely used in biological and medical research around the globe to analyse bacterial samples, viral outbreaks, and to study the evolutionary history of life.

During this reporting period, a production-level library has been implemented and made publicly available. The newly developed kernel version has now been integrated into several production-level tools for phylogenetic inference. The final software is now faster and also provides options for reducing and monitoring energy consumption. Furthermore, complex models to account for heterotachy and the so-called non time-reversible models of nucleotide substitution were implemented. An open-source tool to determine the root of a phylogenetic tree will be finished shortly.

The genomic information gathered by sequencing highly under-sampled invertebrate groups promises to provide new insights into biochemical compounds being produced either by the host or its associated microbial organisms. In IGNITE, we have been completing and analysing particularly sponge holobionts (host and associated symbionts) as they are known to harbour a rich diversity of metabolites with a high potential for applications in human health, including cancer therapies and novel classes of antibiotics. Our initial analyses already yielded one candidate for a novel antibiotic compound. With more sequenced genomes becoming available in IGNITE we aim to identify a larger number of metabolites.
IGNITE has gone beyond state of the art in various ways. It is the first-ever international training program that focuses specifically on biodiversity genomics of the vast majority of animal life, the invertebrates. Invertebrate genomics is a relatively new field that provides many job prospects in companies involved in medicine, biotechnology, bioproduct development, agriculture, livestock and pest management, as well as in research institutions, universities, and, increasingly, in natural history collections and research museums. Through its multifaceted top-level training, the career perspectives and employability of IGNITE's graduates in both academia and industry is significantly enhanced.

Through its innovative training-by-research program, IGNITE has so far fully achieved its objectives. The consortium has generated high-quality reference genomes of undersampled animal lineages, developed new analytical tools and software, and provided novel insights into genomes in the context of an organism's function, environment and ecology, as well as how invertebrate genomes were shaped throughout the Earth's history.

IGNITE's broader socio-economic impact and relevance are shaped mainly by the analysed de novo sequenced high-quality genomes and related resources produced that build the foundation for the education of a new generation of well-trained genomicists.
The IGNITE ESRs