Final Report Summary - NOVABREED (Novel variation in plant breeding and the plant pan-genomes)
The aim of the project Novabreed was to identify novel variation in plants and study the composition of the plant pan-genome. Due to the high levels of structural variation observed in plants, a single genome sequence might not reflect the entire genomic complement of a species, and prompted us to introduce the concept of the plant pan-genome, including core genomic features common to all individuals and a Dispensable Genome (DG) composed of partially shared and/or non-shared DNA sequence elements. It is therefore of great importance to characterize structural variation, consisting of both smaller insertion/deletions, mostly due to recent movement of transposable elements, and of larger insertion/deletion events similar to those termed in humans Copy Number Variants (CNVs). Uncovering the intriguing nature of the dispensable genome, i.e. its composition, origin and function, represents a step forward towards an understanding of the processes generating genetic diversity and phenotypic variation. In the course of the Novabreed project we defined the composition of the pan-genome of grapevine and maize, by implementing state-of-the-art approaches for de-novo assembly of genomes. Transposable elements (TEs) are a major contributor to genetic variation in plants due to their recent and massive transposition activity that is observed in most angiosperm species studied so far. We thus tested methods for the identification of TEs movements in genomes, and improved existing methods for structural variants (SVs) detection in both species. We identified more than 50000 polymorphic insertions of TEs in grapevine and showed that rather than behaving merely as junk DNA they frequently affect gene expression of flanking genes either directly or by modifying epigenetic marks as well as modify the local chromatin environment. Different families of TEs show extreme insertional specificity favoring genomic regions that are defined both structurally as well as epigenetically. Specific efforts were spent in developing methods to compare directly gene expression and epigenetic status between the two homologous chromosomes present in each variety (maternally and paternally derived) by performing allele specific analysis for gene expression, epigenetic modifications and chromatin structure and correlating differences in these different genomic features with the presence/absence of specific transposable elements. Novel variation can appear as somatic variants or as germinal variants depending on the cells harboring the novel variation. We developed a software package (X-scan) for the detection of novel somatic structural variants and designed experiments and analytical protocols for the identification of novel germline SVs. Finally, we devoted some effort to extend our approaches to additional species and observed that in species such as poplar (Populus nigra) and cherry (Prunus avium) the frequency of structural variation and thus the complexity of the dispensable genome is not as dramatic as in the two species we initially selected, grapevine and maize. We hope that the abundance of results obtained by our project will help novel approaches for genetic assisted breeding, by incorporating SVs information into predictive and analytical models already developed for single nucleotide variation. The methods we developed have already been used in the field of regenerative medicine where we have contributed to detecting insertion sites of a transgene used in gene therapy to treat a child affected by a rare genetic disease.