European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Origin, fate and function of wheat genes noncollinear with the other cereal genomes

Final Report Summary - NONCOLLINEARGENES (Origin, fate and function of wheat genes noncollinear with the other cereal genomes)


Executive Summary:

Wheat is one of the most important crops world-wide, but due to its polyploid nature and an extra-large genome (17 Gb, 5 times that of human), wheat genomics has lagged behind other important agronomical crops such as rice, maize and sorghum. In the framework of the 3BSEQ project (funded by the French National Research Agency and France-Agrimer), our group produced, for the first time, a high quality reference sequence for the largest wheat chromosome, 3B. This chromosome accounts for about 900 Mb, and we were able to assemble a single pseudomolecule of 774 Mb in total to study the sequence content, organization, gene expression, and the molecular evolution at a scale never before reached in such a complex genome.

In the NonCollinearGene project,we studied evolutionary aspects, specifically regarding the level and impact of gene duplications and movements on recent wheat genome evolution. A previous study (Choulet et al. 2010) has suggested that gene rearrangements, especially interchromosomal duplications leading to the insertion of nonsyntenic genes, seem to occur at an unexpectedly high level in wheat compared to related species. To further investigate these findings, the NonCollinearGene project was aimed at exploring the origin, function, and fate of wheat genes that are noncollinear with the related grass genomes. Due to the availability of even more data than originally anticipated at the time of the proposal (RNA-seq transcriptome data, a whole-genome shotgun sequence, and a completed 3B pseudomolecule), we modified the research methodology in order to take advantage of this abundance of new information. Thus, instead of choosing 30 candidate genes to study using wet lab experiments, we expanded the project to study all of the 3B genes by using bioinformatics approaches. We performed the following analyses on a chromosome-wide scale:

• Compared the 7264 gene sequences that were annotated on wheat (Ta) chromosome 3B with its three closest relatives for which a complete genome sequence was available: Oryza sativa (Os), Brachypodium distachyon (Bd), and sorghum bicolor (Sb).
• Filtered the gene sets by discarding genes with no homology in at least one of the other species studied. By this we avoid bias introduced by lineage-specific genes or by differences in gene annotation methods.
• Calculated the percentage of syntenic vs. nonsyntenic genes (as well as collinear vs. noncollinear genes) for each species.
• Determined the spatial distribution of syntenic and nonsyntenic genes along the 3B pseudomolecule.
• Determined the gene expression patterns by analyzing massive sequencing data from the wheat transcriptome (RNA-seq) in 15 different experimental conditions.
• Used the 18 non-homoeologous bread wheat chromosomes (short-read based shotgun sequences produced by the International Wheat Genome Sequencing Consortium; IWGSC) to search for parent copies of recently duplicated genes found on chromosome 3B.
• Performed a functional analysis by identifying enriched Gene Ontology (GO) terms in nonsyntenic vs. syntenic genes.
• Identified an unambiguous and completely assembled ancestral gene for a subset of 152 nonsyntenic genes.
• Systematically analyzed the composition of the regions surrounding the syntenic and nonsyntenic genes on chromosome 3B.
• Investigated the time since divergence for 94 nonsyntenic/ancestral gene pairs via the analysis of the nucleotide synonymous substitution rates (Ks).
• Investigated the amount and type of intra- and interchromosomal duplicates of 3B compared to other grass species.

Major results:

• About 38% of the wheat genes are nonsyntenic with related species compared to only 5% for rice, Brachypodium, and sorghum, demonstrating that wheat has undergone accelerated evolution of its gene content in its recent evolutionary history (<35 MYA; Figure 1).
• When looking at strict gene order (i.e. collinearity), collinear genes represented 42 to 68% of the genes present on Os1, Bd2 and Sb3, while they represented less than 30% of the Ta3B genes.
• We observed a clear increase of nonsyntenic gene density at the distal regions of chromosome 3B compared to the centromeric and peri-centromeric regions.
• Expression analysis revealed that the majority of nonsyntenic genes (64.2% vs 85.5% of syntenic genes) are expressed in at least one of the tested experimental conditions. In addition, only 30% of the nonsyntenic genes (vs. 17% of syntenic genes) were annotated as pseudogenes or gene fragments. In addition, a majority (57%) of the genes expressed in one condition correspond to nonsyntenic genes whereas 88% the genes that are expressed in all 15 conditions are syntenic genes.
• GO term enrichment analysis of nonsyntenic genes revealed an enrichment of terms related to adaptive functions, such as response to stimulus and cell death.
• Wheat chromosome 3B had a higher percentage of interchromosomally duplicated genes (34%) than the comparison species (23-24%). About 48% of the 3B nonsyntenic genes (vs. 25% of syntenic genes) were classified as interchromosomal duplicates.
• After identifying ancestral loci of 152 nonsyntenic genes, we observed no clear bias regarding the chromosomal origin of the interchromosomally duplicated genes.
• Three Class II transposons superfamilies (CACTAs, hAT, and unclassified transposons) were more associated with nonsyntenic genes; their presence in the 40 kb flanking the nonsyntenic genes increased by at least 10% compared to the average of all genes.
• Ks analysis revealed that a vast majority (82%) of the genes were duplicated less than 40 mya. Less than half (37%) of the duplications are less than 10 Myrs old and are likely wheat specific whereas the other 63% are expected to be Triticeae specific.
• Investigations of the intrachromosomal duplication rate on 3B found a total of 809 families with 2 or more copies, comprising 2216 genes. There is more than a twofold higher rate of intrachromosomal duplications on chromosome 3B (~37%) compared to the rice, brachypodium, and sorghum orthologous chromosomes (~15-18%).
• About 46% of the duplicated genes of chromosome 3B were found in tandem whereas 56% were found as dispersed duplicates.

Our results confirm and refine previous hypotheses that suggest an accelerated evolution in the wheat lineage compared to other grasses, with many insertions of nonsyntenic genes intercalated in the ancestral grass genome backbone via gene duplications and translocations preferentially in the distal regions of the chromosome. In contrast to previous hypotheses, we show that nonsyntenic genes do not correspond mostly to pseudogenes, but that they can contribute significantly to the expression of the wheat genome. In fact, our data suggest that nonsyntenic genes provide functional diversity and a potential for adaptation. Upon investigating the mechanisms of gene movement, we found that up to half of the nonsyntenic gene movement onto chromosome 3B may have originated through cut and paste mechanisms, possibly by DNA transposons. The remaining nonsyntenic genes may be the result of ectopic recombination due to DSB repair. In addition, our comparison with the 3B pseudomolecule suggests that recently duplicated genes have been collapsed in the IWGSC assemblies or that the high fragmentation of the IWGSC affects the ability to identify all duplicated genes.

The results of this project have shed light on the evolutionary mechanisms that have shaped the wheat genome. The plasticity of the wheat genome has caused variation through gene deletions, duplications and insertions of repetitive elements into coding and regulatory regions. As wheat is such an important agronomic crop, better understanding of the structure and function of its genome compared to other grass species will help to better develop tools for researchers and breeders. By understanding the duplicated and dynamic nature of the genome we can learn how the plant has adapted to its environment over millions of years and apply this knowledge to improving specific traits in wheat such as yield and disease resistance.

final1-venn-diagram-marie-curie-report.docx