Final Report Summary - GENOME OBESITY (Causes and consequences of mechanisms underlying genome size obesity) The genome size is determined by the amount of DNA in each cell of an individual of a species. Large-scale comparative analyses of plant genome sizes have shown that plants with large genomes are at greater risk of extinction, are less adaptable to living in polluted soils, and are less able to tolerate extreme environmental conditions, clearly demonstrating that genome size has ecological consequences which shape the distribution and persistence of biodiversity. There are compelling reasons to believe that many underlying genomic, cellular, developmental and ecological processes are genome-size dependent.Until recently, the sheer scale of the task of understanding genome obesity was too daunting to address. But now that impediment is largely overcome thanks to the astonishing advances in NGS. The major advantage of NGS is producing large amounts of sequence information by reasonable cost. This proposal exploited NGS along with complementary methodologies that surveyed entire genomes to provide insights into the evolutionary dynamics of genome obesity. In this project, the major goal of the research was to understand the epigenetic mechanisms operating in Fritillaria to control the different types of repetitive DNA in their obese genomes. There is now increasing knowledge of how repetitive DNA sequences (especially transposable elements) are epigenetically regulated in species with small to medium sized genomes (e.g. through DNA methylation, histone acetylation, siRNA, transcriptional silencing) and this raised the question as to whether similar processes also operate in species with obese genomes but just less efficiently (leading to genome size growth) or whether epigenetic regulation involves novel mechanisms unique to species with obese genomes. We conducted a global analysis of the repetitive sequences in DNA, RNA and small RNA levels by next generation sequencing (NGS), to address the evolutionary dynamics of genome expansion, which in extreme cases can produce huge, or “obese” genomes. In the angiosperm (flowering plant) genus Fritillaria, species have amongst the largest diploid eukaryotic genomes known. The genome size of Fritillaria species are at least 15-fold larger than the genome size found in human, and at least 60-times larger than common rice. They are, therefore, a good model taxa for studies of large genomes. We have also examined the genomes of gymnosperms (seed plants including, for example pines, larches), which have large genome sizes compared with that typically found in angiosperms. As planned, Lu Ma (MC fellow) isolated DNA and total RNA for NGS (Illumina platform) from three species and 5 different tissues in Fritillaria (one more than planned). Twenty-six sequence datasets including genomic DNA-seq, sodium bisulphite treated genomic DNA-seq (BS-seq), RNA-seq and small RNA-seq from 3 species from 5 different tissues in Fritillaria were produced. All the sequencing data will be submitted to Sequence Read Archive (SRA) database and shared with others upon publication of the fully synthesized data.We identified an endogenous pararetrovirus repeat localised at the centromere of most, or all, chromosomes. Endogenous pararetroviral sequences are the most commonly found virus sequences integrated into angiosperm genomes. The repeat (FriEPRV) was identified from Illumina reads using RepeatExplorer, a bioinformatic pipeline that is ideally suited to identify and characterize repeats in species with large genomes. FriEPRV shows sequence similarity to members of the Caulimoviridae pararetrovirus family, with phylogenetic analysis indicating a close relationship to Petuvirus. Analysis of single nucleotide polymorphisms revealed elevated levels of C to T and G to A transitions, consistent with deamination of methylated cytosine. Bisulphite sequencing revealed high levels of methylation at CG and CHG motifs (up to 100%), and 15–20% methylation, on average, at CHH motifs. FriEPRV’s centromeric location may suggest targeted insertion, perhaps associated with meiotic drive. We also observed an abundance of 24 nt small RNAs that specifically target FriEPRV, potentially providing a signature of RNA-dependent DNA methylation. Such signatures of epigenetic regulation suggest that the huge genome of F. imperialis has not arisen as a consequence of a catastrophic breakdown in the regulation of repeat amplification (Becher et al. 2014, Plant Journal (2014) 80, 823–833). At the start of the project, we hypothesized that the epigenetic machinery which regulates the amplification of repeats in angiosperms, the RNA dependent DNA methylation (RdDM) pathway may be aberrant, resulting in the enlargement of the genome to the huge genome sizes encountered. The presence of 24 nt small RNAs and high levels of methylation in DNA suggested that hypothesis was wrong. To further test the hypothesis, we used our transcriptome data to search for genes in the RdDM pathway. We assembled the transcriptomes of F. imperialis and F. persica. We then built a custom database of all known genes identified to the RdDM pathway from other land plant. To these data we analysed by BLAST the Fritillaria transcriptomes and found that all genes we expected to find in the RdDM pathway were present. We have also analysed the distribution of modified histones in Fritillaria, and found that the expected modified histones, typically associated with euchromatin and heterochromatin are present, but that they were in an unusual configuration comparing with Arabidopsis. Overall, however, our hypothesis that the breakdown of RdDM is responsible for runaway genome expansion in the genus is likely to be wrong (Ma et al, 2015a, in preparation). Instead we propose that the genome has expanded in response to reduced recombination-based removal of DNA, perhaps because of overly efficient RdDM, as proposed by Fedoroff (2012, Science 338: 758-767), a hypothesis that we are testing in a follow on project involving collaboration with Royal Botanic Garden’s Kew, UK. Unlike many other angiosperm species, most gymnosperm species have large genome sizes. To test the hypothesis that their large genome sizes were the result of the breakdown of the RdDM pathway, we analyzed the transcriptome of multiple gymnosperm species, using publically available data. We have found that gymnosperms, unlike Fritillaria, lack key genes of this pathway.These results, our outputs and the exposure we have given them, have contributed to European excellence and competitiveness through: (1) developing the subject of plant genomics research so that it realizes its potential role in characterising biodiversity and what shapes biodiversity; (2) in building our understanding of the establishment of species diversity through genome size enlargement; (3) in enabling predictions of the long-term consequence of genome size enlargement; (4) in improving our understanding of speciation processes through genome size changes, a process that will be elevated by anthropogenic activity, (e.g. climate change), which will alter habitats, patterns of species interactions and selection pressures; and (5) in exchanging, storing and supplying of DNA probes and plant materials (6) Improve agricultural companies’ understanding of genomics for crop improvements (7) impact on policy-making relating to improved conservation strategies and conservation of genetic resources.There is no bespoke website, although details of staff and students and giant genome work and available databases are available at https://evolve.sbcs.qmul.ac.uk/leitch/.