Periodic Reporting for period 1 - FraxiFam (Reconstructing gene family evolution in the ash genus (Fraxinus)
Berichtszeitraum: 2015-07-01 bis 2017-06-30
The fellow also performed synteny analysis and although this analysis shows regions of multiple synteny between ash and monkey flower, low contiguity of the Fraxinus excelsior genome meant that this approach was not as useful as expected for detecting past shared whole genome duplications.
Much of the Fellow's work was dedicated to producing high contiguity genome assemblies for 27 other species and sub-species of Fraxinus, to allow analyses of gene family evolution in the genus. These assemblies proved difficult due to relatively high levels of heterozygosity (up to 5.13%) in these genomes. The Fellow tested a range of state-of-the-art assembly approaches (including ABySS, SOAPdenovo2, Redundans, and Platanus). However, none of these approaches delivered high quality assemblies. Typically the assemblies were highly discontiguous and greatly exceeded the expected genome size due to the assembler failing to properly handle heterozygous regions. The best assemblies were generated by Dr Laura Kelly, at QMUL's School of Biological and Chemical Sciences using the CLC assembler. She has provided her expertise in this area to the Fellow to support FraxiFam's research activities. The Fellow finished these CLC assemblies by scaffolding using SSPACE and filling scaffold gaps with GapCloser. De novo genome assemblies for 28 accessions of Fraxinus were built, and made publicly available on the project website here: http://www.ashgenome.org/worldwide.
In order to further improve the contiguity of a subset of these genomes (representing phylogenetic diversity within Fraxinus) the Fellow extracted high molecular weight DNA from four Fraxinus species - one from each of the major clades of the Fraxinus phylogeny - and sequenced them, and DNA from F. pennsylvanica provided by collaborators in the USA, with long mate-pair libraries using the Illumina HiSeq platform. These data were used to further improve the genome assemblies of these five species. The final assemblies for these five species had scaffold N50s ranging from 18.5kbp to 50.5kbp and were thus suitable for de novo gene annotation.
In order to achieve robust gene annotations for these five genomes RNASeq data were required. The Fellow also extracted RNA from these four of these five species (RNASeq data were already available for F. pennsylvanica via collaborators in the USA), from various tissues, and sequenced these, for use in improved annotations. The resulting transcriptomic datasets were assembled using Trinity and generated from 100K to 133K putative transcripts per species.
To analyse gene families among 28 species and sub-species of Fraxinus, genes were annotated in each genome based on the F. excelsior annotation, and placed into orthologous groups. This provides a database of gene families in the genus Fraxinus. In order to map these gene families onto the history of Fraxinus an accurate phylogeny is needed, as until now the only phylogenies available have been based on low numbers of genes. To do this, genes were selected that are present in all of the species and also three outgroups species, and have suitable variation to be phylogenetically informative. This resulted in over 250 genes that were used to build a new phylogeny for Fraxinus.