Skip to main content
Aller à la page d’accueil de la Commission européenne (s’ouvre dans une nouvelle fenêtre)
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS
Contenu archivé le 2024-05-28

DNA extraction from ancient and modern maize samples and biochemical characterization of genes with key roles in domestication

Final Report Summary - MAIZEKEY (DNA extraction from ancient and modern maize samples and biochemical characterization of genes with key roles in domestication)

With the MaizeKey project I proposed myself to study some of the key molecular switches that resulted in the sequential improvement of modern maize from the ancestral varieties, and contribute data of direct applicability to modern crop science, in particular recovering lost genetic diversity of biotechnological importance with regards to nutrition, resistance to pathogens or stress, or even involvement in metabolic pathways leading to the production of valuable metabolites of commercial interest. In order to find the diversity that was lost in the past we need to go back in time and the only way to do that is to analyse ancient samples. In this work we have analysed maize samples from two time points in the past (2000 and 750 years ago) from Tularosa cave in New Mexico. The cobs from the two time periods show big differences in size and shape of the cob, with the older ones displaying less kernels per cob and an overall pineapple shape and the most recent ones resembling the cobs of most of current maize albeit of a much smaller size. By looking at what is seemingly a time series of samples (those from the 750 population are expected to be the descendants of the population that included the 2000 year olds), I expect to detect the genetic variation responsible not only for the observed morphological changes but also those responsible for other characteristics appreciated by humans, namely the resistance to disease and to climatic stress.

The first step consisted of choosing the regions of interest in the genome. Archaeological samples have a very low content of endogenous DNA, and many of our samples have as little as 1% of maize DNA, with the rest corresponding to environmental contaminants. The cost of producing a significant amount of full-genome data necessary for the downstream analysis is quite prohibited. We therefore opted by increasing the depth of coverage around specific regions of the genome using a capture approach. I chose the targets according to various criteria: I) GO category relevant for resistance to disease, stressful weather conditions, and nutrient content; ii) identity with sorghum between 70-95% (if too similar, then I expected the sequences within maize to be invariable, if too different it would render impossible the comparative analysis); iii) no hypothetical genes or without description; iv) only protein coding. Around 1Mb of sequence was captured using MYselect target enrichment kits. Sequencing was done in a Illumina HiSeq, and I designed and tested a pipeline for filtering and mapping the raw data. CutAdapt was used for adapter removal, PRINSEQ was used for quality trimming, bwa for mapping reads to the B73 RefGen_v2 reference genome, and only reads mapping to regions of mappability of 1 (calculated using gem-mappability) were used in the downstream analysis.

Although the enrichment for the targets regions was significant (from an average of 1X to 10X), the overall depth was still low for confident SNP calls. For this reason, I decided to use a new set of methods that take genotype uncertainty into account instead of basing the analysis on called genotypes, which is especially useful for low and medium depth data. Most of the methods have been implemented in the software ANGSD (http://popgen.dk/wiki/index.php/ANGSD(s’ouvre dans une nouvelle fenêtre)) and in ngsAdmix (http://www.popgen.dk/software/index.php/NgsAdmix(s’ouvre dans une nouvelle fenêtre)).

Given that population structure can lead to an inflation of the false positive rate in selection analyses I started by determining the admixture in the Tularosa samples using the maize HapMap2 data (http://www.panzea.org(s’ouvre dans une nouvelle fenêtre)) as comparison in ngsAdmix. I then moved on to perform various population genetics analyses to characterize variation within and between the two populations (e.g. Tajima's D and Fst) and detect genes with an outlier behavior that could be indicative of specific evolutionary constraints associated with domestication. The results of this analysis are being considered under an adequate demographic model that provides the results expected under a neutral scenario.

Maize is one of the three principal crops that feed the world. I have presented the preliminary results of this work in the “Maize meeting” in Illinois, USA, where around 600 researchers in maize genetics, both from academia and the industry, gather to discuss the latest developments in the field. The final results of this work are of great interest to the maize community, as they will shed light into how the primordial steps in maize domestication impacted the maize genome. Furthermore, our data analysis pipeline includes a highly innovative approach to the analysis of next generation sequencing data. The community showed particular interest in the applications of this to modern samples, since it allows an optimization of resources by allowing for a higher number of samples to be analyzed for the same amount of money.