Skip to main content

Development of ryegrass allele-specific (grasp) markers for sustainable grassland improvement

Deliverables

In this project we have established a methodology for the validation of effects of candidate genes in L. perenne populations. The methodology can easily be extended to any other cross-pollinating crop species. The approach is based on the detection of shifts in allele / haplotype frequencies in populations that have been subjected to divergent phenotypic selection during several cycles of recombination. The methodology involves the following steps: - Selection of a (reduced) number of genotypes with different origins and with contrasting characteristics for the trait under investigation. - Creation of a panmictic population based on the founder genotypes. Special attention should be given to the fact that all genotypes should contribute equally to fertilization and seed production. - Starting from this panmictic population, initiate three lines of multiplication: (i) subject the population to positive selection for the trait of interest during 2-3 generations; (ii) subject de population to negative selection for the trait of interest during an equal number of generations (2-3); (iii) subject the population to cycles of multiplication without any intended selection. - Compare the allelic frequencies of candidate genes in the initial population, the two selected populations and the unselected population. Significant changes in allelic frequency between positive and negative selections with respect to the initial population can be used as an indication that the gene studied is involved in the determination of the trait. Given that L. perenne is an obligate out-crosser, changes in allelic frequencies between the different populations produced can be established using standard approaches for the analysis of genic and genetic differentiation between populations such as Genepop, Fstat, etc. The main advantage of this approach with respect to 'typical' QTL analyses in mapping populations is that the effects of a larger number of alleles (depending on the number of founder genotypes and their degree of genetic divergence) can be analysed simultaneously. It can be anticipated that the exploitation of knowledge generated in this kind of populations will be easier to extrapolate to unrelated genotypes than knowledge generated in mapping populations derived from two genotypes.
Breeding strategy using SNP markers (Partner 02) The developed SNP markers in the GRASP project can be used in plant breeding in tree ways: 1. As simple randomly distributed marker system in which to find usable associations, as any other marker system. 2. As a tool in development of markers where the function of the gene is unknown. If the SNP is located in the gene, which affects the trait of interest, it can be used as specific marker for identifying the allele from the parental plant responsible for the wanted phenotype. 3. As a marker. When the function of the gene is fully known, and it is clarified which sequences of the gene is responsible for the wanted phenotypes, then a real functional marker system can be designed by using the SNP markers. As demonstrated in the GRASP project the real breakthrough using SNP markers in plant breeding is when breeders have access to real functional SNP markers. In the GRASP project we have gone quite a part of the route connecting SNP markers with many agronomically important candidate genes and the genotypes used in the GRASP project can be utilised directly in breeding projects where the gene of interest can be monitored by using a few SNPs in the gene. However the functionality of the genes still needs to be investigated before real functional SNP markers can be developed.
A major prerequisite for the development of SNP-marker based breeding strategy is the evaluation of different SNP detection methods in grasses in respect to throughput capacity, robustness/reproducibility, information content and cost efficiency. In total, 9 techniques were established and evaluated by the different partners of the GRASP project, however, 3 of the partners found the Sequenom's MassARRAY system most useful and sub-contracted the final SNP genotyping to CIGENE (www.cigene.no) at UMB (Partner 9). The Sequenom MassARRAY system utilizes MALDI-TOF mass spectrometry for genotyping. The system is based on an allele-specific primer extension reaction where short primers are extended according to the base composition in the template sequence and then separated by mass. The differences in mass between the two extension products are large compared to the resolution of the Mass Spec, this allows for completely automated calling of the genotype. This system is a powerful, flexible, high-throughput genotyping technology for identifying genetic variations, providing a combination of accuracy, throughput, simplicity and economy. Since genotypes are separated by mass and all assays are computer-designed, it is possible to multiplex several independent SNP sites in a single typing assay without any fluorescent labelling. With the iPLEX technology CIGENE is routinely running 35 plexes with good and confident results, theoretically 40 plexes can be run. Allele sequencing, detection of SNPs and haplotypes within at least 1000 bp of the candidate genes was conducted in the 20 Lolium genotypes (LTS) by two methods: (i) cloning and sequencing of at least 5 PCR fragments for each genotype x candidate gene combination (allele) (Partners 1, 3, 4, and 5), and (ii) direct haplotype-specific sequencing, developed in the project by partner 9 (Rudi et al. 2006). This method utilizes the fact that DNA sequencing polymerases are sensitive to 3 end mismatches in the sequencing primer. By using two sequencing primers with 3 end corresponding to the two alleles in a given SNP locus, allele-specific DNA sequences from both alleles can be obtained. Proposed protocol for SNP detection: 1) Obtain sequences from relevant genotypes; 2) check quality of the DNA sequencing trace files using the phred software (http://www.phrap.org/) which reads, call bases, and assign a quality value to each base; 3) align sequences using ClustalW (ftp://ftp.ebi.ac.uk/pub/software/); 4) detect real SNPs in aligned sequences using PolyBayes (http://www.genome.wustl.edu/) which uses Bayesian probabilities and statistics to calculate the likelihood of a sequence variation within a set of sequences being due to a SNP rather than a sequencing error; 5) determine the number of different alleles (haplotypes) present in the assortment by using the software PolyM11 (Uschi Frei et al., in preparation) developed in the project by partner 1; 6) find the best combination of polymorphisms (SNPs, InDels) for the differentiation of given alleles using the software BEST (http://www.genomethods.org/best/); and 7) design single or multiplexed SNP assays using the MassARRAY Assay Design software based on a text files containing the SNPs, SNP-ids and the sequences flanking the informative SNPs. A high level of multiplexing is crucial for economical genotyping in practical situations. This depends on the possibility of designing PCR primers and extension primers that can be used simultaneously in the same amplification reaction. Genotyping SNPs located in conserved regions in genes which are members of gene families (common in plants) need careful attention since the assays might pick up non-allelic sequences in gene families and create false SNPs. Efficiency of the MassArray system at CIGENE: The number of datapoints produced per day per technician is 55000 (multiplexing of 35) at a price of 0.12-1.20 depending on the multiplexing level. This is not constrained by the equipment so higher throughput can be achieved.
The BAC library generated for Lolium perenne genotype LTS18 comprises approximately 100,000 clones picked into 96 well microtitre plates with an average insert size of 100-125 Kb. DNA pools aliquoted into microtitre plates were prepared and distributed to other GRASP partners for screening by PCR. Following first round PCR screening of the DNA pools by GRASP partners, the relevant pool numbers were sent to IGER and the relevant individual plates were sent to partners for second round screening to identify individual clones. The library is also available on filters for screening through hybridisation by interested partners, for example when studying gene families. This work has been written up as a paper and published (Farrar et al. 2006, Molecular Breeding, online first:http://springerlink.metapress.com/content/85434493804951g2/fulltext.pdf) BAC clones have been identified for 23 genes by screening the Lolium perenne BAC library constructed at IGER as part of GRASP and by direct sequencing. The entire gene and promoter is sequenced in most instances. Of these, the 20 most promising genes have been targeted for SNP discovery in the 20 Lolium perenne genotypes. From the BAC sequence information, primers have been designed to span approximately 1000bp of gene sequence incorporating both coding and non-coding sequence to increase the frequency of available SNPs. These primers have been used in PCR reactions to amplify genomic DNA from the 20 GRASP genotypes for comparison and identification of SNPs. Alleles have been identified in the LTS genotypes for 16 of these genes.
The GRASP database contains, or will contain, the outputs of the GRASP project, including DNA sequences, maps, micro-array data, and SNPs, together with relevant publications and contact details for project partners. It also provides data analysis and visualisation tools, including BLAST DNA sequence homology search capability and comparative map viewer. Data from GRASP not confidential to the project has been made publically available via the Forage Grass Genome Database FoggDB, maintained by partner 3. At present the GRASP database is accessible only to project partners, via password and IP address authentication. It will be maintained indefinitely while any project partner requires it, and its contents will be regularly archived to safeguard the data. Once the industrial partner in GRASP has secured the required IP, and on publication of refereed papers arising from GRASP, all data will be made publically available through FoggDB, to ensure its widest possible dissemination throughout Europe and beyond. Where appropriate they will also contribute to international comparative genomics initiatives such as Gramene. The data are expected to be used by researchers studying forage grasses and related monocots; by plant breeders; and for educational purposes, for example in courses on applied genetics and plant breeding taught by project participants.
A set of ca 4500 partially sequenced genes (ESTs) from Lolium perenne has been generated and used within GRASP. In the public domain, there were no ryegrass ESTs at project start. The collection was mainly derived from cDNAs from a wide variety of sources, albeit not a random sample of all Lolium genes. It is thought to have a wealth of genes relevant in relation to plant production, yield stability, forage quality and the mating system of perennial ryegrass. cDNAs from genes showing differential expression under contrasting environmental conditions were included. The sequence data are present in the GraspDB site in FoggDB. The wide origin of the collection means that it is an interesting starting point for selecting candidate genes possibly influencing the genetic variation for specific (combinations of) agronomic traits. The results of expression profiling studies using the GRASP cDNA micro-array show that the collection contains a lot of genes that deserve attention. It is proposed to use the available sequence and expression data as a starting point for discovery of allelic variation as done in GRASP. The BAC libraries have shown to be effective in finding the full-length sequence of such genes. The isolated genes can be converted into molecular markers for marker-assisted selection procedures, or for genetic modification, in both cases useful for plant breeders. Moreover, the gene collection spotted on a micro-array can be used for expression profiling experiments for discovery of agronomically relevant genes. The resources generated may benefit plant breeders, researchers, or companies dealing with micro-arrays.
RESULTS SUMMARY OF PLANT POPULATIONS Starting from the 20 genotypes of the Lolium test set described in WP1. Three subsets of genotypes were formed according to their compatibility in flowering period. Within each subset, pair-crosses were carried out following a half-diallel scheme. Per pair-cross, a number of seedlings was taken (at random) and combined to form the Syn0 population. These plants were allowed to fertilize each other freely, and the seed collected in bulk constituted the Syn1 (=C0) population. In this way, a C0 population was produced for each subset of genotypes without any conscious selection. The genotypes were multiplied by Partner 08, and the C0 populations were produced by Partner 02 in close collaboration with other partners. The C0 populations were used by the project partners (one partner - one population) for 2 generations of positive and negative selection producing C2+ and C2- generation, with a selection pressure of 10%. Populations with significant difference between C2+ and C2- were successfully produced for following traits: Crown rust Puccinia caronata - Partner 1 Vernalisation requirement - Partner 2 Water- soluble Carbohydrates - Partner 3 Root Development and Cold Tolerance - Partner 7 Temperature and light sensitivity of leaf-area expansion - Partner 8 Frost Tolerance - Partner 9 Two partners will finish the selection work during the summer 2007-03-12 Nitrogene efficiency - Partner 4 Seed production and self-incompatibility - Partner 7 These plant populations with contrasting traits are a unique source for future research for a lot of economically important ryegrass traits.
The overall objective of GRASP was to establish and validate a framework for the development of allele-specific; gene-derived single nucleotide polymorphism (SNP) markers associated with agronomically important traits in Lolium relevant to environmental and nutritive value needs. This project fills the gap that exists between knowledge on plant gene functions and their commercial application. Bridging this gap with the allelic selection tools developed within this project leads to efficient plant breeding towards highly improved, non-GMO crop plants. The SNP markers developed in GRASP will be made available to commercial breeding companies and public institutions on a European-wide basis for "direct selection" strategies at the DNA level, to explore germplasm collections, and to describe varieties for intellectual property right protection. Established techniques and approaches can be easily expanded to additional candidate genes. Allele sequencing has been completed for 91 genes expressed in Lolium perenne, with putative functions relating to agronomic traits. For each of the 91 genes, allele sequences have been obtained from 20 heterozygous ryegrass plants. Allele sequences have been used to derive informative SNP markers to discriminate allelic variation at the 91 loci with a minimum number of SNP markers. This information has subsequently been used for selection experiments. All allele sequences have been or will be analysed in detail with regard to SNP density, heterozygosity, linkage dis-equilibrium, etc. A first publication on resistance gene candidate genes has been submitted. So far, no allele sequences from ryegrass genes have been released from other research groups. The allele sequences generated in GRASP form the substrate for development of multiplexed - high-throughput SNP assays, which benefit plant breeders, researchers, and might be exploited by service providers for marker assays. In any case, these activities contribute to the competitiveness of the respective institutions or companies, thereby securing or generating jobs.
We developed a software combining all steps in the selection of an optimal set of polymorphisms for genotype differentiation in offspring in a single interface: PolyM. PolyM has been developed using C++. The input file is an alignment (CLUSTAL-W) of the candidate gene sequences from a set of genotypes, as generated, e.g., by http://clustalw.genome.jp/. In a first step the program finds the different alleles present in a set of genotypes and their frequency, based on sequence polymorphisms (insertion / deletions, SNPs). Subsequently a minimum set of polymorphisms, necessary to differentiate all alleles is generated. This includes the reduction of redundant polymorphisms, polymorphisms resulting in the same grouping of alleles. Afterwards polymorphisms for allele differentiation are selected based on their PIC value. The minimum set of polymorphisms for allele differentiation is returned; the user has the possibility to exclude/include insertion/deletions, or to restrict the selection of redundant polymorphisms. Finally the program calculates the necessary polymorphisms for the differentiation of genotypes in the offspring, based on the whole dataset or on a user-made selection of the parental genotypes/alleles. A final table gives for all possible genotypes in the offspring the expected marker phenotype and can thus be used as basis for the analysis of SNP detection results in the offspring. For the processing of allele sequence data and selection of minimum sets of SNPs, there is different software freely available: for the alignment, finding of polymorphisms, and selection of informative polymorphisms for allele differentiation. One drawback of these freely available programs is their insufficient interoperability, limiting the use of a particular program by the need to reformat the raw data for each purpose. PolyM combines all above-mentioned steps in one software package. The major application of this software will be in plant (potentially animal) breeding and respective research, to facilitate marker-assisted selection. With increasing available information on allelic variation for genes controlling agronomic traits, it will become more and more important to identify the most relevant polymorphisms to trace and select for valuable alleles in the development of breeding populations while minimizing the work and cost input. In consequence, this software will contribute to the competitiveness of plant breeders and respective researchers, and thereby support maintenance or generation of jobs in these areas. The software will be published; a publication is in preparation. In the longer run, further features might be added to this software package.
The mapping work in GRASP has resulted in a small number of maps of Lolium perenne with mapped candidate genes. These maps have been aligned with help of polymorphic SSR markers, in particular. The integration of the maps still is a weak point because they still have not sufficient markers in common. To this end most partners in GRASP are involved in the development of a set of polymorphic EST-SSRs markers to improve the alignment of the Lolium maps. However, this is an activity outside GRASP. Another way to improve the alignment of maps is to use the allelic variation in the genes studied in detail in GRASP and map them in all relevant maps. The way to do so would be the development a small set of SNP markers for each gene. Parents of mapping and breeding populations could be screened for polymorphisms. To exploit the integrated mapping data from GRASP better, it is proposed to develop a core set of polymorphic genes (well spread over genome) with SNP markers. Map-based approaches will be of central importance for identification of genes or genome segments controlling quantitative inherited characters, such as yield and quality traits. Genetic maps will facilitate identification of relevant genome regions, which will be of interest both for plant breeders but also researchers. Information generated in any grass species can be more easily exchanged with other grasses incl., e.g., cereals with close synteny relations established.
A major goal of this project was to develop trait-specific multiplexed sets of SNP markers and to transfer the newly developed techniques and resources to breeding companies and public institutions. A broad range of relevant traits in ryegrass was addressed by recurrently selecting individuals showing diametrical response towards the different selection regimes applied. A set of carefully chosen markers and genes were monitored for allelic differentiation in response to the applied selection regimes. A set of SSR markers evenly distributed throughout the genome was used for genotype profiling. In addition, a range of different SNP detection methods was evaluated by the project partners to identify trait-specific SNPs in genes with putative association to the respective trait. In total, 9 different SNP detection methods (SNuPE, TaqMan, ecoTILLING, MassArray, SSCR, CSCE, SNaPshot, conventional and pyro-sequencing of alleles) were established and evaluated with respect to reproducibility, multiplexing capacity, costs per SNP and speed. It was concluded that one cannot give a general recommendation, but the choice of a SNP detection method largely depends on the purpose of an experiment. EcoTILLING for example is a cheap technique rapidly generating many data points. However, with increasing number of SNPs the resulting data is difficult to interpret. Moreover, the presence of INDELs makes location assignment of a SNP very complicated unless all allele sequences are known. Furthermore, EcoTILLING turned out to be very prone to parameter changes. Therefore, the use EcoTILLING would be preferable for cases where the uniformity of a population should be determined including the identification of foreign alleles, or individuals containing the same allele combination should be identified. On the contrary, techniques such as TaqMan assay or SNuPE appear very robust and reproducible. However, they are relatively costly and -most importantly - the content of information per assay is comparably low. Nonetheless, these techniques could be useful for the identification of particular point mutations in known genes functionally associated with a certain phenotype (as being used in human gene diagnostics). Also the Massarray assay appears a robust solution, especially when multiplexing is intended. However, due to subcontracting the work based on this technique, it cannot be estimated, how many datapoints can routinely produced per day and person. Nonetheless, in cases where a high degree of multiplexing can be achieved, this technique can be run at moderate costs. In terms of information content, allele sequencing / pyrosequencing for sure is the ultimate technique giving instant information about sequence and haplotype. Unfortunately, allele sequencing is expensive and relatively time consuming and - as EcoTILLING - will be complicated by the presence of INDELs. Techniques detecting conformational changes caused by a SNP (SSCR and CSCE) proved to be a robust yet not the cheapest option. Moreover, the possible level of multiplexing is extremely limited. SNaPshot, though again only a moderately cheap technique, might be useful as a routine method particularly due to its robustness, capacity for automation (in contrast to most gel-based techniques) and a reasonable possibility for multiplexing. For every trait, a set of putative key genes was isolated by the different partners. Upon SNP detection using one or more of the techniques described above, the allelic distribution among the selected and the control populations was assessed by statistical means. As a result, a collection of association between specific marker alleles and different traits was established, and made available to all partners of the project together with a collection of protocols for their laboratory application. Allele specific marker/trait associations could be found for all traits tested: Rust resistance, verbalisation requirement, water-soluble sugar content, nitrogen use efficiency, seed yield, heat and cold/frost tolerance, shading tolerance and tillering. The use of this toolkit in breeding programs of public institutions and private companies is ensured through the development of new marker assisted breeding strategies described elsewhere in this report. This will result in the more rapid development of new fodder and amenity varieties of L. pernenne, with both economical and social benefits for a broad community of farmers and customers of the private and professional sector. Not least, the newly developed knowledge and techniques will provide the project partners with unique competition advantages in the international field of grass breeding.