CORDIS - Risultati della ricerca dell’UE
CORDIS

A NOVEL BREEDING STRATEGY USING MULTIPLEX GENOME EDITING IN MAIZE

Periodic Reporting for period 3 - BREEDIT (A NOVEL BREEDING STRATEGY USING MULTIPLEX GENOME EDITING IN MAIZE)

Periodo di rendicontazione: 2022-09-01 al 2024-02-29

With an ever-growing worldwide population and the gradual resource scarcity due to climate change, agriculture is at the forefront to tackle the challenges of the 21st century. Plant breeding programs should therefore develop varieties with increased yield while being resilient towards extreme climate events such as drought, heat waves, flooding or late frosts. In that perspective, the BREEDIT project focuses on yield improvement and drought tolerance of maize, one of the most consumed crop worldwide either as cereals, silage or raw material for bio-based products. Conventional breeding usually resort to a forward approach for the identification of promising genes, from phenotypes to causative genes. This approach aims at the identification of valuable traits in diverse sources of plant material from wild species to landraces. Valuable traits are then introduced into commercial varieties by breeders in order to further improve them. Plant with contrasting phenotypes for the trait are usually crossed and the resulting segregating progeny is scanned for known markers ideally evenly distributed across the plant genome. By analyzing which version of the parental marker is present in the plants exhibiting the desired phenotype, breeders can pinpoint some markers likely to be associated with the causal genomic region. In the subsequent generations, plants displaying the causative markers are retained and the unfavorable genetic background is gradually eliminated using several rounds of backcrossing with elite material. The whole pipeline of forward genetics is long and can sometimes takes several decades from identification of the best gene combinationsto the release of commercial varieties on the market. Nonetheless, forward genetics represents a valuable way to identify causative genes without any prior knowledge on particular gene mechanistic.
On the contrary, reverse genetics consists in leveraging the work done in plant molecular biology over the past decades aimed at understanding the proper working of gene regulatory networks. Genes often work in concert and interact to up or down regulate each other’s activity. Behind the reverse genetics lies the idea that valuable phenotypes could be obtained directly by modifying genes for which their role in a specific trait is known. Growth-related traits involved in the final plant yield are often governed by (semi-) conserved gene families or pathways through plants, especially when closely-related taxa are considered (e.g. Poaceae). Observations of gene mechanistics in one species can potentially be translated to another closely-related organism thus supporting the use of reverse genetics in major crops which often belong to the same plant families (wheat, maize, rice: Poaceae; apple, peer, cherry: Rosaceae; cucumber, (water)melon, squash and pumpkin: Cucurbitaceae). Another promise of reverse genetics is the substantial amount of time that can be saved by translating research from one species to a close counterpart because the long and tedious steps of gene identification turn to be obsolete. Lines with desired mutations in targeted genes can be obtained by randomly modifying the genome using mutagenic treatments (e.g. X-ray, ethyl methanesulfonate) though with no certitude on the genetic outcomes of such operation. Alternatively, DNA nucleases are used to create mutations in specific regions of the genome but the implementation of such technique is most often done at low scale, generating single, double or triple gene mutants while probably more genes from the same regulatory network have to be modified to avoid functional gene redundancy and start seeing a promising phenotype emerged at the plant level.


The BREEDIT project aims to revolutionize the reverse genetic approach by generating mutations in up to 12 genes at the same time and further stack even more gene edits in one plant using directed crossing schemes. Mutations are performed using the cutting edge CRISPR/Cas9 technology in multiplex. Resulting segregating mutants are phenotyped at the seedling stage for growth-related traits and phenotypes are statistically associated with (combinations of) mutations detected using amplicon sequencing. Mutations occur in the maize inbred line B104 with overall good agronomical performance. Promising B104 mutant lines can even later be crossed with elite material thus circumventing the long characterization and introgression of favorable alleles from more distant materials such as landraces. The scale of gene editing in BREEDIT enables to knock out more members of gene regulatory networks compared to experiments that only target a few genes. In the end, the diversity of genetic profiles obtained throughout the project represent a valuable sources to get insights in gene interactions at an unprecedented scale. The BREEDIT project can improve the scientific knowledge in molecular pathways involved in the increase of plant yield under drought. The technical and scientific outcomes of BREEDIT could finally benefit breeding companies to develop innovative varieties well-adapted to climate changes in a faster way. Furthermore, the BREEDIT project will deliver the technological basis for deconvoluting other complex networks involved in plant traits such as disease resistance; root development, nutrient use efficiencies,…
Creation of the EDITOR lines
The EDITOR lines were first created by transforming the maize B104 to stably express the Cas9 protein. The EDITOR lines then serve as input material for a second transformation (super-transformation) with the SCRIPT constructs each containing 12 gRNAs that drive the Cas9 nuclease to the target site. We generated five EDITOR lines for which we assessed the quality by measuring the number of Cas9 copies. Two promising EDITOR lines, EDITOR 1 and EDITOR 3, were super-transformed with SCRIPT 1 to assess the efficiency of the EDITOR background. We concluded that EDITOR 1 outperformed EDITOR 3 by generating twice as many homozygous mutations in the set of genes associated to SCRIPT 1. We therefore decided to select EDITOR 1 as the background for the transformations with the remaining SCRIPTs.



Identification of target sites, gRNA and primer pair designs

Many genes are involved in plant growth and drought tolerance. Based upon literature and unpublished results from my lab, 60 genes were selected that, when inactivated by Cas9, could positively affect plant growth and stress tolerance. Protospacer adjacent motifs, necessary to activate the cleavage sites of the Cas9 protein, were searched in the coding areas of the gene along with associated 20-nucleotide gRNA sequence. We favored the selection of gRNA+PAM sites located in the middle of the coding DNA sequence (CDS) to ensure maximal inactivation at the protein level. We retained the best two gRNAs according to their specificity and predicted efficiency. Two specific primer pairs suited to multiplex PCRs were designed to surround the two selected cutting sites. Primer pairs were assessed in pilot amplicon sequencing runs on B104 wild-type material. The primer pair that generated the most sequencing reads with the fewest variability was considered the best and the corresponding gRNA was retained for downstream cloning.



Creation of super-transformed T0 material
We cloned five SCRIPT constructs, each with 12 gRNAs targeting the total set of 60 genes. We performed Agrobacterium-mediated transformation on immature embryo of the EDITOR line to obtain T0 material. After the super-transformation, the Cas9 protein and the gRNA present in the SCRIPT are constitutively expressed in the plant and interact to generate double-strand breaks in the DNA molecule at the target sites. DNA repair pathways are then involved to restore the chain of nucleotides and sometimes lead to errors by inserting or removing a few nucleotides. These small changes can have broader consequences such as the inability for the gene to express a valid transcript and therefore altering or even canceling the protein activity. Such inactivation is referred as gene knockout.


Implementation of a genotyping-by-sequencing pipeline

We monitored the mutations generated by the CRISPR/Cas9 machinery using amplicon sequencing. PCR were first generated in multiplex, i.e. each DNA samples receives a mix of primer pairs and reagents and is subjected to thermocycling. Next, the resulting amplicons are prepared for amplicon sequencing by hooking up sequencing adaptors and barcodes that uniquely identify each sample. Amplicons are then pooled and sequenced using Illumina short-read sequencing (2 × 150 bp) at great depths. The service provider first demultiplexed the sequencing reads by samples. We further demultiplexed reads by locus using read mapping. We used the SMAP pipeline to identify haplotypes and their associated frequencies in the set of sequencing reads per sample per locus. The haplotypes show different kinds of mutations. SNPs do not affect the haplotype length compared to the reference haplotype. In-frame insertions or deletions (indels) have a length divisible by three whereas out-frame indels have not. Out-frame indels shift the open reading frame made of a precise chain of 3-nucleotide codons, and are likely to generate early stop-codons or random amino acid chains that disrupt the protein activity. We therefore focused on out-frame indels for downstream analyses. One plant can contain multiple out-frame indels, even more than its ploidy level due to tissue mosaicism arising from ongoing Cas9 activity. We therefore summed the frequency of haplotypes with out-frame indels to calculate the fraction of the plant genome being knocked out at the locus. We observed that this continuous index follows a tri-modal distribution representing the three classical genotypes commonly used in genotyping: homozygous wild-type, heterozygous, and homozygous mutants. Using empirical data we defined cutoffs to discretize the continuous aggregated fraction of out-frame indels into the three classic genotypic classes that we named null, hemi, and full gene knockout. We used this nomenclature to reflect that several haplotypes may lie under the aggregated fraction of out-frame indels and therefore the genetic profile of the plant no longer fits the definition of homozygous and heterozygous. Dosage effects of gene knockouts are also better reflected using this nomenclature.


Crossing schemes
When transgenic material reached the reproductive stage, we implemented several crossing schemes. To stack gene edits in the progeny, we crossed material within (intra-script crosses) and between SCRIPTs (inter-script crosses). In addition, we saved some mutations using backcrosses to non-edited material (B104, EDITOR 1) and produced heterozygous progenies that could later be self-crossed to obtain segregating populations for phenotyping. Selfings were also performed to obtain more full gene knockout. When the Cas9 and SCRIPT constructs are still expressed, we observed that on average around 7% of the progeny display extra gene edits that were not observed in the parental lines.


Phenotyping trials
We phenotyped T1 and T2 material resulting from the crossing schemes for leaf shape (length and width) and integrative traits such as seedling fresh and dry weight. We conducted 14 different experiments leading to the analysis of around 3000 plantlets in growth chamber under controlled conditions. We performed experiments under well-watered and water-deficient conditions. We observed plants with broader leaves and increased weights under drought conditions compared to our EDITOR 1 line.

Phenotype to genotype association
The possibility for gene edit combinations was too large compared to our capacity to process plants, as a result we were unable to obtain sufficient repetitions of each specific combination to perform statistical comparisons. We therefore appraised our datasets at the single-gene level, considering that favorable single-gene edits observed in the presence of additional mutations would still be detected provided that they involved major changes in plant phenotypes. We recorded the number of time single-gene association were observed in different populations, across different experiments. Following this approach, we could generate a subset of genes from the original gene space whose gene knockout lead to enhance trait performances. We therefore suggested to continue the project with this subset of genes.
During the process of creating gene edit diversity, we figured out that the BREEDIT pipeline would be best coupled with a way to generate lines with fixed full gene knockout to reduce the combinatorial space of gene edits. We therefore developed a parallel approach that aimed at creating segregating heterozygous material further subjected to haploid induction. After restoring the diploidy, the resulting doubled-haploid lines will serve to study the effect of some gene combinations using replicates.

After reducing the gene space initially planned for the BREEDIT project, we will develop the next working packages following two axes:
1. Deconvoluting the combinatorial space for specific gene subsets for which we have so far identified promising effects but without knowing which (combination(s) of) gene(s) is responsible for the phenotype. To do so, we will use haploid induction and crossing schemes to obtain lines segregating on a subset of genes. Phenotyping trials will then be performed following the protocols we developed so far to further validate the effect of certain gene combinations. Lines with promising gene profile could then be assessed using our in-house high throughput phenotyping platform (Phenovision) to better characterize the effect of gene knock out combinations. In the end, this work will provide detailed insights on gene interactions and areas of the gene regulatory network of growth under drought will be clarified.

2. Increasing the gene edit diversity by further performing inter-script crosses. Large populations corresponding to segregating individuals at a maximum number of genes will be phenotyped and statistical models from genomic prediction will be derived to capture gene interactions and go beyond single-gene analyses. Statistical models could then be used to predict the phenotypic values of combinations never observed. Promising predicted gene combinations could be generated in vivo using validation constructs combining gRNAs targeting the most interesting subset of genes. Methods from machine learning could also be implemented to go beyond additive statistical models and capture more complex patterns of gene interactions.
Observed phenotypes in segregating populations of SCRIPT 1. One plant (S1*A) performs similarly to t
Allele overview in T0 material. Each cell of the array is a horizontal bar that summarizes the allel
The multiplex gene editing strategy of BREEDIT. A. Selection of growth related genes (GRG) is based