Large-scale identification of secondary metabolites, metabolic pathways and their genes in the model tree poplar

Project Information

POPMET

Grant agreement ID: 834923

DOI

10.3030/834923

Project closed

EC signature date 11 April 2019

Start date 1 July 2019

End date 30 June 2025

Funded under

EXCELLENT SCIENCE - European Research Council (ERC)

Total cost

€ 2 499 251,25

EU contribution

€ 2 499 251,00

2 499 251,00

0,25

Coordinated by

VIB VZW
Belgium

Periodic Reporting for period 4 - POPMET (Large-scale identification of secondary metabolites, metabolic pathways and their genes in the model tree poplar)

Reporting period: 2024-01-01 to 2025-06-30

Poplar is an important woody biomass crop and at the same time the model of choice for molecular research in trees. Although there is steady progress in resolving the functions of unknown genes, the identities of most secondary metabolites in poplar remain unknown. The lack of metabolite identities in experimental systems is a true gap in information content, and impedes i/ obtaining deep insight into the complex biology of living systems and 2/ valorizing these metabolites. The main reason for the lack of metabolite identities is that metabolites are difficult to purify because of their low abundance, hindering their structural characterization and the discovery of their biosynthetic pathways. In this project, we use CSPP, an innovative method developed in my lab, to systematically predict the structures of metabolites along with their biosynthetic pathways in poplar. This CSPP method is based on a combination of metabolomics and informatics. In a next step, the CSPP tool is combined with two complementary genetic approaches based on re-sequence data from 749 poplar trees to identify the genes encoding the enzymes in the predicted pathways. Genome Wide Association Studies (GWAS) are conducted to identify SNPs in the genes involved in the metabolic conversions. Subsequently, rare defective alleles are identified for these genes in the sequenced population. Genes identified by both approaches are then further studied either by crossing natural poplars that are heterozygous for the defective alleles, or by CRISPR/Cas9-based gene editing in poplar. The functional studies are further underpinned by enzyme assays. Given our scarce knowledge on the structure of most secondary metabolites and their metabolic pathways in poplar, this large-scale identification effort lays the foundation for systems biology research in this species, and will shape opportunities to further develop poplar as an industrial wood-producing crop.

We first optimized a protocol for high molecular weight (HMW) DNA extraction from poplar suitable for long-read genome sequencing using the Oxford Nanopore Technologies (ONT). Subsequently, woody cuttings from 750 poplar genotypes were grown in triplicate in a greenhouse and pure high molecular weight (HMW) DNA was prepared. We made a de novo assembly of a reference Populus nigra genome (accession ‘BDG’). To this end, we used ONT long-reads at 350x coverage, Illumina short-reads and Hi-C data. The total genome size is 388 Mb over 125 fragments (19 chromosomes and 106 scaffolds) with a N50 length of 20.3 Mb. Assembly is highly contiguous, with only 20 gaps remaining in the 19 chromosomes, and is estimated to be 96.2% complete according to a BUSCO analysis. Second, we generated long-read sequences of the 749 individuals composing the wild P. nigra population at an average depth of 23x. We identified over 9 million biallelic SNPs by retaining SNPs identified by three independent software packages. Our variant calling analysis also revealed ~128,000 structural variants/genotype, including insertions, deletions, inversions and translocations. We also identified large hemizygous regions (up to 1.2Mb long) that could not be detected by any of the three software packages.

The second objective was to establish the most optimal harvesting stage, tissue and extraction method for metabolite profiling. To this end, metabolite profiles of leaves of three developmental stages of 10 genotypes were generated by LCMS. The first fully mature leaf, at leaf plastochron index 5, generated the most informative metabolite spectrum. Furthermore, metabolites extracted from leaf material from one poplar genotype are used for purification and structural identification by the VIB Metabolomics Core facility. A high-throughput metabolite profiling method was established and used for the metabolite profiling of the leaf samples. The UPLC-MS metabolic profiles for the 749 genotypes in triplicate resulted in ~28,000 metabolite features.

We conducted four types of genome-wide association studies (GWAS) on the 28,000 features. Across all analyses, we detected 691,708 significant trait-variant associations at the genome-wide threshold (Bonferroni-adjusted P < 0.001) encompassing 15,645 metabolic features and 11,292 genes, with between 1 and 6,140 variants per feature. Of these genes, 3,423 (30.3%) are predicted to encode enzymes. The four GWAS approaches revealed both shared and unique gene associations, highlighting complementary aspects of the genetic architecture underlying the traits. Several genomic regions were repeatedly associated across multiple metabolite phenotypes and variant types, suggesting the presence of pleiotropic loci.

As proof-of-concept, our analysis has focused on 89 structurally-characterized poplar compounds. Using the EMMAX algorithm, a total of 3,259 significant trait-variant associations at the genome-wide threshold (Bonferroni-adjusted P< 0.001) were identified, spanning 171 genes. Among these associations, we selected 53 enzyme-encoding genes, 14 of which have been expressed in E.coli or yeast. For 3 genes, we already have proof-of-function based on enzyme-assays. For two candidate-genes, we have made vectors to knock-out the corresponding genes by CRISPR/Cas9 in Populus tremula x P. alba.

The project has generated genome sequences of 749 wild poplar accessions. Given that these trees are planted in a common garden at the INRAE in France, they can be analyzed for many other traits than in the ERC project, by many other researchers with diverse expertise. This will enable the identification of genes for a variety of traits to improve the quality and yield of poplar wood by breeding or new genomic techniques. The metabolites identified may be screened for bioactivity (e.g. anti-herbivore, anti-inflammatory, anticancer, ...) and, if positive, developed into agrochemicals or medicines.
We published a new method for GWAS, called QT-GWAS, based on qualitative traits rather than only quantitative traits (Brouckaert et al. 2023), and showed it delivers both supporting and complementary data compared to classical GWAS.

Logo POPMET

Periodic Reporting for period 4 - POPMET (Large-scale identification of secondary metabolites, metabolic pathways and their genes in the model tree poplar)

Download Download the content of the page