Skip to main content

Rational design of plant systems for sustainable generation of value-added industrial products

Final Report Summary - SMARTCELL (Rational design of plant systems for sustainable generation of value-added industrial products)


Executive Summary:

The SmartCell project was a top-down approach to develop novel strategies for the rational engineering of valuable secondary metabolites, focusing on the terpenoid indole alkaloid (TIA) pathway. These molecules are generally too complex for complete chemical synthesis and they are produced naturally in such low quantities that extraction from natural sources is uneconomical. SmartCell focused on metabolic engineering using plants and plant cell-based systems to produce key terpenoids. In turn this required the in-depth analysis of the corresponding pathway in terms of the genes, enzymes and intermediates, and the complex regulatory processes including transcriptional and translational control, the regulation of mRNA and protein turnover, the regulation of enzyme activity, and the compartmentalization of enzymes and intermediates. The main goal of the project was therefore to develop fundamental knowledge and enabling technologies to facilitate the rational engineering of plants and plant cells and the production of target secondary metabolites. The engineering strategy was divided into two phases: a fast track leading to 8 hydroxygeraniol and an advanced track leading to secologanin. Deep sequencing of the Catharanthus roseus transcriptome along with comparative proteomics and metabolomics was used for the identification and functional analysis of candidate genes. This resulted in the unravelling of all remaining unknown enzyme-encoding genes up to secologanin as well as several regulatory and transporter genes with roles in the terpenoid biosynthesis pathway and genes encoding terpenoid-decorating enzymes. A set of 15 selected enzymes (out of more than 60 screened ones) was expressed in yeast and E. coli and transiently in Nicotiana benthamiana and screened for activity against putative substrates, validating the complete pathway. Multigene transformation methods were developed to introduce the corresponding genes stably into tobacco in order to reconstruct the early part of the terpenoid pathway in plants, hairy roots and cell lines. Several analytical platforms were developed to profile the terpenoid content of these platforms using a combination of chromatography (UPLC-MS, GC-MS), NMR spectroscopy and flux analysis. The data formats and data processing methods were standardized for consistent handling of the large datasets and an experimental data repository was established. New bioinformatics tools were developed so that spectral differences between groups of tobacco and C. roseus samples with different genotypes and growth conditions could be measured, characterised and validated. SmartCell also made the initial steps towards the large-scale production of plants in the greenhouse and cultured plant cells and hairy roots in bioreactors by evaluating novel disposable systems and developed cryopreservation techniques for the most important cell lines. We determined that hairy root and cell suspension cultures provided the highest yields of geraniol (the proof-of-concept target compound in the demonstration activities) and established that the optimised systems and procedures could be used to manufacture ~1 g of geraniol in 41 days. The results from the SmartCell project will help to facilitate further research related to secondary metabolism, reduce the costs associated with the large-scale production of valuable pharmaceutical and industrial molecules and finally ensure that pharmaceuticals using plant-based systems are produced to the highest possible regulatory standards.

Project Context and Objectives:

Context of the project

Plants synthesise a vast number of diverse, low-molecular-weight molecules (known as secondary metabolites), many of which exhibit important biological properties and have industrial applications. Among more than 400,000 plant species, only 8% have been characterised for their chemical diversity. Higher plants, therefore, offer enormous potential for the discovery of new lead compounds with applications in medicine, agriculture and industry. The total chemical synthesis of secondary metabolites tends to be uneconomical because these molecules have complex structures and multiple stereomeric centres. This requires their isolation from plant sources even if the species are rare, endangered or difficult to cultivate. Genetic engineering can be used to recreate secondary metabolic pathways in more amenable species such as tobacco, but there have been no major commercial breakthroughs thus far, mostly because of low production levels linked to inherent biochemical constraints. A major factor constraining the effective modulation of plant secondary metabolism is our poor understanding of the biochemistry and molecular biology of complete metabolic pathways and their regulation at the systems level. This in turn has hampered the efficient and sustainable use of plants as green factories for the production of valuable industrial and pharmaceutical compounds. Single-step engineering may occasionally be successful, but in complex pathways the modulation of intermediate steps does not usually influence the accumulation of the desired end products. Therefore, in order to achieve high production levels, the rational engineering of complete pathways must be implemented and optimised in plants and in plant cell cultures.

Project goals and objectives

The main goal of the SmartCell project was to develop fundamental knowledge and enabling technologies to allow the rational engineering of plants and plant cells and the production of target secondary metabolic products. This would allow the production of compounds that are too complex for total chemical synthesis and too scarce to extract from natural sources. SmartCell focused on terpenoids, the largest group of secondary metabolites. These compounds have diverse biological activities making them useful as pharmaceuticals, flavours, fragrances and chemical building blocks for more complex structures. We focused on Catharanthus roseus, the Madagascar periwinkle, which produces important terpenoid indole alkaloids that can be used as anti-cancer drugs. We aimed to investigate the metabolic pathways leading to these molecules at the systems level (i.e. taking into consideration gene regulation, transcription, protein synthesis, enzyme activity, the compartmentalization and shuttling of enzymes and intermediates, and the feedback regulation that controls different steps in the pathway). We also aimed to engineer the pathway in an attempt to increase the production of key terpenoids in C. roseus and in a heterologous system, namely the laboratory-friendly and widely-grown non-food/feed crop tobacco. In both cases, we also explored the use of cell and organ cultures as production platforms for these molecules.

The specific scientific objectives of the project were:

• To understand and exploit the extraordinary diversity of plant biochemical capacity to produce valuable molecules
• To functionally characterise the genes of interest and systematically explore factors that control the formation, regulation, transport, stability and storage of secondary metabolites
• To combine knowledge of the molecular regulation in vitro and in planta in order to develop a holistic view of secondary metabolic pathways and their products at the systems level
• To develop and optimise methods for large-scale cultivation and downstream processing of secondary metabolites

The three main deliverables of the project were envisaged as follows:

• An integrated framework for the efficient engineering and production of secondary metabolites in plants and plant cells as green factories
• An interactive database of plant metabolic pathways integrated with sequence, transcriptome, proteome, metabolome and functional data, and in addition a repository of pathway-related genes ready for expression in diverse systems
• A validated industrial production system for the manufacture of valuable plant compounds

Project Results:

Overview

The SmartCell Consortium initially comprised 14 academic beneficiaries from nine European countries, two SMEs and three large end-user companies. One large company withdrew after nine months having altered their research focus, and one SME withdrew after 36 months due to financial difficulties. The project was divided into ten work packages as listed below:

• WP1: Characterisation and prioritisation of bottleneck genes in terpenoid biosynthesis
• WP2: Development and validation of multigene transfer systems
• WP3: Development of analytical platforms for secondary metabolites
• WP4: Data acquisition, integration and bioinformatics
• WP5: Metabolic pathway characterisation at systems level
• WP6: Optimising the plants as green factories
• WP7: Demonstration of plants as green factories
• WP8: Storage of data and resources beyond the span of the project
• WP9: Technology transfer and dissemination
• WP10: Project management

The first 42 months of the project focused on R&D, with the demonstration work package (WP7) deferred until suitable enabling technologies were in place. The terpenoid pathway engineering approach was divided in two phases: a fast track leading to 8 hydroxygeraniol and an advanced track leading to secologanin (see Fig.1 as an appendix). By the end of the first reporting period the set of genes leading to 8 hydroxygeraniol was identified (WP1). Those genes were introduced into tobacco plants (by multigene direct DNA transfer, WP2) and tobacco hairy root cultures (by Agrobacterium rhizogenes) followed by functional analysis (WP3, WP5). Efforts to discover all the missing genes from 8-hydroxygeraniol to secologanin were initiated by the deep sequencing of the Catharanthus roseus transcriptome, resulting in the identification of putative enzyme-encoding genes as well as regulatory and transporter genes. Genes encoding terpenoid-decorating enzymes were also discovered and evaluated.

Several analytical platforms were developed to profile the terpenoid content of different transgenic plants, hairy roots and cell cultures (WP3). These included chromatography (UPLC-MS, GC-MS), NMR spectroscopy and flux analysis. The data formats and data processing methods were standardized to allow consistent handling of the anticipated large datasets and an experimental data repository was established (WP4, WP8). The development of an IPR strategy and aspects of regulatory assessment for transgenic material were initiated (WP9).

The Consortium made the first steps towards the large-scale production of plants and cultivated plant cells and organs in bioreactors (WP6) by evaluating novel disposable systems (e.g. wave-type bioreactors) and conventional bioreactors. Plant cell cryopreservation techniques were also assessed.

During the second reporting period, the deep sequencing of the C. roseus transcriptome yielded a number of additional candidate genes, and the combination of de novo sequence assembly and transcript counting provided additional data to support proteomics analysis (WP5). Suitable fast-track genes leading to 8-hydroxygeraniol was identified, validated and expressed in tobacco plants and hairy roots. The consortium also synthesized predicted intermediate compounds in the pathway for functional validation assays and analytical experiments. Novel bioreactor technologies were evaluated and exchanged between partners, and further efforts were made to achieve the cryopreservation of plant cells, which was more challenging than expected.

During the third reporting period, the project focused on validating the sequencing results thus increasing the amount of useful data. RNA sequences from elicited/transgenic cell cultures and plant tissues were assembled in the database, which was searched to identify genes co-expressed with those already known to be active in the early iridoid pathway. This information was integrated with leaf epidermis and mesophyll proteomic data, revealing a set of promising candidates for the different predicted steps in the secologanin pathway. A large number of candidate genes (more than 60 genes) were selected for cloning and expression in Escherichia coli (soluble proteins) and yeast (membrane-anchored cytochrome P450 enzymes). A set of 15 selected enzymes was expressed transiently in Nicotiana bethamiana and screened for activity against predicted substrates, validating the complete pathway. Initial attempts were made to reconstruct the pathway in Nicotiana benthamiana, transiently, ahead of the creation of stable transgenic tobacco and C. roseus plants. RNA sequencing and co-expression analysis also revealed a set of 20 candidate transcription factors and 28 transporters likely involved in terpenoid biosynthesis.

The large number of resulting transformants carrying diverse combinations of genes was characterised using a combination of proteomics and metabolomics (LC-MS, GC-MS and NMR) to determine the roles of individual genes in the pathway. New bioinformatics tools were developed using several linear and non-linear methods so that spectral differences between groups of tobacco and C. roseus samples with different genotypes and growth conditions could be detected, characterised and visualised. A cloud-based application, called LOCIDA, for inspecting the project-related NMR data was developed at VTT to combine the bioinformatics methods with easy-to-use web-based interfaces http://github.com/avirkki/locida.

Different tobacco-based production platforms including whole plants grown in greenhouses and in vitro cultures as well as hairy roots and cell suspension cultures were evaluated in experiments to optimise production of geraniol, the proof-of-concept molecule to validate the fast track genes and demonstrate the feasibility of the developed processes for the production of target secondary metabolites in our chosen platforms. We established that hairy root and cell suspension cultures resulted in the highest product yields in the shortest times. Furthermore, hairy roots and cell suspensions could be cultured under controlled conditions in bioreactors enabling the reproducible and homogenous production of geraniol. We optimised biomass accumulation and product yields in suspension cells further by improving the nutritional and physical culture conditions in factorial design experiments. Fermentation was also carried out using disposable bioreactors to allow larger-scale production. Our initial purification strategies produced geraniol with purity in excess of 99%.

We demonstrated that the optimised plant-based systems and procedures could be used to manufacture approximately 1 g of geraniol. Tobacco hairy roots and cell suspension cultures were chosen as potential manufacturing platforms. We found that stirred and orbitally-shaken bioreactors were optimal for suspension cells, whereas wave-mixed bioreactors were more appropriate for hairy roots. After testing and validating different parameters, we achieved 1.8 ug/g fresh weight per day geraniol in suspension cells and 0.4 ug/g/day in hairy roots. The productivities were 5.5 mg/l and 1.3 mg/l geraniol in suspension cells and hairy roots, respectively.

By the end of the project, all the pathway enzymes were characterized and the complete pathway was validated experimentally (see Fig. 1 as an appendix). We also investigated the subcellular localization of the secologanin pathway enzymes and identified transporter genes involved in loganin and secologanin transport. Our major achievement was the discovery of the last four unknown steps in the (seco)iridoid biosynthesis pathway leading to secologanin and strictosidine. From the set of 20 candidate transcription factors identified by RNA sequencing, three showed promising results in functional testing.

For biological activity tests, comprehensive solvent extraction including three temperatures and three pressures was optimised to obtain a wide range of metabolites. NMR spectra were obtained for all samples, and the results were subjected to multivariate data analysis. There were clear differences in the metabolic profiles among the samples. A total of 162 tobacco hairy root extracts were evaluated in three different concentrations on a colorectal cancer cell line to determine potential cytotoxicity and the proliferation of the cells was measured. C. roseus hairy root lines carrying different gene constructs were also assessed for their biological activity, and these expriments will continue beyond the end of the project.

Our final aim was to demonstrate that our optimised plant systems and procedures can be used to manufacture geraniol on an industrial scale. This experiment was successfully conducted using cell suspension cultures expressing geraniol synthase (VoGes) in 20-L CultiBag and ATMI integrity WandMixer bioreactors. A maximum fresh weight up 275 g/L was achieved after 16 days of cultivation in the CultiBag, producing up to 12 μg/g fresh weight of geraniol. This yield is moderate as the whole 20 liter batch corresponding 5.5 kg of cell biomass resulted only 66 mg of geraniol. Thus 300 liter bioreactor run would have been needed to obtain 1 g of geraniol.

Progress in the individual work packages

WP1. Characterisation and prioritisation of bottleneck genes involved in terpenoid biosynthesis

Objectives:

• To elucidate the metabolic pathway leading to the biosynthesis of secologanin, the upstream terpenoid segment of the TIA pathway
• To isolate and validate the as yet unknown genes encoding the enzymes involved in this pathway, as well as regulators and transporters of metabolic intermediates in C. roseus
• To develop an extended toolbox including genes encoding transcription factors, enzymes and transporters from other plant species for the metabolic engineering of terpenoid/TIA metabolism
• To develop tools for terpenoid/TIA pathway engineering and reconstruction in plants, organs and cells.

Progress:

The identification of candidate genes/proteins and the elucidation of the pathway required convergent information from comparative transcriptomics and proteomics from analysis of cell cultures (in the presence or absence of jasmonate) and epidermis/mesophyll-derived protoplasts. One of our large efforts was the performance of the deep sequencing of C. roseus which led to the establishment of the comprehensive metabolic pathway databank (www.cathacyc.org).Our strategy was most effective and led to the identification of a set of potential regulators of the upstream TIA pathways, including two bHLH factors with strong potential to be master regulators of the jasmonate response and tools for metabolic engineering. Their activity was validated in planta thanks to the design and implementation of a suitable reporter system. Other potential regulators involved in the jasmonate response are still under investigation.

When focusing on appropriate families of enzymes, our strategy also pointed to a set of candidate genes likely to contribute to the sequential conversion of geranyl pyrophosphate into secologanin. The candidate genes were isolated and expressed in microorganisms (E. coli and yeast) for functional screening and validation. The candidate genes with functions demonstrated in microorganisms included three P450 enzymes:

• A geraniol 8-oxidase, G8O or CYP76B6, catalysing geraniol 8-hydroxylation and further oxidation of 8-hydroxygeraniol into 8-oxogeraniol
• An iridoid oxidase, IO or CYP76A26, converting cis-trans-nepetalactol into 7-deoxyloganetic acid, and
• A 7-deoxyloganic acid hydroxylase, 7-DLH or CYP72A224, catalysing the hydroxylation of 7-deoxyloganin into loganin

Furthermore, two oxidoreductases and one glycosyltransferase were involved:

• An 8-hydroxygeraniol oxidoreductase, 8-HGO, catalysing stepwise and reversible conversion of 8-hydroxygeraniol into 8 oxogeranial
• An iridoid synthase, IS, converting 8-oxogeranial into cis-trans-iridodial, and
• A 7 deoxyloganetic acid glycosyltransferase, 7-DLGT, catalysing the glycosylation of 7 deoxyloganetic acid into 7-deoxyloganic acid.

This set of enzymes was sufficient for the formation of secologanin and strictosidine in planta as confirmed by the sequential reconstruction of the pathway in N. benthamiana. This stepwise reconstruction validated the pathway sequence and enzyme products. One of the oxidoreductases (the iridoid synthase; IS) was recently also reported by others [Geu-Flores et al. (2012): Nature 492:138-142]. We also initiated further investigations showing that the reported IS gene is a member of a small gene family, the members of which all encode enzymes with IS activity, albeit with different tissue-specific expression profiles.

The tissue-specific expression of each gene of the secologanin pathway was then determined by in situ hybridization. This identified loganic acid as the metabolic intermediate transported from phloem-associated cells (where early pathway genes are expressed, leading to iridoids) to the epidermis (where late genes are expressed, leading to secologanin and strictosidine). This enzyme localization led us to re-examine our transcriptomic data to identify candidate transporters likely to be involved in the inter-tissue transfer of loganic acid. A set of eight NRT/PTR family candidates was thus selected that were homologous to an Arabidopsis thaliana transporter identified in a Xenopus oocyte-based pilot screen for secologanin importers. All eight C. roseus transporters were expressed in Xenopus oocytes for functional characterization and three were shown to be efficient loganin, secologanin and 7-deoxyloganic acid transporters. The transporter with the highest activity also co-expressed mostly with the pathway enzymes, and exhibited the highest expression overall. Loganic acid was also tested, and transported, but compared to the other three compounds it did not appear to be the preferred substrate in vitro, although this does not rule out that it might be the true substrate in planta. In addition, proteomic analysis of C. roseus leaf tissues and cell suspensions led to the identification of several transporters that are good candidates for transporting (other) intermediates of the TIA pathway.

A full tool box of C. roseus genes was thus obtained and is now available for engineering the (seco)iridoid and TIA pathways in other plants or microorganisms. We extended the investigations to other plant species using this toolbox of genes. In the model plant A. thaliana, in silico gene co-expression analysis led to the identification and functional characterization of a set of two terpene synthases (TPS10 and TPS14) and two cytochrome P450 enzymes (CYP76C3 and CYP75B31) involved in a complex floral linalool oxidative metabolism. The function of the CYP76 family of P450 enzymes in A. thaliana was then more systematically investigated, revealing the activity of three CYP76 enzymes as monoterpene alcohol oxidases, with different expression patterns and substrate specificities. The ecological significance of this monoterpene alcohol metabolism in A. thaliana is now under investigation. A collection of 290 A. thaliana transporters expressed in Xenopus oocytes was then screened for the transport of available intermediates of the secologanin and TIA pathways, leading to the identification of two loganin transport activities, one of them belonging to the NRT/PTR family. In addition, three vinblastine transporter activities were identified. Characterisation of transporters was then extended to Nicotiana tabacum. Several ABC transporters belonging to the pleiotropic drug resistance family were expressed in N. tabacum suspension cells for transport analysis leading to the identification of diterpene transporters. In total 64 new genes were functionally characterized. These encoded transcription factors, terpene synthases, cytochrome P450 enzymes, oxidoreductases, glycosyltransferases and transporters.

WP2. Development and validation of multigene transformation systems

Objectives:

• To develop practical methods for the functional analysis of the prioritised genes
• To generate transgenic cell lines and plants for in depth analysis

Progress:

Three different approaches were explored to determine the most cost-effective and reliable multigene transfer method, specifically for the early part of the terpenoid pathway:

• Co-transformation of independent genes in individual expression vectors (one gene per vector) using direct DNA transfer
• The A2 oligopeptide for co-expression of multiple proteins encoded by the corresponding genes in a single transformation step
• New Gateway-compatible vectors combining three genes, each under the control of a different constitutive promoter and with different 3' ends

All three methodologies were developed and can now be made available to the scientific community, but the co-transformation of independent genes was selected as the best approach for the main objectives of the SmartCell project. As part of the development of a useful toolbox for pathway engineering, specific promoters were used to make a series of constructs for metabolic engineering in specific tissues. Targeting sequences for specific subcellular localisation were also incorporated into the constructs. We were accordingly able to introduce multiple genes into plants and plant cells and recover transgenic material expressing these genes. Two different series of experiments were carried out.

In the fast track, we introduced four genes believed at the start of the project to be required for the synthesis of 8-hydroxygeraniol (AtGPPS, VoGS, CrCPR and CrCYP76B6). Later, we found that only the first two genes are needed for the synthesis of 8-hydroxygeraniol. An unexpected result of the fast track experiments was that certain transgene combinations caused toxicity in the recipient plant tissues, and the basis of this phenomenon remains unknown.

At a fundamental science level we investigated the impact of importing the early monoterpene secoiridoid pathway into tobacco by the comparative metabolic profiling of wild-type and transgenic plants (and hairy roots derived from them) using high-field NMR spectroscopy. Unexpectedly this revealed the complete suppression of the terpenoid pathway in the transgenic plants, as well as important interactions with non-target pathways in both transgenic plants and hairy roots. In order to identify changes in the expression and regulation of nicotine and phenylpropanoid pathway genes, and to develop a better understanding of the biological significance of the different metabolite profiles in transgenic and wild-type tobacco plants, we used qRT-PCR to measure the expression of five endogenous genes from the nicotine pathway as well as phenylalanine ammonia lyase. Transcripts for all the genes were detected in all samples, but the transgenic plants accumulated lower levels of the qprt, odc, pmt and A622 transcripts and significantly higher levels of mpo mRNA. These results confirm that the nicotine pathway genes are differentially regulated in transgenic and wild-type plants, whereas the abundance of pal1 mRNA suggests that the gene is upregulated in all transgenic plants.

Importantly our results established that metabolic engineering can be used to reconstruct a complete pathway in a heterologous organism, but that the function of the pathway may be affected by complex regulatory networks and the compartmentalization of enzymes, substrates and intermediates at the subcellular level. Regulation of the terpenoid pathway is complex, and the control mechanisms are poorly understood. The engineering of any component of the terpenoid pathway may have a direct impact on other branches of the pathway. The metabolic analysis of transgenic tobacco plants and hairy roots expressing AtGpps and VoGes provided insight into this complex interaction and highlighted the importance of understanding the multilevel regulation of endogenous metabolic pathways. The unexpected impact on non-target metabolites suggests that the expression of AtGpps and VoGes in transgenic tobacco plants and hairy roots triggers a stress response, suppressing terpenoid biosynthesis and inducing the accumulation of protective metabolites such as nicotine and/or betaine, and the depletion of others such as choline and inositol. The analysis of transgenic hairy roots showed that the metabolic profile in this tissue was distinct from that in the transgenic plants from which the roots were derived. These data suggest that the regulation of secondary metabolism, and the consequences of metabolic engineering on the production of target and non-target metabolites is not only dependent on the compartmentalization of the pathway but also on the state of differentiation of the host tissue. Therefore multiple levels of regulation must be considered simultaneously to achieve the efficient synthesis of valuable terpenoid natural products in plants/plant cells by metabolic engineering. Thus our results provide further insight into challenges that must be addressed when developing an effective metabolic engineering strategy in plants for the modulation of secondary metabolism.

In the advanced track, genes cloned during the course of the project were combined and incorporated in multigene transformation experiments. The many different candidate genes made it necessary to carry out multiple overlapping rounds of transformations to capture the combinations of candidate genes believed at a given point in time to be the genes involved in the pathway to secologanin. In terms of the multigene transfer component of the program, we were able to recover plants containing all the different input transgenes. However, their expression varied and the general conclusion from these experiments at the end of the project was that the validated candidate genes were confirmed only at the end of the project. Consequently the most appropriate gene combination could only be used in transient expression experiments (see WP1).

Another important conclusion was that results from transient expression experiments did not mirror the situation in stably transformed plants containing the same transgenes. A number of explanations could account for this fact, including the massive levels of transgene expression and concomitant enzyme accumulation in transiently expressing cells versus the much lower levels of expression in stably transformed plants.

WP3. Development of analytical platforms for secondary metabolites

Objectives:

• To evaluate and harmonise extraction procedures and available analytical systems in order to broaden metabolite coverage
• To integrate the existing different platforms for metabolomics available from the partners (GC-MS, LC-MS, NMR)
• To develop methods allowing metabolite identification
• To establish protocols for the targeted analysis of specific compounds
• To establish protocols for flux analysis

Progress:

Several methods were routinely employed during the SmartCell project:

• Targeted analysis of TIAs and precursors in plant and cell culture extracts by HPLC-PDA and/or LC-MS
• Non-targeted analysis of metabolites in plant and cell culture extracts by NMR spectroscopy
• Non-targeted analysis of volatile and semi-volatile metabolites in plant and cell culture extracts by GC-MS
• Targeted and non-targeted analysis of non-volatile metabolites in plant and cell culture extracts by (UP)LC-(QTOF)-MS

In order to set up standard protocols for extraction and analysis, four types of plant material were selected: tobacco (Nicotiana tabacum Petit Havana SR1) leaves, tobacco (Nicotiana tabacum BX) hairy roots, C. roseus hairy roots and C. roseus cells. Extraction and analysis protocols were established and standardised across the Consortium, allowing the reliability and reproducibility of the methods to be confirmed and to produce a test dataset against which other partners could compare their results.

1H-NMR spectra from C. roseus hairy root extracts revealed signals for secologanin, serpentine, ajmalicine, loganic acid, glucose (low levels) and sucrose (high levels) and some aliphatic amino acids/organic acids (beta-alanine, aspartic acid, malic acids, glutamic acid/glutamine and alanine). Multiple signals were found in the aromatic region of the spectra, but these would have required further analysis by 2D NMR and/or compound separation/isolation. A comparison with TIA standards showed that TIAs were not present in the spectra of C. roseus cells. The signals in the cell culture extract were also much lower than those of the hairy roots extract. Glucose and sucrose were found in higher amounts, and there was also more malic acid but less valine compared with C. roseus hairy roots. In tobacco, we identified the presence of nicotine as a major compound in both hairy roots and leaves, and smaller aromatic peaks in the leaf samples which did not match any standards. Two caffeoyl quinic acids were also identified in tobacco leaves but not hairy roots.

GC-MS analysis of the C. roseus samples also revealed the presence of TIAs. PCA analysis showed that the technical replicates clustered closely together (as anticipated) and that the four different plant materials clustered separately. GC-MS analysis using the standardised procedures we developed yielded reproducible results appropriate for statistical analysis, making this a suitable tool for the analysis of engineered plants, hairy roots and cells.

The targeted and non-targeted analysis of non-volatile metabolites by UPLC-MS revealed the presence of secologanin and loganin in C. roseus hairy roots. Several amino acids (serine, arginine, glycine, ethanolamine, aspartic acid, glutamine, threonine, alanine, -aminobutyric acid, proline, tyrosine, lysine, methionine, valine, leucine, isoleucine and phenylalanine) were also detected and quantified in all four samples by UPLC-UV with AccqTAG derivatisation. Further unknown compounds detected either by AccqTAG derivatisation (compounds with primary or secondary amino groups) or UPLC-LTQ-Orbitrap MS method were not fully identified in these samples.

The full characterisation of metabolic changes requires the use of flux analysis methods. An isotope-labelling strategy was applied in tobacco plants and C. roseus cells and hairy root material followed by GC-ToF-MS for primary metabolites and LC-MS analysis for secondary metabolites. After the cells were elicited with 13C-labelled methyl-jasmonate, the substrate was introduced and the samples were analysed by GC-TOF-MS and UPLC-FT-MS. The 13C enrichment was calculated by measuring the changing abundance of isotopic spectral fragments of the molecular ion of each metabolite, with natural abundance corrected by reference to an unlabelled control. The percentage of label present in each metabolite was calculated from the total abundance of 12C and 13C ions in a particular metabolite pool.

WP4. Data acquisition, integration and bioinformatics

Objectives:

• Standardisation of data formats and data processing
• Establishment of an experimental repository
• Establishment of a database of secondary metabolites
• New methods for the reconstruction of secondary metabolic pathways

Progress:

The standardisation of data formats and data processing involved the provision of recommendations and methods for the standardisation of all data generated in the project, followed by established community standards, and the preparation of a guideline report for data generation, processing and analysis. As a result of this work, standard data templates and guidelines for data storage were developed. The data format was developed on the basis of the data matrix presentation in standard multivariate statistics, but augmented with extra metadata in the file headers to facilitate data integration in the later phases of the project. The format was developed to be compatible with Microsoft Excel but a binary data repository was also established initially to allow free-format data interchange and to secure storage.

The establishment of the repository involved the development of standard forms and schemes for data entry, including database functionality, and a data access, retrieval and mining interface. The standardised data formats described above were integrated into a relational data model, and the schema definition was stored in a single spreadsheet-readable file that was used to generate both the human-readable documentation and the database schema definition automatically. This structure was designed to avoid the submission of redundant information. The data repository was integrated with relational database architecture and the R language, and environment for statistical computing. These choices allowed efficient back-ups, direct web integration, secure data access, and mining and visualisation with both ready-made and custom algorithms in R.

The establishment of a database of secondary metabolites involved the development and testing of the existing database and repository using experimental data. This work included a complete revision of the middle tier logic between the standardised data templates and the data model, and systematising the functionality as a package for the R language instead of having it in a collection of separate scripts. The improved procedure enabled two-step data curation before committing data to the database, and enabled convenient queries through the abstract database interface in the R language.

The new methods for the reconstruction of secondary metabolic pathways were experimental and interdisciplinary, and incorporated elements of data science, signal processing and statistics in addition to the database and processing techniques. This led to the development of a novel local intensity difference analysis tool (Locida) to distinguish and to explore visually the biochemical distances between groups of samples. The method was developed to explore the project NMR data, but after testing it was generalised to all kinds of NMR measurements producing one-dimensional spectra. The algorithm is based on the observation that keeping data un-scaled simplifies its interpretation, whereas a carefully chosen non-linear detection function can still reliably detect the statistically sound differences between groups of differently treated samples. The method simplifies the analysis of data by allowing the analyst to omit the selection of different scaling and pre-processing methods mandatory for the use of the classic dimension reduction algorithms (such as PLS or PCA). The virtue of the new detection method in Locida is that it behaves approximately linearly with small differences, but limits the contribution of large differences such that they cannot mask smaller, but relevant changes. The visualization component together with its source code is already available for broader use at http://github.com/avirkki/Locida.

WP5. Metabolic pathway characterisation at the systems level

Objectives:

• Systems analysis of plants and plant cells (including expression analysis, comprehensive metabolite analysis and fluxomics)
• Assessment of biological activity

Progress:

The development of platforms for transcriptome, proteome and metabolome analysis was the key to the identification of all remaining unknown steps in the iridoid pathway. Both known and missing genes could be identified by comparing the total transcriptome under different conditions with the proteome and metabolome data, thus eliminating all genes not involved in the TIA pathway. These available platforms will serve as an excellent basis for gene discovery work also in the future.

Transcriptome analysis was applied to the fast track and advanced track transgenic tobacco, N. benthamiana and C. roseus samples. The tobacco and N. benthamiana materials were analysed to confirm the expression of the transgenes. The C. roseus materials were analysed to identify the genes expressed in alkaloid-producing cells, and to select genes involved with iridoid and/or alkaloid production from the large set of candidate genes. Two C. roseus cell culture lines were analysed at the proteome level, allowing the identification of proteins related to iridoid and/or alkaloid production. Because 2D-DiGE only identifies a limited number of proteins, a gel-free approach was applied involving orthogonal LC separations of tryptic peptides followed by mass spectrometry. This identified 1663 proteins, which allowed us to reduce the number of candidate genes from the transcriptome analysis and confirmed the presence of proteins. The transcriptomic and proteomic data facilitated the functional analysis of additional TIA pathways and the isolation of the as yet unknown corresponding genes.

The GC-MS, LC-MS and NMR metabolomics platforms were developed and validated in WP3. The samples used above for transcriptome analysis were also subjected to one or more of the metabolomics analysis methods. The data were stored in the database developed in WP4. Although many known metabolites can be identified and quantified using these metabolomics methods, the majority of the signals were unidentified. However, by focusing on signals correlating with iridoid/alkaloid biosynthesis it was possible to identify many relevant compounds. To facilitate the identification of the iridoid pathway intermediates, a library of synthetic intermediates was procured to help to identify intermediates in the fast track and advanced track samples. A number of geraniol glycosides were tentatively identified. The intermediates were also important as substrates for the functional analysis of enzymes expressed in yeast, and demonstrated that other pathways were affected, e.g. the phenylpropanoid and nicotine pathways. The metabolomics databases now contain qualitative and quantitative data for the major primary and secondary metabolites in all samples.

A system for growing plant cells and hairy roots in medium containing 13C-labeled early precursors for flux measurements was developed and used to monitor the flux from primary to secondary metabolism. Transgenic tobacco roots expressing GES and GPPS were fed with unlabelled, 99%[1-13C], 99%[2-13C] or 20%[13C6] glucose (parallel feeding). GC-MS was then used to establish the incorporation rates allowing the fluxes to be determined. In C. roseus cell suspension cultures, the effect of jasmonate on metabolic fluxes was investigated by feeding the cells fully-labelled 13C-pyruvate followed by UPLC-FT-MS analysis, revealing that loganic acid accumulated in the elicited cells. This demonstrates that:

• Production of the key intermediate loganic acid does not appear to be constrained by the supply of precursors from primary metabolism; and
• Combined metabolite profiling strategies coupled with stable isotope labelling is an effective way to monitor carbon flow from primary to secondary metabolism.

In a further study the channelling of C5 units over the various groups of terpenoids was estimated by analysing each of the groups with targeted methods. The mevalonate pathway has a considerably lower output (steroids) than the MEP pathway (TIA and carotenoids). The carotenoid pathway may offer the potential to channel more C5 into the TIA pathway. These experiments showed that fluxomics is feasible in plants and plant cells, particularly for primary metabolism. For secondary metabolism targeted analytical tools are required to deal with the low quantities of terpenoids or their intermediates.

The advanced track transgenic plants and hairy roots became available at the very end of the project so the original plan to compare transgenic cell lines expressing different combinations of input transgenes was not feasible. Instead a novel approach was developed involving comprehensive gradient extraction to yield ~30 fractions, NMR-based metabolomic analysis followed by the application of multivariate statistics to rapidly pick up signals correlating with the activities found in the fractions. A total of 162 tobacco hairy root extracts were evaluated in three different concentrations on a colorectal cancer cell line to determine potential cytotoxicity and the proliferation of the cells was measured. C. roseus hairy root lines carrying different gene constructs were also assessed for their biological activity, and these expriments will continue beyond the end of the project.

WP6. Optimising the plants as green factories

Objectives:

• To evaluate and optimise the production platforms for tobacco plants, hairy roots and cell suspension cultures as well as C. roseus cell suspension cultures
• To establish scale up systems for plants, hairy roots and plant cells
• To establish extraction and purification processes for plants, hairy roots and plant cells

Progress:

Different plant-based production platforms were evaluated and optimised for the production of terpenoids including the scale-up of upstream production and downstream processing for systems based on plants, hairy roots and cells suspension cultures. Because transgenic plant material containing the improved metabolic pathways was not available during the project, the process steps were developed and optimised using plant systems expressing the geraniol synthase gene isolated from Valeriana officinalis (VoGes), resulting in the accumulation of the monoterpenoid product geraniol. To identify the most optimal production platform, we analysed different systems including intact tobacco plants grown in hydroponic solution or soil in the greenhouse or as sterile in vitro cultures, as well as tobacco hairy roots and cell cultures in suspension. Finally we optimised a Nicotiana benthamiana transient expression system based on the vacuum infiltration of recombinant bacteria carrying the VoGes expression cassette into leaves.

The growth of plant material and geraniol production was maximised by testing different nutritional and physical cultivation parameters (e.g. nutrient supply, elicitors, media ycomposition, medium additives, temperatures, light regime/intensity and the application of mild stress conditions). We also included statistical experimental techniques such as factorial designs and response surface methodology to screen a large number of medium compositions and physical factors such as light intensity. The optimisation experiments improved the geraniol productivity significantly, reflecting a combination of enhanced biomass production and/or geraniol accumulation in the plant material. For example, geraniol productivity in tobacco cell suspension cultures is mainly determined by light intensity, which affects geraniol synthesis, and the concentration of sugar in the medium, which boosts biomass growth, resulting in an 18-fold increase in geraniol yields compared to the original VoGes line.

Following the optimisation steps we compared the ability of the different plant-based expression platforms described above to produce geraniol. The highest yield of geraniol was produced by intact transgenic plants cultured in vitro (48 μg/g fw), followed by the transient expression system (27 μg/g fw), transgenic plants under hydroponic conditions in the greenhouse and cell suspension cultures (16 μg/g fw), and finally hairy root cultures (9 μg/g fw). Differences in biomass production and the duration of cultivation resulted in a spectrum of geraniol productivities. Tobacco cell suspension cultures achieved a geraniol production rate of 1.8 μg per gram fresh biomass per day, whereas transient expression produced 5.9 μg per gram fresh biomass per day (if cultivation prior to agroinfiltration was ignored) or 0.5 μg per gram fresh biomass per day (if cultivation prior to agroinfiltration was included). The superior productivity, strict process control, and simple handling procedures available for transgenic cell suspension cultures, and their compliance with GMP guidelines, suggest that cells cultivated under defined and controlled conditions in bioreactors are the most promising system for optimisation and ultimately the scaled-up production of geraniol.

Geraniol production using intact plants in the greenhouse was easily scaled up by increasing the number of plants growing in parallel. We demonstrated the scalability of geraniol-producing tobacco plants grown in soil or hydroponic systems such as the nutrient film technology. After cultivation, the product was extracted from harvested leaves. An aeroponic system was also established using spraying to provide the culture medium. The major advantage of this system was that the product can be extracted by physical or chemical permeabilisation without destroying the roots. This so-called “milking” procedure allows the semi-continuous harvesting of target compounds. However, it was not possible to extract significant amounts of geraniol from aeroponically grown tobacco plants because geraniol accumulated at low levels in the roots of the transgenic plants.

Hairy roots and cell suspension cultures were scaled-up using different bioreactor systems, including stirred-tank steel bioreactors as well as disposable wave-mixed and orbitally-shaken bioreactors. Importantly, it was possible to scale up from small volumes (e.g. Erlenmeyer flasks) to 20-L bioreactors without a significant reduction in biomass and geraniol productivity. Tobacco cell suspension cultures produced 160 g cell biomass in 2-L wave-mixed disposable bag reactors, and 26 µg geraniol per gram fresh weight. The cells were cultivated in batch mode for 16 days. Similarly, there was no significant difference in biomass growth when tobacco hairy roots were cultivated in batch mode in 2-L wave-mixed or orbitally-shaken bags. The final total hairy root biomass was 29 g fw (1.6 g dw) in the wave-mixed bioreactor and 25 g fw (1.4 g dw) in the orbitally-shaken bag. The total geraniol yield per cultivation bag was 160–170 μg/g dw in both bioreactors. When the roots were cultivated in feeding mode, there was a positive influence on biomass propagation. The repeated addition of fresh medium resulted in 56% more dry biomass compared to batch cultivation and delivered an average total geraniol yield of 165 μg/g dw per bag. This was similar to batch mode, but because more biomass was generated in feeding mode the total geraniol yield was 1.6 times higher. Scale-up experiments achieved the successful transfer of the procedure established at the 2-L scale. Cultivation in wave-mixed 20-L bags resulted in a final total biomass of 185.3 g fw (10.7 g dw). The total geraniol concentration reached 204 μg/g dw and therefore milligram quantities of geraniol can be produced in one cultivation bag.

A pilot scale downstream process was established for geraniol produced in tobacco cell suspension cultures (5 kg cell material) based on steam distillation and subsequent FLASH chromatography. Cells were disrupted by ultrasonication or by using a Polytron homogenizer, and enzymatic deglycosylation was performed in a rotary evaporator with a large-scale (20 liters) evaporation flask. The distilled fractions containing geraniol were extracted in a separation funnel using pentane and the concentrated organic fraction was fractionated on a FLASH column. After all purification steps the recovery of geraniol was ~80% with a purity of >99%.

WP7. Demonstration of plants as green factories

Objectives:

• To demonstrate the ability and efficiency of optimised plant-based systems and procedures to manufacture sufficient amounts of selected terpenoids

Progress:

The principal aim in WP7 was to demonstrate the suitability of the optimised plant cell-based system established in WP6 for the economically feasible production of selected terpenoids. This was achieved by determining the most cost-effective system for the production of 1 g geraniol.

Based on the work carried out earlier in the project, we focused on the three most promising systems:

• Tobacco (Nicotiana tabacum cv. Petit Havana SR1) hairy roots stably transformed with the construct p-GES 150210/25 (hereafter “tobacco hairy roots”)
• Tobacco (Nicotiana tabacum cv. Samsun NN) cells stably transformed with the construct VoGES (hereafter “VoGES suspension cells”)
• Taxus spp. Grown under aeroponic conditions

Tobacco hairy roots delivering up to 165 µg/g dw of geraniol were grown in three different single-use bioreactors (orbitally-shaken 2-D and 3-D bags, and the wave-mixed BIOSTAT CultiBag RM basic). The most promising results at the bench-top scale were achieved using 2-L and 10-L CultiBags. Similar biomass growth and geraniol productivity was observed in 28-day fed-batch cultivations. However, because of the root morphology, transfer to pilot-scale production (20-L and 50-L CultiBags) required the development of a “bag-to-bag inoculation strategy”. The 50-L CultiBag produced ~1.2 kg of fresh biomass and 9.4 mg of geraniol over an 8-week cultivation period. Thus 117 kg of fresh biomass would be required to produce 1 g of geraniol in this system. The orbitally-shaken single-use systems (20-L CultiBag RM basic / Infors Multitron and the SB50-X from Adolf Kühner AG) had lower productivity. They produced 49% of the biomass and 46% of the geraniol compared to the wave-mixed BIOSTAT CultiBag RM.

VoGES suspension cells delivering up to 212 µg/g dw geraniol were grown in the BIOSTAT RM (prototype for light cultures) with a CultiBag RM 20 L basic, and the ATMI Integrity Wandmixer 20 L (tumbling system from ATMI Life Sciences). Illumination (day-night cycle) was ensured by internal LEDs (BIOSTAT CultiBag RM prototype) or external fluorescent tubes (Wandmixer). The wave-mixed 20-L CultiBag achieved a final biomass fresh weight of 274 g/L and 33 mg of geraniol after 16 days, whereas the tumbling AMTI Integritiy Wandmixer achieved 183 g/L (19 mg geraniol). Thus in the optimal system 600 liter bioreactor would be needed to produce 1 g geraniol.

It was not possible to establish aeroponic cultures from geraniol-producing tobacco plants during the project so we initiated the same process for Taxus spp. plants producing the complex terpenoid paclitaxel. First, different Taxus species were screened for their production profile and different culture media were evaluated in order to optimise paclitaxel production in the roots. Multiplication through cuttings yielded more than 1000 Taxus spp. plants which were grown under aeroponic conditions in the area of 16 m2. The roots were harvested twice a year, with a total fresh biomass productivity of 0.98 kg/m2/year. The paclitaxel content in the dried was dependent on the harvest season (0.03% in October and 0.17% in April). The aeroponic culture of Taxus spp. plants resulted in a paclitaxel productivity of ~150 mg/m²/year.

The VoGES cell suspension culture was clearly superior to the other platforms in terms of scalability and production efficiency. Therefore, a generic process for the manufacture of 1 g of geraniol was developed for the VoGES suspension cell line as follows:

• Initial inoculum production in shake flasks
• Second inoculum production in a wave-mixed BIOSTAT CultiBag RM 20/50 (CultiBag RM 20 L) with LEDs
• Fed-batch geraniol production in a BIOSTAT CultiBag RM 600 (CultiBag RM 600 L) with LEDs

The production of 1g geraniol would be feasible within 41 days (89 kg fresh cell suspension biomass with a geraniol content of 12 mg/kg fw) taking into account all the three above mentioned production steps.

We have successfully established a proof-of-concept for large scale production of geraniol. The industrial commercialisation of a specific product made in plant cells is dependent on the value of the product, and therefore cannot be generalised. With commercialisation as a target, the sale value of the product determines the medium and long term needs in terms of optimising the production process and competitive positioning in the market. All the tools we have developed for pilot-scale production of geraniol can now be applied with suitable modifications to more high-value plant-based products.

WP8. Storage of data and resources

Objectives:

• To ensure the outputs of the project have a long-lasting impact and facilitate future metabolic engineering projects
• To create data and gene banks, a compound library and a culture collection for the most important cell lines generated during the project

Progress:

To ensure the long-lasting impact of SmartCell, it is essential to preserve and facilitate access to all the valuable resources developed in the course of the project. We therefore focused on the structuring, storage and management of all the information generated during the course of the project. This included designing and establishing the following four resources:

• An ‘omics’ data bank
• A physical gene bank
• A compound library, and
• A culture collection

The ‘omics’ data storage system was developed to store the transcript sequences, transcriptome and proteome data, and metabolomic profiles generated during the project. This can support the generation of metabolic pathway maps by providing the experimental data in a suitable format for further mining, thus offering a sound technological basis for future experiments. A ‘Data Storage and Curation’ procedure and a detailed technical guideline, an ‘Outline for Pathway Reconstruction’, and a tentative web interface design were generated and distributed among the partners.

The pre-processed data were generally stored as spreadsheet (xls) files with fixed attribute names. The data were further divided into separate folders corresponding to different participating institutions. This folder structure, along with an automated back-up system, prevented the accidental deletion of data, whereas storing the data in two formats, in the original spreadsheet and in an SQL database, provided maximal flexibility. The network drive can store any data, including raw source input and free-format documentation, whereas a rigorous database structure was required for the practical combination and mining of experimental data.

One of the major achievements of the project was the establishment and release of an expression and sequence databank containing the SmartCell deep sequencing data (www.cathacyc.org). These data needed to be available to all partners in first instance (during the project) and have now been released publicly. The data are based on the RNA-Seq analysis of C. roseus suspension cells and seedlings treated with methyl jasmonate plus appropriate controls, resulting in ~58 x109 bases of sequence data. By combining and joining all these sequences, an exhaustive reference set of 31,450 unigenes was generated, mainly comprising full-length transcripts representing the entire C. roseus transcriptome. The comparison of our sequence set with de novo assemblies from a subset of the publicly available Medicinal Plant Genomics Resource (MPGR) (http://medicinalplantgenomics.msu.edu comprising RNA-Seq data from more than 20 different tissues of C. roseus plants grown under standard conditions, did not yield any longer transcripts.

The high-quality unigene set was used to predict open reading frames, although the presence of indels in the assemblies of some transcripts occasionally resulted in more than one assignment per transcript. We assigned a functional description to each transcript thus transferring associations from plant reference genomes to the query C. roseus transcriptome.

We also built ‘fake chromosomes’ to facilitate storage and analysis in the ORCAE database by concatenating the set of 31,450 transcripts joined by a spacer of 2,000 N. This resulted in seven chromosomes, the first six of which contained 5000 transcripts each. This platform, accessible at http://bioinformatics.psb.ugent.be/orca allows the predicted open reading frames to be edited, curated, annotated and associated with further analysis such as protein domains, BLAST alignments and expression data, depicted as bar diagrams. As well as displaying gene-related data, the database offers BLAST and text search options. We also built the C. roseus RNA-Seq atlas in ORCAE, to visualise the expression profiles of C. roseus genes as bar diagrams. The current public version of the C. roseus RNA-Seq atlas holds the expression data from the MPGR consortium as well as the SmartCell RNA-Seq expression data.

A detailed metabolic pathway database (Cathacyc) based on the SmartCell and public C. roseus RNA-Seq data sets. CathaCyc (version 1.0) contains 390 pathways with 1347 assigned enzymes and spans primary and secondary metabolism. The pathways are linked with the synthesis of monoterpenoid indole alkaloids and triterpenoids, their primary metabolic precursors and their elicitors, the jasmonate hormones. CathaCyc offers a range of tools for the visualisation and analysis of C. roseus sequences, metabolic networks and ‘omics’ data to the broad scientific community.

One of the original tasks was the construction of a gene library to store the gene sequences identified and cloned in the SmartCell project. To install such a library and ensure its efficient management, the legal structure and the storage system for the gene bank were successfully drafted before the project commenced. The idea of the legal structure was that the collection of constructs generated during the project would be maintained by VIB to distribute the vectors for non-commercial research purposes. As a legal framework, we will work with two separate material transfer agreements, one giving VIB the right to distribute the constructs and another executed between VIB and the party requesting the constructs.

A database has been created on the SmartCell staff website, initially comprising a single table containing information on SmartCell constructs with external links to further resources on the genes and vectors that would in the longer term become publicly accessible, and eventually expanded to comprise multiple pages for ease of navigation. However, a public gene bank with physically stored and publicly available clones has not been constructed yet because none of the genes cloned and analysed during the project has yet been published or patented yet, but this will be addressed in the future.

A compound library of 12 metabolites was built and served as a source of standards. The terpenoid standard compound library was created because many of the intermediate compounds in the terpenoid indole alkaloid pathway are not commercially available. The compounds were, however, important for the functional analysis of the candidate genes as well as in the metabolomics experiments for identification of the gene products. All compounds from the putative secoiridoid pathway have been gathered in the compound library. Loganic acid and loganin were obtained from Extrasynthese whereas geraniol and secologanin were obtained from Sigma-Aldrich. We prepared 8-hydroxygeraniol, 8-hydroxygeranial, 8 oxogeraniol and 8-oxogeranial from commercially-available geranyl acetate (Sigma-Aldrich) according to a protocol developed by Mark Overhand from the Organic Synthesis department at the Leiden Institute of Chemistry. Iridodial, 9-hydroxyiridodial, iridotrial and 7-deoxyloganetic acid were synthesised as the chemically more stable glucosides from the natural compound aucubin prepared from extracts of Aucuba japonica leaves. The deglucosylated pathway intermediates were prepared prior to experiments by deglycosylation of the glucosides with commercial almond glucosidase (Sigma-Aldrich). All compounds are stored and distributed by Leiden University.

A number of cell, tissue culture and in vitro plant lines were generated during the project, many of which are unique because they carry specific genes identified during our screening experiments. The establishment of cell cultures is laborious and time-consuming and only a few methods have been described for the long-term storage of such resources. Our aim was to create a set of culture collection and cryopreservation standard operating procedures for the most important transgenic cell lines that were generated during the project.

The maintenance of cell culture banks using solid or liquid media is laborious and time-consuming, because the large numbers of cultures generated in the project have unique growth requirements, culture media and sub-culturing intervals. Long-term maintenance also results in ‘drifting’ caused by genetic/epigenetic modifications and somaclonal variation, resulting in changes to the original culture properties. However, storing cells in ultra-low temperature environments (cryopreservation) ensures they maintain their original properties and are ready to be used in subsequent projects. Many of the cell and tissue cultures generated in SmartCell are relevant to future industry partners. Therefore, in order to avoid the risks involved in routine maintenance, cryopreservation methods were developed and a database for plant collection was created at the VTT Biological Resource Centre (http://culturecollection.vtt.fi).

Successful cryopreservation methods were established for tobacco BY-2 cell suspension cultures and SR1 Petit Havana hairy root cultures, but not for C. roseus cell suspension cultures and hairy roots. The cryopreservation of plant cells still relies on fully empirical testing, which makes the development and optimisation of methods laborious and time-consuming. Therefore more studies are required to establish reproducible cryopreservation methods for C. roseus resources.

WP9. Technology transfer and dissemination

Objectives:

• Ascertain compliance with regulations governing the use of the production platforms developed in SmartCell and develop standard operating procedures for the use of transgenic plants or plant cells for production of secondary metabolites
• Intellectual property protection and exploitation of research results
• Dissemination to the scientific community, the public, and policy makers
• Training of the next generation of scientists specialising in metabolic engineering

Progress:

The transfer of our optimised metabolic engineering platforms to industry requires the development of standard operating procedures for all production and processing steps. Although standard operating procedures can only be formalised when a final product or process has been established, we were able to develop SOPs based on the available plant material (tobacco plants, hairy roots and cells producing geraniol) and therefore delivered appropriate procedures for their upstream cultivation as well as downstream processing for the extraction and purification of metabolites such as geraniol. We also assessed compliance with regulatory aspects that could affect the industrial application of SmartCell technologies. All the processes we developed were carried out in containment (greenhouses or bioreactors) so we concluded that no additional regulatory issues were pertinent.

The training program developed during the project involved the on-site training of several young researchers (MSc, PhD students and postdoctoral fellows). The management committee also organised numerous training courses and workshops covering different aspects of the SmartCell project, including metabolomics, single-use bioreactor technologies, statistical design of experiments, cell expansion and protein expression, and the requirements for compliance with good manufacturing practices.

The dissemination activities under WP9 are discussed in detail in paragraph "Main dissemination activities".

The exploitation activities under WP9 are discussed in detail in paragraph "Exploitation of results".

Potential Impact:

Impact

The SmartCell project was designed as a top-down approach to develop novel strategies for the rational engineering of valuable secondary metabolites, focusing on the terpenoid pathway. These compounds are generally too complex for complete chemical synthesis yet they are produced naturally in such low quantities that extraction from natural sources is also, in most cases, economically unfeasible. SmartCell therefore focused on metabolic engineering using plants and plant cell culture-based systems to achieve the production of key terpenoids, and in turn this required the in-depth analysis of the corresponding pathway in terms of the genes, enzymes and intermediates, and the complex regulatory processes including transcriptional and translational control, the regulation of mRNA and protein turnover, the regulation of enzyme activity, and the compartmentalization of enzymes and intermediates.

The major scientific and technological impacts of the project have therefore been to identify additional key steps in the terpenoid pathway through the deep sequencing and systematic screening of gene libraries to identify sequences corresponding to specific enzyme activities, and to introduce these steps into heterologous plants and plant-based systems that are scalable and easy to cultivate. One of the major breakthroughs was the discovery of the missing four genes in the early terpenoid indole alkaloid pathway until secologanin, an important intermediate towards the more complex dimeric alkaloids with clinical use. In the course of the project we developed several platforms that produce the major terpenoids geraniol and 8-hydroxygeraniol, which are important flavour/aroma products in their own right and also precursors for more complex downstream terpenoids that could be produced by semisynthesis in lieu of more complex metabolic engineering concepts. During the development of the project, this resulted in major innovations that will have an important impact on future R&D programs because the methods developed can be used to streamline other metabolic engineering strategies – these include our combined transcriptomics/metabolomics platform for the rapid identification of enzyme activities based on metabolic profiles, novel bioinformatics methods for sequence analysis and structure/function screening, and novel methods for multigene transformation, which is required for the introduction of complex pathways into heterologous plants.

In addition to these direct impacts it is important to recognize the potential wider impact of the project on industry, health and the environment. We focused on the transfer of metabolic capabilities to highly scalable platforms such as transgenic plants, hairy roots and cell suspension cultures and have provided proof-of-concept data showing that such platforms can be scaled up without losing productivity, providing an effective new source of compounds such as geraniol. By extending this process to other terpenoids, and indeed to other metabolic pathways, it will be possible to produce such compounds in large amounts at minimal cost once a productive expression platform has been developed. In turn this will reduce the use of environmentally-harmful processes such as total chemical synthesis or extraction from rare and endangered plant species. SmartCell therefore offers a defined path for the conversion of basic scientific information (the components of metabolic pathways) into applications of real value to industry, medicine and agriculture, thus validating the concepts, tools, tangible materials, resources, intellectual property and regulatory/biosafety aspects that influence the commercial manufacture of value-added products from plants.

The project will also have a major impact on the metabolic engineering research community because the gene bank, metabolomics/pathway database, compound library and cell culture collection developed during the project will be made available to the wider academic and industrial stakeholders. The direct involvement of companies in SmartCell will support the competitiveness of European industries, specifically those focusing on the industrial application of new technologies, processes and products related to secondary metabolism in plants.

It is reasonable to expect that SmartCell should be considered a pioneering project in the context of the application of synthetic biology to plants and plant-based systems. The toolbox of genes, enzymes and other molecular components in conjunction with the multigene engineering systems developed during the project constitute an essential first step in the application of synthetic biology to plants. Thus a further impact of the project is that scientists can begin contemplating whole-system/organism engineering in plants in the future.

In terms of European competitiveness, the project served as a forum that brought together leading EU laboratories in diverse fields thus creating a critical mass of people, expertise and resources to put in place a Consortium which is not only internationally competitive in the field of plant metabolic engineering but that also gained international prominence and recognition in the field.

It is thus abundantly clear that the SmartCell project exceeded all expectations in terms of its impact at many different levels. A real threat to realizing these impacts, however, is the draconian and anti-competitive European regulatory system for transgenic crops which regrettably operates in direct opposition to European economic growth policies, competitiveness and progress. Consequently a substantial number of publications and oral presentations at meetings by SmartCell scientists focused on this issue. We urge the Commission to also do its part to help SmartCell and other similar projects by doing its outmost to remedy these anticompetitive practices and thus actually implement its own rules and regulations in an effort to stop undermining the efforts of EU-funded R&D projects such as SmartCell and therefore restore international competitiveness and economic growth in Europe, consistent with stated EU policy for economic growth and industrial development for the betterment of EU society and preservation of the environment.

The role of the Scientific Advisory Group (SAG)

The SAG has actively followed the achievements and provided guidance and constructive criticism throughout the entire project life time. The SAG was present at least once a year in the project meetings. Professor Kazuki Saito (Chiba University and RIKEN institute, Japan) summarized the project after our final meeting in Crete (June 2013) in the following way:

“This SmartCell project sets a highly-challenging target on engineering of terpenoid indole alkaloid production from a variety of aspects. The approach taken in the project is a multi-disciplinary integration from fundamental plant biochemistry and molecular biology, multi-omics focusing on deep-transcriptomics and metabolomics, data basing and bioinformatics, metabolic engineering, commercialization, and outreach dissemination activities.

In the last 4.5 years, remarkable progresses have been made for identifying candidate genes involved in the pathway. From a large scale RNA seq analysis, the SmartCell Consortium could successfully mine a number of genes, which are presumably involved in the pathway as those encoding pathway enzymes, transporters and transcription factors. The function of some of them has been experimentally characterized by the sophisticated assay by pathway reconstruction in vitro and in vivo transient systems. This success of gene identification led to the proposal of unambiguous biosynthetic pathway of secologanin, which is one of the key compounds of the pathway expanding towards a variety of high-value plant products. Multigene stable transformation in planta is also nicely going on for the reconstruction of the pathway. These achievements should be of the great promises not only for scientific community in the world but also for the direct application in industry, which may produce the high-value plant products for our future lives based on the Consortium’s findings.

Regarding the development of human resources, the members of this Consortium should definitely take the top-world-wide-initiative of the progress of this field in the future. I would also encourage younger colleagues of the Consortium to develop their career based on an excellent experience they have taken through this project. Such an excellent experience is not always the case for all young students and post-docs. Those young colleagues should be the leaders of next generation in plant metabolic biology, which takes the great promise in the future of our society.

In the last, I have a little concern on following up the project – this Consortium now possesses enormously valuable research resources, i.e. chemical materials, vectors, recombinant proteins, transgenic plants and cells, transcriptome/proteome/metabolome datasets and bioinformatics tools/databases, which can potentially enhance the progress of the research of this field in academia and industry, if those could be freely distributed. However, the question is how they can sustain and extend these valuable resources to advancement of the community not only inside the Consortium but also the wide-range of researchers in the world. This point should be considered by an effort getting an additional grant or a resource center. The European Committee should also seriously consider this issue to follow-up the project, which has been successfully terminated as this case.

As a conclusion, the members' jobs have been finely adjusted by an admirable effort of the project leader, Dr, Kirsi-Marja Oksman-Caldentey, who exhibited an extremely strong leadership. The project has been excellently stayed on its target and hence has made a remarkable progress with a strong enthusiasm of the all participating members. A great advancements in this filed is now made with a strong impact to the international scientific community at the end of the project after the last 4.5 years.”

Main dissemination activities

The principal dissemination activities that have been carried out during the project include the publications of peer-reviewed original research papers, reviews and book chapters, the maintenance of the project website, the publications of popular articles and the presentations of data at meetings and congresses. The project has so far yielded 89 published peer-reviewed articles with six more submitted, as well as 14 book chapters. Members of the SmartCell Consortium have been invited to give 131 oral presentations and have displayed 33 project-related posters. The project has thus far resulted in six PhD and several Master and Bachelor theses. The Consortium participants have also published articles in popular magazines aiming at the wider public and policy makers and have participated in TV and radio interviews as well as other media events. Two leaflets have been published, one outlining the aims and objectives of the project (first year) and the other summarising its achievements at the end of the project (delayed to accommodate work carried out during the no-cost extension period).

The SmartCell project website http://www.smart-cell.org/ has been used as a major tool for the archiving and sharing of project resources and documentation (staff portal, password protected) and for the dissemination of information about the project to the scientific community and general public. The website has been regularly updated with important project information, including publications, standard operating procedures, presentations from project meetings and project reports (full reports available on the restricted-access part of the site, and publishable summaries on the home page). The SmartCell web page will continue to run for at least two years after the project has been finished and will eventually be archived and integrated with up-coming projects.

Exploitation of results

An Exploitation Committee representing the academic and industry partners met regularly (at least every 6 months) to discuss exploitable results from the project. Several inventions were considered as potential foreground IP, including the candidate genes representing enzymes, transcription factors and transporters in the terpenoid pathway. We decided that these genes were not suitable for patent applications due to the limited protection that would be secured with the available data. However, the project generated valuable data know-how and enabling technologies for metabolic engineering in plants, including the databases, bioinformatics algorithms and analytical platforms discussed above. The project also yielded a searchable database of published patents and patent applications in the field of secondary metabolic engineering in plant cells and whole plants. This database was compiled from information in the publicly-accessible EPO, USPTO and JPO patent databases as well as the CAMBIA IP database. The project database is accessible to all partners via the staff website and is a valuable resource as a starting point for freedom-to-operate analysis.

List of Websites:

The URL of the project website is http://www.smart-cell.org/

Principal contact for scientific issues:
Dr. Kirsi-Marja Oksman-Caldentey
Head of Industrial Biotechnology
PO Box 1000
FI-02044 VTT
Finland
Tel. +358 20 722 4459
Fax +358 20 722 7071
kirsi-marja.oksman@vtt.fi

Principal contact for administrative issues:
Ms. Riitta Kervinen
EU project administration, Biotechnology
PO Box 1000
FI-02044 VTT
Finland
Tel. +358 20 722 5206
Fax +358 20 722 7071
riitta.kervinen@vtt.fi