CORDIS - EU research results

Systematic screening for novel hydrolases from hot environments

Final Report Summary - HOTZYME (Systematic screening for novel hydrolases from hot environments)

Executive Summary:
4.1.1. An executive summary (not exceeding 1 page)
Thermostable hydrolases are needed in a wide range of industrial applications including paper/pulp processing, bioethanol production and textile processing. However, A very limited number of these enzymes is currently being used in industry because generally the performance of these enzymes is sub-optimal. The HotZyme project (systematic screening for novel hydrolases from hot environments) funded by the EU FP7 program aimed to identify and characterize a new generation of thermostable hydrolases (glycoside hydrolases, lipases and proteases) from hot springs worldwide.
The HotZyme consortium is composed of 12 partners from Europe and 1 from USA, including 8 universities or public research institutions, 3 SMEs and 2 large industrial partners. The industrial partners, Novozymes (NZ) and Sigma-Aldrich (SIAL), played a very important role in the development of HotZyme project, as can be exemplified by the following exciting achievements. A novel endo-glucanase was discovered by NZ from a new thermophilic Planctomycete organism isolated by partner INMI. This cellulase is very distantly related to the glycoside hydrolase family 5 (GH5) (www. and the distance is large enough to creat a GH family of its own! The cellulase exhibits exceptionally high thermostability and strong enzyme activity. Based on the novelty and interesting features suitable for industrial applications, NZ filed a patent application. A novel thermostable protease was cloned from a new isolate of archaeal genus Thermococcus (isolated by partner INMI and sequenced by partner UCPH). Expression optimization (NZ) allowed purification of the protease at sufficiently high levels for industrial trials. Interesting characteristics have prompted drafting of a patent application. SIAL synthesized and commercialized a number of substrates for enzyme screening and characterization during the HotZyme project.
With the major contributions from the academic partners and the SMEs, the HotZyme team obtained hundreds of environmental samples from hot springs worldwide; made thousands of enrichment cultures on a set of polymeric substrates of industrial interests; obtained a few dozens of isolates capable of degrading polymeric substrates; sequenced 15 hot spring metagenomes, about 10 isolate genomes and a few transcriptomes; developed new bioinformatic platforms for handling the huge sequence data; identified hundreds of potentially new hydrolases using in silico method and/or functional screening methods, more than a dozen of which were selected for detailed biochemical and structural analyses. This resulted in 8 high resolution crystal structures with more being finished. Based on the results, the HotZyme team has published about 10 papers and many more are being prepared and submitted.
The HotZyme team has been very active in disseminating the work. During the project, 16 newsletters were released, two training sessions and one symposium were held. Moreover, the partners presented the project at different conferences, symposia, workshops, popular scientific journals and other public events.
In conclusion, the HotZyme project performed very well and proved successful.

Project Context and Objectives:
4.1.2. A summary description of project context and objectives (not exceeding 4 pages).
Enzymes are used in a wide range of applications and industries, including detergents, food applications, agriculture/feed, textile processing, paper/pulp processing, etc. Many of these processes require high temperatures where thermostable hydrolases are needed. However, a very limited amount of these enzymes is being used in industry mainly due to the sub-optimal performance of the currently available enzymes.
Without the availability of highly thermostable hydrolases, some industrial processes need repeated heating and cooling. In some of the processes mentioned below, the materials and water are heated to near boiling temperatures followed by enzymatic treatment. In order to maintain the activity of the relevant enzymes, the materials and water need to be cooled down before adding the enzymes. This cooling and reheating process may need repeating when multiple steps and different treatment processes are involved. This leads to enormous waste of time and energy and the associated cost is huge.
Thermostable enzymes are used in a number of different industrial processes. In Paper and pulp bleaching process, there is a need to run enzymatic bleaching processes at higher and higher temperatures and the limiting factor for this are heat stable xylanases capable of working above 80oC or even at 90 to 100oC. Lipases or cutinases are used for pitch (sap) removal during paper manufacture, and the present product is stable at 85-90oC. New thermostable enzymes that would be active and stable at 100oC or higher would promote much more energy efficient industrial processes.
In off-shore oil drilling, there is a great need for enzymes capable of in situ viscosity reduction in fracturing fluids and for filter cake removal. Presently, fracturing fluids typically contain either galactomannan or carboxymethylcellulose (CMC). There is a clear need for both mannanases and cellulases active well above the 80 oC where present enzyme products are active. Temperatures of 100 to 150 oC at pressures of over 10 atm. will be needed for off-shore oil applications.
High temperature starch degrading enzymes will enable simultaneous liquefaction and saccharification. Such an approach would result in substantial cost and time savings for the starch industry. More importantly, the application of thermostable hydrolases in some critical biotechnological fields (e. g. production of biofuel from biomass) is still under early development.
As can be seen from the examples above, the need for new thermostable hydrolases with an effective performance and/or novel functionalities would provide huge savings in time, money and energy.
Although thermostable hydrolases have been known for many years, the related research and applications have been limited to cultivated thermophilic microorganisms. Since most microorganisms (>99%) cannot easily be cultivated, many potentially active enzymes have never been characterized. This is particularly true for thermostable enzymes, since the number of isolated and characterized (hyper)thermophiles is very small. Therefore, the diversity of thermophiles and their encoded enzymes remains largely unexplored.
Metagenomics has a great potential for assessing biodiversity and for enzyme discovery. This technology has been applied mainly to soil and marine water samples which revealed an enormous biological and molecular diversity. But to date, very little work has been done on hot terrestrial environments, mainly due to the difficulty of access to various hot environments and the relatively lower concentration of biomass in such ecological systems. Another un-tapped rich source of potential novel enzymes is the (hyper)thermophilic viruses, which differ significantly from viruses of Bacteria and Eukarya in genome content and in viral morphology. Most open reading frames encoded in the genomes of hyperthermophilic viruses have unique sequences which share no homology with any gene in public databases (Prangishvili et al., 2005). Application of metagenomics to (hyper)thermophilic viral communities is therefore expected to reveal enormous diversity and provide a rich gene repertoire for novel enzymes. For example, it is likely that some of the virus encoded enzymes are capable of changing or solubilising the unique archaeal membranes or cell walls during virus life cycles.
The HotZyme project aimed to address the aforementioned problems and to investigate the global biodiversity by systematically screening for a new generation of (hyper)thermostable hydrolases from the hot terrestrial environments. The major objectives of HotZyme project included:
• Identification of novel genes or enzymatic functionalities in hot environments on earth. Hot Springs in China, USA, Russia, Italy, Norway and Iceland were the primary targets of the HotZyme project.
• Developing innovative bioinformatic techniques for metagnomic data analysis and handling and for high-throughput prediction of protein functions.
• Establishing a “hot” metagenomic database of putative thermostable enzymes predicted by the newly developed bioinformatics tools. The database will be accessible for the public after the project is finished.
• Screen for and obtain novel thermostable hydrolases from metagenomes of (hyper)thermophiles and their viruses. Enzyme targets include thermostable glycosidases, lipases, starch degrading enzymes and proteases. Novel enzymes include those with:
i) enzyme activities showing properties atypical of the proteins it is closely related to;
ii) less than 50% amino acid sequence identity to other proteins in the public database;
iii) 3D structures which are substantially different to those already described.

This HotZyme project directly addressed all the aspects of the call KBBE-2010-3-5-04. The selected ecosystems were hot terrestrial environments. The reported success rate of activity-based metagenomic screening is generally very low, therefore both functional screening and sequence based screening approaches were considered. Advanced bioinformatic tools were planned to be developed and applied to the metgenomic data to assess the biodiversity in such ecosystems and to perform high-throughput enzyme prediction. A combination of innovative high-throughput activity screening, as well as sequence-based screening were to be performed to obtain a new generation of thermostable hydrolases.
A frequently encountered problem during recombinant protein expression is the insolubility of the protein. The consortium emploed the largest and most diverse expression systems for recombinant protein expression, covering all three domains of life, ranging from Escherichia coli, Bacillus, the hyperthermophilic archaeon Sulfolobus, and various eukaryotic expression systems. This greatly facilitated enzyme characterization post screening.

The HotZyme partners were very carefully selected to compose a highly competitive and interdisciplinary consortium. The strong expertise in Microbiology, Moleculary Biology, Biochemistry, Biophysics, Geochemistry, Nanotechnology and Bioinformatics were ensured in the consortium to fulfil all the proposed tasks. In total 13 partners were present in the consortium including 12 European partners and one American partner. To complement the basic research activities of the universities, the HotZyme consortium included 3 SMEs and 2 large industrial partners (Novozymes and Sigma Aldrich). This ensured that the industrial interests of the project were represented and executed.

Project Results:
4.1.1. A description of the main S&T results/foregrounds (not exceeding 25 pages) Environmental sampling of hot spring biodiversity

More than three hundred samples were obtained from natural thermal environments located worldwide: terrestrial hot springs of Iceland, Italy, China, Yellowstone National Park (USA), Kamchatka, Kuril Islands and Baikal Lake area (Russia), and deep subsurface biosphere (Western Siberia, Norwegian Sea, Troll, Barents Sea, Spitzbergen). The temperature of samples was ranging from 40 to 151oC, while pH varied from 2.0 to 10.5. Samples were represented by water, sediments and microbial biofilms and were further used for DNA isolation and for enrichment and isolation of thermophilic microorganisms with hydrolytic activities. In situ enrichment of thermophilic microorganisms with hydrolytic activities
In situ enrichments with diverse biopolymeric substrates of interest were set in the hot springs of Kamchatka, Kurils and Island. Substrates were cellulose (MCC, CMC, leaves of corn and bamboo), xylan, starch, alfa- and beta-keratins, xanthan gum, polyester, PVA. The composition of microbial communities developing in primary in situ enrichments was studied by PCR-DGGE of 16S rRNA genes. Those showing visible degradation of insoluble substrates and/or containing new phylogenetic groups of thermophilic microorganisms were used for further characterization and isolation work. High-throughput enrichment and isolation of new thermophilic microorganisms with hydrolytic activities.
Hundreds of crude environmental samples and in situ enrichments were cultured in the lab with media containing different polymeric substrates of industrial interest. The substrates included cellulose, xylan, lignin, starch, chitin, bamboo leaves, alfa- and beta-keratins, xanthan gum, lichenan, agarose, polyester, and PVA. Enrichment conditions are either aerobic or anaerobic, without acceptor, or with ferric iron, sulfur, sulfate, arsenate as the electron acceptors, and the temperature/pH was adjusted to be close to those found in sampling sites. The combination of different environmental samples, different substrates and different incubation conditions resulted in a huge number of enrichment cultures which involved enormous workforces (Master and PhD students, Post-docs and lab technicians). Although more than one thousand enrichments were set up, less than 10% of these survived three transfers and only these were subjected to DNA isolation, 16S RNA gene sequencing and further isolation of pure strains.
From the enrichments that survived three or more transfers, we isolated pure strains and the information is presented in Table 1.
The most significant discovery of WP1 is the isolation of a few Planctomycete strains which represent novel genus, family and order (Fig. 1). Very interesting enzymes have been cloned from these organisms and one was filed for patent application (WP5). DNA sequencing
We sequenced 15 cellular communities distributed across four continents and a deep sea marine sediments (Table 2), two viral communities, and a number of isolates (Table 3). To identify a potentially novel pathway involved in xanthan gum degradation, we sequenced the transcriptomes of the Planctomycete Thermogutta, in two different media with triplicates.

The major significant result is the creation of a DNA sequence database of high temperature microorganisms and their viruses for screening for targeted novel hydrolase activity. Bioinformatics method development

We tested the most widely used assemblers for 454 and Illumina HTS data on our metagenomic and isolate/enrichment sequencing data and established appropriate protocals for the assembly of different types of sequencing data. In this way, assemblies of all metagenomic and enrichment/isolate sequencing data have been completed. For velvet assemblies, we chose k-mer sizes between 31 and 71 for individual assemblies, which were then merged using Minimus2 into meta-assemblies. 454-sequenced samples were assembled by Mira, Celera and Newbler and merged into meta-assemblies using the same approach.

We developed de novo gene prediction platform ANASTASIA (Automated Nucleotide Aminoacid Sequences Translational plAtform for Systemic Interpretation and Analysis), which integrates the three programs MetaGeneMark, Prodigal and MGA. Using the pipeline, de novo gene prediction was performed on all assemblies. Those predicted proteins were further compared to protein databases using BLASTP and HMMER, and subjected to functional classification (WP4). Predicted genes were annotated using the ANASTASIA protein sequence classification tools based on EC numbers and homology to known hydrolase domains. De novo gene prediction was done on the meta assemblies only, which however have a lowest contig length of 200 nt. Additionally, singleton reads longer than 200nt were also included as part of the meta-assemblies, in order to not miss protein sequences from those reads.

SignalP and Phobius were used to detect signal peptide sequences in predicted proteins. The hidden Markov models used by SignalP show sufficient sensitivity on archaeal protein sequences and in the case of missed positive predictions, homology-based sequence annotation would complement the assignment of signal peptide status for a given protein sequence. Global biodiversity of hot environments
To facilitate storage and analysis of the sequencing data generated during the HotZyme project, partner UCPH established the server “Helios” with 36 CPU cores, 256 GB RAM and 36 TB of storage capacity. The server is accessible by all partners through the ANASTASIA analytical platform, following a user restricted model. This infrastructure has been customized in order to meet the computational needs of the HotZyme consortium through the installation of the appropriate analytical tools and the allocation of adequate storage capacity for each group regarding the storage of raw sequencing data and the results derived from the corresponding analysis.

Taxanomic content of the hot spring metagenomes (Table 2) were analyzed using MEGAN and compared between samples. The results were summarized and published in Menzel et al. 2015 (see publication list in Section 4.2.A). Further, the viral sequences were retrieved from the metagenomes and analyzed in more detail. A manuscript was prepared based on the results and is more or less ready for submission. In silico identification of novel hydrolases
Prediction of protein function was carried out by utilizing homology-based methodologies and machine learning methodologies. The homology-based analysis was perforemd by exploiting the capabilities of the ANASTASIA platform with the integrated BLAST and HMMER programs. BLAST analysis was performed against two databases; NCBI-nr and a custom database built by partner NTUA from all the annotated hydrolases of UniProt databases. The NCBI-nr database was used in the analysis for comparative purposes regarding the hits that showed low homology to the hydrolase database. HMMER analysis was performed also against two databases; Pfam-A and a customized database built by NTUA containing all the Hidden Markov Models of the representative sequences from all the hydrolases of the UniProt database. The machine learning methodologies were implemented by the help of a hydrolase classifier software that could assign putative EC numbers to unknown sequences built by NTUA as well as the exploitation of the open source software EFICAz. Both tools were integrated in the ANASTASIA platform and became modules of the annotation workflows that were used to analyse the metagenomic sequences.

Priority class 1 enzymes and priority class 2 enzymes were defined by the consortium, and in silico searches of the enzyme encoding genes in the new sequences generated by WP2 were performed through ANASTASIA. This generated a long list of potentially interesting genes which was distributed to WP5 and WP6 partners for further selection and cloning. Development of new high throughput enzyme screening methods
As efficient and generally applicable screening methods for finding new enzyme activities are scarce, the HotZyme partners invested efforts on the development of new screening methods, including the Microcolony-based screening system and in vivo reporter systems.

The Microdish culture chip (MDCC) is a highly compartmentalized porous ceramic based cultivation system for the high density cultivation of microorganisms on a solid surface. The most commonly used MDCC180.10 contains 3300 circular 180 µm diameter wells with a footprint of only 2.88 cm2. The colony density is thus almost two orders of magnitude higher compared to direct inoculation on an agar-based medium. We found that direct functional screening of E. coli expression libraries for enzyme activity is challenging due to diffusion problems and the fact that ideal screening conditions (defined buffer system, temperature of 60-80 °C) are not compatible with subsequent recovery of viable clones. Therefore, a replica-plating procedure, termed “microcolony-lift” was developed during the course of the project (Fig. 2).

We also attempted to develop general reporter assays that can be used to rapidly screen expression libraries. In this regard, two different systems were investigated, the transcription regulator based selection system and the riboswitch-based selection/screening system. While the latter is still at the proof of concept stage, significant progress has been made for the former.

Different versions of the selection and screening system, varying the selection reporter (kanamycin resistance marker KmR or leucine auxotrophy complementation with LeuB), the copy number (low or medium) and the transcriptional regulator (AraC or LacI), have been developed and characterized. The two best performing system versions in terms of dynamic range, leakiness and sensitivity were the medium copy version with AraC as regulator and KmR as selection reporter and the low copy version with LacI as regulator and LeuB as selection reporter (Figure 1.5). If changing the inducer specificity is successful the system will be further adapted to make it applicable in finding novel biocatalysts. Custom synthesis of new enzyme substrates
The industrial partner SIAL provided continuous support to WP5 and WP6 by providing custom-synthesized substrates for biochemical analyses of novel enzymes (Table 4). Some of these have already been commercialized or are in the process of commercialization and were crucial for a joint publication lead by partner UDE (Kallnik et al., J. Biotechnol. 2014; for details see HotZyme dissemination record). Construction of expression libraries
A total of 8 E. coli expression libraries have been constructed for screening purposes (Table 5). The libraries have undergone various levels of quality control including a complementation test with a purine auxotrophic E. coli strain. Libraries were distributed to the partners involved in screening activities. Enzyme screening
Hundreds of hydrolases of industrial relevance have been identified during the course of the HotZyme project by different and complementary approaches, including over 200 glycosyl hydrolases (GHs), dozens of lipolytic enzymes and a few proteolytic enzymes. Approximately 100 of these have been selected for cloning and expression in a recombinant host, providing a rich resource for the selection of candidates for further analysis by biochemical and structural analyses in WP6.

Of the cloned genes, 36 GHs, 10 lipolytic enzymes and one protease were successfully expressed at various levels in different hosts, E.coli Aspergillus Orizae and/or Sulfolobus acidocaldarius. Futher biochemical and structural analyses were performed for many of these.
Two of the enzymes will make the foundation for patent applications. One GH will be the founding member of a new CAZy glycoside hydrolase family. Biochemical and structural analyses of selected hydrolases.
About 15 enzymes were selected for detailed analyses, including esterases, lactonases, epoxide hydrolases, cellulases and proteases. The selected hydrolases were purified in large amounts and characterized in respect to their biochemical and enzymatic properties.
The major focus of the enzymatic characterization was set at substrate specificity, kinetic parameters, (Vmax-, Km-, Ki-values) as well as the stability at different pH-values, temperatures. In addition, the effect of addition of organic solvents as well as activity in organic solvents, ionic liquids, supercritical carbon dioxide, and micro-emulsion systems were determined.

Crystal structures of eight of the enzymes were successfully resolved (Fig. 3), and more are on the way.

Potential Impact:
The potential impact (including the socio-economic impact and the wider societal implications of the project so far) and the main dissemination activities and exploitation of results (not exceeding 10 pages).
The goal of the project was to investigate the global biodiversity of hot environments and to identify a new generation of thermostable hydrolases with better performance than current enzymes in different industrial processes such as paper pulp bleaching, oil drilling, textile and food processes. The expected final results include patenting of novel cellulases, xylanases, xanthan gum degrading enzymes, discovery of novel pathways in PVA degradation and novel organisms with interesting polymer degradation characteristics. Moreover, the huge biodiversity obtained in the consortium constitutes a rich reservoir for further exploitation. Furthermore, sequence information generated in the project will be accessed by the public at some point post project, therefore will be beneficial for the scientific research worldwide.

An important objective of FP7 was to build a European knowledge-based bio-economy (KBBE) by bringing together science, industry and other stakeholders, to exploit new and emerging research opportunities that address social, environmental and economic challenges. Our project has demonstrated to support these objectives.
During the four years period, the HotZyme consortium orchestrated a multifaceted approach to co-ordinate and focus the efforts of 14 outstanding research groups, thereby promoting the exchange and sharing of expertise between internationally acclaimed institutions in 9 European countries and USA. In doing so our network attained the momentum required to strengthen and expand research on microbial diversity and metagenomic mining of biotechnological innovation. Moreover, by systematically studying various hot environments world-wide, our activities facilitated deeper understanding of microbial diversity and evolution of life on earth. Furthermore, by promoting the joining of forces and pooling of complementary competencies, our team generated the structure and capacity required to achieve a common goal that has significant impact on European and worldwide biotechnology.
By disseminating the HotZyme project to young scientists through training sessions, newsletters and symposium we believe that our endeavour and its significant scientific outcome will attract and provide ample motivation for bright young investigators to stay within European borders and engage in aspects of the follow-up research. During the course of our programme, we offered unique opportunities to students and researchers early in their careers to receive specialised training in the field. We incorporated training sessions and exchanges of scientists between partners, where we educated researchers on the use of tools and resources that we generated, thus maximising the impact of our initiative.
The targeted hyperthermostable enzymes in the proposed project all have direct and immediate relevance to economic and environmental objectives.
• The identified novel cellulase which was filed for patent application (Novozymes) will have the potential to enable following exploitations in biomass-based bioethanol production and paper-pulp industry, which can offer significant environmental benefits by substituting fossil fuels and thus reduce the increase in atmospheric greenhouse gases, particularly CO2, and improve air quality. The application of some of our targeted heat stable enzymes in industry will simplify the processes and thus save substantially costs and time.
• The hyperthermostable protease identified from the novel Thermococcus genome (INMI, UCPH, NZ) will have high potential in wash powder development and food processing for animals
• A range of new substrates for enzyme screening has been produced and introduced to the market by HotZyme partner SIAL. This will facilitate and accelerate enzyme screening in general, thus strengthening the economic and environmental impacts of the project in a broader sense.

During the HotZyme project, we produced and distributed 16 newsletter, arranged 2 training sessions each involved careful organization, advertisement and post training evaluation. At the end of project, a symposium for young scientist was held focusing on the discovery of extremophilic novel enzymes with application for industrial biotechnology. Apart from publication in peer reviewed scientific journals, the partners also made efforts on publishing in popular science magazines. Further, numerous posters and oral presentations related to HotZyme project were delivered by the partners (see details in Section 4.2).

List of Websites:
The address of the project public website, if applicable as well as relevant contact details.
The project website is: The website will be maintained for at least three years post project, and may extended if necessary.

The contact detail of project Coordinator:

Dr. Xu Peng
University of Copenhagen
Tel: +45 35322018