Skip to main content

Metagenomics of the Human Intestinal Tract

Final Report Summary - METAHIT (Metagenomics of the Human Intestinal Tract)

Executive summary:

Catalog of genes of intestinal microbes. We established a broad catalogue of microbial genes from the intestinal tract, using the cutting edge DNA sequencing technology. We sequenced 540 Gb of DNA from stool samples of 124 individuals and found 3.3 million different genes, 150-fold more than in our own genome. The results were published in a Nature cover paper.

Information organization and analysis. The catalog contains 19000 functions, 5000 of which were never found before. Some 6000 ("minimal metagenome") are present in every individual and may be equired for the proper function of the human gut microbiota. 1200 of these, ("minimal gut genome"), may be required for any bacterium to strive in the human gut.

Microbial genes in different individuals. We use high throughput DNA sequencing and the DNA arrays to find genes in each individual. The former generates short sequence "tags", that we can map onto the catalogue genes and thus count the genes. The arrays allow to measure both gene frequency and gene expression.

Enterotypes. Using tools developed within METAHIT, we were able to establish that, like with blood groups, we can be characterized by a particular enterotype. One of the major results of the METAHIT project is the discovery of the distribution of the population into 3 distinct enterotypes, each characterized by a dominant bacterium. At first glance, enterotypes do not seem to be associated with geography, food, genetic variability, age or sex of individuals. It is, therefore, a fundamental characteristic. They were originally defined by studying a population of 200 individuals. Today, after the study of nearly 800 people, the three enterotypes remain, but, with research ongoing, it is possible that sub-groups, or even other enterotypes, will be found.

Patient cohorts and microbial profiling. IBD includes two different pathologies, Ulcerative colitis (UC) and Crohn's disease (CD). We compare patients in remission with healthy individuals, 60 for each disease. The clinical part of the UC study has been completed, the bio-informatics analyses indicate that some bacterial species differ in the two groups. This may well lead to a break-through in our understanding of the disease. The CD study will reach the same stage before the end of the year. Obesity is ever-increasing across the world. We compared 120 obese and 60 lean individuals. The clinical part has been completed, the first analyses point to the differences in bacterial species between the two groups.

Bacteria-host interactions. We developed procedures to monitor the response of human cell lines to genes from gut bacteria and found bacterial clones that induced a significant cellular response. Five were further tested on the dendritic cells (DC), either directly or indirectly via intestinal epithelial cells (IEC). Some regulated gene expression in IECs, and thus conditioned DCs response. One clone directly affected the DCs.

Technology transfer. We tested the effect of probiotics on the stability of microbiota in the UC patients, in a study involving a METAHIT industrial partner. The expected high stability among the healthy individuals and a lower stability among the patients taking placebo was found. The patients that consumed the probiotics appear to have an improved stability.

Outreach. We organized the International Human Microbiome Congress (19-21 March 2012 - Paris, France) placed under the patronage of the UNESCO. It gathered over 600 participants from 36 countries over 5 continents. Information to the general public was published via our website (see http://www.metahit.eu online), Facebook, YouTube (see http://www.youtube.com/user/microbiome online) and Twitter (see http://www.twitter.com/metahit online). A dashboard on Netvibes was opened (see http://www.netvibes.com/metahit online), as a first effort to centralize Human Microbiome related information on the web. Communication with newspapers, magazines, radio, and television has been pursued (http://www.metahit.eu/index.php?id=205).

Project Context and Objectives:

When METAHIT was conceived, in 2007, it was already clear that a detailed understanding of human biology will require not only knowledge of the human genome but also of the human microbial metagenome, as humans live in constant association with microbes that are present on surfaces and in cavities of the human body, and even within our cells. It was known that the number of our microbial companions exceeds by at least ten-fold those of cells of our own body and it was predicted that the number of unique genes they encode were at least 100-fold greater than the number of genes in our own genome. It was also known that this complex and dynamic microbiota has a profound influence on human physiology, nutrition, and immunity and that disruption in these human-associated microbial communities or alterations of the intimate cross-talk between these microbes and human cells may be a significant factor in many diseases. Understanding the dynamic and variable nature of human microbial communities appeared to be a critical challenge before us; defining this dynamic diversity was perceived as the next frontier of genomics. To progress towards this ambitious goal we decided to focus on the microbiota of the intestine, which plays a particularly important role in human health and well-being and was believed to be the most complex of the microbial communities associated with humans.

In this context, the overall concept of METAHIT was to implement and integrate the following activities:
(i) creation of a reference set of genes and genomes of intestinal microbes;
(ii) creation of generic tools to study the variation of human intestinal microbiota, based on the reference set;
(iii) use of these tools to search for correlations between the presence of specific genes in the intestinal metagenome and health and disease states;
(iv) study of the function of the microbial genes correlated with the disease, with the focus on host-microbe interactions;
(v) creation of a database to store and organize the heterogeneous information generated within the project and enriched by information from outside of the project;
(vi) creation of the bioinformatics tools to carry out the meta-analysis of the information.

We chose to focuse on two pathologies, inflammatory bowel diseases and obesity, disorders of increasing social importance in Europe. Incidence of IBD has been growing steadily during the past 5 decades in Western Europe, expanding also dramatically in Eastern Europe. It was widely recognized that the disease is caused by excessive immunologic responses to intestinal microbiota, due to some changes in the intestinal ecosystem that are not yet understood. We expected that Identification of microbial signatures specifically correlated with disease phenotype and disease activity were bound to provide essential information for future achievements in the prevention and control of IBD and to be of a considerable clinical relevance.

The global obesity epidemic imposes a huge and rapidly growing challenge for the public health services. Obesity appeared to be clearly correlated with the altered composition of the intestinal microbiota, presumably via its energy-harvesting impact, but the inflammatory component was well established, albeit imperfectly understood. Identification of metagenomic signatures specifically correlated with obesity would open novel avenues to understand and combat this condition, as they represent the basis for the development of prognostic and diagnostic tools of clinical relevance for obesity-associated morbidity.

In view of the novelty and the potential impact of our project on the health and disease, it was obvious that it must be integrated in the world we live in. For this purpose, we decided to work toward the establishment of in the International Human Microbiome Consortium (IHMC) and to actively participate in it. In advance of the start of the project, two meetings were organized in 2007 between the coordinator and representatives of National Institutes of Health (NIH) - on 28 February and 1st of March, in Bethesda, Maryland, USA, and on 19 March, in Paris, France - which resulted in a strong expression of intention to promote information exchange, common meetings and joint decisions between actors in the field. Furthermore, we planned to carry out transfer of technology to industry and to help present the information about the project to the general public.

Project Results:

Catalog of genes of intestinal microbes - our other genome

The first challenge we had to meet was to establish a broad catalogue of microbial genes from the intestinal tract. This was necessary, first, to get a global view of the genetic potential of the gut microbial community and thus its possible impact on our health and well-being. Second, this catalog was to enable assessing differences of the genetic content of microbes in different individuals, and thus get a handle of association of genes, species and even communities to diseases we were targeting.

Establishment of the catalog was achieved by the cutting edge sequencing technology, which allows to generate tens and even hundreds of millions of short sequences in parallel for a DNA sample. We determined a total of some 540 Gb of DNA sequence prepared from stool samples, a value approaching that of 200 human genomes. Never before was the human gut metagenome characterized to such a depth. In a break-through manner, we raised to the challenge of connecting these short snippets of information into much longer DNA stretches, where we could identify the genes present in the intestinal microbes.

We analyzed samples from 124 individuals that participated in our studies. They were of Danish and Spanish origin, some were healthy and some sick, suffering from IBD or obesity. In this way, we expected to identify the largest possible number of genes, not missing those that could possibly be less frequent or even absent in a given group of individuals that we were committed to study. An extensive bio-informatics analysis has shown that there is a staggering number of some 3.3 million different genes among the individuals that we analyzed, 150-fold more than in our own genome! In lay terms, the ratio of our genes to those of our gut microbes is that of the height of an average human being and that of the Eiffel tower. We have identified at least 85 % of all the frequent genes that the 124 individuals carry, the value determined by an appropriate statistical analysis. Some 99 % of the genes are of bacterial origin, in keeping with the predominance of bacteria among the intestinal microbes. From the gene number we deduce that there are at least a 1000 frequent bacterial species in our gut.

How many of the 1000 bacterial species are present in each individual? We found that a person carries, on average, 540 000 genes, a value that corresponds to some 160 species. Inevitably, different individuals have many of the bacterial species in common - there are no more than a 1000 to go around and everyone has at least 160. We found that some 60 previously known species, with sequenced genomes, are present in over 90% of the individuals of the cohort. These species represent a 'core' set, the part of the microbial communities that are common to all of us. However, the abundance of these species varies considerably in different individuals, between 10-fold for the one that varies least and greater than 1000-fold for that one that varies most. This high variance explains the view that was dominant prior to our study, namely that the human gut communities are exceedingly different in different individuals and that there is very little that we all have in common. The main reason was that the prior analyses did not have a depth great enough to detect species present at a relatively low abundance. Our study thus established a new paradigm, that we are all rather similar, albeit by no means identical, in keeping with the fact that about 40 % of genes of each individual were present in at least 50% of other individuals of our cohort. Remarkably, a great majority of the genes of the catalog, about 85%, belonged to unknown species. The assignment of these genes to novel species by a method we developed is outlined below.

Sequence information must be interpreted (annotated) in terms of genes, proteins and the functions they perform. An automated sequence annotation pipeline was developed and used to analyze sequence data from the human intestinal metagenome. We identified over 19000 different functions in the gene catalog that we established. The statistical analysis indicated that we have captured essentially all of the functions present in our 124 samples and thus obtained an exhaustive view of the genetic potential of the bacteria from the human gut. A large proportion of the functions, greater than 5000, were never found before. This illustrates the tremendous novelty revealed by our analysis.

Beyond the minimal metagenome, we defined the set of 1200 functions as required for any bacterium to strive in the human gut and suggest that they represent the 'minimal gut genome'. About a half are present in most bacteria with sequenced genomes and are necessary for the bacterial life. A large number, however, are found only rarely among the bacteria with sequenced genomes and may well be specific for the gut bacteria. Their study should lead to a much better understanding of our microbial companions than we presently have.

Last but not least, this study has attracted a large attention, both in academic circles, as witnessed by 600+ citations in scientific journals in 2.5 years since its publication in Nature, and in general media, which commented upon it on a world-wide basis. Description of our other genome did merit such attention.

Quantitative metagenomics

Our next challenge was to establish the procedure to assess the presence and the abundance of the catalog genes in every individual. Quantitative metagenomics, as we term the approach, consists of two main elements. One is an extensive catalogue of the gene, such as the one described above. The other is a very high throughput sequencing of the total DNA prepared from a stool sample.

A number of features of the gut ecosystem can be deduced from the gene abundance profiles, such as overall metabolism or the taxonomic composition, but most importantly, in the context of the METAHIT project, microbial composition can be correlated to the health and disease.

Enterotypes of the Human Gut Microbiome

Using quantitative metagenomics we established that humans can be clustered into three groups by the composition of their gut bacterial communities. This was shown with three different sets of data. The first derives from a relatively small number of individuals, 33, but from three different continents, America, Asia (in studies preceding METAHIT) and Europe (within the METAHIT project). The sequencing technique was a 'classical' one, denoted Sanger, which is no longer used because of its low throughput and high cost, but was the only one available before the onset of METAHIT. The second was the METAHIT analysis by quantitative metagenomics of 85 Danish individuals, whereas the third was from an independent study of 154 US individuals, where only the taxonomic composition by sequencing of the gene encoding the 16S ribosomal gene.

The definition of enterotypes does not explain where these differences come from, but they appear to be very robust clusters. Today, after the study of 663 METAHIT individuals, the three enterotypes are observed, but, with research ongoing, it is possible that sub-groups, or even other enterotypes, will be found.


The Nature publication describing the enterotypes was hailed by the journal Science as one of the 10 breakthroughs of the year 2011 and was commented upon by the world-wide media, bringing this discovery to the attention of the general public. Its impact was considerable in the academic communities, as the work was already cited over 380 times. As with the description of the human other genome, this METAHIT study had a very high impact.

Metagenomic species and the associated metagenomic units

Natural microbial communities can comprise a high number of organisms, viruses and other chro-mosomal and extra-chromosomal genetic elements. The microbiome of the human distal gut is among the most complex studied, with an estimated number of approximately one thousand differ-ent species across humanity. While next-generation sequencing technology promises to uncover the metagenomes of such communities, the resulting genetic complexity is staggering, with genes counted in the millions. This complexity represents a significant challenge for meaningful interpretation and understanding of these communities.

Current analysis of metagenomic data largely relies on comparison to reference sequences from cultivated microbes or reference sequences from single cells. Comparisons of bacterial genomes from different isolates of the same species, however, usually show considerable differences in the genetic makeup. These observed variations pose difficulties for structuring metagenomic data based on a limited set of reference genome sequences obtained from isolated strains.

Interestingly, the size distribution of the MGU, in terms of genes contained, is bimodal. The majority of the binned genes (82%) were assigned to the 741 MGU with 700 or more genes. To distinguish this subset of gene-rich MGU with gene counts similar to bacterial reference genomes, we call these large MGU 'MetaGenomic Species' (MGS). The lower peak, on the other hand, corresponds approximately to the expected size of extra-chromosomal material like phages, genomic islands and plasmids.

The larger, gene-rich MGU appear to describe cores of microbial genomes, as they contain the genes essential for bacterial life. The specificity of the clustering procedure is high, as the MGS are highly consistent in gene-wise taxonomical annotation. In contrast to large MGS, many of the smaller units, containing less than 700 gene, are very significantly enriched for genes found on extrachromosomal elements, such as plasmids, bacteriophages and genomic islands, in particular CRISPR elements or restriction-modification systems. This suggests that smaller MGU might be described as accessories, clonal variations or parasites to the MGS (see below). The most common type appears to be 923 small MGU enriched for genes with strong similarity to known bacteriophages.

De novo assembly of metagenomic data is typically complicated by similar sequences from the multitude of organisms present in an ecosystem. By selectively mapping sequence reads to the contigs that carry genes assigned to a MGS, the assembly is greatly simplified. We assembled the reads mapped to each of the 741 MGS; 247 assemblies were of a high quality draft genomes. 44 of these have closely related reference genomes and on average show 98.4% identity to them, which illustrates the specificity of our assembly approach. This effort all but doubled the number of gut microbial species with sequenced genomes and has opened avenues for a more detailed understanding of the gut microbial communities.

Clearly, MGS correspond to autonomously living bacterial species, but the small MGU cannot conceivably exist on their own and must therefore depend on the autonomous MGS. This dependence can be detected by absence of a dependent small MGU from individuals that lack a MGS which supports that MGU, as illustrated with the example of MGU2350 that is found only in samples that contain MGS135. Significant dependency relations were systematically identified; the resulting directional network contained 882 dependencies between 1,205 MGU and was dominated by sub-networks, most of which were centered on an MGS. Genes from the MGS and the dependent MGU, were found on the same contigs in many cases (413), showing that the two physically interact.

Dependency relations allow connecting MGU into sub-networks, which guide explorations and understanding of the parts. For example, the dependency sub-network centered on MGS:135 contains eight dependencies to MGS:135. In this sub-network MGU:3731 is strongly enriched for phage genes and MGU:4011 for CRISPR associated genes and carries a CRISPR repeat region. Interestingly, the sample-wise detection of the CRISPR and phage MGU are anti-correlated. Consistent with this, one of the CRISPR spacers of MGU:4011, which guides sequence specific protection against alien DNA, has a 15 bp sequence match to MGU:3731, suggesting a causal mechanism for the observed anti-correlation.

The co-abundance approach is entirely novel and offers an unsupervised way of deciphering the structure of microbial communities directly from metagenomic data and allowed us to organize 40% of a non-redundant catalogue of 3.9M genes into 7,381 MGU. This reduces the complexity of the data very significantly while at the same time elucidating details important for rationalizing the content of the gut microbial community and should enable biomarker detection and association analysis. Furthermore, the organization of the microbial genes into MGS and associated dependent MGU facilitates a higher resolution in associating genetic elements to diseases, for example. Small MGUs capture strain variations as independent entities that may be associated to the disease. The 2011 European E. coli food poisoning outbreak serves as an example where a few critical virulence factors, including the Shiga toxin 2-encoding prophage, an additional critical high pathogeniticy island and a set of virulence genes, discriminate this food poisoning O104:H4 strain from commensal and other less pathogenic E. coli strains23. Beyond association to a disease, the structuring of the catalog we achieved reveals a view into a highly dynamic genome structure, composed of the well-organized core and the variable elements, of the gut microbial species of unprecedented detail.

The MGS approach promises to identify known and uncharacterized abundance coherent species. In the present human gut microbiome data about 600 MGS had no or little species level sequence similarity to previously sequenced species. Phylogenetic analysis of a set of single copy proteins suggests that 27 of the MGS that can be assembled to high quality draft genome form a distinct and deep monophyletic group under the Mollicutes. Interestingly, this clade, like its sister branch of previously sequenced intercellular-parasitic Acholeplasmataceae have very low GC content, fewer cell wall related genes and low gene count. Future work should clarify physiological consequences of the presence of these bacteria for their hosts.

While the co-abundance clustering extends the percentage of genes that can be clustered, from ap-proximately 10% using reference genomes up to the 40% captured by the MGU, there is still a significant fraction of the 3.9M genes that is not clustered. 18% of these genes were considered too rare to form reliable clusters and, therefore, were excluded from the high confidence canopy clustering presented here. The remaining 42% are on average less abundant than the clustered genes. Enlarging the gene catalogue and analyzing more individuals is likely to allow associating many of these into novel and interesting MGU. Interestingly, genes involved in resistance to antibiotics had distinct single gene abundance profiles (except some vancomycin resistance genes), despite a high presence in the samples. This is in line with the fact that most antibiotic genes, except vancomycin re-sistance genes, are known to single-handedly provide antibiotic resistance and suggests that some genes may be highly dynamic and perhaps are best understood non-contextually, at the single gene level.

Richness of human gut microbial communities correlates with metabolic markers

Modern living has resulted in an epidemic of metabolic disorders characterized by a core of excessive body fat accumulation. Some individuals seem to be more susceptible to the 'obesogenic' environment of modern living, suggesting an important inherited component, supported by several twin- family- and adoption studies, with heritability estimates ranging from 40-70%3-5. Studies of variation in the human genome have so far resulted in the discovery of 31 validated genome-wide significant loci associated with measures of overall adiposity and 17 loci associated with visceral fat accumulation. Yet, the proportion of explained genetic variance of body mass index (BMI) remains low, i.e. a few percent. Emerging evidence suggests, however, that variation in our other genome may have an even greater role than human genome variation in the pathogenesis of obesity. To examine this hypothesis we undertook Quantitative metagenomic analysis of a study population of 292 Danish individuals.

Surprisingly, comparison of gene profiles across showed a bimodal distribution of bacterial genes, 23% of individuals having less than 480 000 genes. We term these 'low gene count' (LGC) and others 'high gene count' (HGC) individuals. They had, on average, 380 K and 640 K genes, a difference of some 40%. This corresponds to a less or more rich microbiota, respectively.

We determined the enterotype of the individuals in our cohort and found that enterotype distribution greatly varies with the gene count. Strikingly, 81.3% of the LGC individuals belonged to the Bacteroides-driven enterotype 1 while 63.4% of the HGC individuals belonged to enterotype 3 in which Ruminococcus was shown to be over-represented but which correlates even better with Methanobrevibacter in the present, larger, dataset.

Both the difference in gene number and the stratification by enterotypes indicate that the LGC and HGC individuals harbor different microbial communities. In order to assess the difference in phylogenetic composition between the two, we combined reference genome mapping with gene abundance data at phylum, genus and species level.

We first examined the general phylogenetic composition at higher taxonomic levels based upon sample-wise rarefied read abundances that were mapped on publicly available reference genomes and binned at genus and phylum level. 51 genera differed significantly in abundance between the HGC and LGC individuals. While Bacteroides, Parabacteroides, Ruminococcus (specifically R. torques and R. gnavus), Campylobacter, Dialister, Porphyromonas, Staphylococcus and Anaerostipes were more dominant in LGC, 41 genera, including Faecalibacterium, Bifidobacterium, Lactobacillus, Butyrivibrio, Alistipes, Akkermansia, Coprococcus, and Methanobrevibacter were significantly associated with HGC.

Next, we studied the specific species that were differentially abundant between LGC and HGC individuals. To this aim, we used a novel, gene-centric approach that enables the visualization of individual-based patterns and avoids artifacts from incomplete genome coverage. In this approach, we identified the genes that were significantly different between the LGC and HGC individuals by the Wilcoxon rank sum test, comparing 204 (70% of total) randomly chosen individuals 30 times. 120,723 genes were found in all 60 tests at p less than 0.0001 and were analyzed further.

We searched for genes that could belong to the same species, by comparing them to all sequenced genomes. At a threshold of 95% identity (the species-level cut-off) over at least 90% of the gene length, 10,225 genes (8.5%) were assigned to a total of 97 genomes representing some 73 species. However, a vast majority (93.4%) belonged to only 9 species, which were all Firmicutes with a single exception of the main human methanogen, M. smithii. The corresponding species varied significantly in abundance between the LGC and HGC individuals where the presence and abundance of 50 arbitrarily chosen genes from each of the 9 species in the individuals of the cohort is displayed.

Taken together, our analyses highlight the contrast between the distribution of anti-inflammatory species, such as Faecalibacterium prausnitzii and Roseburia inulinivorans32,33, which are more prevalent in HGC individuals and potentially pro-inflammatory, Bacteroides and R.gnavus associated with IBD and found to be more frequent in LGC individuals.

However, a vast majority (greater than 90%) of the 120,723 genes with significantly differing abundances in the LGC and HGC gene individuals could not be assigned to a sequenced bacterial genome. These genes must also belong to bacterial species that are present at different abundances in the two types of individuals. We thus attempted to cluster the genes from the same species by a gene abundance-based approach (this approach, applied systematically to the entire gene catalog is documented in the preceding section). 57% of genes were grouped in only 58 clusters that contained greater than 75 genes. They included 6 of the 9 taxonomically characterized species; the other clusters contain genes from previously unknown species. The genes of a group can be used as tracers of a species; the species have a clearly biased distribution among the HGS and LGC individuals.

LGC and HGC individuals can accurately be distinguished by bacterial species they harbor, as shown by a receiver-operator characteristic (ROC) analysis. First, we estimated the abundance of 58 species that were significantly different between LGC and HGC individuals. For each individual, we used these values to compute a score, named Decisive-Bacterial-Abundance (DBA) score, equal to the sum of abundances of the species more frequent in HGC individuals subtracted by the sum of the abundances of species more frequent in LGC individuals. The DBA scores were calculated exhaustively for all combinations of up to 23 species. The area under curve (AUC) values for the best combinations reached the values of .98 for the best combination of 4 or more species, indicating an almost perfect differentiation of the LGC and HGC people.

49 gut metabolic pathway modules differed significantly in abundance between LGC and the HGC individuals. LGC individuals appeared to have increased capacity to handle exposure to oxygen/oxidative stress and to produce metabolites with possible deleterious effects on host). In contrast, HGC individuals were characterized by a potentially increased production of organic acids - including lactate, propionate, and butyrate required by the enterocytes; a shift from a methanogenic/acetogenic ecosystem in HGC individuals toward a sulphate reducing in LGC ones might take place.

The LGC individuals, who represented 23% of the total study population, included a significantly higher proportion of obese participants and were as a group characterized by a more marked adiposity. They had elevated serum leptin, decreased serum adiponectin, insulin resistance, hyperinsulinaemia, elevated levels of triglycerides and free fatty acids, decreased HDL-cholesterol and a more marked inflammatory phenotype than the HGC individuals. These analyses suggest that the LGC individuals are featured by metabolic disturbances known to bring them at increased risk of prediabetes, type 2 diabetes and ischaemic cardiovascular disorders.

Based upon these results we hypothesize that an imbalance of potentially pro- and anti-inflammatory bacterial species triggers low-grade inflammation and insulin resistance. In parallel, we suggest that an altered gut microbiota of LGC individuals induces the noted increase in serum FIAF levels, eliciting an elevated release of triglycerides and FFA, as evidenced by studies in rodent models.

Interestingly, obese individuals who belonged to the LGC group had gained significantly more weight than those of the HGC group during the past 9 years; the BMI change was significant without and with linear adjustment for baseline BMI and age. No significant difference was observed for lean individuals. We found 8 species significantly associated with change in BMI. The average weight gain of individuals with the lowest or undetectable levels of a species was in all cases greater than that of their counterparts with the highest species levels; all 8 species were more abundant in high than in low gene individuals, consistent with the overall association of the BMI change and gene abundance. These 8 species may therefore protect against weight gain.

We also assess the difference in bacterial species between the lean (BMI less than 25 kg/m2, n=96) and obese (BMI greater than 30 kg/m2, n=169) individuals. 18 species had a significantly different distribution among the lean and obese individuals, 14 being more frequent among the former and 4 among the latter. The best AUC value in ROC analysis, 0.78 was obtained with 9 species. This accuracy, albeit lower than that for the separation of LGC and HGC individuals, is substantially better than AUC of 0.58 achieved by a ROC analysis of 32 human genome loci associated with adiposity measures. Accordingly, we suggest that the obesity-associated signal in the human gut microbiome may be much stronger than that presently known in the human genome.

The differences of the microbial composition linked to enterotypes appeared to mask, in part, those linked to obesity and thus affected differentiation of the lean and obese individuals, as a ROC analysis of enterotype-stratified individuals gave substantially higher AUC values than those observed without stratification culminating at 0.98 for the Prevotella enterotype. Stratification by enterotypes therefore appears to substantially increase the ability to differentiate lean from obese individuals by the gut microbial species they harbor. It is likely that a similar stratification will be important in the analysis of other pathologies.

When comparing the Bacteroides-enriched enterotype individuals accounting for 29% of the examined population with the Ruminococcus- and Methanobrevibacter-enriched enterotype individuals (51%), the former were characterized by increased insulin resistance, white blood cell count and hsCRP. Such phenotype features of aggravated dys-metabolism and inflammation of Bacteroides-enriched individuals are in several respects similar to what is characteristic of the LGC individuals. In addition, they show evidence of fatty liver, as reflected in increased circulating level of alanine amino transferase, bringing them at higher risk of non-alcoholic liver disorder. Clearly, stratification of gut ecosystems by enterotypes might be of importance as an assessment tool to identify those at higher risk of health threats.

Contemporary lifestyle is associated with a tide of metabolic abnormalities characterized by a core of excessive body fat accumulation. However, obesity is not just obesity. Some obese individuals appear to have a benign prognosis whereas others progress to co-morbidities such as type 2 diabetes, ischaemic cardio- and cerebrovascular disorders and non-alcoholic liver disorders. It is also recognized that human obesity in the context of pathogenesis, pathophysiology, and therapeutic responsiveness is a heterogeneous condition. Our research provides evidence that studies of alterations in 'our other genome'- the microbial gut metagenome - may define subsets of individuals with different metabolic risk profiles and thereby contribute to resolve some of the heterogeneity associated with adiposity-related phenotypes.

We demonstrate that an almost perfect stratification of LGC and HGC individuals can be achieved with a very few bacterial species, suggesting that simple molecular diagnostic tests, based on our other genome, can be developed to identify individuals at risk of common morbidities. Therefore, focus on our other genome, which in some respects appears to be more informative than our own, may spearhead development of stratified approaches for treatment and prevention of widespread chronic disorders.

Beyond metabolic dysfunctions, low-grade inflammation as seen in LGC individuals with and without obesity is associated with a plethora of other chronic diseases, which are steadily rising. Whether a low gut bacterial richness is common to many or even all of those, as already reported for IBD could be revealed by exploring gut microbiota at a deep metagenomic level in a broad variety of these afflictions.

A manuscript describing these studies is presently under revision.

Low bacterial diversity aggravates Ulcerative Colitis

Inflammatory Bowel Diseases (ulcerative colitis, UC, Crohn's disease, CD) are not rare. Incidence sharply increased since 1950 in Western Europe and North America, where UC and CD currently affect 0.5% of the population. Same rising trend is now being reported in Eastern Europe and Asia). Predisposition to UC or CD is linked to a number of genetic traits recently identified; however, only environmental factors can explain the rapid changes observed over the past few decades. In UC and CD, intestinal lesions are produced by aberrant immuno-inflammatory responses against resident bacteria. In order to asses microbial differences between healthy individuals and the UC patients we first studied 26 patients in remission and 32 healthy family matched controls by quantitative metagenomics.

As expected from previous studies, the gene richness was significantly lower in the patients, notwithstanding their remission state. Unexpectedly, the patients with the lowest richness at the time of analysis (less than 360 000 genes) underwent much more frequently a relapse in their prior history than the patients with the richness comparable to that of healthy individuals (greater than 600 000 genes) - a difference was 2.5 vs less than 1 relapse per year (p less than 0.03). Clearly, the severity of the disease, as assessed with relapse frequency, is higher in low than in the high richness individuals.

To further explore association of the low gene richness and the severity of UC we followed the patients over a year period, examining in parallel every 3 months their clinical condition and their microbiome. 15 patients relapsed over this period; their gene richness decreased severely, by about a third (p less than 0.0004). In contrast the patients that did not relapse had not changed significantly their gene richness. Nevertheless, a trend of decrease of gene number was observed, suggesting that they may be losing their richness and are on the way to relapse. In any case, this longitudinal study supports the association of low richness and disease severity.

We found 12 species significantly different among the healthy individuals and patients, most having no species-level taxonomic assignment. As expected, Faecalibacterium prausnitzii was more abundant in the former, but Akkermansia muciniphila displayed a similar prevalence. We confirmed this difference for the latter bacterium by exploration of several hundred individuals, using the sensitive Q-PCR technology. This observation opens avenues to exploring the potentially protective role of this bacterium.

The differentiating metaspecies were found to be highly discriminant for the UC and healthy individuals, in a ROC analysis, where a value of 0.85 was reached for a model based on 6 species. Similarly, the 5 species that had a different abundance in patients relapsing at a high (greater than 1 year) and low (less than 1 yr) frequency, could distinguish the two types of individuals with an accuracy of about 93 %. This raises a possibility of predicting a relapse, by quantitative metagenomics analysis of gut microbiota.

Loss of richness at relapse indicates a change of gut microbial communities. While the number of individuals that could be included in the study was too low to detect a significantly different microbial populations at relapse, a hypothesis could be raised that the stabilization of fluctuation of gut communities in the UC patients may reduce the number of incidences of relapse. In other terms, if the stability of the microbiota in the UC patients, which is known to be lower than in the healthy individuals, could be improved, the severity of the disease could perhaps be attenuated. However, currently there are no treatments specifically directed towards stabilization of the gut microbiota. We decided to examine the effect of a probiotic, delivered as a fermented milk product (FMP) to patients over a 12 week period on the microbiota stability.

The study design was to treat 24 patients in remission with a FMP and another 24 with placebo, with 12 healthy individuals as controls, and to compare the microbiota at the entry and exit time points. For this purpose the frequency of greater than 700 MGS was assessed by quantitative metagenomic and a standard Spearman correlation coefficient used as a measure of population similarity. First, the stability of microbiota in patients taking the placebo was significantly lower than that of the healthy individuals (p=0.01) confirming previous observations of higher fluctuations in patients. However, the difference between the stability in placebo and test patients was not significant (p=0.06) even if a trend towards stabilization was observed. Importantly, the difference among the high gene richness placebo and test patients was very significant (p=0.01) the latter reaching the stability indistinguishable from that of the healthy individuals. In contrast, there was no effect of the FMP on the stability of low gene richness patients. We conclude that for a milder form of the disease a probiotic treatment can improve the stability. It remains to be determined whether the increase is correlated with the improvement of the clinical outcome. To the contrary, the FMP treatment has no effect in a more severe disease form. These results indicate that the microbiota analysis has a great potential for stratification of responders and non-responders to a treatment, a premice for the development of the personalized medicine.

Bacteria-host interactions

We focused on the genes that may be involved in the interaction between bacteria and ourselves. To identify these genes we developed procedures allowing monitoring the response of human cell lines when brought in contact with the genes present in gut bacteria. A two-pronged approach was followed. On the one hand, we constructed a large collection of genes from intestinal bacteria in a standard E. coli cloning host. The collection comprises over 200 000 clones, and a total of over 8 million genes from intestinal bacteria. On the other, we established 16 different screens, based on the human cell lines carrying various reporter genes. We validated our approach in a pilot study with 5000 bacterial clones, where a high-throughput robot was used to carry out 25 000 individual tests. About a dozen clones were found to induce a significant cellular response.

We further tested, in culture, the effect of 5 clones on the dendritic cells (DC), either directly or indirectly via intestinal epithelial cells (IEC). DCs are thought to be among central elements of the immune system response. The metagenomic clones differently regulated gene expression in IECs, and their impact on IECs clearly conditioned DCs response. In addition, one clone directly affected the DCs. Analysis of these clones may lead to a novel understanding of interactions between the bacteria that we host in our gut and us.

Meeting and bypassing the objectives

In retrospect, all of the initial objectives of METAHIT were reached. We created a reference set of genes and genomes of intestinal microbes and the generic tools to study the variation of human intestinal microbiota, based on the reference set. We used the tools to search for correlations between the presence of specific genes in the intestinal metagenome and health and disease states and found such correlations. We also created a database to store and organize the heterogeneous information generated within the project and enriched it by information from outside of the project and notably HMP and the bioinformatics tools to carry out the meta-analysis of the information. Finally, we studied the function of the microbial genes correlated with the disease, with the focus on host-microbe interactions and found interesting gene candidates.

But we also went beyond the initial objectives, as we discovered enterotypes, deemed to be among the ten scientific breakthroughs of last year. We also succeeded in organizing the catalogue into meaningful metagenomic unitis, centered on large metagenomic species, most of which have no close well-characterized relatives.

Potential Impact:

The impact of the project bears on three domains, the scientific community, the industry, and society. Concerning the scientific community, the impact of the project can be analysed through the use of foreground generated in further activities, and the analysis of necessary accompanying actions, such as the question of standards used by the community worldwide. In relation with industry, our activities are translated into a measurable increase of the interest of industrial partners in the use of foreground (knowledge) generated, as well as applying the technological developments (know-how) to explore health and nutrition questions of foremost importance nowadays. Finally, the impact of our communication in general, and towards the general public more specifically, was seen through the interest we perceived in communication through traditional media, such as general, non specialist press, as well national press.

The scientific community

At project onset, metagenomics was still a relatively new field. While some research on the characterisation of gut microbial communities by molecular techniques had already been published, and was the object of many discussions in the scientific community, the approach mostly used was applied to a relatively small number of patients and or healthy volunteers, and the technology used consisted mainly of 16S RNA sequencing. The approach chosen by the METAHIT consortium differed largely, in that the technique we proposed to implement allowed a more complete coverage of species present in the human gut microbiota, lessening the potential bias introduced by 16S sequencing, and an analysis going beyond the presence or absence of species, a simple yes-no signal, by enabling a count, and thus giving an image of the relative species abundance. Furthermore, the development of new sequencing technologies (New Generation Sequences, NGS) entailed a notable decrease in sequencing costs, which allowed us to consider the analysis of a larger number of patients and matching healthy volunteers in the studies undertaken within METAHIT. The size of the cohort is in an important criterion to assess the relative statistic value of the results obtained - in other words, numbers count. The METAHIT approach opened new avenues in terms of biomarkers for health and disease and thus represents a major step towards personalised medicine.

The publication of our first results in Nature (A gene catalogue of the human gut microbiome, 2010; Enterotypes of the human gut microbiome, 2011) placed the METAHIT project at a level of recognition at least at a par with the studies financed by the National Institutes of Health in the United States, through the Human Microbiome Project (HMP), highlighting the role of the European scientific community. The enterotype discovery reinforced this visibility. We anticipate that our leading position will be further strengthened by the publications of the new foreground generated within the METAHIT project, and which are still jointly prepared by the Consortium members. This work concerns the obesity and the IBD studies, as well as work of more general interest stemming from our approach (metagenomics species) and is referred to in the previous section of the report.

To take advantage of the present European position, we need to further develop three aspects in the coming years. First of all, the generated foreground could be strengthened through further exchanges with clinicians, potentially from additional geographical sources. It would also certainly be beneficial to increase the awareness of clinicians in other fields of the potential of the technology, and address other pathologies. These actions need to be sustained by actions towards the general public and patients to enhance their awareness of the role of the gut microbiota, the studies and their potential uses, both in health and disease. Finally, the contacts which were initiated with the industry, both in the pharmaceutical and the nutrition domains, should be further developed, to open new avenues of exploitation of the know-how gained by the members of the consortium. The activities of the consortium during the project already opened these three-pronged developments and are highlighted below.

An important aspect of research within the international community lies in exchanges and, when possible, coordination. In this respect, METAHIT was actively represented in the construction and activities of the IHMC, which was officially launched during the consortium second meeting organised in Heidelberg in September 2008. Partners of the METAHIT consortium were present in each of the committee set-up (patient data release, whole genome sequencing, data repository). Furthermore, the European Commission and the coordinator was co-chair of the consortium for its first and third year. To cite some examples of the consortium activities, whole genome sequencing was regularly discussed and the list of strains sequenced regularly updated to avoid as much as possible duplicated work performed and financed through public sources. Data repositories were discussed to help the community access the data generated in different contexts, and ensure that comparison would be possible. Such endeavours are not unique, but rare enough to be cited. Avoiding duplication of efforts should be a shared aim, especially in a field like human metagenomics, where resources, while significant, are still not sufficient to tackle all the potential research to be developed world-wide. The work done in this context by consortium members was certainly important, and helps in the development of the field. It is further implemented through the IHMS project, which aim is to compare present protocols and develop standards operating procedures (SOPs) to be made available to the scientific community. One of its first aims was to try to compare protocols for DNA preparation, used both in the HMP projects and METAHIT, as well as in other smaller endeavours, including through existing commercial products. While it is not here the place to communicate on this project, outside of the scope of METAHIT, we can certainly mention differences that were noted, and further exchanges on the subject should ensure better integration of data gathered world-wide.

Dissemination activities

Among the project dissemination activities, described in the grant agreement, figured the organisation of two conferences (communication towards the scientific community) and a web site (communication towards the general public), which is now usual for European projects. The third activity, communication with the industry, is developed in the final section of this report. The consortium as a whole participated in the two aspects, which were treated as a priority.

Conferences

The project included the organisation of two international conferences. These were placed under the banner of the International Human Microbiome Consortium (IHMC), and discussed with its executive committee. The first conference dedicated to the human microbiome was organised by the METAHIT consortium In Shenzhen, China in March 2010. It gathered some 220 participants, a majority of which were delegates from Europe and the United States, but nevertheless represented 27 different countries. The second international conference under the banner of the IHMC was organised by the NIH in Vancouver (Canada) in March 2011, with a large participation of METAHIT investigators, both at the level of organizing committee and invited speakers. It gathered some 400 participants, reflecting the increasing importance of the field, expected by the community. The third conference was again organised by METAHIT, in Paris in March 2012. More than 600 participants were registered (the total number was limited by the regulations applying to the conference location, and many potential participants had to be turned down). While a majority was understandably European, all continents were represented. Details on the participation are presented in the IHMC newsletter, accessible from its web site (see http://www.human-microbiome.org online). More than 12% of the participants were delegates from companies, which is both a good participation and a noticeable increase compared to previous occasions. Interest of our economical partners was also shown is the strong financial support which was obtained and represented nearly a third of the budget for the specific expenses. Further to the necessary exchanges between the actors of the field, it was the occasion for various efforts in communication with the general public, stressed in the following paragraphs.

From web to web 2.0

While it is quite hard to evaluate the impact of such actions, a short description of what the consortium did, completed with some figures may give elements towards the assessment of their potential impact. The web site, which was opened in April 2008, received a sufficient number of visitors to attract our attention. Peaks in visitors' number were noted at each event publicised, such as the two Nature publications, as well as the two international conferences. We used tools to survey the connections, but these don’t allow an analysis separating visitors from the scientific community from visitors from the general public. Nevertheless, we received some 20 letters/emails from patients requesting help, advice, or some explanation, as well as questions on the availability of the analyses performed in our project through their regular health services.

Communication through video

Two films were made during the first year of the project. To build on this action (one of the films was presented on a French television channel), an English version was prepared, and the two films made available on a You Tube channel created for the project. Further to this, METAHIT was selected as a subject for a communication project financed by the Commission, and the film produced was also made available on the same You Tube channel. More elements were created, interviews of work package leaders in their respective country, as well as a major effort during the 2012 conference. This consisted in the opening talks, short interviews of both scientists and institutional representatives, and a film of the round-table discussion which concluded the 2012 conference. In total, the visual elements were seen more than 7000 times, which we believe constitute a fair number for a subject which was not known to the general public a few years back, and is still at the onset of its development.

Communication through traditional media

Five events were publicised by the coordinator, and the messages were taken up by the participants, and translated in their respective languages for communication with the local press. These events were the kick-off meeting (2008), launching the International Human Microbiome Consortium (2008), the two Nature publications (2010, 2011) and the conference organised in Paris in March 2012 (the first conference was in conjunction with the first Nature paper, and both events were cited in the communication). In all cases, the communication material was transmitted to the press, and the content taken up in various proportions by national and local newspapers. To cite some examples, the project was cited in Le Figaro, le monde, les echos, le temps, the independant, business week, el mundo, la republicca, etc. Futhermore, the coverage in the national press enabled us to reach a more specialised press, either scientific editions for the general public, or press for a specific audience, such as biotechnology industry or clinicians. The effectiveness of coverage in the press may also be measured in the repetition (new events - new communication), which was generally the case from the first Nature paper, and from the interest of radio and TV channels (Arte, and more recently a Swiss television channel to cite just two examples). Our aim is now to maintain the momentum as much as possible, and continue our actions on the occasion of the publication of foreground generated during the project and still not published.

To complete this overview, one final action should be noted, towards volunteers enrolled by the clinicians for our studies. This communication is necessary, and part of the physicians work in their communication with the patients and their families. It takes place during patients-physicians appointment, but can also take the form of a general presentation. Such an occasion was organised by HUVH this year, and received a wide success, noted in the fact that the conference room used for such occasions at the hospital was on this particular occasion crowded, strengthening our analysis of the necessity of such events in the framework of health projects. The variety of questions also reflected the interest for the approach.

Exploitation of the results

One of the first steps towards the exploitation of the results lies in the transfer of the knowledge to companies, and before that can occur in a rather novel field such as human metagenomics, to the transmission of general information on the technique and its use. To facilitate this, METAHIT organised two meetings with a stakeholder's platform. The meetings were organised over a single day, to facilitate participation of industry representatives, and included presentations prepared specifically by members of the consortium, comprising from the latest results to concepts in a way that could be apprehended by non-specialists. Furthermore, the second meeting included a form of speed-dating, to allow less formal exchanges between scientists from the consortium and industry participants, and, potentially, raising questions which can be construed as indicators of the company interest in a less public framework. One event was organised in Barcelona in 2009, the second in Milano in 2011. Participation was limited (between 15 and 20 guests), but highly of companies important stakeholders in the food, biotech and pharmaceutical industry in Europe. Contacts were developed and several bilateral projects were first discussed on these occasion. Considering the lag usually existing between research and developments in industry, such events should be considered as contributing steps towards the exploitation of foreground.

The next critical step consists in defining units which can be transferred to the industry. In this respect, the activities of the consortium are highlighted in the four patent applications which were made, two of which are reported in the report in detail, the next two which are still confidential and on which data will not be released until the publication of the applications. The importance of the applications is twofold: demonstrate the ability to generate novel foreground, in a way recognisable by the community, and identify elements which can represent a starting point to industrial development.

Products of the project fall in several categories, including concerning human health the biomarker market, diagnostic and prognostic, as well as functional food. In total, these fields are valued in 2012 at slightly more than EUR 150 billion, with an annual growth between 5 and 15% depending on the specific target. The development of biomarkers, and of the technology developed in the field opens the potential for its use in personalised medicine. Valued at EUR 180 billion in 2012, it represents the most important opening. This market is strongly impacted by the change in regulation and the increased need for analysis of its component and their effect. Our present aim is to construct activities towards the exploitation of our patent applications.

The interest of the industry was already shown in the specific construction of the project, in which our partners didn't benefit from the financial support of the Commission, but on the contrary supported, in part, the research contracted. The interest was further confirmed through the addition to the consortium of a third European company, through an amendment, under similar conditions. One outcome of the project concerns the use of the foreground generated by the partners in bilateral agreement with industry. While such projects may not be reported on in a public report, their existence has been advertised in various instances by the individual partners. Each of these agreements, based on a specific application of the knowhow gained through the project, concerns a specific interest of an industrial partner, and an application of a new approach to concerns which are both societal and industrial in the fields of nutrition and health.

Two more examples of exploitation of the results can be briefly mentioned. The first which was advertised during the 2012 conference is the creation of a start-up company, Enterome, in connection with one the project partners. The agreement is based on the knowhow generated, and highlights the interest of the knowledge which can be generated by the approach, through the financing obtained by the company for its activity; EUR 7.5 million were raised in the A round. The company specialises in the development of new knowledge in connection with the intestinal microbiota in specific diseases. The second example is the creation of a platform dedicated to human metagenomics, and more specifically through the analysis of the microbiota in the intestinal tract, and its interaction with the human host. The platform, named Metagenopolis, was launched in august 2012, with a public funding (Investissement d'Avenir of the French government) of 19 M EUROS. Its mission is to collaborate with the scientific community and industrial partners, with a particular stress on the later, since the public funding of the platform should be matched, according to the financial rules of the grant, by equivalent private sources, including through bilateral research projects. The project is too new to allow for any reporting but its evaluation, by international experts, judged that the field was sufficiently mature, and the knowhow developed promising enough to analyse positively the chances of success.

Project website:

http://www.metahit.eu
Twitter: @metahit https://twitter.com/yowino
Facebook page: https://www.facebook.com/pages/metahit/195087858830
YouTube channel: http://www.youtube.com/user/microbiome
Netvibes dashboard: http://www.netvibes.com/metahit