Skip to main content

Functional annotation of microbial communities in human health and in the environment

Final Report Summary - SIFTERMETAGENOMICS (Functional annotation of microbial communities in human health and in the environment)

Statistical inference of function through evolutionary relationships (SIFTER) is a tool for automated phylogenetic-based protein function prediction. From a functionally-decorated reconciled evolutionary tree, SIFTER infers molecular functions for all proteins in a protein family, using Bayesian inference and a simple model of protein function evolution. In this model, every molecular function can evolve to every other function seen in the family and the likelihood of change is related to estimated mutability of the pair of molecular functions, tree distance and whether proteins are related by speciation or duplication events. Thus, SIFTER makes use of all members of a protein family and their detailed evolutionary relationships to provide a statistical estimate of function for each protein within a family. To test the functionality of the programme, we participated in the 'Critical assessment of function prediction' experiment in 2011. SIFTER was amongst the top ten performing algorithms as shown in the CAFA results, published in 'Nature Methods' in 2013. The SIFTER project provided the researcher with the planned training opportunities to study computer science and learn to program, and allowed her to deepen her knowledge in phylogenetic analysis and molecular evolution in general.

Crohn's disease is chronic inflammatory disease of the digestive tract, and during the past decades, the incidence of Crohn's disease has increased. It is now widely accepted that Crohn's disease involves an imbalance in the intestinal microbial ecosystem and it seems that the commensal microbiota play a key role in disease onset and progression. Some Crohn's disease patients require surgical resection of the terminal ileum to remove evident inflammation, however, there is a risk of recurrence after the surgery. In order to investigate whether changes associated with remission or recurrence of inflammation are related to differences in the composition of the gut microbiota in Crohn's disease patients and healthy individuals, we profiled the gut microbiota of healthy individuals and patients. We performed 16S ribosomal profiling of 6 samples from Crohn's patients undergoing resection of diseased terminal ileum, and looked for changes associated with remission or recurrence of inflammation. We additionally characterised the mucosal composition of 40 Crohn's and control patients undergoing surgery or colonoscopy. The profiling shows that the microbiota of Crohn's patients, who remained in remission, were more similar to the controls at the time of surgery, compared to Crohn's patients who suffered of subsequent recurrence of the disease. Further, we observed that patients, who remained in remission exhibited greater stability of the microbiota through time. Based on our observations, we suggest that profiling the gut microbiota may be useful in planning the treatment of Crohn's disease patients that undergo surgery.

The researcher developed and organised the first two instances of the Critical assessment of genome interpretation (CAGI) experiment in 2010 and 2011. CAGI is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. In this assessment, participants are provided genetic variants and make predictions of resulting phenotype. These predictions are evaluated against experimental characterisations by independent assessors. The CAGI experiment culminates with a community workshop and publications to disseminate results, assess our collective ability to make accurate and meaningful phenotypic predictions, and better understand progress in the field. A long-term goal for CAGI is to improve the accuracy of phenotype and disease predictions in clinical settings. The CAGI website with experiment pages can be found at

The results of CAGI 2010 and 2011 show that the most widely used methods of genome interpretation do not necessary provide the best predictions, while newer methods provide advantages that should be taken into account when applying genome interpretation methods in basic and clinical research. CAGI also explored genome-scale data, showing unexpected successes in predicting Crohn's disease from exomes, as well as disappointing failures in using genome and transcriptome data to distinguish discordant monozygotic twins with asthma. Predictors had promising complementary approaches in predicting distinct response of breast cancer cell lines to a panel of drugs. Predictors also made measurable progress in predicting a diversity of phenotypes present in the PERSONAL GENOME project participants. In all, the results from CAGI will help the broader community, such as clinicians and genetic counsellors, understand the appropriate level of confidence they should have in variant prediction methods, and which classes of approaches are most suitable to a particular application. The manuscript on CAGI results is currently under preparation.

Considering the training objectives within the fellowship, the CAGI project taught the researcher very essential skills. She gained invaluable training in project coordination, communication and collaboration with international research groups, and the international research contacts made while organising CAGI will be very valuable for her future career.