Skip to main content
European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS
Zawartość zarchiwizowana w dniu 2024-06-18

New methods to evaluate the impact of single point protein mutation on human health

Final Report Summary - MUT2DIS (New methods to evaluate the impact of single point protein mutation on human health)

Project objectives

The main aims of our proposal are:
-i. Study and characterisation of the rate of evolution of single nucleotide polymorphisms and their effect in human disease;
-ii. Study and characterisation of the structural determinants of human disease;
-iii. Development of new general machine learning methods for disease prediction;
-iv. Development of disease-specific predictors;
-v. Development of a World Wide Web server for predicting the likelihood of a SNP variant to be associated with human disease. These five aims correspond to six different tasks to be accomplished in 36 months, all of which were achieved.

Work performed

During the first 24 months of the project (outgoing phase) Dr Emidio Capriotti selected a set of annotated missense single nucleotide variants (mSNVs) from the database SwissVar. The dataset used in this work was downloaded at the end of October 2009. For the selection of the subset of mSNVs for which the three-dimensional structure of the proteins is known, the fellow implemented programmes able to automatically compare the sequences of the mutated proteins with the sequences of the protein collected in the Protein Data Bank (PDB).

In the second phase, the fellow performed an evolutionary analysis calculating the selective pressure acting at codon level, using alignments between the human DNA sequences and their homolog in mammalian species.

In successive steps, Dr Capriotti built a machine-learning base approach to predict the impact of mSNVs evaluating the discriminative power of different features. In these algorithms we included features from sequence analysis such as evolutionary and functional information and protein structure information. In the last period, a disease-specific method was developed to predict the cancer-causing mSNVs.

Project results

During the last 12 months of the project (returning phase) Dr Emidio Capriotti developed a new method for the prediction of disease-specific mutations focusing on cancer. In addition, he implemented two different web servers for the predictions of disease-related variants. In detail, he selected a manually curated set of cancer driver missense mSNVs. This dataset was previously used to train another method (Carter et al., Cancer Research 2009) and analysed it by performing a sequence analysis of the protein under mutation. For each protein the sequence profile has been calculated using similar protein retrieved using the BLAST algorithm. Using all sequence information previously calculated, the researcher developed a machine-learning approach to discriminate between cancer causing and neutral variants. For this particular task only sequence information has been used because the number of cancer mutations for which protein three-dimensional structure was available was not sufficiently abundant to train a machine-learning method. Finally, the fellow implemented two web servers: the first more general for the prediction of disease-related mSNVs and the second more specific for the detection of cancer-causing mSNVs.

During the outgoing phase the fellow achieved the first three aims of the project. In particular, the main discriminative features derived from protein sequence profile and protein structure were defined. This enabled development of a new machine-learning-based method for the prediction of deleterious variants, taking input information from the protein sequence profile, protein function and protein structure. The improvement in prediction accuracy resulting from the use of structural information has been quantitatively estimated comparing the structure-based method with a similar sequence-based tool.

The research activity performed during the returning phase reached all the aims described in our proposal. In particular, it has been demonstrated that for diseases for which a good number of annotated mutations are available it is possible to build disease-specific predictors. In particular we tested this hypothesis in the case of cancer-causing mSNVs, showing that the disease-specific method performs better than the general method. In addition, we implemented a web-available version of the method that can be used by the scientific community to evaluate possible deleterious mutations in humans.

Expected final results and potential impact

At the end of the project, we have developed a user-friendly web server interface for the prediction of the effect of mSNVs. The web tools implemented include a general method for the detection of disease-related variants that uses both protein sequence and structure information and a cancer-specific algorithm that only takes into account sequence information. A web server implementation of the cancer-specific method has been made available. In conclusion, we demonstrated that structural information is important to improve the prediction of deleterious variants. When structural information is not available but a good set of mutations have been annotated, the functional information is important to improve the performance of the predictors on a specific class of diseases. We believe that in the near future, when more mSNVs data is available, the development of disease-specific methods will be key for the development of more accurate algorithms and for understanding the disease mechanism. More details about the project are available at http://snps.uib.es/mut2dis/