Standardising phenotype vocabulary

Scientific advances, particularly in molecular biology, have amassed colossal data that is described using different phenotype forms, resulting in fragmented biological results. EU funding supported efforts to harmonise such vocabulary.

Health

Many efforts to construct standard phenotype vocabularies are constrained due to the vast quantity and complexity of data from primary literature. Semantic alignment of clinical and biomedical data resources on heritable diseases would benefit researchers as it would facilitate data integration. To address this need, the PHENOMINER (Semantic mining of phenotype associations from the biomedical literature) project aimed to use state-of-the-art text processing solutions with existing ontological resources. This data could then be integrated into a machine-understandable semantic representation and accessed via a public database. PHENOMINER successfully mined phenotype descriptions from scientific literature stored in Europe PubMed Central and found statistical associations with Mendelian diseases using data-mining technology. The Online Mendelian Inheritance in man (OMIM) database and the human phenotype ontology were some of the databases used for benchmarking. Overall, 4 898 phenotypes and 28 155 phenotype-disorder associations, an impressive collection of data sets, were found to be on par with these gold standards in terms of quality. Team members successfully generated a semantic database of automatically mined phenotypes and phenotype-disorder associations, available in two public open access repositories: GitHub(opens in new window) and Zenodo(opens in new window). Commonly used phenotype forms and novel associations with OMIM disorders could be determined through these PHENOMINER techniques. Project outcomes resulted in 13 publications in journal articles as well as conferences and several knowledge transfer activities were also undertaken. The PHENOMINER approach and database are of relevance to life scientists and clinicians involved in translational studies as well as bioinformaticians and database curators. Standardised phenotypic vocabularies could prove instrumental in discovering new therapies for diseases such as Alzheimer's and multiple sclerosis. Moreover, this hybrid approach could also prove useful in human language technologies, e-science and information retrieval.

Keywords

Discover other articles in the same domain of application

How to motivate liver disease candidates to make lifestyle changes

29 January 2024

Developing technology that will help type 1 diabetes patients forget their disease

22 August 2022

Why you should eat your vegetables

20 November 2023

Type 2 diabetes and the risk of bone fractures

20 June 2025

Project Information

PHENOMINER

Grant agreement ID: 301806

Project closed

Start date 1 November 2012

End date 30 December 2014

Funded under

Specific programme "People" implementing the Seventh Framework Programme of the European Community for research, technological development and demonstration activities (2007 to 2013)

Total cost

€ 278 807,40

EU contribution

€ 278 807,40

278 807,40

Coordinated by

EUROPEAN MOLECULAR BIOLOGY LABORATORY
Germany

Standardising phenotype vocabulary

Keywords

Discover other articles in the same domain of application

Share this page Share this page on social networks

Download Download the content of the page