Skip to main content

Article Category

Article available in the folowing languages:

Standardising phenotype vocabulary

Scientific advances, particularly in molecular biology, have amassed colossal data that is described using different phenotype forms, resulting in fragmented biological results. EU funding supported efforts to harmonise such vocabulary.


Many efforts to construct standard phenotype vocabularies are constrained due to the vast quantity and complexity of data from primary literature. Semantic alignment of clinical and biomedical data resources on heritable diseases would benefit researchers as it would facilitate data integration. To address this need, the PHENOMINER (Semantic mining of phenotype associations from the biomedical literature) project aimed to use state-of-the-art text processing solutions with existing ontological resources. This data could then be integrated into a machine-understandable semantic representation and accessed via a public database. PHENOMINER successfully mined phenotype descriptions from scientific literature stored in Europe PubMed Central and found statistical associations with Mendelian diseases using data-mining technology. The Online Mendelian Inheritance in man (OMIM) database and the human phenotype ontology were some of the databases used for benchmarking. Overall, 4 898 phenotypes and 28 155 phenotype-disorder associations, an impressive collection of data sets, were found to be on par with these gold standards in terms of quality. Team members successfully generated a semantic database of automatically mined phenotypes and phenotype-disorder associations, available in two public open access repositories: GitHub and Zenodo. Commonly used phenotype forms and novel associations with OMIM disorders could be determined through these PHENOMINER techniques. Project outcomes resulted in 13 publications in journal articles as well as conferences and several knowledge transfer activities were also undertaken. The PHENOMINER approach and database are of relevance to life scientists and clinicians involved in translational studies as well as bioinformaticians and database curators. Standardised phenotypic vocabularies could prove instrumental in discovering new therapies for diseases such as Alzheimer's and multiple sclerosis. Moreover, this hybrid approach could also prove useful in human language technologies, e-science and information retrieval.


Phenotype, phenotype vocabulary, biomedical, semantic mining, ontological resources

Discover other articles in the same domain of application