Skip to main content
European Commission logo print header

Contextual Text Mining from the Biomedical Scientific Literature

Final Report Summary - BIOLITCONTEXTMINING (Contextual Text Mining from the Biomedical Scientific Literature)

The objectives of the BioLitContextMining project can be summarized as designing methods to extract relationships among biomedical entities from text, as well as their local (e.g. interaction type) and non-local (e.g. experimental method) context information, build web-based systems to make the extracted information accessible to the biomedical scientists, and design knowledge discovery methods that utilize contextual information.

Novel methods, which advanced the state-of-the-art, for relation extraction, local and non-local context information extraction, and knowledge discovery have been developed. The contributions can be summarized as follows. (i) Interaction Network Ontology (INO) that collects and classifies over 800 interaction keywords has been developed and extended to cover complex interaction types. (ii) A literature-mining approach using INO has been developed to identify and characterize the interactions among host and Brucella genes. (iii) A novel relation and local context information extraction method has been introduced to identify the relations among brain regions. (iv) Novel methods for identifying non-local context, specifically the experimental methods used to detect protein-protein interactions in full text articles have been developed. (v) New ontology-centered methods based on the linguistic analysis of the text have been developed for extracting bacteria context information, in particular, the habitat information where they live. (vi) The project contributed to the development of two web-based systems, namely IGNET -http://ignet.hegroup.org and PHISTO - http://www.phisto.org which provide easy access to the extracted information. (vii) A knowledge discovery approach based on centrality and ontology based network discovery using literature data (CONDL) has been developed, integrated with IGNET, and used in a case study to identify fever and vaccine associated gene interaction networks. (viii) A modified Fisher’s exact test was established for knowledge discovery to analyze significantly over- and under-represented gene-gene interaction types within a specific area. (ix) New methods for drug-target interaction prediction based on ligand similarity and network analysis have been developed.

Long-lasting collaborations with researchers in Europe and USA have been established. The novel methods developed during the project and the results are published in eight peer-reviewed journal papers (Bioinformatics, BMC Bioinformatics (two papers), Biomedical Semantics (two papers), Frontiers in Microbiology (two papers), PLoS ONE), as well as six peer-reviewed conference and workshop papers. Three journal papers are under review. In addition, the researcher and her team BOUN participated in community-wide shared tasks including the BioNLP Shared Task-Bacteria Biotopes Sub-task in 2013 (http://2013.bionlp-st.org/tasks/bacteria-biotopes) and 2016 (http://2016.bionlp-st.org/tasks/bb2/bb3-evaluation) as well as in the BioC Task of the BioCreative V challenge in 2015 (http://www.biocreative.org/tasks/biocreative-v/track-1-bioc/).

The researcher has been appointed as an assistant professor in the Department of Computer Engineering of Bogazici University since December 2011. She fulfilled the requirements and submitted her application to be promoted to associate professor position. The researcher has designed and taught three new graduate level courses: Bioinformatics, Natural Language Processing, and Information Retrieval. She co-founded the Text Analytics and Bioinformatics (TABI) Research Lab at Bogazici University (http://tabilab.cmpe.boun.edu.tr). Currently, she advises five PhD and five MS students. She has been awarded a 2016 Science Academy Young Scientist Award (BAGEP 2016, Turkey, http://en.bilimakademisi.org/bagep-2016/).

Project web page: http://www.cmpe.boun.edu.tr/~ozgur/projects/biocontext.html