CORDIS - Forschungsergebnisse der EU
CORDIS

Big Data of Small Molecules

Periodic Reporting for period 1 - BDoSM (Big Data of Small Molecules)

Berichtszeitraum: 2015-05-01 bis 2015-10-31

HighChem is developing mass spectral databases and software systems for compound identification using mass spectrometry, and providing informatics’ solutions for a wide range of life science industries. As compound identification remains one of the largest technical challenges in chemistry faced by industrial and academic research, HighChem has formulated an innovative technological concept for the identification and classification of unknown compounds, integrating software tools and databases not available on the market. The main objectives of the feasibility study were defined as follows:

1) Analysis of market segmentation
2) Freedom-to-operate analysis
3) Existing competitive solutions
4) Realisation assessment analysis

Based on the performed feasibility study, we concluded that the concept of Big Data Platform of Small Molecules (BDoSM) is fully realisable technically, has no overlapping focus of interest with the registered intellectual property of others, is sufficiently distinct from competitive products on the market, and will directly fulfil the targeted customers’ needs.
The feasibility study started in May and was finished by October 2015. Among the main activities performed within the scope of the study, aiming to define clearly the required functionalities of the proposed Big Data Platform, were an analysis of the market segmentation in the biomedical, pharmaceutical, biotechnology and life science industries, and the identification of the most critical problems associated with compound identification using mass spectrometry. Based on the functionality analysis, an extensive freedom-to-operate study, including patent search and IPR protection strategy, was conducted in order to ensure unrestricted development of the platform. The most relevant competitive solutions present on the market were identified and screened for functional similarities. Finally, the technological concept was clearly formulated and a small-scale prototype of the platform was launched to demonstrate the expected functionalities and to attract the attention of the relevant market segment.
The market analysis has revealed that, in the absence of a comparable commercial solution of our intended perception and sophistication level, there is a substantial need for a product that integrates a “big data” platform with modern front-end applications directly responding to the needs of the biomedical, pharmaceutical, biotechnology and life science industries, which constantly detects the numerous unknown compounds, and searches for new biomarkers and disease specific agents. The features and functions of a compound identification platform requested by customers were defined as follows:

• Comprehensiveness of the reference spectral databases
• Intelligence of the employed search methods and algorithms
• Simplicity of operation
• Flexibility and scalability
• Data sharing potential

To demonstrate the viability of our concept for compound identification and the capability to manage big data on the cloud, we have built a scaled-down web-based prototype platform (mzCloud) which has been tested in a ‘real world’ environment by end users (www.mzcloud.org). More than 5,000 spectral trees consisting of in excess of 275,000 high-resolution mass spectra were curated, processed and uploaded onto a prototype cloud platform accessed by more than 55,000 visitors over a 12 month period. A prototype of API (Application Programming Interface) was successfully integrated into the alpha version of our commercial desktop software, HighChem Mass Frontier™, and this is now being implemented into the Compound Discoverer™ software developed by Thermo Fisher Scientific. This prototype has been able to identify reliably significantly more endogenous metabolites in LC/MSn data than by using traditional techniques.

Socio - public impact:
Our objective is to create a platform that will not only considerably simplify compound identification and allow efficient management of a large volume of experimental data, but will also open up a whole new realm of revenue-generating opportunities for customers engaged in biomarker, drug, natural product and disease target discoveries. Novel molecular biomarkers have the potential not only to enhance the life of patients, but also to lower overall healthcare costs through early detection and the concept of personalised medicine. Compound identification is also crucial for decreasing the high attrition rate and enhancing productivity within the various stages of drug discovery and the development process. The identification of leachables and extractables helps to ensure that the medicines we use every day are safe and free of potentially harmful contaminants. As natural products remain rich sources of novel drugs and are often used as the starting point for drug discovery, compound identification is a key element in pharmacological studies. Metabolomics is emerging as a promising disease target discovery method, also relying on efficient and confident small molecule identification.

Economic impact:

The economic benefit for a prospective customer translates into a reduction of the fixed costs associated with the acquisition and operating costs of the necessary hardware and software infrastructure for mass spectrometry analysis, as well as saving on expensive human resources. Every analysis in mass spectrometry consists of two main steps - data acquisition and interpretation. Using the Big Data Platform, customers essentially outsource the interpretation stage by uploading the raw experimental data, then selecting the desired method for identification, and finally reviewing the result from the Big Data Platform applications. Moreover, the customer finds all the necessary structural and spectral databases in one place, together with a wide selection of software tools for their management, processing and analysis and visualisation - all in one integrated package.
smei-phase-1-big-data-platform.png