Skip to main content

Integrated Computational Techniques for Non-Target Screening of Environmental Contaminants using High Resolution Mass Spectrometry

Final Report Summary - CONTAMINANTID (Integrated Computational Techniques for Non-Target Screening of Environmental Contaminants using High Resolution Mass Spectrometry)

Summary: Project context and Objectives
Water is essential for life, but many of the millions of chemicals produced end up at some stage and form in the water cycle. The general public is becoming increasingly aware of the presence of organic contaminants in the water cycle, but current strategies for water monitoring include a very limited selection of “priority pollutants”, which is insufficient for public and environmental protection. Recent advances in high resolution LC-MS-based technologies open up the possibility for detecting previously undetectable organic contamination in water samples due to increasing mass accuracy and mass resolution as well as decreasing detection limits. Thus, traditional target screening for a subset of defined substances can be complemented with “suspect screening” using prior knowledge and “non-target screening”, where no prior knowledge is available but also necessary for a full characterisation of the risks due to contamination. The aim of this project was to improve the identification of organic contaminants in water matrices of monitoring programs by (i) developing an automated procedure for the identification of unknowns (non-target screening) using compound database searching and (ii) to establish the use of structure generation as an alternative to compound database searches for those (many) cases where database searching is insufficient. Although many large compound databases containing several million structures are available, many compounds of environmental relevance, such as transformation products of common contaminants are not yet in these databases, necessitating the second strategy.

Description of Work and Results
The outcomes of the project can be broken down into four main areas, in line with the major publications that arose from this work. Firstly, spectral pre-processing methods were investigated to improve the quality of measurement data for identification. A software package, RMassBank, was developed for the automatic annotation and recalibration of mass spectra, including annotation of compound information and measurement data to enable upload to the public mass spectral database MassBank, available at www.massbank.jp and www.massbank.eu/MassBank/. During the project, experiments were also made with regards to the storage of tentative and unknown spectra. The outcomes of this part of the project were published in the Journal of Mass Spectrometry (M.A. Stravs, E.L. Schymanski, H.P. Singer and J. Hollender, 2013, Automatic Recalibration and Processing of Tandem Mass Spectra Using Formula Annotation, 48 (1), 89-99. DOI: 10.1002/jms.3131) and the R package is openly available at the Bioconductor repository http://www.bioconductor.org/packages/devel/bioc/html/RMassBank.html. Furthermore, over 7000 spectra have now been included in MassBank as a result of processing with RMassBank, forming ~17 % of the MassBank spectra. Additionally, tentative and unknown spectra are stored on the European server but are now able to serve as supporting information for publications, to enhance the exchange of spectral information between institutes. The increasing spectral availability improves the potential for identification dramatically.

Secondly, progress was made towards the prioritization of peaks for non-target identification, to improve the chances of successful identification. Prioritization was performed over 10 samples, picking masses for identification by sorting by cumulative intensity i.e. taking the most intense peaks not yet identified as the most relevant. This led to the development of a very successful screening strategy that was simple to apply and effective in identifying many peaks as surfactant compounds, which are ubiquitous and present in relatively high concentrations, but seldom as target compounds. Thus, it is important to identify these “known” compounds before performing a full, time-consuming non-target identification. This investigation also yielded a confirmed identification of a non-target compound. The prioritisation and characterisation efforts were published in the top environmental journal, Environmental Science and Technology, E.L. Schymanski, H.P. Singer, P. Longrée, M. Loos, M. Ruff, M.A. Stravs, C. Ripollés Vidal, J. Hollender, 2014, “Strategies to characterize polar organic contamination in wastewater: Exploring the capability of high resolution mass spectrometry”, 48 (3), 1811-1819, DOI: 10.1021/es4044374.

Thirdly, the quest for improved identification and comparability of identification methods led to the founding of CASMI, the Critical Assessment of Small Molecule Identification. The inaugural CASMI took place in 2012 and was organised by Steffen Neumann and Emma Schymanski. The idea behind CASMI was to start an open contest on the identification of small molecules from mass spectrometry data to facilitate the exchange of ideas between metabolomics, environmental sciences and other fields and allow a systematic comparative evaluation of alternative strategies using a consistent set of data provided by the organisers. The contest details are all available on the CASMI website (http://casmi-contest.org). The 2012 contest was published in an open access special issue of Metabolites (http://www.mdpi.com/journal/metabolites/special_issues/CASMI) and included articles summarising the whole contest (Schymanski & Neumann, 2013, CASMI: And the winner is…, 3 (2), 412-439. DOI: 10.3390/metabo3020412) as well as a submission using structure generation for identification of unknowns (Meringer & Schymanski, 2013, Small Molecule Identification with MOLGEN and Mass Spectrometry, Metabolites, 3 (2), 440-462. DOI: 10.3390/metabo3020440). The 2013 contest was organised by Prof. Takaaki Nisioka and included a submission using compound database query strategies to identify unknowns (see http://casmi-contest.org/2013/results-cat2.shtml) with very pleasing results for queries to large compound databases. A corresponding publication has been submitted.

Finally, progress was made towards the improvement of unknown identification using structure generation, for compounds that are not in any public databases. A book was published by De Gruyter in 2013, entitled Mathematical Chemistry and Chemoinformatics: Structure Generation, Elucidation and Quantitative Structure-Property Relationships, by A. Kerber, R. Laue, M. Meringer, C. Rücker, and E. Schymanski. An investigation into the identification of transformation products of benzotriazoles, one of the highest concentration contaminants in wastewater effluents, was performed using structure generation techniques to identify the transformation products. This investigation was also published in the top environmental journal (Huntscha, S., Hofstetter, T.B. Schymanski, E.L. Spahr, S., Hollender, J. (2014): “Biotransformation of Benzotriazoles: Insights from Transformation Product Identification and Compound-Specific Isotope Analysis”, Environmental Science and Technology, 48 (8), 4435–4443. DOI: 10.1021/es405694z). Finally, the lack of ability to communicate some of the tentative identification results in both the spectral libraries and literature led to the viewpoint article “Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence” (Schymanski et al. 2014, Environmental Science and Technology, DOI: 10.1021/es5002105) and the invitation to participate in a similar article for the metabolomics field (Creek et al. 2014, Metabolite Identification: are you sure? And how do you peers gauge your confidence, Metabolomics, DOI: 10.1007/s11306-014-0656-8).

Expected Final Results and Potential Impact
This is the final reporting period for the project and apart from the results mentioned above, three more publications are in preparation with results from this project, one first author paper and two as co-author. The CASMI contest, started during the project, will also run in 2014 and the interest this contest has generated shows the cross-disciplinary need for improved identification methods in high-throughput analysis. The interest in non-target screening in the environmental community is currently very high, with the NORMAN Association of Environmental Monitoring Laboratories rating this as the highest priority activity for 2014. The NORMAN Collaborative Non-target Screening trial grew out of the CASMI contest idea and a workshop discussing the results of this trial will be held at Eawag in September 2014. Thus, the impact of the research in this project will continue to develop beyond the project deadline. Furthermore, development on the methods will continue in the framework of the EU FP7 project “SOLUTIONS for Present and Future Emerging Pollutants in Land and Water Resources Management” (www.solutions-project.eu).
The spectra associated with these investigations are available online on MassBank, along with the spectral processing software RMassBank. The suspect lists associated with the screening work are also publically available as the supporting information to the paper. The whole CASMI contest, including the data, proceedings and website is also openly accessible.

Website and Contact Details
Contact details and information about the research covered in this project can be found at the department website: http://www.eawag.ch/forschung/uchem/index_EN or at the direct link http://www.eawag.ch/forschung/uchem/schwerpunkte/projektuebersicht/projekt64/index_EN