Skip to main content
Vai all'homepage della Commissione europea (si apre in una nuova finestra)
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

DeepNOE: Leveraging deep learning for protein structure solving at ultra-high resolution on the basis of NMR measurements with exact nuclear Overhauser enhancement

Periodic Reporting for period 1 - DeepNOE (DeepNOE: Leveraging deep learning for protein structure solving at ultra-high resolution on the basis of NMR measurements with exact nuclear Overhauser enhancement)

Periodo di rendicontazione: 2020-03-01 al 2022-02-28

Development of new techniques that facilitate protein structure studies is one of the major undertakings in molecular biology. All key techniques, namely X-ray crystallography (X-ray), Electron Microscopy (EM) and Nuclear Magnetic Resonance (NMR) spectroscopy, contributed extensively to our fundamental understanding of biological processes, which govern the organization of living organisms at the molecular level. Although the above methods have been used successfully in research for decades, there are still major open problems, which offer an opportunity for their further improvement.

In this research project, we placed full emphasis on NMR spectroscopy. This technique has certain characteristics, which makes it indispensable in molecular biology research. It allows protein structures to be solved in nearly physiological conditions, and provides insight into dynamics and interactions at the atomic level. Additionally, NMR spectroscopy features high-precision interatomic distance measurements (up to 0.1 Å), making it possible to reveal multiple simultaneously populated conformational states of a protein.

One of the main limitations of NMR spectroscopy is the tedious data analysis process. It takes weeks or months of a trained expert’s work to transform the set of measured spectra into a protein structure model. This bottleneck not only reduces the throughput of the experimental work, but also makes certain studies prohibitively expensive. Automation of this process is an open problem, which has been formulated in the field over 30 years ago. Its solution could emerge as a powerful tool for the elucidation of protein structure and dynamics, opening new avenues in structural biology and structure-based drug discovery.

The primary objective of the DeepNOE project was to address this long-lasting challenge, by combining deep learning with methods implemented in the existing software package CYANA. Over the course of the project, we managed to propose the first comprehensive solution to this problem. Our new method analyses NMR spectra strictly without human intervention, making it possible to obtain results within hours after the measurement has been finished.
The availability of a large-scale standardized dataset, which is representative for a problem of interest, constitutes a precondition for supervised machine learning projects. To date, this requirement hasn't been satisfied in the context of protein structure determination with NMR spectroscopy. Therefore, the elimination of this obstacle has been proposed as the first major endeavor within this Marie Skłodowska-Curie action. We have managed to establish an annotated dataset of NMR measurements, which is composed of 1329 spectra that allow 100 protein structures to be reproduced out of the original data.

In the second part of the project, we focused on automated visual analysis of the spectra. At first, we used over 600 000 manually annotated cross-peak examples to establish a deep residual neural network (ResNet) that automatically detects true signals in the recorded spectrum, distinguishing them from impurities and artifacts. Subsequently, we implemented a generator of synthetic NMR spectra fragments, which was used to train a second instance of the ResNet model, addressing the problem of deconvolution of highly-overlapping signals. Finally, chemical shifts deposited in the BMRB database were used to establish a kernel density estimator of cross-peaks positions, making it possible to unfold recorded signals. All three models, used sequentially, extract automatically signal frequencies from NMR spectra.

In the third part of the project, we integrated our approach to visual spectra analysis with methods implemented in CYANA. One of the key steps in this process was the development of a graph neural network, which captures dependencies between chemical shift values and supports interpretation of ambiguous signals. As a result, we obtained an end-to-end approach (ARTINA) that fulfills the main aim of the DeepNOE project by automating protein structure determination with NMR spectroscopy. Our method runs strictly without any human intervention, taking as input only the protein sequence and a set of NMR spectra. The method returns the intermediate spectra annotations, making it possible for a researcher to manually verify the automatically determined structure.

In the quantitative evaluation, ARTINA automatically determined 100 protein structures, which were compared with corresponding PDB depositions, yielding high agreement with 1.44 Å median backbone RMSD to reference. In this experiment, ARTINA demonstrated its ability to correctly assign 90.39% of the chemical shifts, as compared with BMRB depositions. The method handled more accurately the protein backbone (96.03% mean accuracy) than side-chains (86.50%) chemical shifts, which is mainly caused by difficulties in aromatic ring assignments (76.87%).

ARTINA is publicly available online as a service (SaaS) within the NMRtist system (https://nmrtist.org(si apre in una nuova finestra)). The method has been presented at the EUROMAR conference (2021) and an Emerging MR Webinar. During practical sessions of the “Biomolecular NMR: Advanced Tools” workshop at Gothenburg University and a graduate course at Goethe University in Frankfurt, over 50 participants tested ARTINA in practice by solving automatically protein structures and assigning chemical shifts.
Over the course of the project, we managed to propose the first comprehensive solution to a long-lasting open problem, which exists in computational NMR spectroscopy since the early 90s. Our results go beyond state-of-the-art on both scientific and technical ground. We believe that the most impactful results of our work are: (a) the deep learning-based approach to fully automated proteins structure determination, (b) the large-scale NMR benchmark dataset, and (c) practical tools for NMR practitioners that are ready to use in academic and commercial research laboratories.

From the scientific perspective, the results obtained in DeepNOE project open interesting opportunities for further research. As protein structure determination (at least for well-behaved targets) no longer involves human work, we could aim at the development of autonomous measurement devices. The only role of the researcher would be to insert the protein sample into such an instrument, and the machine would perform the measurements and solve the structure automatically. It would lead to a paradigm shift in structural biology research by NMR. At the moment, the researcher carries out the measurements in the hope of obtaining the answer to a biological question. With the proposed autonomous device, the researcher defines the biological question (protein structure model) and the instrument answers it directly by combining autonomous measurements with data analysis.

Another interesting direction is the extension of ARTINA to new macromolecular systems, as they follow the same logic of the data analysis as proteins. In principle, the current system could be adapted to solve protein-ligand complexes, which are of particular interest in pharmaceutical industry. Similarly, it is conceivable that after moderate modifications, ARTINA could be used to solve structures of RNA and RNA-ligand complexes.
abstract-v3.jpg
Il mio fascicolo 0 0