Skip to main content

Accounting for correlated errors with maximum likelihood in crystallography and cryo-EM

Periodic Reporting for period 1 - LikelyStructures (Accounting for correlated errors with maximum likelihood in crystallography and cryo-EM)

Reporting period: 2018-06-01 to 2020-05-31

Macromolecular Crystallography is a gold standard experimental technique to determine the structures of proteins and other macromolecules. As only partial information is directly measured in the experiment, it is usually necessary to use computational methods to derive the missing information. Molecular Replacement (MR) bootstraps the final structure from the initial partial information given by other molecules that share a certain degree of similarity with the unknown structure. In particular, the Phaser MR software uses a statistical approach based on maximum likelihood that is able to exploit even low signal from such remote models, and it is today the most-used software worldwide to accomplish such a task. Over the years the mathematical foundations of the software have been strengthened and the algorithms have become more sophisticated, but limitations remain because of uncertainties in the quality of models and in assumptions about the data themselves. In this project, we tried to address some of those uncertainties by building an automatic, easy to use, graphical pipeline that will prepare both the data and the models prior to and specifically for the molecular replacement task. The findings in this procedure can be exploited and applied also to emergent techniques such as Cryo-EM and EM-Tomography. Our main goal is to build a unified framework that can tackle structural biology problems from different points of view and exploiting prior information combined with machine learning approaches.
The results of the project are all integrated in the software Voyager that will soon be released, initially in the crystallographic community, and then extended to the Cryo-EM and EM-Tomography community. Voyager combines Maximum Likelihood approaches to analyse and correct experimental data and then to perform molecular replacement. The majority of macromolecular structures are solved by this technique, though its application might be difficult for non-expert users especially in the presence of complex molecular structure with only poor homology models available. Voyager automates and rationalises the preparation of both experimental and model data improving the chances of success, providing clear and reproducible results that will push the boundaries of current molecular replacement, and provide better optimised initial phases for model building and refinement.
Voyager will contribute significantly to the process of structure determination in structural biology, simplifying operations and pushing the boundaries toward even larger complexes with large domain movement, coping better with low-resolution data that present pathologies. The software will impact positively not only on academic research but also on the pharmaceutical industries where the quality of solved structures and the precise understanding of their quaternary conformations is fundamental to plan and deliver high-quality drugs.