Periodic Reporting for period 4 - CRYOMATH (Cryo-electron microscopy: mathematical foundations and algorithms)
Okres sprawozdawczy: 2021-09-01 do 2023-02-28
As extracting information from cryo-EM experiments completely relies on mathematical algorithms, the method’s deep mathematical challenges that have emerged must be solved for cryo-EM to realize its tremendous potential. These challenges focus on integrating information from huge sets of extremely noisy images (up to millions of images per data set) reliability and efficiently.
This project addresses the three key open challenges of cryo-EM data processing – a) deriving reliable and robust reconstruction algorithms from cryo-EM data, b) developing tools to process heterogeneous cryo-EM data sets, and c) devising validation and quality measures for structures determined from cryo-EM data. The fourth goal of the project, which ties all goals together and promotes the broad interdisciplinary impact of the project, is to merge all our algorithms into a software platform for state-of-the-art processing of cryo-EM data.
Thus far in the project, we derived an improved algorithm for reconstructing molecules without symmetry, algorithms for molecules with cyclic symmetry (published in Inverse Problems), an algorithm for molecules with D2 (dihedral) symmetry (published in SIAM Journal on Imaging Sciences), and an algorithm for molecules with tetrahedral (T) and octahedral (O) symmetry (under second round of review in SIAM Journal on Imaging Sciences). We are at the final stages of developing an algorithm for molecule with Dn symmetry. We failed to solve the case of icosahedral (I) symmetry.
We also made progress with objective (b) “developing tools to process heterogeneous cryo-EM data sets”. A heterogeneous data set is one that contains images of different molecules, or of a single molecule at “different states” (known as conformations). We developed an algorithm to separate a heterogeneous data set into homogeneous subsets, by casting this separation as the problem of partitioning the nodes of a graph into two “consistent” groups. We proved accuracy and stability bounds on our algorithm, and demonstrated it on simulated as well as experimental data sets. A paper describing this work has been published in Journal of Mathematical Analysis and Applications. We also analyzed mathematical models for heterogeneity, resulting in two papers, one in Information and Inference and one in Statistics and computing.
For objective (c), we developed a particle picking algorithm, which is fundamentally different from other approaches to the problem. Existing methods rely either on manual labeling of a rather large number of particles, or on templates provided by the user. Thus, the particle picking step in the current cryo-EM data processing pipeline is labor intensive, error-prone, and susceptible to model bias (towards the given templates). In our research, we have shown that it is possible to automatically estimate the optimal templates for particle detection given only the input micrographs. Based on this idea we developed a particle picking algorithm which does not suffer from the abovementioned shortcomings, and in particular, does not require manual labeling, nor parameter tuning. This work has been published in Journal of Structural Biology, and the accompanying software is available as open source. We then extended this work to handle contaminations in micrographs, a work which was also published in Journal of structural Biology.
Much progress has been achieved also with objective (d) “developing a publicly available software toolbox implementing the proposed algorithms”. With a programmer hired using the project’s funds, we ported all our algorithms to python, creating a standalone free software package for structural biologists and developers.
During the project, I have also established a collaboration with Prof. Natan Nelson from the Faculty of Life Sciences at Tel Aviv University. In this collaboration we use tools developed by my group to analyze cryo-EM data acquired by his lab. This joint work resulted in a major scientific advancement, with two papers published in Nature Plants and one paper in Biochimica et Biophysica Acta (BBA).
In terms of disseminating the outcomes of the research, from the beginning of the project and until today, 18 papers have been published in leading journals, and all resulting algorithms have been made publicly available.