Skip to main content

A Bayesian Framework for Cellular Structural Biology

Final Report Summary - BAYCELLS (A Bayesian Framework for Cellular Structural Biology)

The functioning of a single cell or organism is governed by the laws of chemistry and physics. In fine detail, the bridge from biology to chemistry and physics is provided by structural biology: to understand the functioning of a cell, it is necessary to know the atomic structure of macromolecular assemblies, which may contain hundreds of components. However, knowing the static structure is not enough; it is necessary to investigate and understand the dynamic interplay of these hundreds of components. This understanding requires bridging from the atomic scale resolution of structural biology to the much longer length scales that are the realm of cellular biology. It is a specificity of the molecular systems that these length scales are not always well separated: a small perturbation (binding of a small number of small molecules) can have a very large effect. A detailed understanding of the molecular systems of life forms the basis of innovative therapeutic strategies, by identifying new drug targets and new ways of interfering with pathogenic processes.
To investigate and understand the dynamic multi-component molecular systems and their spatio-temporal complexity is the biggest challenge for structural biology today. Traditionally, the field has been dominated by the study of static structures of isolated molecules with a single technique, most prominently X-ray crystallography. To understand the formation and evolution of transient complexes within a living cell, this type of high resolution structure determination by a single technique will likely stay the exception, and we need to acquire data with multiple biophysical techniques at multiple scales, in an integrative structural biology approach. These data must be integrated into one consistent, dynamic, picture that relies both on emerging experimental technologies and on molecular modelling and numerical simulations, in a truly integrative approach.
Bayesian approaches have decisive advantages and are increasingly being used in the wider context of structural biology. We pioneered an approach, Inferential Structure Determination, which we had first developed for NMR data only, and which goes beyond this and treats structure determination itself as a Bayesian data analysis problem. Bayesian approaches can determine all unknown “nuisance” parameters during the structure calculation, and to determine rigorous error estimates for the coordinates and all unknown parameters. This is of particular importance in the case of integrative structural biology, where data come from different sources with their approximate forward models to relate structures to data, their nuisance parameters, and various levels of molecular description. The disadvantage of Bayesian methods is their increased computational complexity, compared to standard structure calculation approaches.
Within the project, we developed a Bayesian framework for integrative structural biology. This includes the development of forward models for all relevant data used in integrative approaches. Notably, we developed forward models for data from chemical cross-linking and mass spectrometry, including a fast method that to verify if a cross link transverses a protein in a given model; we developed an an Bayesian method to merge small angle scattering (SAS) curves from multiple experiments and a forward model for SAS data from X-ray or neutron scattering; a new fast and scalable representation of molecules and electron microscopy data; and we also developed a Bayesian approach to interpret chromosome capture experiments, to characterise genome organisation. We also developed approaches to use evolutionary information in isolation or in combination with other experimental data (NMR, cryo-electron microscopy). Approaches take into account the ensemble nature of experimental data, even for sparse data, either in an implicit way (for SAS data) or in an explicit way during the modelling (for chemical cross-linking or NMR data).
The Bayesian analysis is based on sampling very large number of conformations and nuisance parameters. We developed new algorithms to sample molecular conformations compatible with experimental data which are based on recent advances in theoretical statistical mechanics, and a new coarse grained description, and to analyse very large numbers of conformations to classify and cluster them. This method can be used to analyse "big data" in general. All methods were used on a number of challenging and biology interesting systems, with data obtained either from the literature or from collaborating groups: pili from a type II secretion system; Pol II and Pol III RNA polymerases; complement factor; chromosome organisation.