Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Tessellation-based analysis of dynamic protein structures and their complexes - MoleculAR MOTions meet TEssellations (MARMOTTE).

Periodic Reporting for period 1 - MARMOTTE (Tessellation-based analysis of dynamic protein structures and their complexes - MoleculAR MOTions meet TEssellations (MARMOTTE).)

Reporting period: 2023-09-01 to 2025-08-31

Structural biology is being revolutionized by deep learning-based methods that produce high-quality protein structure models. However, current state-of-the-art prediction methods have a key limitation: they generate only static conformational models and do not predict structural heterogeneity. Different regions of a protein can exhibit different propensities to change conformation, for various reasons. Some may be partially disordered or participate in structurally ambiguous (fuzzy) interactions essential for biological function. Another limitation is that accurate prediction of protein-protein complexes typically requires large multiple sequence alignments across homologous chains, which are often unavailable for complexes with unclear evolutionary relationships, such as hetero-oligomeric, transient, or ad hoc assemblies. Thus, reliably modeling protein complexes remains challenging. Overall, while recent breakthroughs mark a transformative advance, they do not provide insight into protein dynamics or interactions, which are crucial for understanding biological function. Protein functions arise through intermolecular interactions that almost always involve conformational changes of the interacting partners.

The MARMOTTE project aimed to address these challenges by improving computational analysis of protein structures and their complexes through the exploration and prediction of structural heterogeneity of interatomic contacts. The project focused on a Voronoi tessellation-based approach to interaction analysis, which has been shown to be more descriptive than traditional pairwise distance-based methods. Specifically, the project had three scientific objectives: to develop methods that (1) efficiently compute tessellation-derived contact areas in dynamic structures represented as sets of conformational snapshots, (2) predict how these contact areas change upon motion, and (3) use the predicted statistical properties of contact areas to estimate protein-protein binding energy scores suitable for ranking and selecting complex models.
The researcher developed Voronota-LT, a new algorithm for rapid computation of Voronoi tessellation-based interatomic contact areas. Voronota-LT supports both additively weighted and radical (Laguerre) tessellations and can compute selected subsets of contacts (e.g. inter-subunit interfaces) without constructing the full tessellation. It is robust, parallelizable, and applicable to any type of molecular structural data. Tests showed it to be from 16 to 105 times faster (depending on the regime) than the state-of-the-art method. Parallelization was benchmarked and proved efficient. The algorithm was extended for molecular simulation workflows, including support for periodic boundary conditions and incremental updates upon atomic coordinate changes.

The researcher developed an automated workflow for collecting and quantifying contact area heterogeneity from sequence-clustered ensembles in the Protein Data Bank (PDB). From these data, area persistence values were derived for each unique contact, defining ground truth for classifying contacts as stable or unstable. Data were divided into training, validation, and test sets. A Voronota-LT-based subarea calculation algorithm was created to divide atom-atom contact areas into layers (by distance from the solvent boundary) and sectors (by atomic directions), generating fine-grained contact descriptors for machine learning. Using the training data, the researcher estimated contact-type probabilities of occurrence and persistence and discovered that these probabilities are not highly correlated. Their combined use could therefore potentially benefit protein structure assessment tasks. The researcher derived heterogeneity-informed statistical pseudo-energy coefficients and used them to compute pseudo-energy values serving as classifier input features. The researcher introduced the Voronoi Contacts Block (VCBlock) descriptor summarizing an inter-residue contact and its neighbors in a permutation-invariant vector form, enabling neural network training on contact-level properties. A VCBlock-based neural network classifier was trained to predict whether a contact area in a protein structure is stable or unstable within an ensemble. Tested on unseen PDB data, it achieved 0.78 accuracy. A standalone software tool, VoroMarmotte, was developed to apply this classifier.

The researcher demonstrated that the VoroMarmotte method for predicting contact stability can be used to assess protein-protein complex predictions by aggregating contact-level outputs into global interface scores. Native and high-quality models consistently showed higher predicted persistent interface areas. Further work defined a contact area persistence-based pseudo-energy score and applied it to the data from the EGFR Protein Design Competition, showing that it can be instrumental in distinguishing binders from non-binders among designed proteins. A computational binder optimization pipeline was also built to propose mutations improving interface stability based on the new pseudo-energy scoring.

Additionally, the researcher developed a contact area-based statistical potential method, VoroChipmunk, that directly utilized observed contact area occurrence and persistence probabilities to score protein-protein interface predictions. The researcher also developed VoroIF-GNN-v2, a graph attention neural network for predicting interface quality on the level of residue-residue contacts. It works on tessellation-derived protein-protein interface graphs annotated with the VoroChipmunk-like descriptors. During CASP16-CAPRI in 2024, the researcher's scoring group "Olechnovic" employed VoroIF-GNN-v2 and demonstrated top performance in the CAPRI challenge, where it was ranked first in the CAPRI scoring category.
The main results of the MARMOTTE project are new computational methods for structural bioinformatics. The researcher developed: (1) a highly efficient method, described in a publication, for calculating Voronoi tessellation-based contact areas in macromolecular structural models; (2) a methodology to define and quantify tessellation-based contact area persistence (stability) across multiple conformational ensembles of protein structures; (3) a method of subdividing tessellation-based contact areas for enhanced machine learning feature resolution; (4) a machine learning methodology to predict protein contact area stability from static input conformations and to apply it in assessing protein complex models.

The researcher implemented all the developed methods as open-source software and made them openly and freely available under a permissive license. The core software, Voronota-LT, was made accessible in multiple ways: as libraries in different programming languages, a command-line application, and a web application. Also, thanks to established collaborations, the researcher implemented a Voronota-LT plugin that was included in the Faunus framework for Monte Carlo molecular simulations.

The researcher initiated steps to increase the uptake and success of the developed methods: exploring use cases in the field of protein design, preparing method-related publications, integrating the developed software into established protein structure analysis and modeling frameworks.
Main results of the MARMOTTE project
My booklet 0 0