The researcher developed Voronota-LT, a new algorithm for rapid computation of Voronoi tessellation-based interatomic contact areas. Voronota-LT supports both additively weighted and radical (Laguerre) tessellations and can compute selected subsets of contacts (e.g. inter-subunit interfaces) without constructing the full tessellation. It is robust, parallelizable, and applicable to any type of molecular structural data. Tests showed it to be from 16 to 105 times faster (depending on the regime) than the state-of-the-art method. Parallelization was benchmarked and proved efficient. The algorithm was extended for molecular simulation workflows, including support for periodic boundary conditions and incremental updates upon atomic coordinate changes.
The researcher developed an automated workflow for collecting and quantifying contact area heterogeneity from sequence-clustered ensembles in the Protein Data Bank (PDB). From these data, area persistence values were derived for each unique contact, defining ground truth for classifying contacts as stable or unstable. Data were divided into training, validation, and test sets. A Voronota-LT-based subarea calculation algorithm was created to divide atom-atom contact areas into layers (by distance from the solvent boundary) and sectors (by atomic directions), generating fine-grained contact descriptors for machine learning. Using the training data, the researcher estimated contact-type probabilities of occurrence and persistence and discovered that these probabilities are not highly correlated. Their combined use could therefore potentially benefit protein structure assessment tasks. The researcher derived heterogeneity-informed statistical pseudo-energy coefficients and used them to compute pseudo-energy values serving as classifier input features. The researcher introduced the Voronoi Contacts Block (VCBlock) descriptor summarizing an inter-residue contact and its neighbors in a permutation-invariant vector form, enabling neural network training on contact-level properties. A VCBlock-based neural network classifier was trained to predict whether a contact area in a protein structure is stable or unstable within an ensemble. Tested on unseen PDB data, it achieved 0.78 accuracy. A standalone software tool, VoroMarmotte, was developed to apply this classifier.
The researcher demonstrated that the VoroMarmotte method for predicting contact stability can be used to assess protein-protein complex predictions by aggregating contact-level outputs into global interface scores. Native and high-quality models consistently showed higher predicted persistent interface areas. Further work defined a contact area persistence-based pseudo-energy score and applied it to the data from the EGFR Protein Design Competition, showing that it can be instrumental in distinguishing binders from non-binders among designed proteins. A computational binder optimization pipeline was also built to propose mutations improving interface stability based on the new pseudo-energy scoring.
Additionally, the researcher developed a contact area-based statistical potential method, VoroChipmunk, that directly utilized observed contact area occurrence and persistence probabilities to score protein-protein interface predictions. The researcher also developed VoroIF-GNN-v2, a graph attention neural network for predicting interface quality on the level of residue-residue contacts. It works on tessellation-derived protein-protein interface graphs annotated with the VoroChipmunk-like descriptors. During CASP16-CAPRI in 2024, the researcher's scoring group "Olechnovic" employed VoroIF-GNN-v2 and demonstrated top performance in the CAPRI challenge, where it was ranked first in the CAPRI scoring category.