The project commenced effectively on November 1th, 2018. Throughout the project, we obtained proof-of-concept data and established novel, state-of-the-art OMICS workflows and methodologies.
Initially, we published an optimized proteogenomic workflow utilizing bacterial ribo-seq and proteomic data, employing Salmonella as a test case. To comprehensively detect novel, previously unannotated genomic elements, we conducted extensive analyses, including total translatome (ribo-seq), translation initiation (retapamulin-assisted ribo-seq), and shotgun proteomics. These analyses, based on reported transcriptomic efforts, provided a cross-section of infection stages and complementary genomic expression patterns. To delineate translated ORFs using ribo-seq data, we developed and applied a new gene discovery pipeline. Additionally, we interrogated matching proteomics datasets for protein evidence of the novel genes, strengthening the confidence in their true nature.
A breakthrough concept emerging from these riboproteogenomics data was the omnipresence of protein variants that originate from the same gene but differ at their N-terminus due to alternative translation initiation (i.e. N-terminal proteoforms) and the wealth of unannotated small proteins expressed. These insights further shifted the paradigm in how we understand the relationship between a gene and resulting protein. The workflow also demonstrated great promise in experimentally-based bacterial genome annotation.
Viewing the discovery of a large number of new and underexplored bacterial genomic elements encoding sORF-encoded polypeptides (SEPs), N-terminal proteoforms and virulence factors (effector proteins), we also set out to out to validate and functionally characterize a selection of these. Besides (real-time) expression and localization analysis, we developed a novel Ribo-seq method which specifically enabled the study of short transmembrane domain containing SEPs, a category frequently found amongst (uncharacterized) SEPs. Additionally, by applying an optimized proximity labelling strategy implemented in plant and for the first time in bacteria, protein/protein interactions were also studied at the pathogen/plant host (the interaction between Agrobacterium pathogen effectors and plant host) as well as bacterial predator/bacterial prey interaction interface (preprint; DOI 10.1101/2023.11.29.569176)
Further, we evaluated and compared protein extraction methodologies for their efficacy in the extraction and concomitantly proteomics detection of SEPs, and optimized an experimental protocol that specifically enriches for SEPs. Building upon this knowledge, we investigated whether computational analysis and state-of-the-art riboproteogenomic approaches can shed light on the challenges faced in the identification of SEPs. Given the versatile biological functions SEPs have been shown to exert, this work provides an accessible protocol and analysis pipeline for the proteomics exploration of this fascinating class of small proteins.
Finally, dual-proteome profiling was empowered by the development of an optimized hybrid library generation workflow for data-independent acquisition (DIA) mass spectrometry relying on the use of data-dependent and in silico predicted spectral libraries. This strategy significantly improved peptide detection, particularly relevant in profiling host-pathogen interactions.