For optimized marker detection, we have developed a novel biomarker detector for longitudinal proteomics data, enabling robust and reproducible detection of the markers. To characterize longitudinal protein features and their dynamics, we have tested state-of-the-art methods for longitudinal omics data as well as developed novel approaches that take into account the interplay between multiple proteins. To ensure high-quality quantitative data for modelling, we have performed a comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification, imputation, and normalization. To assess the methods in well-defined samples, surrogate longitudinal data were generated using mass spectrometry-based shotgun proteomics.
To develop innovative strategies for individualized disease risk prediction dynamically, we have introduced new statistical and machine learning techniques for longitudinal data. These include new methods for binary stratification of the individuals over time as well as time-to-event prediction. Additionally, we have introduced a robust feature selection method that allows significantly reducing the number of proteins needed for the prediction without reducing the prediction accuracy. The methods have been carefully validated computationally in multiple real and simulated datasets. Further experimental validations have been performed to support selected key findings.
Finally, the developed computational methods have been applied to identify novel candidate markers and models for predicting early type 1 diabetes and its progression. Early detection of the disease already before clinical symptoms is crucial for developing future therapeutic and preventive strategies. In addition to proteome-level data, also other molecular omics layers have been considered.