Profound understanding of the relationship between molecular structure and biological activity is an essential prerequisite for rational drug design. Due to the inherent non-linearity of these relationships, ordinary regression is often inadequate, and black-box machine learning approaches offer limited, if any, interpretability. As a solution, we propose to use symbolic regression, in conjunction with a unique approach to feature selection, to establish quantitative structure-activity relationships (QSAR) that are succinct, non-linear, analytical, and interpretable. Symbolic regression is a stochastic optimization technique based on principles of evolution that searches the space of analytical expressions for equations that describe the investigated data. In other words, symbolic regression does not only fit the coefficients of an equation, but also the form of the equation itself. Our particular concept has not been used in QSAR before. We will combine theoretical investigations of the method with practical applications. Large combinatorial libraries will be analyzed to obtain validated QSAR models that are immediately and intuitively interpretable. Taking trypsin inhibition as an example, we will design, synthesize, and test new inhibitors suggested by our models. Our interdisciplinary project will contribute to European excellency in basic and applied pharmaceutical and medicinal chemistry, in line with health as a top priority of the seventh framework programme.
Fields of science
Call for proposal
See other projects for this call