Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Using Deep Learning to understand RNA Binding Protein binding characteristics

Periodic Reporting for period 1 - DEEPLEARNRBP (Using Deep Learning to understand RNA Binding Protein binding characteristics)

Reporting period: 2019-07-01 to 2021-06-30

RNA Binding Proteins (RBPs) are important biological molecules that bind RNAs and regulate their function. Experimental identification of RBP Binding sites is expensive and limited to probing specific tissues or cell lines. For this reason, bioinformatic approaches have been used to model the binding affinities of RBPs to their target RNAs, machine learning can be used to learn from experimental data, what makes RBPs select one target over another.
In the past few years, the machine learning field has been revolutionized by the use of Deep Neural Networks, a family of machine learning techniques that use new types of neurons in several layers to learn increasingly complex representations of data. Commonly used types of such networks are Convolutional Neural Networks, and Recurrent Neural Networks.
The aim of this project was the development of such machine learning methods for the modelling of experimental RBP binding data of several RBPs, the interpretation of the machine learning models to understand how RBPs bind, and the dissemination of the trained models via easy to access standalone tools and web-servers.
The action was concluded succesfully with the main objectives of the grant proposal being implemented and published.
The first main objective of the project was the production of RBP Binding Sites datasets that can be used for the training of Machine Learning models. We have used the widely used RBP-24 and RBP-31 datasets, as well as produced a large dataset from ENCODE CLIP-Seq data. In total we have produced datasets for over 100 RBPs, including millions of RBP binding sites. These datasets are disseminated freely and available to the community for training and testing their models.
We proceeded with the development of the deep neural network models of RBP binding. We have not only produced such models that outperform the state of the art for the RBP-24 and RBP-31 datasets, but have also produced and published ENNGENE – a Graphical User Interface equipped method that allows any researcher to easily train such a model on the dataset of their liking. Paired with our ever increasing dataset, and trained model, collection this will become an invaluable resource to RBP researchers.
The second objective of the project was the interpretation of the machine learning models in order to understand what combination of sequence, secondary structure, and evolutionary conservation patterns they learned. We have, for the first time, implemented the Integrated Gradients technique on multi-branch convolutional neural networks, and interpreted the importance of each nucleotide on all three trained modalities. We are finalizing a method that can extract binding motifs from such trained models of RBP binding.
Finally, we had the objective of dissemination of our methods via standalone programs and web-servers. The stand-alone program part was achieved using ENNGENE, not only with the publication of our trained models, but also with empowering researchers to train their own models using our GUI. We are in the process of publicizing our web-server that includes all our collected experimental datasets, as well as binding site predictions for all our trained models.
In this project we explored the use of Deep Neural Networks for the modelling of RNA Binding Protein binding sites. We progressed beyond the state of the art in the creation of new training datasets, the exploration of sequence bias in datasets, the development of new architectures and their training to accuracy above state of the art. Finally, we have produced a highly accessible tool based on a Graphical User Interface which allows anyone to train high quality models even if they have no programming or machine learning expertise.
ALKBH5
My booklet 0 0