Periodic Reporting for period 1 - DEEPLEARNRBP (Using Deep Learning to understand RNA Binding Protein binding characteristics)
Reporting period: 2019-07-01 to 2021-06-30
In the past few years, the machine learning field has been revolutionized by the use of Deep Neural Networks, a family of machine learning techniques that use new types of neurons in several layers to learn increasingly complex representations of data. Commonly used types of such networks are Convolutional Neural Networks, and Recurrent Neural Networks.
The aim of this project was the development of such machine learning methods for the modelling of experimental RBP binding data of several RBPs, the interpretation of the machine learning models to understand how RBPs bind, and the dissemination of the trained models via easy to access standalone tools and web-servers.
The action was concluded succesfully with the main objectives of the grant proposal being implemented and published.
We proceeded with the development of the deep neural network models of RBP binding. We have not only produced such models that outperform the state of the art for the RBP-24 and RBP-31 datasets, but have also produced and published ENNGENE – a Graphical User Interface equipped method that allows any researcher to easily train such a model on the dataset of their liking. Paired with our ever increasing dataset, and trained model, collection this will become an invaluable resource to RBP researchers.
The second objective of the project was the interpretation of the machine learning models in order to understand what combination of sequence, secondary structure, and evolutionary conservation patterns they learned. We have, for the first time, implemented the Integrated Gradients technique on multi-branch convolutional neural networks, and interpreted the importance of each nucleotide on all three trained modalities. We are finalizing a method that can extract binding motifs from such trained models of RBP binding.
Finally, we had the objective of dissemination of our methods via standalone programs and web-servers. The stand-alone program part was achieved using ENNGENE, not only with the publication of our trained models, but also with empowering researchers to train their own models using our GUI. We are in the process of publicizing our web-server that includes all our collected experimental datasets, as well as binding site predictions for all our trained models.