Periodic Reporting for period 1 - CONTESSA (COuNt data TimE SerieS Analysis: significance tests and sequencing data application)
Reporting period: 2015-08-01 to 2017-07-31
As intermediate results dissemination I took part to many research meetings and workshops. I also took part to summer schools and organized myself scientific events and had weekly meetings with my supervisors Neil Lawrence and Eleni Vasilaki.
In Sheffield I collaborated with Marta Milo and Guillaume Hautbergue, working on Amyotrophic Lateral Sclerosis (ALS) data. I joined a very important project where a new experimental technology has been tested. I could handle new RNAseq data and develop a custom pipeline from the alignment phase to the differential expression and protein protein interaction network estimation. In particular the new proposed technology is called GRASPS (Genome-wide RNA Analysis of Stalled Protein Synthesis): A novel translatome technology to identify functional consequences of widespread RNA dysregulation in neurodegeneration. This work is in collaboration and has been presented at the Sheffield neuroscience conference and is currently under revision for journal submission.
In the meantime I started my secondment at the University of Manchester where I collaborated within the group of Professor Magnus Rattray. There I could interact with a computational biology team and start a project about finding co-oscillating genes in a given set of RNA-seq data (bulk or Single Cell). This study lead us to develop a method and a software called PyScope: Detecting oscillatory gene networks. This is in collaboration and has been presented at the data science 2017 meeting in Manchester and at the ISMB 2017 meeting in Praga. We propose a full analysis pipeline on the resulting graph to identify communities of signicantly co-oscillating genes.
I also focused on network community extraction methods and their validation. This is extremely important when dealing with real Biological or Social networks. Indeed a way of summarising networks is via the main representative groups of nodes (elements) that are strongly connected, hence via communities. It follows that it is crucial to be able to rely on robust community extraction methods. This led me to develop, in collaboration with Annamaria Carissimo and Italia Defeis, a method for validating community robustness in networks. We show the results obtained with the proposed technique on simulated and real datasets. This work is currently under second round of revision in a top statistical journal an was presented at the Machine Learning conference NIPS 2016, Barcellona.
Discussing ideas with the ML group in Sheffield, I was introduced to the team of professor Ernst Wit, leading a COST action on Networks called COSTNET (COST Action CA15109). I took part to the first Meeting of COSTNET in 2016. There I exchanged some ideas on Network validation with Mirko Signorelli and this lead us to a fruitful collaboration on Networks validation techniques. We developed an inferential procedure for community structure validation in networks. This work is currently under revision for submission to a statistical journal.
I had the opportunity to take part to the launch of the single cell facility at BMS where I was invited for a talk. Within the facility I started a project about how to address Fluidigm C1 doublets problem and the detection of a single cell developmental stage before the sequencing. This work is in collaboration with Max Zwießele, Paul J Gokhale, Marcelo Rivolta and Marta Milo. The Fluidigm C1 is a single-cell analysis system uses a simplified single-cell isolation and cell processing based on Integrated Fluidic Circuits (IFCs). Our approach gives the great advantage of characterising cells before the RNA-seq assay and therefore gives great interpretation power to the following RNA-seq data analysis. Future improvements of this approach are based on optimising prior selection and data features extracted from the IFCs images. This work was presented at the meeting ISMB 2017 in Prague.
Together with professor Neil Lawrence I developed a method to estimate the graph both between the cells and the genes involved in the same dataset. We rely on a previous work by Lawrence and Kalaitzis where a Bigraphical Lasso approach was implemented. When dealing with single cell data we are simultaneously interested in estimating cells interrelations and genes interrelations.The bigraphical lasso is a model for matrix-variate data that preserves their column/row structure and simultaneously learns two graphs, one over rows and one over columns of the matrix samples. This model has the time complexity of 2 GLasso problems of O(n + p), preserves the matrix structure by using a Kronecker sum (KS) structure for the precision matrix and also enhances sparsity of the graph.
Among the main research lines all related to my project I also started a dissemination and communication project with the Research Software Engineer Team in Sheffield, lead by Micheal Croucher. This project was to make the Bioinformatics Awareness Days accessible to a worldwide public. The Bioinformatics Awareness Days are days devoted to Bioinformatics. Together with Tania Allard and Mike Croucher at the DCS, we decided to publicly divulgate this material to all the interested scientific community. The sessions are self contained and a full run should last at most 2 hours. All of the sessions material is now also contained in a website based around the Jupyter notebooks https://github.com/trallard/BAD_days/. This also involved the use of the new Microsoft Azure Notebooks.