Periodic Reporting for period 2 - MuG (Multi-Scale Complex Genomics)
Reporting period: 2017-05-01 to 2018-10-31
The study of how cells package their DNA is called ‘3D/4D genomics’: 3D because it is about the three dimensional shape of the DNA, 4D because we have to add the dimension of time - the way the DNA is folded in a cell can change from day to day. One of the big problems in this area is that it is a ‘multiscale’ type of science. To understand it properly yone needs to look both at the behaviour of whole cells, and of individual molecules in the cell (which are a million times smaller).
This is a young science, expanding rapidly. All the time new groups discover new ways to study it at the large, medium, or small scale. Enormous amounts of very complicated data are being generated and now we urgently need a good way to help scientists bring this all together and allow them to make sense of it by seeing how it fits together into the ‘big picture’. Computer simulations are a powerful tool to help us do this. They allow us to turn complicated experimental data into pictures of how DNA is packed and folded, from the molecular scale all the way up to the cellular scale. But until now the ways in which these visualisations are done has not been standardized, or adapted to all the new sorts of data that are becoming available, or made simple for non-specialists to use.
The MuG Virtual Research Environment is now a reality available to the scientific community: a sort of specialised web browser - where scientists can:
*Upload, share, find and check all types of 3D/4D genomics data generated by experimentalists anywhere in the world
*Perform data analysis and integration tasks, some of which need a lot of computer power
*Perform computer simulations that turn this data into visualisations of how the DNA is packed into a cell, and how this can change
*See how all this relates to how cells change their behaviour, and so affects growth, development, disease and ageing
This has only been possible through a tight collaboration of a unique multidisciplinary team of experts in experimental 3D/4D genomics, in molecular studies of DNA, and in computer and data science.
The MuG website currently has an average traffic of 170 new users/month and the MuG VRE has over 150 registered users actively using VRE tools. A functional version of the multi-resolution genome-browser TADkit is installed and running since 2017 (WP3). In 2018 efforts focused on enhancing the browser to fulfill the requirements of the MuG pilot projects, as lead users representing the needs of the 3D/4D genomics community.
To facilitate the sustainability of the MuG VRE and its capacity to keep up to date with the community demands, MuG has developed a tool wrapping API that facilitates the integration of tools by third party tool developers. The pilot projects (WP7) have contributed to define the VRE tool offer and have successfully tested the tools integrated in the VRE. As leaders in the field, pilot projects have had a key role in end-user engagement, providing real use cases for MuG training activities and acting as VRE lead users VRE. Datasets generated by the pilot projects will be made available for re-use to the community following publication.
MuG is already generating impact on the scientific community: 31 published papers including a position paper in Nature Genetics on the 3D/4D data and processing standards, co-authored with worldwide leading research groups. The MuG team was actively involved in the organization of scientific gatherings. Training has also been a key tool for MuG to engage with its end-users and is identified as a key service in sustainability.
MuG brings advanced and powerful computing closer to this new community, becoming the natural interface between experimental biologists doing research with chromatin (DNA in a cell), physicists developing methods to simulate it, and computer scientists aiming to improve analysis and simulation tools, and how data is stored, integrated and shared. MuG is positioning itself as a reference to structure the community, define standards in software and data and provide a sustainable, computationally powerful infrastructure that will reduce the gap between experimental scientists and the high performance computing (HPC) world.
Through MuG, biologists, methods developers and computational scientists join forces to find solutions for a field that is expected to make a huge impact on the bio-world, from basic cell biology to personalized medicine. As a EU-funded infrastructure, the main objectives set for the MuG VRE in the mid-term are to speed up research in the emerging field of 3D/4D genomics, contributing to making Europe a preferred place for scientists to conduct research and innovation. To this effect, any benefits are set to be reinvested in further development.
Genomics is also an attractive field for industry. MuG can contribute to processing output data from high-throughput sequencing equipment, thus being of interest for sequencing instrument vendors. The pharma industry is another potential long-term beneficiary of the public information made available through MuG, which may contain clues on the use of DNA-interacting proteins as potential drug targets. According to a recent report by the European Federation of Pharmaceutical Industries and Associations (EFPIA), the research-based pharma industry invested 31.5k€ in R&D in Europe alone in 2015 and employed 725,000 people directly.