Community Research and Development Information Service - CORDIS

Final Report Summary - MR.SYMBIOMATH (High Performance, Cloud and Symbolic Computing in Big-Data problems applied to mathematical modeling of Comparative Genomics)

The Mr.SymBioMath project (www.mrsymbiomath.eu) pursued the linking of different research domains to come up with a coordinated multi-disciplinary approach in the development of tools targeting Big-Data and computationally intensive scientific applications. Generic solutions for Big-Data storage, management, distribution, processing and analysis were developed. These solutions targeted a broad range of scientific applications such as full genome comparison, visualization, genome wide studies, etc. In particular, as proof-of-concept we created new solutions for Comparative Genomics in bioinformatics and phenotype-genotype associations in the biomedical domains (BIBM).
Following a bottom-up perspective, in a first layer the project installed and managed a Cloud infrastructure for geographical and platform-independent access, data-storage and processing power. Over this infrastructure, a set of libraries were built-up to facilitate mathematical modelling and software development of BIBM applications. Next, in a third layer the different applications were developed and finally, software-clients, including novel mobile applications were deployed for discovering, composing, launching and monitoring applications, providing uniform and universal access to software and data.

As a project framed within the Marie Curie programme, people were in the frontline of objectives; expressed in terms of long-term cooperation with a high potential for increasing mutual understanding between different scientific background but also of the different cultural settings and skills of both the industrial and academic sectors. In this sense we also created a network of PhD-students and post-docs whose tasks were mapped on the described infrastructure as a function of their own expertise and the skills claimed by each partner. The consortium combined the application domain of bioinformatics (UMA, JKU, ITG) with the technical research fields of computer architecture and HPC (UMA, RISC, LRZ) and the direct medical application (IMABIS, ITG).

Collaborators from the Johannes Kepler University (JKU, Linz-Austria) were hosted by the Research Institute of Symbolic Computing (RISC, Hagenberg-Austria) to learn about specific modelling and also provide RISC with the knowhow on machine learning techniques. People from RISC spent one year in the Leibniz Rechenzentrum München (LRZ, Munich-Germany) to learn visualization with a specific focus on Virtual Reality and related techniques. The spin-Integromics (ITG, Spain )off received people from the University of Malaga (UMA, Malaga-Spain) and also JKU to enforce industrial packing of final applications. Students and staff personnel from UMA were hosted by JKU and RISC to provide high performance computing knowhow and receive project management skills. The IMABIS Institute (associate to the Andalusian Health Service –Malaga-Spain) conducts several projects related with drug allergy, mechanisms in production of allergic diseases in the respiratory airways, and were benefited by entering in the BIBM world. However, and more important, the forged network continues open and active with increased collaboration among the different partners. The project consortia is very proud about the students exchange among the different institutions, the synergies that were built along these four years and the new skills acquired, not only scientific or technical, but also social, that secondees and new-recruitments gained. Now, they are back in their original institutions where these new skills will find the way to be profitable for the original sending institutions.

The proposed roadmap was fully and successfully completed. In the first year, a preliminary analysis was conducted to fine-tuning the specifications for the systems, including the computational infrastructure, data collections, validation tests, etc. In short, deliverables of the first year described the road-map for research, development and validation, involving all partners in the consortium with their specific tasks. The main work packages started in the second part of the first year covering the design and implementation of the system, based on the specifications produced in the preliminary analysis.

During the second year of the project, we focused on completing the work in software libraries and applications and starting the software integration stage. Due to the importance of visual representation within the Comparative Genomics field, we incorporated a new sub-task for the visual analysis of multiple-genome comparison. Along the third year of the project, and following the recommendations raised during the Mid-Term review, we re-scheduled some tasks for the last year, including tasks and personnel interchanges. In simple terms, in the first 3 years we completed the technical part (design of algorithms and code development) of the project and, in the fourth year, we focused in the evaluation and dissemination of results.

We developed new algorithms for a better and more efficient task scheduling in the cloud, controlled and authentication access; user-friendly clients, and especially in the field of comparative genomics were new procedures and algorithms allow to break the barriers of large genome comparisons algorithms; new ways to visualize, browse and navigate this the large collection of data assisted by elegant software. All applications were tested under computationally intensive conditions (i.e. very large mammalian and plant genomes), but also in the other side of this kind of research, we carried-out an important and consistent set of dissemination and public-oriented press releases belonging to the outreach activities.

In short, the Mr.SymbioMath project deployed a research Cloud and High Performance Computing environment to run software applications, solving most of the theoretical problems related to genome-genome comparisons --memory and CPU intensive applications--, inter-genome distances and evolutionary events identification, all of them with clear impact in the state of the art solutions. New software was developed to identify genes correlated phenotype feature in patients. All pieces of software have been integrated in prototypes and were tested and successfully validated in the Integration and Validation work package.

The described advancements will enable research experiments in comparative genomics that would not be possible otherwise. Particularly, the project has strongly contributed towards the discovering and identification of new relationships between species, genes and evolutionary events, which have a direct impact in several fields: from agriculture to human health and with all the economic benefits resulting therefrom. Furthermore, as a result of the research carried out during the project, we have extended our research lines in the fields of Precision Medicine, particularly in intelligent pathology classification and in Comparative Genomics, throughout the hardware-assisted acceleration for the comparison of extremely large genomes.

Reported by

UNIVERSIDAD DE MALAGA
Spain

Subjects

Life Sciences
Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top