Skip to main content
European Commission logo
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

The first MPEG-G compliant software tools for efficient compression, storage, transport and analysis of genomic data enabling systems interoperability

Periodic Reporting for period 1 - GenCoder (The first MPEG-G compliant software tools for efficient compression, storage, transport and analysis of genomic data enabling systems interoperability)

Période du rapport: 2018-07-01 au 2018-12-31

Currently the huge and fast increasing Genomic Market lacks of a standard for what concerns the file format. Different products, both commercial or open-source, are available, but each with critical limitations in term of information management and interoperability. Moreover the huge amount of data generated by genomic files (forecast Exabites in few years) costs a relavant amount of money to public and private repositories (Hospitals, Data Banks, Cloud Services, etc.). The GenCoder is the first tool able to implement the new format developed by the MPEG-G group, aiming to unify all the different formats, and become a standard. The GenCoder will facilitate the sharing of information among the community, and reduce costs to both Research Institutions and Software Houses involved in this market.
Genomic analysis is revolutionizing the future of medicine and human healthcare. Information coded in our DNA can be used to identify predispositions to diseases, prevent pathologies like cystic fibrosis or Type 2 Diabetes, support diagnosis and drive the targeted treatment of lethal diseases such as cancer. The development of the GenCoder will improve the interaction of the many actors in the research field (bioinformatics, chemists, biologists, statistics, physicians) and the exchange of information among them, thus supporting the genomic research. Moreover it will lower the overall costs related, saving space occupied by data and processing energy.
The objective of the SME project is the development of a software suite, the GenCoder, to make accessible to the whole community the MPEG-G standard format, using the most advanced technical features currently available. The product is forecast to be ready in 20 months.
During the feasibility study GenomSys reached some important technical objectives:
New architecture design, compliant with new MPEG-G specifications
Achievement of Compression factor >5
Achievement of Latency < 10 sec
Achievement of Real Time Data Streaming
Installation into currently existing pipelines

Considering business objective:
Consolidation of GenomSys business plan
Deep market analysis, interviewing more than 30 important stakeholders (8 research centers, 3 biotech or pharma companies, 4 large data repositories, 3 NGS producers, 15 software houses), which helped the company to refine its Cost Benefit Analyis, Sensitivity Analysis and Risk Assestment
Refining of the overall Pricing Strategy of the GenCoder
Definition a marketing strategy to best diffuse the awarness of the product

During the study GenomSys participated to the following events to present its current results:
ISMB 2018 - International Society for Computational Biology
Europe Biobank Week
Toolpoint / PwC Event “Connected Lab of the future”
122nd MPEG meeting held in San Diego
123rd MPEG meeting held in Ljubliana
124th MPEG meeting held in Shenzen
The GenCoder is currently the only software tool with codecs from/to MPEG-G. The system is currently able to reach extremely high compression factor and compression speed thanks to the algorithms designed on the genomic data structure.
Thanks to the underlying technology the system can be integrated on Cloud Services and allow streaming of data, in a way similar to what is currently done with Audio and Video Formats.
During the next stages of the project GenomSys expect to implement new codecs and features to its system, such as the definition of tags and security rules over genomic intervals. Moreover GenomSys is working to make the system suitable for the whole pipeline of genomic data (from sequencing to tertiary analysis).
The project impact is huge in terms of economics: even in a slightly conservative scenario the TAM is expected to reach over 600 M€ in 2023. The technology has already been validated by a large of the community working among the open MPEG-G group.

Genomic data is expected to be the major generator of big data (ahead astronomy and huge Internet databases like YouTube), with somewhere between 100 million and 2 billion human genomes stored by 2025. Considering that the average size of a whole human genome for clinical use can vary up to 3 TB, and whereas the cost of highly-efficient storage of data for industrial applications is about 850 €/TB per year, it is easy to see the huge impact of a compression solution with high performance. Moreoever the lack of interoperability is limiting the share of information among the different research actors, thus slowing down one of the most promising field of research in medicine, the genomics. The GenCoder will ensure a professional product and ISO compliant solutions, granting certified quality to medical and research institutions.
Genomic Data Pipeline