Project description
Shrinking the storage size of Big genomic Data and enhancing analysis and interoperability
The era of Big Data has put the power of large numbers in our hands for more detailed pictures of phenomena in areas from the housing loan market to epidemiology and climate change. However, storing and analysing all these data – and ensuring shareability to speed insight and innovation – is a significant challenge. The Swiss health software company GenomSys has developed the GenCoder software tool to streamline compression and analysis of genomic data and ensure interoperability amongst formats used by stakeholders including clinics, research institutes, biobanks and biotech companies. The EU-funded GenCoder project is helping the team optimise the technology and pave the way to market.
Objective
Next-Generation Sequencing (NGS) devices have enormously impacted on genomic analysis and life sciences, with a
drastic price reduction for sequencing a human genome (1,000 USD) and faster data processing, which resulted in the
exponential accumulation of genomic data. However, the vast amount of genomic data produced and stored in clinics,
research institutes, bio-banks and biotech companies is bringing along cumbersome informatics challenges: the huge
amount of data to be stored imply massive costs for data storage (850 €/TB per year, with average size of a whole human
genome being in the range of 0.4-3 TB), also due to the use of ineffective data formats not specifically designed for genomic
data; the lack of interoperability and standardized software and protocols prevents genome analysis centres to implement
reliable, scalable, and widely-accepted applications for cross-correlating and comparing genomic data. GenCoder by
GenomSys is a software tool specifically designed for efficient genomic information representation, compression and
transport, which provides the following main performances: (1) High compression rates in lossless mode (up to 90% with
respect to the BAM standard); (2) Selective access to specific blocks of data and metadata, so to significantly speed-up data
analysis; (3) Interoperability amongst available data formats, enabled by the compliance with the ISO standard MPEG-G (to
be released in 2019), which is being developed by a joint working group having GenomSys as main contributor.
The Phase 1 project will have as main outcomes the optimization and validation of the GenCoder performances through
tests involving huge datasets in selected data sequencing and storage facilities, and the business plan of the product, which
consolidates the business model and pricing, the marketing strategy, and the financial plan.
Fields of science (EuroSciVoc)
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques.
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques.
- natural sciencescomputer and information sciencessoftware
- natural sciencesbiological sciencesgeneticsDNA
- natural sciencescomputer and information sciencesdata sciencebig data
- natural sciencesbiological sciencesgeneticsgenomes
- natural sciencescomputer and information sciencesdata sciencedata processing
You need to log in or register to use this function
Programme(s)
Funding Scheme
SME-1 - SME instrument phase 1Coordinator
1015 LAUSANNE
Switzerland
The organization defined itself as SME (small and medium-sized enterprise) at the time the Grant Agreement was signed.