Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Processing-in-memory architectures and programming libraries for bioinformatics algorithms

Periodic Reporting for period 1 - BioPIM (Processing-in-memory architectures and programming libraries for bioinformatics algorithms)

Reporting period: 2022-05-01 to 2023-04-30

Low cost, high throughput DNA and RNA sequencing (HTS) data is now the main workforce for various biological applications. HTS technologies have already started to impact a broad range of research and clinical use for the life sciences. These include, but are not limited to large-scale sequencing studies for population genomics and disease-causing mutation discovery including cancer, metagenomics, comparative genomics, transcriptome profiling, and outbreak detection and tracking including COVID-19, Ebola, and Zika. HTS also impacts the whole health care system in several directions. Although there is still much room for improvement, sequencing of personal genomes is now becoming a part of preventive and personalized medicine as HTS technologies make it possible to identify genetic mutations that enable rare disease diagnosis, determine cancer subtypes therefore guiding treatment options, and characterize infections and antibiotic resistance.


Currently all biological data are analyzed using computation platforms that are general-purpose, i.e. they aim to solve a wide range of problems. This means that all current compute grids, servers, and cloud computing platforms are designed to be able to provide solutions for all “computable” problems with amortized efficiency. Analyzing massive amounts of biological data in large clusters and cloud platforms pose two problems. First, transferring the data from where it is generated (hospitals, clinics, or even small villages in the case of virus tracking) to these computer centers is both time and energy-consuming and requires stable and fast internet connection. Second, these computer platforms themselves are energy-hungry, as the data moves between the processing unit and the memory on the same computer system a considerable amount of energy is spent.


The BioPIM project aims to develop algorithms and specialized hardware together to improve the speed and cost of various bioinformatics analyses. The project focuses on two algorithm design techniques: combinatorial algorithms such as alignments, pattern matching, genome assembly and other uses of graphs; and methods based on deep learning, machine learning, and AI such as genomic variation discovery. To achieve energy-efficient, cost-efficient, and ultra-fast bioinformatics analysis, the BioPIM project leverages the emerging processing-in-memory (PIM) architectures that couples processing capability with memory and storage devices, therefore minimizing time and energy spent in data transfer. We will also design our hardware to perform some of these analyses on mobile devices therefore enabling edge computing. BioPIM addresses the inability to perform genome-analysis-on-the-go to help timely investigation of both clinical and research data, including viral and bacterial typing in remote locations with little or no access to conventional large-scale computing platforms.


BioPIM’s proposed research is flexible as it aims to develop PIM acceleration for various algorithms. Although the methods the project focus on will be within bioinformatics domain, most of these algorithms originated decades ago, and they are also being used for non-bioinformatics applications such as:


● String search and pattern matching (e.g. in natural language processing, data mining)
● Graph theory (e.g. data analytics, web indexing)
● General machine learning
● Specifically, neuromorphic computing (e.g. many applications of deep learning and artificial intelligence)


Additionally, most of our PIM developments will also benefit data centers in terms of performance gain and energy efficiency, therefore the project’s impact is expected to be far beyond our major aims.
In the first year of the project, we evaluated the performance and behavior of several tools and data structures commonly used in bioinformatics. Our aim for this analysis was to understand the computational requirements of these tools, and how to improve them using PIM architectures. We then determined several algorithms implemented in these tools to be the targets for our new hardware/algorithm co-design. In the remainder of the project, we will optimize these algorithms for PIM architectures.
We have characterized the memory utilization and boundedness and computational requirements of several algorithms and tools commonly used in bioinformatics. This characterization will guide us in WP3 and WP4 to better design novel algorithms and hardware architecture.