Periodic Reporting for period 1 - BioPIM (Processing-in-memory architectures and programming libraries for bioinformatics algorithms)
Reporting period: 2022-05-01 to 2023-04-30
Currently all biological data are analyzed using computation platforms that are general-purpose, i.e. they aim to solve a wide range of problems. This means that all current compute grids, servers, and cloud computing platforms are designed to be able to provide solutions for all “computable” problems with amortized efficiency. Analyzing massive amounts of biological data in large clusters and cloud platforms pose two problems. First, transferring the data from where it is generated (hospitals, clinics, or even small villages in the case of virus tracking) to these computer centers is both time and energy-consuming and requires stable and fast internet connection. Second, these computer platforms themselves are energy-hungry, as the data moves between the processing unit and the memory on the same computer system a considerable amount of energy is spent.
The BioPIM project aims to develop algorithms and specialized hardware together to improve the speed and cost of various bioinformatics analyses. The project focuses on two algorithm design techniques: combinatorial algorithms such as alignments, pattern matching, genome assembly and other uses of graphs; and methods based on deep learning, machine learning, and AI such as genomic variation discovery. To achieve energy-efficient, cost-efficient, and ultra-fast bioinformatics analysis, the BioPIM project leverages the emerging processing-in-memory (PIM) architectures that couples processing capability with memory and storage devices, therefore minimizing time and energy spent in data transfer. We will also design our hardware to perform some of these analyses on mobile devices therefore enabling edge computing. BioPIM addresses the inability to perform genome-analysis-on-the-go to help timely investigation of both clinical and research data, including viral and bacterial typing in remote locations with little or no access to conventional large-scale computing platforms.
BioPIM’s proposed research is flexible as it aims to develop PIM acceleration for various algorithms. Although the methods the project focus on will be within bioinformatics domain, most of these algorithms originated decades ago, and they are also being used for non-bioinformatics applications such as:
● String search and pattern matching (e.g. in natural language processing, data mining)
● Graph theory (e.g. data analytics, web indexing)
● General machine learning
● Specifically, neuromorphic computing (e.g. many applications of deep learning and artificial intelligence)
Additionally, most of our PIM developments will also benefit data centers in terms of performance gain and energy efficiency, therefore the project’s impact is expected to be far beyond our major aims.