"Complete genomes" are becoming available to the world community. It is clear that a huge amount of information is hidden in these sequences. This increasing mass of data poses new challenges: traditional computing tools should be rewritten to keep the pace of the data growth. There is a clear consensus to develop computer systems with performances at least 1000 times the ones of today. The goal of this Project is to tailor common software tools onto a parallel computer built using Beowulf architecture. Then this technology can be transferred to a real case building a system for Comparative Genomics. We want to exploit existing software by integrating it in a modular way on a parallel platform with custom databank, as to catch up with the most advanced institutions and reaching the highest speed of sequence analysis achievable. We want also to test this modular hardware and software, adding a component used to study molecular dynamics and docking programs.
The key point of the BIOWULF Project is to tailor common software tools (i.e. Blast, Blitz, Fasta) onto a parallel computer built using proven Beowulf architecture. Then this technology can be transferred to a real case, building a system for Comparative Genomics.
BIOWULF will be built using a modular hardware and software architecture to ease adding new components or adapting it for new goals. As an example, this system could be used to study interaction between molecules. In this respect we will include also molecular dynamics and a docking program as components of the modular package we want to build.
The BIOWULF Project is also porting a core of DNA/RNA/PROTEIN sequence search tools onto a parallel computer system.
The objective is to build a system with comparable computing performance to low-end supercomputers ones.
The proposed solution is based on a well-known computing architecture: the Beowulf cluster (http://www.beowulf.org). The key point is to use a well-tested and stable parallel architecture so we can focus on porting sequence search application and designing databank to exploit the parallel capabilities of Beowulf cluster.
The BIOWULF Project has two phases: model construction and BIOWULF exploitation.
The prototype, built during model construction, will be used only for testing and tuning hardware, system software and application solutions. The BIOWULF exploitation aims to build a production system for testing such solution on real research project and would demonstrate the effectiveness of this kind of architecture.
The BIOWULF Project has four main expected results:
1. Build a prototype high performance parallel computer using commonly available hardware systems and using standard operating system software architecture;
2. Develop parallelised version of standard sequence search tools (like Blasta, fasta, etc.) and "in house" databanks to exploit parallel computing power;
3. Build and deploy a real world case to be used in industrial projects for Comparative Genomics side by side with more traditional architectures. The objective is to demonstrate the effectiveness of such computing architecture opposed to dedicated massive parallel processors still widely used;
4. Port one important program for molecular interaction studies (like Gromacs and Autodock)
The BIOWULF Project intends to build a production system for testing such solution on real research project and would demonstrate the effectiveness of this kind of architecture.
WP1: real world application on PM16
WP2: Assembly BIOWULF prototype (hardware and communication) on PM3.
WP3: The milestone is the Blast (Fast A) porting on PM10
WP4: Porting on a larger scale the results obtained on the BIOWULF model and development of a custom databank on PM11
WP5: Creation of a very high performing and stable super-computer system based on modular software component today available on PM16
Funding SchemeACM - Preparatory, accompanying and support measures