Translating large volumes of plant genome data
The completion of the Arabidopsis' genome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics research. However, the utilisation of these data is inefficient, as molecular biology data collections are distributed and heterogeneous, and efforts towards their comprehensive integration are lagging behind. The ultimate aim of the PLANET project was to overcome the limitations of individual efforts and independent collections of molecular biology data. In contrast to many other data storage approaches, the PLANET project partners did not propose a data warehouse solution where all data on plant genome mapping are collated in a single database. Instead web technologies were developed to access up-to-date datasets that remained distributed with the specialists annotating them. YAdumper was developed by project partners with the Spanish National Research Council as a purpose-specific tool to facilitate structured information downloads from distributed databases. This Java application makes no assumption about the database capabilities. It requires as input a set of global variables, the formatting functions and the name of output file that can be in XML or other flat format. The template data is the only file permanently stored in memory. However, YAdumper does not only reduce memory requirements. By allowing queries to be sent to multiple databases as well as the use of correlated queries, the retrieval of thousands of related rows from molecular biology databases is facilitated. YAdumper is just one of the web technologies developed by the PLANET project and is expected to contribute to the systematic exploration of Arabidopsis and other plant genomes.