The core objective of this project was the provision of the EMBL Nucleotide Sequence Database in collaboration with the appropriate international partners. Related tasks included:
- the provision of protein sequences via the SWISS-PROT database in a way which allowed it to viewed through the same interface as the nucleotide sequences.
- support (with collaborators) of specialist databases to be used with the Nucleotide Sequence Database.
- collaboration with the EMBnet project.
All of the main objectives were met, during a phase complicated by a move of the entire Data Library operation from EMBL's Heidelberg headquarters to its Outstation on the Wellcome Trust Genome Campus at Hinxton in the UK. Added to these logistic developments, advances in genome sequencing caused a huge surge in the scale of the task. The original proposal predicted 350 megabases at the end of the project. The total, as can be seen from figure 1, was nearer 700 megabases.
The EMBL Data Library has responded to these changes by major technical developments in software, and database methodology and hardware enhancements. In particular, following the transition to the UK, the Data Library remodelled its team and developed its services to Exploit modern network access methods, such as the World Wide Web, both for data acquisition and for data distribution and query.
The Nucleotide Sequence Database was been delivered to schedule, and the only disappointments have been problems with the SWISS-PROT pipeline causing late delivery of releases, and postponement of a planned user survey due to changes around the transition to the UK. Although the results of this survey were largely positive, the response rate was so low that it is hard to give very much credance to its findings. Aside from the overall growth of the nucleotide collection it is worth noting that the fraction of the nucleotide data which are human has risen from 20% to over 40% (by base pairs) in the course of the project.
The scaling up of operations at the Sanger Centre, the EBI's neighbours on the Genome Campus, makes them the biggest submitter of data in the world, and close cooperation with them resulted in the development of the 'Syncron' system for the automatic inclusion of data from genome sequencing projects into the database.
The EBI continued to work in close co-operation with their global partners in the USA and Japan, exchanging all data and updates via computer networks. Technical collaborative developments have included:
- extension to accession number format
- experimental scheme for representing very long sequences
- introduction of cross-references to external databases at level of sequence features, allowing, among other things, more robust links to SWISS-PROT.
- implementation of a common taxonomy developed by NCBI
- simplified procedures for processing and exchanging data from the patent literature.