Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Article Category

Content archived on 2023-03-01

Article available in the following languages:

International DNA and RNA databases reach 'gigantic' milestone

The three members of the International Nucleotide Sequence Database Collaboration (INSDC) have announced that their public repositories for DNA and RNA sequence information now hold over 55 million sequences, equivalent to 100 gigabases, or 100,000,000,000 bases - the molecula...

The three members of the International Nucleotide Sequence Database Collaboration (INSDC) have announced that their public repositories for DNA and RNA sequence information now hold over 55 million sequences, equivalent to 100 gigabases, or 100,000,000,000 bases - the molecular components of DNA that encode genetic information. The three members - EMBL-Bank (based at the European Molecular Biology Laboratory's European Bioinformatics Institute in Hinxton, UK), GenBank in the US and the DNA Data Bank of Japan - have reached this milestone together, due to their data exchange policy. The three organisations all share their sequence data through the global exchange of biological information in order to make every nucleotide sequence in the public domain freely available to the scientific community as quickly as possible. Four bases - adenine (A), thymine (T), guanine (G) and cytosine (C) - linked together as pairs, form a long chain to make up the now familiar double helix form of deoxyribonucleic acid (DNA). The links between base pairs - with A linking to T and C to G via hydrogen bonds - can be broken to 'unzip' the two strands of the double helix. Genetic information is encoded in DNA by the order in which the bases occur in sequence. Conventionally, sequences can be described simply by listing the order of single bases (or nucleotides) on one of the two strands (e.g. CCAAATATGGATT), and this, along with annotations identifying source species and function, is the type of information held in the INSDC databases. 'This is an important milestone in the history of the nucleotide sequence databases,' said Graham Cameron, Associate Director of EMBL's European Bioinformatics Institute. 'From the first EMBL Data Library entry made available in 1982 to today's provision of over 55 million sequence entries from at least 200,000 different organisms, these resources have anticipated the needs of molecular biologists and addressed them - often in the face of a serious lack of resources.' The INSC was formalised in February 1987, and all three databases have their roots in the 1980s: EMBL-Bank, now at the EBI in the UK, was launched as the EMBL Data Library in Heidelberg Germany; the US GenBank was launched soon after at the Los Alamos National Laboratory, before moving to the National Center for Biotechnological Information in Bethesda, US; the DNA Databank of Japan was launched at the National Institute of Genetics in Mishima in 1986. David Lipman, Director of the National Center for Biotechnology Information, explained further, 'Today's nucleotide sequence databases allow researchers to share completed genomes, the genetic make-up of entire ecosystems, and sequences associated with patents.' Initially, data was distributed on magnetic tape and entered by hand or on floppy disk. This has now been supplanted by pipelines from genome sequencing projects and the European Patent Office, ensuring that all sequences in the public domain are available as rapidly as possible. Researchers can also submit data directly to any one of the organisations and, due to the three databases' harmonised data models, the sequences are exchanged automatically overnight to make the data available via all three. Initially the sequences were entered by hand from scientific journals, but this process has also evolved over the years so that direct submission of nucleotide sequences to the databases has become part of the publication process. This principle has also been extended to other areas, including proteomics and models of biological processes. 'The INSDC has laid the foundations for the exchange of many types of biological information,' said Takashi Gojobori, Director of the Center for Information Biology and DNA Data Bank of Japan. 'As we enter the era of systems biology and researchers begin to exchange complex types of information, such as the results of experiments that measure the activities of thousands of genes, or computational models of entire processes, it is important to celebrate the achievements of the three databases that pioneered the open exchange of biological information.'

My booklet 0 0