CBOL has reached an agreement with GenBank® to create an open archive of standardized DNA sequences derived from voucher specimens held in reference collections. GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. There are approximately 37,893,844,733 bases in 32,549,400 sequence records as of February 2004.
The new "Barcode Section of GenBank" was launched at the February 2005 barcode conference in London. It now contains several thousand data records.Each record in the Barcode section of GenBank contains the short DNA barcode sequence, along with links to a voucher specimen, a species name, and the literature citation where the sequence was published. In addition to standard GenBank data elements, the Barcode Section has additional data fields related to when and where the specimen was collected and what laboratory protocols were used to obtain the barcode sequence.
The Barcode Section of Genbank also contains “trace files”, which are the actual raw data that come from the DNA sequencing machine. These trace files are the basis from which the DNA sequence is interpreted, even though we generally think of the DNA sequence as the raw data. There can be multiple trace files involved with a single DNA sequence, as when it is sequenced from both the light and heavy strands and these data are combined to produce a more robust or corroborated result. This interpretation of a trace file is calculated as a probability score for each base call, which is also archived with each trace file for each barcode sequence. Together, the trace files (observations) and the quality scores (analyses) help to generate results (the COI sequence or DNA barcode) for each specimen examined. This approach is superior to the highly subjective interpretation that many researchers have employed in the generation of DNA sequence data submitted to GenBank, which lacks trace files and computed quality scores and often even neglects reference to an actual reference specimen.
NCBI is now beta-testing the public submission tool for the Barcode Section of GenBank. Researchers who would like to submit their DNA barcode data directly to GenBank (as opposed to submitting data through BOLD) should contact Scott Federhen (federhen@ncbi.nlm.nih.gov).
For more information please refer to the CBOL-GenBank Partnership Press Release from the CBOL Conference in February 2005.
