NAME - Massage NCBI chromosome annotation into GFF-format suitable for Bio::DB::GFF


 $RCSfile:,v $
 $Revision: 1.1 $
 $Author: lstein $
 $Date: 2008-10-16 17:01:27 $


   perl [options] /path/to/gzipped/datafile(s)


This script massages the chromosome annotation files located at

into the GFF-format recognized by Bio::DB::GFF. If the resulting GFF-files are loaded into a Bio::DB:GFF database using the utilities described below, the annotation can be viewed in the Generic Genome Browser ( and interfaced with using the Bio::DB:GFF libraries. (NB these NCBI-datafiles are dumps from their own mapviewer database backend, according to their READMEs)

To produce the GFF-files, download all the chr*sequence.gz files from the FTP-directory above. While in that same directory, run the following example command (see also help clause by running script with no arguments): --locuslink [path to LL.out_hs.gz] chr*sequence.gz

This will unzip all the files on the fly and open an output file with the name chrom[$chrom]_ncbiannotation.gff for each, read the LocusLink records into an in-memory hash and then read through the NCBI feature lines, lookup 'locus' features in the LocusLink hash for details on 'locus' features and print to the proper GFF files. LL.out_hs.gz is accessible here at the time of writing:

Note that several of the NCBI features are skipped from the reformatting, either because their nature is not fully known at this time (TAG,GS_TRAN) or their sheer volume stands in the way of them being accessibly in Bio::DB::GFF at this time (EST similarities). You can easily change this by modifying the $SKIP variable to your liking to add or remove features, but if you add then you will have to add handling for those new features.

To bulk-import the GFF-files into a Bio::DB::GFF database, use the utility provided with Bio::DB::GFF


Gudmundur Arni Thorisson <>

Copyright (c) 2002 Cold Spring Harbor Laboratory

       This code is free software; you can redistribute it
       and/or modify it under the same terms as Perl itself.