######## Bio::ToolBox revision history #############

	- Add new option of smart coverage to script bam2wig that
	smartly handles pair-end alignments with gaps (introns)
	- Add capability to collect from multiple datasets at once 
	for scripts get_binned_data and get_relative_data. Summary 
	files can now handle multiple datasets.
	- Allow specific number of up and down windows in 
	script get_relative_data.
	- Add option to provide list of specific feature IDs to 
	script get_features.
	- Write shift correlation region data from bam2wig.
	- Improve GTF export.
	- Add utility function to simplify dataset names, used in 
	data collection scripts. Strips path and everything after 
	first period from dataset file names. 
	- Improve sort function in manipulate_datasets by taking a 
	range of columns and sort by mean. Also addname function will 
	overwrite a feature name if present.
	- Adjust logic for setting a file extension when none is 
	- Lots of additional minor fixes and changes

	- Optimize data2wig fast mode, about 3 times faster
	- Summary files now use a cleaned-up column name. Fix 
	bugs with summary file generation.
	- Bam2wig now properly reports alignment counts for each 
	strand when provided with multiple input bam files 
	(previously reported the same number). 
	- Fix bug where the Big adapter would crash when search 
	coordinate was out of bound, unlike UCSC, HTS, and Sam.
	- Improve GTF export with correct formatting and no longer 
	export transcript lines.
	- Improve GTF parsing where both transcripts and genes are 
	inferred but coordinates where not updated correctly.

	- Add function to read directly from bigWig files, and add 
	support for bigWig files to script manipulate_wig
	- Added options for filtering transcript Gencode or biotype 
	in script get_gene_regions.
	- Added option to discard low count features from script 
	- Add option to explicitly set number of columns of output 
	bed file in script data2bed
	- Update script get_feature_info to work with annotation files
	- Optimize data2wig to handle fast option in more scenarios
	- Coordinate string generation in manipulate_datasets takes 
	start values as is
	- Bug fixes in Bio::ToolBox, get_relative_data, 
	manipulate_datasets, more

	- Added support for Encode gappedPeak files. Also support for 
	gleaning file formats from bed track lines. This should make 
	future file formats easier to support in the future.
	- Fix critical bug with skipping duplicate features from GTF 
	files, particularly from Ensembl where exons share the same exon ID.
	- Fix double-counting of stranded alignments in bam2wig script. 
    Also correctly set minimum paired-end size.
	- Fix bug to correctly count FPKM and TPM over length-adjusted 
	features in script get_datasets.
	- Fix bug with filtering transcripts in script get_features.
	- Reset and clarify behavior regarding stop codons when parsing 
	and exporting transcript features for various annotation formats.
	- Add single-letter option support to script get_gene_regions.

	- Added minimal Cram file support through the HTS adapter. 
	Currently only supports the reference fasta listed in the Cram 
	file header. 
	- Added fast paired-end option and paired-end start point options 
	to script bam2wig. Temporary files now written to a temporary 
	subdirectory, which can be specified. Extreme depth can now be 
	handled properly by using 32 bit integers instead of 16. Splice 
	segments can now be fractionally counted. 
	- Brought back and updated old script correlate_position_data to 
	identify positional shifts in nucleosome or ChIP signal peaks.
	- Added new SeqFeature methods to duplicate objects and delete 
	- Added option to format result numbers in script get_datasets.
	- Fix numerous small bugs in scripts data2gff, data2fasta, 
	get_intersecting_features, get_relative_data, and more

	- Added Bed parser with support for bed3-12, bedgraph, narrowPeak,
	and broadPeak files. Data collection files will now parse bed 
	files and write a table with ID and name only, instead of 
	appending data columns to the original file structure. Parsing 
	can be turned off if you prefer the old way.
	- Added support for writing bed12 transcript models to GeneTools 
	library and get_features script.
	- Bam file alignment counting now automatically excludes all 
	secondary, duplicate, and supplementary marked alignments.
	- Add new method to manipulate_datasets to name features, useful 
	for naming bed3 files.
	- Added TPM option to get_datasets script.
	- Fix bugs with parsing gff and gtf files at same time
	- Fix bugs with detecting null and/or empty values, especially when 
	converting data formats
	- other miscellaneous bug fixes

	- Added genomic sort and bgzip file compression support when 
	writing files for tabix compatibility with several scripts, 
	including those that write gene tables.
	- Tables generated from parsed gene annotation files (GFF, etc) 
	no longer write a Type column.
	- Simplified dataset column names in script get_datasets.
	- Fix transcript filtering bugs in script get_features.
	- Add helper methods for setting bam and big adapters to db_helper.
	- Optimized run time by loading db_helper only on demand. 
	- Fix numerous POD bugs.

	- Major update to using Bio::DB::Big module for bigWig and 
	bigBed file support. This should be much easier to install and 
	support than the old UCSC library adapter modules from GMOD. 
	The old UCSC adapter is still supported, however. Also 
	included a wrapper for working with BigWigSet databases, which 
	are too useful to deprecate. 
	- Use File::Which to always locate helper applications. 
	- Add support for pigz when writing gzip compressed files.
	- Add support for fetching genomic sequence from subfeatures
	- Add single letter command line options to all scripts. This 
	was vaguely inherently supported before if the option was unique, 
	but now single letters (case sensitive) for common options are 
	explicit, and bundling is available. 
	- Add simple menu descriptions and option grouping to the Synopsis 
	section of every script POD documentation (about time!).
	- Add new script manipulate_wig.pl.
	- Add chromosome-specific normalization to bam2wig.

	- Fix bugs in bam2wig script when using negative shift values;
	thanks to Piotr for reporting. Also fix bug regarding forking 
	in coverage mode; thanks to Naoki for reporting.

	- Update config module to stop writing unnecessary config files.
	Config file will only be written when updating database or 
	application paths. Removed outdated validation, exclude tags, feature 
	classes, and default window values used by old db_helper methods.
	- Complete rewrite of get_features script to handle annotation 
	files such as GFF3/GTF/UCSC formats in addition to SeqFeature::Lite 
	databases. Includes additional feature filters based on tags.
	- Add additional transcript filter methods to GeneTools library, 
	including GENCODE basic tags and transcript_biotype.
	- Update Data parse_table API, now allows for chromosome skip regex, 
	control simplify option, and explicitly search for mRNAs. 
	- Allow SeqFeature transcript collapsing and length determination 
	to work with features from a database. 
	- Tolerate weird transcript types when collecting subfeatures in 
	various GeneTools functions.
	- Removed unnecessary primary_tag gene checks when collecting scores.
	- Record extra ensemblSource data as transcript biotype when 
	parsing UCSC files
	- Add chromosome skip regex to db_helper and big_helper methods
	- Add no header options to data convertor scripts
	- Long overdue update of POD and Readme.

	- Significantly streamlined GTF and GFF3 parsing to improve 
	loading times. By default, no subfeatures are parsed and must 
	be explicitly turned on as needed.
	- Improved parsing gene tables (GTF, UCSC, etc) as an input 
	file to scripts. Now supports defining both the feature and 
	subfeature types to process. One more reason not to use an 
	annotation database.
	- Fixed critical bug with collecting data across subfeatures, 
	e.g. get_binned_data. Subfeatures were not being properly 
	parsed and coordinates weren't converted to relative positions 
	correctly. Thanks to Zhizhou for reporting.
	- New methods in Data objects for collapsing gene transcripts 
	and calculating transcript lengths.
	- Fix bug with paired-end center span recording in bam2wig.
	Thanks to Yixuan for reporting.
	- Summary files now report bin midpoints based on 1000 bp length.
	- Script pull_features allow multiple groups in a list file, 
	and write only summary files if desired.
	- Bug fix in collecting sequence. Thanks to Patrick.
	- Add support for collecting cds Start and Stop in script 
	- Numerous small bug fixes

	- Added binning option to wig files in script bam2wig. Default 
	is to write wig files in 10 bp bins with significant decreases 
	in runtime and memory usage while not appreciably diminishing 
	- Add support to calculate shift values without doing wig 
	conversion in script bam2wig
	- Add support for mRNA transcript subfeatures, including CDS, 
	5 prime UTR, and 3 prime UTRs, in data collection scripts 
	get_datasets and get_binned_data.
	- Add new UTR methods to GeneTools library
	- Changed behavior of reporting common and alternate exons and 
	introns in GeneTools. Genes with single transcripts now report 
	all exons and introns as common for simplicity.
	- Add option to search at the 5 prime, middle, or 3 prime end 
	of features in script get_intersecting_features
	- Fix bug in specifying which database feature to collect 
	regions from in script get_gene_regions
	- Fix bug where tables with coordinates could not be used in 
	database lookups in script get_feature_info

	- Changed how bam alignments are recorded for indexed position 
	data hashes. Alignments are now recorded at their 5' postion 
	instead of midpoint, which wrecked havoc with large gaps and pairs.
	- Reporting indexed bam alignment names (ncount method) now returns 
	the actual names rather than count. The db_helper calculate_score 
	method can properly count these. This avoids double-counting across 
	exons, etc. 
	- Fix major bug in script bam2wig that prevented paired-end 
	alignments from working. Thanks to Mengyao for pointing this out.
	- Add additional checks when loading malformed files that have a 
	missing column header or extraneous hidden columns (extra tabs)
	- Add format checks for numeric columns in some file formats
	- Miscellaneous code improvements here and there
	- Major upgrade of the data collection libraries to simplify data 
	collection and improve efficiency. The value type is no longer 
	specified, being rolled into the specified collection method. Low 
	level optimizations have been added to improve speed. Increases 
	from 30% to over 300% have been measured, depending on the 
	collection method and adapter.
	- Rewrite of data collection scripts to work with the improved libraries
	- Added support for the modern Bio::DB::HTS module for Bam files, 
	while keeping support for the older Bio::DB::Sam module. 
	- Added more agnostic support for multiple different fasta indexing 
	- Script bam2wig is completely rewritten to handle multiple bam 
	files for merging, independent bam scaling, improved alignment 
	filtering, customizable output, improved cross-strand correlation 
	for peak shifting, improved speed and memory management, and lots 
	more features.
	- Updated script data2fasta
	- Numerous other features and changes too small to mention
	- Relaxed requirements for external modules, namely BioPerl, so 
	that scripts and functions that don't absolutely require them can 
	still be used. All database functions will require it though.

	- Fix endless loop bug with opening files with metadata but no data,
	e.g. empty VCF files
	- Revert support for opening bedGraphToBigWig file handles

	- Added new function to GeneTools for exporting to GTF format.
	- Added new function to filter transcript subfeatures in a gene 
	SeqFeature object by available Ensembl Transcript Support Level tags.
	- Fixed critical bug with collapsing multiple transcripts in 
	GeneTools function that resulted in too many overlapping exons.
	- Fixed bug in exporting non-coding gene models to UCSC refFlat format.
	- Other minor bug fixes.

	- Fix bug with unique option in script get_gene_regions where 
	too many regions were being discarded. Thanks to Mengyao.
	- Fix bug with generating bigWig files in script bam2wig, and 
	restore option to prefer bedGraphToBigWig if so desired
	- Add option to ignore extraneous attribute tags when parsing
	GFF and GTF files to reduce memory (simplify). Enable this 
	option by default when parsing annotation files when loading a 
	table in Bio::ToolBox::Data. 

	- Changed bigWig convertor method to use primarily the wigToBigWig 
	utility for simplicity
	- Introduced new method to open a wigToBigWig utility filehandle to 
	"print" wig files directly to a bigWig
	- Updated bam2wig and data2wig scripts to write directly to the 
	bigWig utility and skip writing temporary intermediate wig file
	- Added functionality to bam2wig to record stranded shifted counts
	- Fixed a critical bug in script get_gene_regions where transcripts 
	weren't being filtered
	- Improved file format taste testing to avoid GFF false positives
	- Improved UCSC gene table parser behavior

	- Added no header option when loading text files missing a 
	column header row. Updated script manipulate_datasets to take 
	advantage of the feature.
	- Added option to combine multiple score columns into a single score 
	when converting a file to a wig file in script data2wig
	- Added option to split gff or vcf data files by an attribute tag 
	in script split_data_file
	- Improve handling of writing vcf files
	- Fix critical errors with calculating cdsStart and cdsEnd in the 
	GeneTools library
	- Fix bugs in gff parser to continue when encountering errors in 
	parsing and interpret transcript biotype gtf attributes
	- Fix bug in properly handling start coordinates in script data2wig

	- Major update introduces new SeqFeature object Bio::ToolBox::SeqFeature 
	that is a little faster and more compact than equivalent BioPerl objects. 
	This is the default object used in gene table parsers.
	- New Module Bio::ToolBox::GeneTools for working with SeqFeature objects 
	representing traditional nested feature gene, transcript, exon models.
	The script get_gene_regions now uses this module, as do other scripts.
	- Expunged many scripts that are no longer considered part of the primary 
	mission of the BioToolBox distribution. These are now available in a 
	separate repository located at https://github.com/tjparnell/HCI-Scripts.
	- Bio::ToolBox::Data objects can now parse all gene tables into memory 
	and store the features in the object. This allows gene tables to be 
	used without requiring a database to be setup.
	- Added a file tasting method to determine whether a file looks like a 
	specific file format, e.g. gff, UCSC gene table, etc.
	- Added numerous little methods and method aliases here and there to 
	improve functionality
	- Added attribute rewrite functions for both GFF and VCF files
	- Improved file format testing
	- Numerous little optimizations in loading files

v1.36 (git 44b9dea)
	- added new option to script get_relative_data to allow user to specify 
	what feature types to avoid
	- fix bugs in scripts manipulate_datasets when exporting log2 treeview 
	files and defining x axes in graph_profile
	- fix annoying bug where manipulate_datasets will not re-show column list
	- improve data file summarization
	- some library method optimizations

v1.35 (git e489d52)
	- Add new options for setting dimensions and linear regression lines in 
	script graph_data.
	- Restored unique option in script data2gff.
	- New convenience methods for Feature objects.
	- Fixed bug with smoothing interpolation in get_relative_data
	- Numerous other bug fixes regarding bed files, column names, 
	file support, warnings.

v1.34 (git 5d4803c)
	- Changed the behavior of automatically converting interbase coordinates 
	to base coordinates upon loading a file, and converting back as necessary 
	when writing. This had the side effect of effectively changing coordinates 
	when writing out nonstandard text files. Conversion is now done on the fly 
	when using the start method of row Features. Start interbase coordinates 
	are now recognized by appending a 0 to the column name. Output files should 
	now look like the input files.
	- Strand values are not automatically converted upon loading; They are 
	converted as necessary on the fly using the row Feature strand method.
	- Null values are not automatically converted to internal '.' null values.
	They are converted as necessary using the row Feature value method to 
	maintain backward compatibility.
	- Scripts data2bed and data2wig go back to using a Stream input to avoid 
	high memory usage. 
	- Script data2wig now has a fast option to skip lots of checks on values 
	and intervals. This speeds up conversion considerably at the risk of 
	making improper wig files if the source file has issues.
	- Script join_data_file is considerably faster by simply concatenating 
	data lines without processing or checking.
	- Script bam2wig has new recording option, mid extend, to record the 
	middle portion of alignments or proper paired-end alignments. Credit to 
	Ohad for recommending.
	- Add explicit interbase support to scripts data2gff and data2fasta.
	- Fix critical bug were extensions were not scored properly for coordinate 
	features in script get_binned_data. Thanks to Mengyao.
	- Fix bam2wig alignment alignment illustrations in POD. Thanks to Ohad.
	- Bug fixes regarding bed file integrity checking that were introduced in 
	the previous release.

v.1.33 (git ba1a70e)
	- Removed legacy_helper module. All scripts now properly updated to  
	use Bio::ToolBox::Data and related objects. This was the last step of 
	a long process to modernize all of the scripts to use the new libraries.
	- All data collection modules are now chromosome naming-scheme agnostic, 
	meaning that "chr1" and "1" for chromosome can be used equally, regardless 
	of what the annotation or big data file uses.
	- Minimal VCF file support is added, including the ability to parse INFO 
	and SAMPLE attributes, and verify some file format integrity. 
	- Significantly improve GTF file parsing. 
	- Improve file format verification, including printing error messages. 
	This should alleviate cryptic reasons for automatic file extension changes.
	- Tons of bug fixes. See GitHub for a full change log. 

v.1.32 (git 67749a7)
	- Fix bug with adding a new column to Data object, particularly 
	when selected from a database. 
	- Fix bugs related to adding, deleting, or modifying columns for 
	a specific file format, such as BED or GFF
	- Introduce additional Data structure verification tests, including 
	proper strand information, to verify correct file formatting, such 
	as BED and GFF
	- Fix bugs when writing data files that incorrectly maintained 
	file extensions for a given format even when the structure was no 
	longer valid.
	- Add support for .bigwig and .bigbed file extensions.
	- Fix bug with opening fai fasta index and forked databases in script 

v.1.31 (git 9a4e122)
	- Major addition of parsers for GFF and UCSC gene table formats.
	This replaces the old gff3_parser and now supports GFF, GTF, and GFF3.
	Also moved UCSC gene table parsing out of ucsc_table2gff3 and into 
	own parser module, available for all. This supports refFlat, genePred,
	and knownGene tables. Tests for these parsers are included.
	- Updated script get_gene_regions to use parsers.
	- Greatly optimized bedGraph writing from script bam2wig to reduce 
	memory usage. Also ensure that bedGraph is written over entire chromosome.
	- Fix bugs when sorting and performing math with null, NA, and inf 
	values, especially with script manipulate_datasets.
	- Fix bug where coverage shifts by 1 bp after each write to fixedStep wig 
	in script bam2wig. Thanks to Magda for reporting.

v.1.30 (git 9ab9ff4)
	- Major upgrade of the Bio::ToolBox::Data library internals. 
	Old data_helper and file_helper modules are gone, and a 
	legacy_helper module added for those programs that still haven't 
	been upgraded yet. Numerous improvements and bug fixes to Data and 
	Stream objects, structure verification, standard file format metadata, 
	file writing, and more. Several new methods have been added too.
	- Added support for ncount, or name count, of bam files. By 
	counting unique alignment names, we can avoid double-counting 
	of reads in adjacent search areas. Also works for counting 
	paired-end reads. Supported by get_datasets script.
	- Updated pull_features script to use new Data objects.

v.1.26 (git 21c800b)
	- Removed Extras folder and outdated library functions. These 
	are available as a separate GitHub project, biotoolbox-extra.
	- Improved GFF3 parser to handle orphans more gracefully, and 
	simplify parsing by adding a next_top_feature function. It is 
	moved out of the db_helper hierarchy, where it never really belonged.
	- Changed license to exclusively Artistic License 2.0.
	- Fixed bug when using input files with coordinate information in 
	script get_datasets. Thanks to Mengyao for reporting.
	- Fixed bug when opening a new Data::Stream not based on a file or 
	data list.

v.1.25 (svn 955)
	- Added a new option to manually specify the extension length 
	and allow new ways to record read coverage in the script bam2wig.pl.
	A text graphic is included in the documentation to illustrate 
	different methods.
	- Broke out database and fasta functionality from 
	Bio::ToolBox::db_helper into a separate sub module, which should 
	limit the number of modules loaded at compile time.
	- Allow main Data feature_type to be specified by command line 
	option, useful when your input file has names of database features 
	but not a type column, for scripts get_feature_info.pl, 
	get_datasets.pl, get_binned_data.pl, and get_relative_data.pl. 
	- Added BED and GFF string export to Bio::ToolBox::Data::Feature 
	- Changed library version reporting for default new Data files.
	- Fix bugs with setting and removing AUTO metadata properly 
	when opening and writing Data files.
	- Fix bugs regarding deleting metadata, which had a side effect 
	of adding unwanted metadata to files written by manipulate_datasets.
	- Added more name possibilities when looking for possible name 
	- Fix bug where a database may sometimes not be opened properly 
	after forking into children in data collection scripts.
	- Fix bug that prevented statistics from being recovered from 
	child processes in script graph_data.pl.

v.1.24001 (svn 940)
	- Updated tests to catch possible sources of error, including 
	recent UCSC BigFile libraries that power Bio::DB::BigWig 
	adaptors, DB_File required for GFF3 loading into memory database,
	and path verification in Data metadata.
v.1.24 (svn 936)
	- Added new module Bio::ToolBox::Data::Stream for working with 
	data files line by line instead of loading them into memory. 
	Moved lots of shared methods into Bio::ToolBox::Data::common.
	- Added explicit file support for UCSC-style refSeq and genePred
	file formats, as well as Encode narrowPeak and broadPeak files.
	- Added new value type, pcount, in data collection scripts and 
	library score methods. Features, such as Bam alignments, must 
	be entirely contained within the search region, and not just 
	overlapping as with the count value.
	- Added improved method for reloading forked children files 
	back into Data objects without having to call external 
	join_data_file script.
	- Improved forking in data collection scripts, including a 
	delay in the parent after forking to prevent race conditions 
	on fast servers with high fork numbers.
	- Removed all vanity names to data_helper and file_helper 
	subroutines. All scripts updated to reflect changes.
	- Improved identification of overlapping features when avoiding 
	neighboring features when collecting relative data.
	- Optimized Bam score data collection methods.
	- Disabling bins when writing coverage in bam2wig.
	- Fix bugs with writing CDT files in manipulate_datasets.
	- Improved ToolBox::Data::Feature methods to handle internal nulls.
	- Improved retrieval of sequence list, particularly for 
	SeqFeature::Store databases.
	- Updated and improved library testing for Data and Stream objects 
	and database interaction.
	- Fixed bug where negative coordinates would not be accepted 
	when collecting relative coordinates.
	- Fixed bug where Bam and BigBed databases may not be opened 
	properly in some instances, such as precounting features for RPM 
	- Fix bug where in some cases all database features could be 
	returned with the method get_feature().
	- Fix bug were type options is now properly implemented in script 
	- Fix bug limiting to chromosome length in script 

v.1.23 (svn 915)
	- Improved script get_gene_regions to recognize non_coding exons; 
	prompt for region, feature, and RNA type; specify for more than 
	one feature type at a time; and avoid mixing RNA sub types from 
	the same gene. Thanks to Mengyao for troubleshooting.
	- Fixed bugs pertaining to collecting relative windows that may 
	extend beyond the beginning of the chromosome. Thanks to Nate 
	for reporting.
	- Fixed bugs sorting by genomic coordinate, especially when 
	only Position is provided and not Start.
	- Made Bio:ToolBox::Features return smart coordinates only, no
	funny values.

v.1.22 (svn 906)
	- Added new export options of alternate, common, or all exons 
	to script get_gene_regions.
	- Changed behavior of Bio::ToolBox::Data::Feature such that 
	database features must now be explicitly retrieved rather 
	than automatically retrieved, which could lead to runaway 
	execution if it could not be found.
	- Improved how name columns are recognized and used when 
	retrieving database features.
	- Improved writing of strand information in proper format 
	for Bed and GFF files.
	- Fixed numerous bugs that prevented proper execution in 
	several scripts, including manipulate_datasets, get_feature_info, 
	graphing scripts. Thanks to Mengyao and Yixuan for reporting.
	- Standardize data file loading message among several scripts.

v.1.21 (svn 896)
	- Fixed critical bug that prevented upstream windows from 
	collecting data in script get_relative_data.
	- Fixed critical bug that prevented some bigBed files from 
	being opened.
	- Fixed critical bugs that prevented scripts data2fasta and 
	get_intersecting_features from working properly.
	- Fixed bugs where strand may be inappropriately assigned or 
	sometimes ignored when collecting a regional positioned scores.
	- Fix minor bugs in output of scripts ucsc_table2gff3 and 
	- Include checks in data collection scripts to exit gracefully if 
	datasets can't be verified.
	- Interactive list of values to keep or toss is now sorted 
	alphanumerically in script manipulate_datasets.

v.1.20 (svn 884)
	- Refactored db_helper so that all database adaptors are loaded 
	dynamically only as needed during runtime, rather than loading 
	everything all at once regardless of need. This results in 
	faster load times and reduced memory footprint.
	- Added new methods to Bio::ToolBox::Data objects, including 
	sorting, genomic sorting, and feature_type.
	- Split out metadata-related methods and Feature objects as 
	separate modules in Bio::ToolBox::Data. Feature objects will 
	now automatically retrieve represented database features as 
	necessary to collect attributes.
	- Rewrote many, many scripts to use Bio::ToolBox::Data objects.
	Simplify, unify, and improve all Data functions.
	- Moved many specialized, outdated, or esoteric scripts to an 
	optional extras folder that will no longer be distributed via 
	CPAN but will be available through SVN.
	- Added new functions to script manipulate_datasets.pl, including 
	processing rows with specific values, split and concatenate columns, 
	view table contents, and add additional manipulations prior to 
	writing CDT files. Also, several old functions were removed.
	- Added support for converting refFlat and simple genePred 
	file formats to GFF3 in script ucsc_table2gff3.pl.
	- Add better warnings for reading files with DOS or MAC line endings.
	- Removed file extension manipulation in join_data_file script.
	- Replaced fatal errors with warnings in merge_datasets script.
	- Fix critical error where midpoints were not calculated correctly 
	for features in script get_relative_data.pl, preventing data 
	collection around a feature midpoint.
	- Fix bug to properly collect extended bins at 3'end and avoid 
	undefined start errors in average_gene.pl; plus write a summary 
	file when executing with forks.
	- Fix bugs with collecting features from a database.
	- Fix bug with renaming M to UCSC-style chrMT in 
	- Numerous other small fixes scattered about.

v.1.19 (svn 843)
	- Implemented subfeature sharing and multiple parentage when 
	exporting UCSC tables as GFF3. For example, exons can now be 
	shared between multiple transcripts of the same gene. This 
	leads to considerable reduction in file size at the expense 
	of increased complexity. Naming of subfeatures is now optional.
	- Renamed script print_feature_types.pl to simply db_types.pl.
	Known databases in the configuration file can now be 
	interactively chosen from a list.
	- Added support for multiple parentage in the gff3 parser 
	library and script gff3_to_ucsc_table.pl.
	- Added a verbose option and improved path detection in script 
	- Script filter_bam.pl now works on unsorted and non-indexed 
	bam files, making it more useful than before.
	- Bam files opened using db_helper::bam may now be sorted as 
	necessary before indexing.
	- Increase default buffer value in script bam2wig.pl.
	- Fixed bug where firstExon features were misnamed as lastExon 
	in script get_gene_regions.pl.

v.1.18 (svn 826)
	- Fixed critical bug when calculating RPM and RPKM values in 
	data collection scripts. This is a long-standing bug that 
	produced erroneous values. The bug does not affect bam2wig.pl 
	rpm reporting.
	- Improved methods for collecting from subfeatures such as 
	exons of genes or transcripts in script get_datasets.pl.
	- Added option to specify which UCSC table(s) to use when 
	setting up a new database in script db_setup.pl.
	- Added new options to extend and concatenate sequences in 
	script data2fasta.pl.
	- Added ability to use the samtools fasta index when available 
	in scripts data2fasta.pl and CpG_calculator.pl. This index is 
	about 10-20% faster than the BioPerl fasta index.
	- Fixed bug to avoid illegal characters in filenames when 
	splitting data files, and added an option to use a custom 
	file prefix in script split_data_file.pl.
	- Fixed bug where ensembl gene names may not be properly 
	recorded in the output GFF3 file in script ucsc_table2gff3.pl. 

v.1.17 (svn 808)
	- Added six new method functions to Bio::ToolBox::Data for 
	working with columns and metadata.
	- Updated script correlate_position_data.pl with parallel 
	execution plus an ANOVA statistical analysis between data.
	- Fixed bug where the --bwapp option was not being used in 
	script bam2wig.pl. Thanks to Michael D. for reporting.
	- Removed extraneous BioPerl warnings when opening a fasta file 
	or directory fails, and replaced with some suggestions.
	- Fixed bug with RPM option that lead to warnings in db_helper.
	- Simplified warning for duplicate lookup values in script 
	- Reorganized the POD summary and provided examples of usage 
	for main data collection scripts, plus provide default values 
	in POD summaries for a number of scripts. Thanks to Christian 
	for the recommendation.

v.1.16 (svn 794)
	- Fixed critical bug that prevented the forward strand from 
	being written when generating stranded coverage in script 
	bam2wig.pl. Thanks to Michael D. for reporting.
	- Fixed critical bug that prevented the script get_bam_seq_stats.pl 
	from compiling properly.
	- Fixed bug that prevented filtering more than one length at 
	a time in script filter_bam.pl. Thanks to Yixuan for reporting.
	- Fixed again the bug where passing a negative or zero start 
	to data collection methods issues a warning and resets the value 
	to 1 in db_helper.

v.1.15 (svn 786)
	- Added Bio::ToolBox::Data method to delete column metadata
	and improved adding new metadata.
	- Added back cached database objects for data collection, 
	which brings back speed lost in the previous version. 
	- Original strand format is now maintained when rewriting data 
	files. For example, + and - from Bed and GFF files as opposed 
	to 1 and -1.
	- Passing a negative or zero start value to data collection 
	methods in db_helper now issues a friendly warning and resets 
	the value to 1.
	- Opening a BigWigSet directory of bigWig files can now infer 
	strand based on filename and set the metadata appropriately. 
	For example, files whose basename ends in f, forward, or plus 
	will be interpreted as strand 1. 
	- Script gff3_to_ucsc_table.pl was significantly updated to 
	address critical flaws and change the output format to refFlat.
	- Script manipulate_datasets.pl no longer writes metadata for 
	simple file formats when using certain functions that do not 
	change data content.
	- Script bam2wig.pl now includes a --flip strand option.
	- Scripts graph_data.pl and graph_profile.pl have fixed errors 
	and made improvements regarding fonts and sizes. 
	- Various other small bug fixes and checks for optional Perl 
	module installs.
	- Updated shebang lines to use universal /usr/bin/perl
	- Updated script POD documentation to make common options more 

v.1.14.1 (svn 763)
	- Changed the method of caching database objects introduced 
	in version 1.14, which wreaked havoc with forked child 
	processes. All database connections are cached by default 
	and returned if subsequently re-opened, unless explicitly 
	told to not use the cached connection. Multiple scripts 
	were updated to reflect the new connection caching. 
	- Bio::ToolBox::Data now automatically re-clones existing 
	database connections if you splice the data table.
	- Bam file index files are now explicitly generated prior 
	to opening the bam file database connection. Additionally, 
	existing .bai files are copied as .bam.bai in preference 
	to creating a new .bam.bai file. Thanks to Yixuan for 
	- Fixed POD errors in script bar2wig.pl and updated method 
	for finding the java executable file. Thanks to Guillaume 
	for reporting.
	- Removed debugging warn statements in script 
	- Added POD documentation to Bio::ToolBox::db_helper::useq.

v.1.14 (svn 737)
	- Massive reorganization of the entire package into a proper 
	Perl module distribution that is installed using standard 
	Module::Build methods. This will install the libraries into 
	site-specific Perl library directories as Bio::ToolBox::*. 
	Scripts will install into a standard bin directory. All 
	scripts have been updated to reflect these changes.
	- Added new module Bio::ToolBox::Data, which provides an easy 
	object-oriented interface to working with data files and the 
	rest of the Bio::ToolBox functions. 
	- Added new script db_setup.pl to ease generating an annotation 
	database with UCSC data
	- Added Build tests for all major library functions, including 
	score collections from all binary database adaptors.
	- Added capability to properly collect value types, including 
	score, count, and length, from useq and wiggle database adaptors
	- Loosened restriction for counting Bam alignments where the 
	midpoint had to be within the query region; now any overlapping 
	alignment that intersects the region will be counted.
	- Reworked the interpolation algorithm to interpolate as many 
	datapoints as possible in script get_relative_data.pl.
	- Removed cryptic error messages when opening databases, and 
	added database handle caching to avoid repeated openings
	- Newly generated feature lists no longer append all aliases to 
	the feature name
	- Added additional attributes to the list of available ones to 
	retrieve from the database in script get_feature_info.pl. Also 
	added a --type command line option to set a feature type to 
	named features.
	- Improved data table checking to include a count of columns 
	for every row.
	- Added max_count option to script bam2wig.pl to control for 
	high Bam coverage
	- Fixed bug where the summary file was not created for 
	script get_relative_data.pl

v.1.13 (svn 691)
	- Updated to include native support for USeq archive files 
	with data collection scripts. USeq files may be used in 
	the same manner as BigWig, BigBed, or Bam files for data 
	collection. USeq files may be generated using tools from 
	the USeq package (useq.sourceforge.net). The 
	Bio::DB::USeq adaptor is available via CPAN. 
	- Added new script filter_bam.pl, which can filter alignments 
	based on various criteria and write a new Bam file. Filters 
	are one or more boolean tests, including attributes, scores, 
	lengths, sequence, etc.
	- Added new script get_bam_seq_stats.pl, which collects 
	information about the read sequences themselves and summarizes 
	the sequence composition and nucleotide frequencies, suitable 
	for generating sequence logos.
	- Updated script manipulate_datasets.pl to allow any integer 
	to be used when formatting decimal values.
	- Restored ability to write a new data file without collecting 
	data from script get_datasets.pl.
	- Changed the log conversion step to avoid having to increase 
	read count by 1 to avoid log of 0 errors in script bam2wig.pl.
	- Use the command line --log argument in preference over 
	metadata in script manipulate_datasets.pl.
	- Method sum now writes 0 instead of null in script 
	- Fixed issue where joining data files may not maintain gzip 
	status. This had issues with combining forked children files.
	- Fixed bug where a provided, indexed data source file 
	(e.g. BigWig) could not be used as a database in script 

v.1.12.6 (svn 680)
	- Updated the script novo_wrapper.pl to use Parallel::ForkManager 
	instead of GNU Parallel. This should make it more stable, 
	particularly under nohup.
	- Consolidated the standard out results when functions were 
	applied to multiple columns in script manipulate_datasets.pl. 
	This will make the script much less chatty.
	- Fixed bug with naming temporary forked children file names.
	- Fixed bugs with the generation of summary files.
	- Fixed bug with the automatic identification of the X axis in 
	script graph_profile.pl.
	- Fixed bug where features not found in a database could crash 
	the script get_feature_info.pl.
v.1.12.5 (svn 667)
	- Improved the shift value determination to make it more robust 
	against outliers in script bam2wig.pl. Additionally, the model 
	data that is written is now centered over the shift peak to 
	make evaluations more interpretable. 
	- Fixed a bug where 0 or negative coordinates may be written 
	to varStep wig files in script bam2wig.pl.
v.1.12.4 (svn 662)
	- Improved the efficiency of scanning for high coverage regions 
	and calculating 3 prime shift values in script bam2wig.pl; Each 
	reference sequence is now scanned in parallel. Also added a new 
	option to write the shift profile model and correlation data. 
	The efficiency of writing bedGraph files was improved, giving 
	up to 2X increase in performance. The default maximum duplicate 
	value is now unlimited. Warnings about coverage beyond the ends 
	of chromosomes are now silenced unless verbose is turned on.
	- The script graph_data.pl can now execute in parallel to improve 
	efficiency when a list of datasets are provided in advance. A 
	list may now be provided in conjunction with the --all option. 
	- Improved recognition of the X-axis column in script 
	- Fixed critical error when writing extended position bedGraph 
	files from script bam2wig.pl where reverse reads were not 
	extended appropriately in the 3 prime direction.

v.1.12.3 (svn 651)
	- Added user options to control the size of the memory buffer 
	when writing bedGraph files and the disk write frequency in 
	script bam2wig.pl.
	- Added option to control the output order of the features from 
	script pull_features.pl. The order may match either the input 
	list or input data file. Also improved automatic column identification 
	and avoid empty output files.
	- Script data2wig.pl will now write bedGraph files.
	- Fixed bug leading to excessive memory usage when writing a 
	fixedStep wig file from script bam2wig.pl. Thanks to Jeff for 
	- Fixed bug where writing strand values for gff or bed files may 
	not be written correctly.
	- Fixed bug leading to errors loading input files with comment or 
	empty lines in the middle of data lines.
	- Fixed bug to avoid log of 0 errors in script bam2wig.pl.

v.1.12.2 (svn 642)
	- Scripts find_enriched_regions.pl and CpG_calculator.pl are now 
	multi-threaded. The find_enriched_regions.pl also has additional 
	optimizations to reduce memory usage.
	- The script merge_datasets.pl now has the option to use a coordinate 
	string as a unique identifier when looking up features. This is 
	particularly helpful with BED, GFF, and other files with genomic 
	coordinates that do not have unique name identifiers.
	- A coordinate string in the format chromo:start-stop may now be 
	generated from coordinate values in data files using a new function 
	in the script manipulate_datasets.pl.
	- Fixed a bug regarding changing file extensions in script 
	join_data_file.pl, which gave odd output file names with scripts that 
	executed in parallel.

v.1.12.1 (svn 635)
	- Fixed bugs were gzip status and file extensions may be inappropriately 
	inherited. This may cause problems when joining children files from 
	parallel process forks.
	- Fixed bug where the interactive menu would exit upon an empty value
	in script manipulate_datasets.pl. A "q" must now be provided to exit.
	- Minor optimization when calculating shift values in script bam2wig.pl.

v.1.12 (svn 619)
	- Major improvements to performance of some data collection scripts by 
	adding multi-threaded options. These include get_datasets.pl, 
	get_relative_data.pl, average_gene.pl, and bam2wig.pl. The number of 
	CPU forks may be specified with the --cpu option (default 2). This option 
	requires the installation of Parallel::ForkManager, available through 
	CPAN. Run the check_dependencies.pl script to install it.
	- All gzip compression read and writes are now forked through an 
	external gzip utility for a considerable boost in performance (2-5X). 
	The gzip executable must be in your path for this to work (it usually 
	is on most Unix-like environments).
	- Added --long option when collecting data from long features in script 
	- Improved efficiency when collecting data from very large windows in 
	both get_relative_data.pl and average_gene.pl.
	- Summing the total number of read alignments in Bam files is also  
	multi-threaded. Summing the total number of intervals in a BigBed file 
	is also improved.
	- Fixed a critical error where not all windows had data collected when 
	using the script get_relative_data.pl

v.1.11 (svn 603)
	- Major revision of how features are now retrieved from the database 
	using primary_IDs rather than relying on unique names in the database. 
	Generating lists of features will now return Primary_ID, Name, and Type. 
	The Primary_ID is unique to a database and is usually non-portable. 
	Current feature lists with only Name and Type will still work, and are 
	subject to limitations of non-unique Names in the database. This affects 
	all scripts that work with database features, including get_features.pl, 
	get_feature_info.pl, get_datasets.pl, get_relative_data.pl, 
	average_gene.pl, get_intersecting_features.pl, and correlate_position_data.pl.
	- GFF3 annotation scripts get_ensembl_annotation.pl and ucsc_table2gff3.pl 
	now produce GFF3 files that better match the GFF3 specification. Names 
	are no longer made unique (which broke ties with the originating data), 
	proper Dbxref tags are attributed when external sources could be 
	identified, and chromosomes are now sorted by name. Other minor 
	improvements were also made.
	- Fixed critical bug that prevented spliced alignments from being 
	counted in script bam2wig.pl. Thanks to Pinal K. for reporting.

v.1.10.3 (svn 597)
	- Unified column names and improved their recognition in scripts 
	get_feature_info.pl and the graphing scripts graph_data.pl, 
	graph_histogram.pl, and graph_profile.pl.
	- Graphing scripts now write the output graph directory in the input 
	file parent directory instead of the current directory.

v.1.10.2 (svn 591)
	- Added a new option of position when adjusting coordinates of retrieved
	features using the script get_features.pl. Coordinates may be adjusted
	at the 5 prime, 3 prime, or both ends of stranded features. This also 
	fixes bugs where collected features on the reverse strand with adjusted
	coordinates were not reported properly.
	- Improved automatic recognition of the name, score, and other columns
	in the convertor scripts data2bed.pl, data2gff.pl, and data2wig.pl. 
	- Improved the Cluster and Treeview export function in script
	manipulate_datasets.pl. The CDT files generated now include separate ID
	and NAME columns per the specification, and new manipulations are
	included prior to exporting, including percentile rank and log2.
	- The convert null function now also converts zero values if requested
	in script manipulate_datasets.pl.
	- Added new option of a minimum size when trimming windows in the script
	- Increased the radius from 35 bp to 50 bp when verifying a putative
	mapped nucleosome in script map_nucleosomes.pl, leading to fewer
	overlapping or offset nucleosomes.
	- Added new option to re-center offset nucleosomes in script
	verify_nucleosome_mapping.pl. Also improved report formatting.
	- Added checks and warnings when writing file names longer than 256
	characters. Some scripts automatically generate file names that may
	exceed this limit, preventing writing. File names are now truncated.
	Thanks to Adam F. for reporting.
	- Added new methods and code improvements to the gff3 parsing library.
	- Fixed a bug in script merge_datasets.pl where the column index for a
	second file may not be properly validated leading to premature
	- Fixed a bug where multiple datasets combined with an ampersand for
	merging were not properly verified. 
	- Fixed a bug where a user may not be prompted to select a dataset from
	a database if none was supplied from the command line.
	- Fixed a bug where files containing trailing nulls do not load
	- Fixed a bug related to finding specific data columns by name.
	- Fixed a bug with writing summary files.

v.1.10.1 (svn 568)
	- Added support for Bio::DB::Fasta in the main BioToolBox library, and
	added the support to scripts data2fasta.pl and CpG_calculator.pl. Any
	BioToolBox program that requires chromosome information or sequence can
	now use a genomic multi-fasta or directory of fasta files in the --db
	- Fixed critical error in data2gff.pl that prevented files from being
	converted to GFF format.
	- Fixed critical error merge_datasets.pl that prevented column headers
	from being written to the output file.
	- Made the warning about unavailable files on the UCSC FTP server less
	scary in the script ucsc_table2gff3.pl.
	- Updated and clarified some script documentation.

v.1.10 (svn 559)
	- Significantly improved performance when collecting data from Bam files
	by using a low level API. Improvements of at least 2X may be realized.
	- Significantly improved the performance of the bam2wig.pl script by at
	least 2X. Added a new option of recording extended regions across the
	predicted fragment based on empirically determined shift values.
	Sampling to determine shift values has been increased. BedGraph files
	are now written more efficiently. Maximum number of identical reads are
	now enforced.
	- Significantly improved the performance of the split_bam_by_isize.pl
	script to increase speed by at least 2X. Added an option to skip
	checking of mates. Improved reporting of results.
	- Added a filter option to remove overlapping nucleosomes in script
	verify_nucleosome_mapping.pl; also fixed bugs in reporting offset
	distances and improved output reporting.
	- Removed confusing separate scan and tag datasets required for script
	map_nucleosomes.pl. Cleaned up and organized code. Fixed bugs that
	prevented datasets from being validated.
	- Fixed critical bug where data was not collected for the final row in
	script get_datasets.pl.
	- Fixed bugs with parsing unusual input files, for example commented
	header lines in bed files or inconsistent column numbers.
	- Fixed bug in script get_intersecting_features.pl where a strand column
	was expected even if it was not present.
	- Changed all tim library calls to use arrays instead of anonymous
	hashes for a cleaner API. 
	- Changed shebang lines to use /usr/bin/env to improve portability on
	systems with different Perl versions installed. 
	- Cleaned up and made POD documentation more consistent.
	- Add warnings about database users and passwords in configuration file. 

v.1.9.7 (svn 539)
	- Fixed critical bug where an exon containing all three 5'UTR, CDS, and
	3'UTR was not properly parsed in the script get_ensembl_annotation.pl.
	New command line options for to include or not CDS, UTR, and start/stop
	codons were added. Significant changes to improve and organize the code
	was also made.
	- Changed the method of assigning the GFF type for chromosomes and
	scaffolds based on their name in the script ucsc_table2gff3.pl.  Also
	made the inclusion of start and stop  codons enabled by default.
	- Removed annoying automatic column assignment for input GFF files in
	script data2bed.pl. GFF files are still handled properly if no columns
	are specified on the command line.

v.1.9.6 (svn 533)
	- Fixed critical bug in script ucsc_table2gff3.pl where single exons
	containing all three 5'UTR, CDS, and 3'UTR subfeatures were not properly
	parsed into GFF3. This had resulted in an extended CDS longer than
	expected. Thanks to H. Stovall for reporting.
	- Added warnings when a sequence could not be generated to avoid
	division by 0 errors, and a slight correction to fraction calculations,
	in script CpG_calculator.pl.

v.1.9.5 (svn 525)
	- Changed the non-intuitive --except option to a more intuitive --zero
	option in script manipulate_datasets.pl; this is now a boolean option to
	include or exclude zero values when calculating statistics. The printed
	statistics output has also been cleaned up and no longer includes
	decimal formatting. The export function will automatically generate a
	name when executed automatically.
	- Added capability to use a column of source values rather than a static
	text string for the GFF source tag in script data2gff.pl. Also made
	improvements to the interactive ask session.
	- Added the capability to use a big file dataset as the database for
	chromosome information in script find_enriched_regions.pl.
	- Added an option to automatically convert the output file to a BED file
	in script get_gene_regions.pl, and included a description of the --in
	option in the POD documentation.

v.1.9.4 (svn 519)
	- Fixed first critical bug in script get_datasets.pl where strand
	information in input files with genomic coordinates (e.g. BED files) was
	not considered when adjusting coordinates (start, stop, or fractional). 
	- Fixed second critical bug in script get_datasets.pl where collecting
	fractional data for named database features resulted in data collection
	over the entire feature.
	- Improved interpretation of input file features as genomic regions or
	named features in script get_datasets.pl.
	- Changed the --set_strand option to --force_strand in multiple data
	collection scripts. This should make the function a little more obvious
	as to its purpose. Documentation changed as appropriate.

v.1.9.3 (svn 516)
	- Fixed bug where wig definition lines may not be written when no
	alignments exist in the first 2 Mb of a chromosome when converting a bam
	file to a wig file in script bam2wig.pl. Definition lines are now always
	written. Thanks to Matt J. for reporting.
	- Fixed bug where the format_with_commas sub was not properly imported
	into the tim_db_helper library
	- Fixed bug where the bed output from script get_features.pl did not
	properly report strand information. 

v.1.9.2 (svn 510)
	- Fixed critical bug where codon changes were not reported correctly for
	minus strand genes in script locate_SNPs.pl. Thanks to Craig K. for

v.1.9.1 (svn 507)
	- Added critical code to interpret strand information from input files
	such as Bed and GFF into BioPerl standards. Essential for collecting
	stranded data. Also properly writes back strand information for valid
	Bed and GFF files
	- Updated and unified internal library methods for validating and
	requesting database feature types. By default, all database features are
	presented to the user as a list when selecting database features to
	collect data. The source_exclude parameter in the biotoolbox.cfg
	configuration file is now deprecated.
	- Upgraded script get_intersecting_features.pl to automatically
	recognize input file columns and search for more than 1 feature type
	- Fixed bug in script get_datasets.pl where the program will not
	continue when only a data database was provided
	- Fixed bug of requesting index when using a .kgg file as a gene list in
	script pull_features.pl
	- Fixed bug in generating file name for Treeview export function in
	script manipulate_datasets.pl
	- Fixed behavior when reading files to prevent adding the current
	program name to the metadata when the input file does not have this
	- Minor updates to script novo_wrapper.pl
v.1.9.0 (svn 493)
	- Added new script get_features.pl which generates a list of features
	for one or more feature types from a database. Information about the
	features may be returned, including name, type, and coordinates. Sub
	features may be included. The data may be written as a BioToolBox
	formatted text file, GFF or BED.
	- Added new script correlate_position_data.pl that calculates a Pearson
	correlation between the score values at identical positions along a
	feature between two datasets. This helps in identifying changes in
	spatial distribution of values. An option for calculating shifts is also
	- Improved Big File generation such that Bio::DB::BigWig or
	Bio::DB::BigBed is no longer required just to generate the big file, as
	conversion uses external utilities anyway.
	- Fixed generation of bin values when calculating distribution
	frequencies in scripts data2frequency.pl and graph_histogram.pl

v.1.8.7 (svn 487)
	- Added new command line options to script merge_datasets.pl to control
	the program's behavior. The "--lookupname" option allows you to specify
	the name of the lookup column, while "--manual" turns off all automatic
	guessing of columns. Also improved handling of original_file metadata.
	- Added a new option to collect data from long features (such as genomic
	annotations) instead of point data (microarray or sequence data) in
	script get_relative_data.pl.
	- Added option to convert to and from Roman numerals in chromosome names
	and support for wig files in script change_chr_prefix.pl
	- Added option to change the IP port number when connecting to a remote
	MySQL database host in script get_ensembl_annotation.pl
	- Fixed bug to properly close opened files in script split_data_file.pl
	and avoid unnecessary error messages.
	- Modified statements and warnings regarding step and span values in
	script data2wig.pl

v.1.8.6 (svn 477)
	- Added numerous enhancements and bug fixes to script data2wig.pl,
	including automatically assigning the span parameter in the wig file,
	identifying coordinate columns, adding command line options for
	coordinate columns, and updating the POD documentation
	- Improved the treeview export function in script manipulate_datasets.pl
	to include different manipulations, including median center of genes or
	datasets, converting to Z-scores, and converting null values. Also
	changed the default output name to <basename>.cdt.
	- Added advanced option to script merge_datasets.pl to specify the
	column order on the command line instead of interactively. Also
	increased the number of columns that can be specified as letters.
	- Added the "value" command line option to specify the type of data to
	collect to the script find_enriched_regions.pl. Also added the sum
	method plus some improvements for identifying depleted regions.
	- Updated the script run_cluster.pl to accept any file name as input,
	and added basic file format validation checks prior to running the
	cluster algorithm, among a few other minor improvements
	- Improved handling of error messages when attempting to open databases
	that do not exist or can not otherwise be opened.
	- Added more support for reading bedgraph files, dealing with track
	lines and possibly empty lines
	- Collecting data from bigWig files that use spanned features (span > 1
	bp) are now collected at every base rather than just the start position
	- Fixed bug where more than two files were not properly merged using
	lookup in script merge_datasets.pl
	- Fixed bug to allow data to be collected for Bed files from indexed
	data files without specifying a database in script get_datasets.pl

v.1.8.5 (svn 461)
	- Fixed critical bug where all knownGene feature strands are reversed in
	script ucsc_table2gff3.pl
	- Fixed critical bug where the sign is flipped when generating Z-scores
	with script manipulate_datasets.pl
	- Added new functions "convert null values" and "absolute value" to
	script manipulate_datasets.pl
	- Added additional file format checks when writing formatted files
	including GFF, BED, and SGR. File extensions may automatically change to
	default txt if the format does not match.
	- Better handling of input Bed files and generating appropriate default
	file names in script data2gff.pl
	- Improved merging of datasets by lookup, and loosened restrictions on
	metadata checking, issuing warnings instead, in script merge_datasets.pl
	- Loosened restrictions on metadata differences and failures in script
	- Included fix for finding column indices when name is prefixed with #
	- Added another check to avoid returning undefined values from BigWig
	data collection

v.1.8.4 (svn 448)
	- Changed shift value determination to use trimmed mean to avoid
	outliers, and added new option to control the minimum acceptable R^2
	value in script bam2wig.pl
	- Improved script merge_datasets.pl to identify appropriate lookup
	columns automatically and successfully merge more than two files using
	- Changed my implementation of Z-score generation so that signed values
	are properly reported instead of absolute values in script
	- Fixed critical bug where output files were prematurely closed when
	splitting a data file in script split_data_file.pl
	- Reduced some unnecessary error reporting when opening databases that
	do not exist
	- Updated list of column names to avoid in script graph_data.pl
	- Updated interactive prompts in script manipulate_datasets.pl
	- Fixed bug where the --pos option in script_datasets.pl did not accept
	the 'm' argument
	- Fixed bug where strand was reported as '.' instead of '0' in script
	- Fixed bug regarding writing headers, especially with new BED files
	- Fixed bug when providing an index of 0 on the command line with script

v.1.8.3 (svn 431)
	- Improved mapping efficiency, made tag dataset optional, added direct
	support of BigWig and BigWigSet datasources, and updated documentation
	to script map_nucleosomes.pl.
	- Updated script verify_nucleosome_mapping.pl to accomodate changes in
	map_nucleosomes.pl output, added support for generic input files, added
	option for other datasources, and added direct support for BigWig and
	BigWigSet datasources.
	- Added multiply and add methods to script manipulate_datasets.pl.
	- Added firstIntron and lastIntron to list of regions to collect in
	script get_gene_regions.pl
	- Fixed critical bug when collecting data about GFF features from a
	database that caused a crash when no features were found.
	- Fixed bug in get_gene_regions.pl when collecting introns where the
	last intron was skipped and reverse strand coordinates were flipped
	- Fixed bugs in manipulate_datasets.pl where a list of invalid index
	numbers could still evaluate to index 0, and the start column may not be
	recognized when performing a genomic sort.
	- Fixed bug where text files with DOS/Windows line endings (CRLF) were
	not loaded properly
	- Fixed bug in data2wig.pl to skip positions less than or equal to 0
	- Improved null value reporting when collecting data

v.1.8.2 (svn r411)
	- Added new script CpG_calculator.pl to count observed and expected CpG
	dinucleotides across a genome sequence or defined regions.
	- Added R61 SacCer2 to R64 SacCer3 conversion to script
	convert_yeast_genome_version.pl. Also improved chromosome name
	recognition and identification of columns in custom file structures.
	- Fixed and improved bin generation and output in scripts
	data2frequency.pl and graph_histogram.pl. Values outside of the
	requested range are now ignored. Script data2frequency.pl also has
	considerable code cleanup and reorganization.
	- Added a sum method and made minor enhancements to wig data collection
	to script bin_genomic_data.pl, along with considerable code cleanup.
	- Added automatic capability to script merge_datasets.pl. All unique
	columns are automatically merged without manual interaction. This is now
	useful for automated shell scripts.
	- Enforced no compression when generating bigWig files, and improved
	column recognition in script data2wig.pl
	- Changed 'primary_tag' to 'type' in the generated metadata and subtrack
	selection for BigWigSet database output in script big_file2gff3.pl. Also
	improved conf stanza renaming scheme for BigWigSets.
	- Fixed bug in script bar2wig.pl that prevented the USeq App Bar2Gr from
	being used.

v.1.8.1 (svn r392)
	- Updated script find_enriched_regions.pl to handle separate feature and
	data databases if desired, and add capability to restrict searches to
	specific strands.
	- Updated script map_transcripts to handle chromosomes names without
	integers in their names
	- Brought script convert_yeast_genome.pl back out of retirement and
	updated with R63 to R64 convertor
	- Added chromosome and sequence sorting to GFF3 output from script
	get_ensembl_annotation.pl. Also include Ensembl API version reporting.
	- Updated script check_dependencies.pl to report the installed Ensembl
	API version number
	- Improved GFF3 parsing and minor improvements to script
	- Fixed bugs when working with BigWigSet databases, where a trailing
	slash in the directory name may lead to different behaviors, and
	unexpected results when collecting data from BigWigSet databases using
	two different methods in the same program
	- Fixed bug where null values in tab-delimited text files are now
	internally converted to null character .
	- Fixed sorting issues in script split_bam_by_isize.pl
	- Fixed bugs in script novo_wrapper.pl that prevented an uncompressed
	Fastq input file from being split properly, split input files from being
	removed after aligning, and a single unsorted Bam file is not further

v.1.8.0 (svn r378)
	- Moved script novo_wrapper.pl out of retirement (due to popular demand)
	and significantly updated it to handle parallel execution
	- Retired old script merge_SNPs and replaced it with new
	intersect_SNPs.pl script, which is an improved version that uses the VCF
	- Updated script locate_SNPs.pl to work with multiple alternate
	sequences, multiple features, and importantly with the VCF format
	- Added .vcf and .bdg extensions as properly recognized file format
	extensions. Changed default bedgraph extension to use .bdg in script
	- Stripped all code and mention of binary tim_data_formatted files based
	on Storable. Not really a prominent feature and never lived up to its
	hype anyway, so removing it

v.1.7.4 (svn r363) (not released)
	- Fixed critical bug that prevents local Bam files from opening for data
	- Added warnings if a chromosome segment failed to be found in a

v.1.7.3 (svn r355)
	- Fixed bugs in script bam2wig.pl that prevents it from finding its
	libraries and compiling properly; and another bug that prevented
	stranded start positions from being recorded properly

v.1.7.2 (svn r351)
	- Fixed bug in script ucsc_table2gff3.pl where the output file name may
	not be properly generated, leading to an overwrite of the input file.
	- Fixed bug in script bam2wig.pl where the recorded position is off by 1
	- Added recommended settings in the POD for bam2wig.pl

v.1.7.1 (svn r346)
	- Fixed critical bug in data collection library that allowed too many
	datapoints to be collected by ignoring the stop position. This could
	affect scripts get_datasets.pl, get_relative_data.pl, average_gene.pl,
	find_enriched_regions.pl, and others.
	- Major overhaul of script pull_features.pl to include better automatic
	identification of identifier columns, the capability to match multiple
	features, and to simultaneously write all groups from a .kgg list
	- Updated script get_datasets.pl so that it would rewrite the output
	file after each round of data collection.
	- Minor bug fixes in script find_enriched_regions.pl
	- Retired outdated script convert_yeast_genome_version.pl. Users should
	use the liftOver program from UCSC and chain files from SGD.

v.1.7.0 (svn r340)
	- Added new program get_gene_regions.pl which helps in retrieving
	regions not explicitly annotated in a database, including start and stop
	sites of transcription and introns.
	- Added new program data2fasta.pl which generates a multi-Fasta file
	from a tab-delimited text file of coordinates or a list of sequences,
	such as microarray probes.
	- Added new program compare_subfeature_scores.pl which compares a list
	of feature and subfeatures and find the subfeature with the minimum and
	maximum score.
	- Major update to the data collection scripts to improve memory
	consumption and efficiency, and a significant boost in speed when
	working with BigWig data sources (I have seen up to 10 fold increase,
	depending on collection methods).
	- Improvements when working with BigWigSet directories, including
	working with impromptu directories of BigWig files that do not have a
	defined metadata file.
	- Added the option of using separate annotation and data databases when
	using the data collection scripts. This greatly simplifies things when
	you have, for example, an annotation SeqFeature::Store database and a
	BigWigSet database of data.
	- Added the rpkm method to work with any segment, not just genes with
	exons, in data collection scripts get_datasets.pl and average_gene.pl
	- Fixed bugs in script ucsc_table2gff3.pl, data2wig.pl,
	find_enriched_regions.pl, and bar2wig.pl

v.1.6.4 (svn r314)
	- Major update to script bam2wig.pl to reduce memory consumption by
	writing incremental portions. The strand option is now a boolean option,
	and when enabled, automatically writes both strands simultaneously. The
	binning of read counts into windows of user-selected size is now
	possible. The optimal shift value for ChIP-Seq data can now be empically
	determined from the reads using a statistical method.
	- Added additional support for UCSC ensGene tables by including
	ensemblToGeneName and ensemblSource supplemental tables in script
	ucsc_table2gff2.pl. The common gene name is now included in the output
	GFF3 file.
	- Added rna_count function to script get_feature_info.pl
	- Added minimum and maximum value functions to script
	- Included a range option when generating a summary file in script
	- Improved the regular expression matching of the chromosome name when
	sorting by genomic coordinates in the script manipulate_datasets.pl
	- Increased the number of available letters when requesting indices from
	the second file in script merge_datasets.pl
	- Updated script check_dependencies.pl to handle missing dependencies
	more gracefully
	- Updated error handling of missing Perl module dependencies, including
	- Fixed bug where the default chromosome exclusion list in
	biotoolbox.cfg wasn't being used when generating a new genome interval
	- Fixed bug where where a script might ignore the --nogz option when the
	original file was gzipped
	- Fixed bug in script split_data_file.pl where a filename may get out of
	sync with what was requested and what is written

v.1.6.3 (svn r293)
	- Added knownGene as a source in script ucsc_table2gff3.pl
	- Improved handling of the chromosome exclusion list in library
	- Fixed bug where an exception could occur if multiple genomic regions
	on different chromosomes are returned from a database query. Included
	logic to help identify the appropriate intended chromosome.
	- Fixed bug where an exception and crash could occur if the query
	chromosome is not present in a bigWig, bigBed, or Bam file when
	collecting data. Chromosome names are now checked prior to query.
	- Fixed bug in script get_datasets.pl where a null value is returned
	instead of 0 when using the method of sum.
	- Removed several minor bugs that could generate non-fatal Perl warnings

v.1.6.2 (svn r282)
	- Fixed bugs in script data2bed.pl that prevented a bigBed file from
	being generated. Also improved autodetection of data columns and allowed
	for dummy data to be inserted in lower column data when writing higher
	column data. Also added ability to use either the GFF Name or ID
	attribute as the Bed feature name.
	- Added span option to script data2wig.pl when making wig files.
	- Renamed script process_agilent.pl to process_microarray.pl. Completely
	restructured internal data to accomodate multi-slide arrays and other
	file formats, including NimbleGen and GenePix.
	- Removed annoying verbose output from script split_data_file.pl and
	improved efficiency.
	- Stopped writing index keys in the metadata of tim data file formats.
	Index is now automatically calculated and retained internally. Also
	avoids writing metadata automatically if it wasn't present in the first
	- Added summary export function to script manipulate_datasets.pl. This
	replicates the summary option from script get_relative_data.pl.
	- Added multi-column support to the subtract and division functions in
	script manipulate_datasets.pl.
	- Minor bug fixes and improvements to script map_oligo_data2gff.pl.
	- Improved script gff3_to_ucsc_table.pl to handle gzip files and make
	the UCSC bin column optional.
	- Added character escaping when generating GFF3 files.
	- Improved handling of BigWigSet directories in script big_file2gff3.pl
	where the set name is used as the final subdirectory in the target path.
	Also improved name handling.
	- Fixed bug in writing Sam files in script change_chr_prefix.pl. Also
	added increased support for pragmas and fasta sequences in GFF3 files,
	and support for non-standard text files.
	- Changed the score column name to the more meaningful outfile basename
	when writing summary files.
	- Fixed data collection from Bed files in script bin_genomic_data.pl.
	- Renamed script map_relative_data.pl to get_relative_data.pl; updated
	the POD to be more helpful.

v.1.6.1 (svn r258)
	- updated the inline documentation for all perl scripts to include the
	version option

v1.6.0 (svn r253)
	- added version numbers and reporting to all perl scripts and modules
	- retired a number of outdated scripts
	- renamed script map_data.pl to map_relative_data.pl

v1.5.9 (svn r247)
	- updated script big_file2gff3.pl to generate BigWigSet conf stanzas
	with subtracks, also more thorough conf stanzas
	- added additional axis formatting options to script graph_profile.pl
	- fixed critical error in library tim_db_helper where relative
	coordinates were not correctly reported in function
	- improved handling of opening a bigwigset database in library
	- major overhaul of script average_gene.pl to work with bed files, add
	new methods including rpm support, and general much-needed
	- improved error messaging in biotoolbox libraries by using confess
	instead of croak
	- reorganize the order of checking for the biotoolbox configuration in

v1.5.8 (svn r240) (not released)
	- fix some bugs with script graph_histogram.pl concerning the bins and
	their labels
	- updated script gff3_to_ucsc_table.pl to work with gene models without
	transcripts and fix bugs handling comments and pragmas
	- fixed bug with trimming windows in script find_enriched_regions.pl by
	including absolute option to get_region_dataset_hash() function in
	library tim_db_helper
	- added option to randomly assign strand for paired-end features to
	script bam2gff_bed.pl
	- fix chromosome regex issue with non-standard chromosome names in
	script bar2wig.pl
	- updated methods to get chromosome sizes in libraries
	tim_db_helper::bigwig and tim_db_helper::bigbed
	- added new parameter chromosome_exclude in configuration file
	biotoolbox.cfg, which allows specific chromosomes to be excluded when
	generating new feature or genomic interval lists
	- removed all references to key reference_sequence_type from config file
	biotoolbox.cfg and associated scripts
	- updated chromosome reference, and added logic to automatically
	identify column indices in script data2bed.pl
	- updated several scripts to use seq_ids to retrieve chromosome lists
	- fixed bug in script get_feature_info.pl where short feature lists
	would cause a failure when generating a list of possible attributes from
	sample features

v1.5.7 (svn r227) (not released)
	- major overhaul of script get_datasets.pl
	- removed subs get_feature_dataset() and get_genome_dataset() from
	library tim_db_helper, functionality moved to script get_datasets.pl
	- added data color options to script graph_profile.pl
	- completely updated script map_data.pl to work with chromosome segments
	rather than named features, and added rpm support
	- added new sub to check datasets for rpm support in library
	- fixed bug when specifying no datasets in script get_datasets.pl
	- improved support for BigWigSet databases in library tim_db_helper and
	script print_feature_types.pl

v1.5.6 (svn r223) (not released)
	- added rpm method to score functions in library tim_db_helper
	- minor bug fixes and adjustments to help rpm method in tim_db_helper
	bigwig, bigbed, and bam libraries
	- minor bug fix in script find_enriched_regions.pl
	- fixed export bug in library tim_db_helper::bigbed
	- fixed bug in library tim_db_helper sub process_and_verify_dataset()
	where new datasets would never be prompted
	- corrected the method for counting bed features in library
	- fixed alignment collection to only take alignments with midpoint
	positions within the requested region in library tim_db_helper::bam

v1.5.5 (svn r219) (not released)
	- added new avoid option to method get_region_dataset_hash() in library
	- updated script map_data.pl to use get_region_dataset_hash()
	- fixed bug in method validate_dataset_list() in library tim_db_helper
	- fixed bug in script merge_datasets.pl where table headers may not be
	written properly
	- fixed bug in tim_db_helper::get_genome_dataset() if more than one
	segment was found
	- made numerous improvements in opening db connections in library
	- made changes to assigning feature type when opening certain files in
	library tim_file_helper
	- fixed bug in library tim_db_helper where bed file coordinates were not
	written out in interbase
	- moved the sum_total_alignments() subroutine from the script bam2wig.pl
	to the library tim_db_helper::bam
	- added support for stranded paired-end RNA-Seq bam files aligned with
	TopHat which use the XS attribute to record strand information in
	scripts bam2wig.pl and bam2gff_bed.pl
	- disabled splices on paired-end bam files in script bam2wig.pl

v1.5.4 (svn r209) (not released)
	- added more explicit support for bed files in the tim_file_helper and
	tim_data_helper libraries, including data structure verification,
	interbase to base conversion, and metadata handling
	- generalized bam and bigfile database handling to tim_db_helper
	- simplified generating genomic windows in tim_db_helper
	-improved handling of collecting data from bigfile databases in
	tim_db_helper libraries
	- added chromosome feature output to script big_file2gff3.pl
	- updated numerous scripts to reflect tim_db_helper changes; general
	code cleanup
	- further simplification and code cleanup of library tim_db_helper,
	including database and dataset list verification, and removing redundant
	code in collecting dataset values
	- added new subroutine process_and_verify_dataset() to library
	- updated scripts average_gene.pl, find_enriched_regions.pl, and
	map_data.pl to use the new sub process_and_verify_dataset()

v1.5.3 (svn r205)
	- Fixed bug in script bam2wig.pl that prevented spliced alignments from
	being properly checked and recorded. 
	- Fixed numerous bugs in script ucsc_table2gff3.pl, including a bug
	where the gene start coordinate may not be updated from interbase to
	base, and not accurately converting the CDS phase
	- Added new features to the script ucsc_table2gff3.pl, including
	automatic table retrieval through FTP from UCSC to greatly simplify
	conversion, adding support for knownGene and xenoRefGene tables,
	customizing the type of features to output, properly handling features
	with duplicate names by creating unique IDs, and optionally including
	chromosome information in the output GFF3 file
	- Deleted the now redundant script ucsc_chrom2gff3.pl

v1.5.2 (svn r200)
	- Updated several scripts and libraries to fix bugs in handling GFF
	version numbers and pragmas.
	- Added unique IDs to the gff3 output from bam2gff_bed.pl
	- Added option to deal with multiple values at identical positions in
	the script data2wig.pl
	- Added support for log2 values when combining multiple values at
	identical postions in scripts data2wig.pl, bar2wig.pl, and
	- Retired the outdated script just_blast_oligos.pl.

v1.5.1 (svn r193)
	- Fixed critical bug in script bar2wig.pl where values from multiple
	postions were not combined properly. Also fixed bug with processing a
	single bar file.
	- Removed required dependencies of bioperl for scripts bar2wig.pl and
	- Fixed small bug in tim_db_helper::bigbed library to ensure positions
	were withing the region of interest
	- Added mapping quality filter and other improvements to script
	- Changed score reporting to record mapping quality in script

v1.5 (svn r184)
	- Added script useq2bigfile.pl for converting USeq archives
	- Added script check_dependencies.pl for assisting in checking for Perl
	module dependencies. It will help install the latest versions through
	- Changed the biotoolbox configuration file from lib/tim_db_helper.cfg
	to biotoolbox.cfg in the root directory.
	- Moved the biotoolbox configuration loader into a separate module as
	lib/tim_db_helper/config.pm. This avoids requiring installing BioPerl
	and loading all of tim_db_helper.pm when it may not be necessary.
	- Updated numerous scripts to reflect changes with the biotoolbox
	configuration loader.
	- added axes labeling options to scripts graph_data.pl and
	- fixed bug in handling bed files in library tim_file_helper
	- minor fixes in script data2wig.pl
	- improved working with bigfile conversions
	- fixed minor bug in script big_file2gff3.pl when leaving files in the
	current directory

v1.4.4 (svn r162)
	- Added reads per million option to script bam2wig.pl
	- Added parent, exon, and transcript_length attributes to script
	- Updated scripts find_enriched_regions.pl and map_transcripts.pl to
	work with with standalone data files (BigWig, BigBed, Bam)
	- Added configuration, description, and capabilities to working with
	SQLite database files in tim_db_helper
	- Added midpoint as acceptable coordinate in script data2wig.pl
	- Bug fixes to script locate_SNPs.pl and bam2wig.pl; library

v1.4.3 (svn r144)
	- Changed script bar2wig.pl to require method for combining values and
	removed interbase option
	- Updated peak indentification in script map_nucleosomes.pl to use the
	tag dataset and not the scan dataset
	- Updated script big_file2gff3.pl to produce more useful conf files with
	- Added overlap data column to ouput of script
	get_intersecting_features.pl and added --set_strand option to enforce
	- Added three new functions to script manipulate_datasets.pl, including
	new column, strandsign, and mergestrand
	- Fixed script wig2data.pl so it works now
	- Updated script get_feature_info.pl to parse an attribute list from the
	command line
	- Improved handling of metadata when opening tim data files

v1.4.2 (svn r129)
	- Added fast low level coverage function to the script bam2wig.pl
	- Fixed script pull_features.pl to keep the order of features in the
	list file.
	- Fixed script bar2wig.pl to correctly identify the chromosome name.
	- Various bug fixes to the database library helper tim_db_helper.pm.

v1.4.1 (svn r119)
	- Fixed bug with get_ensembl_annotation.pl where a protein_coding gene
	encoding a transcript lacking a CDS will write inappropriate
	coordinates. These transcripts will not write start_codon, stop_codon,
	or CDS subfeatures.
	- Fixed bug with script get_intersecting_features.pl where selecting
	regions with a start, stop modifier was not being selected properly.
	- Fixed bug with tim_db_helper modules that prevented working with
	source data files specified in a database feature
	- Added log transformation of count in script bam2wig.pl

v1.4 (svn r111)
	- Added script bam2wig.pl for enumerating alignments and writing a wig
	file of the counts. 
	- Added script change_chr_prefix.pl for adding or stripping chromosome
	prefixes from data and annotation files.
	- Bug fixes to ucsc_table2gff3.pl.

v1.3 (svn r104)
	- Added ability to restrict data collection to exon subfeatures to
	script get_datasets.pl. Useful for RNA-seq analysis.
	- Added exon count as attribute to script get_feature_info.pl.
	- Bug fixes to get_datasets.pl.

v1.2 (svn r98)
	- Added support for bam files as a data source.
	- Updated data collection scripts to allow direct referencing of data
	source files, including bigWig, bigBed, and Bam files, on the command
	line, without having to reference the files from within the database.

v1.1 (svn r92)
	- Updated script ucsc_table2gff3.pl to use Bio::SeqFeature::Lite. Now
	outputs exon and codon features.
	- Updated script get_ensembl_annotation.pl to collect RNA features from
	Ensembl as well as generate exon and codon features.
	- Added script gff3_to_ucsc_table.pl to generate UCSC style refSeq
	tables from GFF3 formatted data.

v1.0.2 (svn r91)
	- Bug fixes to libs tim_file_helper and tim_db_helper
	- Bug fixes to scripts print_feature_types.pl,
	get_intersecting_features.pl, big_file2gff3.pl, graph_data.pl,
	graph_histogram.pl, graph_profile.pl

v1.0 (svn r68)
	- Initial public release of an archive. Previous versions were only
	available through SVN.