NAME
Bio::ToolBox::utility - common utility functions for Bio::ToolBox
DESCRIPTION
These are general subroutines that don't fit in with the other modules.
REGULAR SUBROUTINES
The following subroutines are automatically exported when you use this module.
- parse_list
-
my $index_request = '1,2,5-7'; my @indices = parse_list($index_request); # returns [1,2,5,6,7]
This subroutine parses a scalar value into a list of values. The scalar is a text string of numbers (usually column or dataset indices) delimited by commas and/or including a range. For example, a string "1,2,5-7" would become an array of [1,2,5,6,7].
Pass the module the scalar string.
It will return the array of numbers.
- format_with_commas
-
my $count = '4327908475'; printf " The final count was %s\n", format_with_commas($count);
This subroutine process a large number (e.g. 4327908475) into a human-friendly version with commas delimiting the thousands (4,327,908,475).
Pass the module a scalar string with a number value.
It will return a scalar value containing the formatted number.
- ask_user_for_index
-
my @answers = ask_user_for_index($Data, 'Please enter 2 or more columns ');
This subroutine will present the list of column names from a Bio::ToolBox::Data structure along with their numeric indexes to the user and prompt for one or more to be selected and entered. The function is smart enough to only print the list once (if it hasn't changed) so as not to annoy the user with repeated lists of header names when used more than once. A text prompt should be provided, or a generic one is used. The list of indices are validated, and a warning printed for invalid responses. The responses are then returned as a single value or array, depending on context.
- simplify_dataset_name
-
my $simple_name = simplify_dataset_name($dataset);
This subroutine will take a dataset name and simplify it. Dataset names may often be file names of data files, such as Bam and bigWig files. These may include a
file:
,http:
, orftp:
prefix, one or more directory paths, and one or more file name extensions. Additionally, more than one dataset may be combined, for example two stranded bigWig files, with an ampersand. This function will safely remove the prefix, directories, and everything after the first period. - sane_chromo_sort
-
my @chromo = $db->seq_ids; my @sorted = sane_chromo_sort(@chromo);
This subroutine will take a list of chromosome or sequence identifiers and sort them into a reasonably sane order: standard numeric identifiers first (numeric order), sex chromosomes (alphabetical), mitochondrial, names with text and numbers (text first alphabetically, then numbers numerically) for contigs and such, and finally anything else (aciibetically). Any 'chr' prefix is ignored. Roman numerals are properly handled numerically. Chromosome arms with a L, R, p, or q suffix, such as with Drosophila chromosomes, are handled appropriately. Any chromosome name consisting entirely of more than 2 digits is considered a scaffold.
The provided list may be a list of SCALAR values (chromosome names) or ARRAY references, with the first element assumed to be the name, e.g.
[$name, $length]
. Length is not considered here.
AUTHOR
Timothy J. Parnell, PhD
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.