NAME

BioUtil::Util - Utilities for operation on data or file

Some great modules like BioPerl provide many robust solutions. However, it is not easy to install for someone in some platforms. And for some simple task scripts, a lite module may be a good choice. So I reinvented some wheels and added some useful utilities into this module, hoping it would be helpful.

VERSION

Version 2015.0105

EXPORT getopt

file_list_from_argv
get_file_list

delete_string_elements_by_indexes
delete_array_elements_by_indexes

extract_parameters_from_string
get_parameters_from_file

get_list_from_file
get_column_data

read_json_file
write_json_file

run
readable_second

check_positive_integer

filename_prefix
check_all_files_exist
check_in_out_dir 
rm_and_mkdir

run_time

SYNOPSIS

use BioUtil::Util;

SUBROUTINES/METHODS

getopt

getopt FOR ME

Example -a b -c t tt -d bb -dbtype asdfafd -test

-a: b
-c: ARRAY(0xee25e8)
-d: bb
-dbtype: asdfafd
-infmt: fasta
-test: 1

file_list_from_argv

Get file list from @ARGV. You should use this after parsing options!

When no arguments given, 'STDIN' will be added to the list, which could be further used by, e.g. FastaReader.

get_file_list

Find files/directories with custom filter, max serach depth could be specified.

Example (searching perl scripts)

my $dir   = "~";
my $depth = 2;

my $list = get_file_list(
    $dir,
    sub {
        if ( -d or /^\./i ) {  # ignore configuration file and folders
            return 0;
        }
        if (/\.pm/i or /\.pl/i) {
            return 1;
        }
        return 0;
    },
    $depth
);
print "$_\n" for @$list;

delete_string_elements_by_indexes

Delete string elements by indexes, it uses delete_array_elements_by_indexes

delete_array_elements_by_indexes

Delete array elements by given indexes.

Example:

@list = qw(a b c d e f);
@idx = (1, 2, 4);
$list2 = delete_array_elements_by_indexes(\@list, \@idx);
print "@$list2\n"; # result: a, d, f

extract_parameters_from_string

Extract parameters from string.

The regular expression is

/([\w\d\_\-\.]+)\s*=\s*([^\=;]*)[\s;]*/

Example:

# bad format, but could also be parsed
# my $s = " s = b; a=test; b_c=12 3; a.b =; b
# = asdf
# sd; ads-f = 12313";

# recommended
my $s = "key1=abcde; key2=123; conf.a=file; conf.b=12; ";

my $pa = extract_parameters_from_string($s);
print "=$_:$$p{$_}=\n" for sort keys %$pa;

get_parameters_from_file

Get parameters from a file. Comments start with # are allowed in file.

Example:

my $pa = get_parameters_from_file("d.txt");
print "$_: $$pa{$_}\n" for sort keys %$pa;

For a file with content:

# cell phone 
apple = 1 # note

nokia = 2 #

output is:

apple: 1
nokia: 2

get_list_from_file

Get list from a file. Comments start with # are allowed in file.

Example:

my $list = get_list_from_file("d.txt");
print "$_\n" for @$list;

For a file with content:

# cell phone 
apple # note

nokia

output is:

apple
nokia

get_column_data

Get one column of a file.

Example:

my $list = get_column_data("d.txt", 2);
print "$_\n" for @$list;

read_json_file

Read json file and decode it into a hash ref.

Example:

my $hashref = read_json_file($file);

write_json_file

Write a hash ref into a file.

Example:

my $hashref = { "a" => 1, "b" => 2 };
write_json_file($hashref, $file);

run

Run a command

Example:

my $fail = run($cmd);
die "failed to run:$cmd\n" if $fail;

readable_second

readable_second

Example:

print readable_second(11312314),"\n"; # 130 day 22 hour 18 min 34 sec

check_positive_integer

Check Positive Integer

Example:

check_positive_integer(1);

filename_prefix

Get filename prefix

Example:

filename_prefix("test.fa"); # "test"
filename_prefix("tmp");     # "tmp"

check_all_files_exist

Check whether all files existed.

check_in_out_dir

Check in and out directory.

Example:

check_in_out_dir("~/dir", "~/dir.out");

rm_and_mkdir

Make a directory, remove it firstly if it exists.

Example:

rm_and_mkdir("out")

run_time

Run a subroutine with given arguments N times, and return the mean and stdev of time.

Example:

my $read_by_record = sub {
    my ($file) = @_;
    my $next_seq = FastaReader($file);
    while ( my $fa = &$next_seq() ) {
        my ( $header, $seq ) = @$fa;
        # print ">$header\n$seq\n";
    }
};

my ($mean, $stdev) = run_time( 8, $read_by_record, $file );
printf STDERR "\n## Compute time: %0.03f ± %0.03f s\n\n", $mean, $stdev;

1 POD Error

The following errors were encountered while parsing the POD:

Around line 664:

Non-ASCII character seen before =encoding in '±'. Assuming UTF-8