NAME
inst-abbr-ids.pl - Abbreviate seq ids in FASTA files (optimized)
VERSION
version 0.242020
USAGE
inst-abbr-ids.pl <infiles> --id-regex=<str> [optional arguments]
REQUIRED ARGUMENTS
- <infiles>
-
Path to input FASTA files [repeatable argument].
- --id-regex=<str>
-
Regular expression for capturing the original seq id.
The argument value can be either a predefined regex or a custom regex given on the command line (do not forget to escape the special chars then). The following predefined regexes are available (assuming a leading '>'):
- :DEF (first stretch of non-whitespace chars) - :GI (number nnn in gi|nnn|...) - :GNL (string xxx in gnl|yyy|xxx) - :JGI (number nnn in jgi|xxx|nnn or jgi|xxx|nnn|yyy) - :PAC (number nnn in xxx|PACid:nnn)
OPTIONAL ARGUMENTS
- --outdir=<dir>
-
Optional output dir that will contain the abbreviated FASTA files (will be created if needed) [default: none]. Otherwise, output files are in the same directory as input files.
- --id-prefix-mapper=<file>
-
Path to an optional IDM file explicitly listing the infile => prefix pairs. Useful in the context of processing multiple input files. This argument and the next one (
--id-prefix
) can be both specified together. In such a case, however, a single pipe char is appended to the combined prefix. - --id-prefix=<str>
-
String to use as the seq id prefix (e.g., NCBI taxon id, 4-letter code) [default: none].
- --store-id-mapper
-
Store the IDM file corresponding to each output file [default: no].
- --version
- --usage
- --help
- --man
-
Print the usual program information
AUTHOR
Denis BAURAIN <denis.baurain@uliege.be>
CONTRIBUTOR
Valerian LUPO <valerian.lupo@doct.uliege.be>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by University of Liege / Unit of Eukaryotic Phylogenomics / Denis BAURAIN.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.