NAME

physeter.pl - Taxonomic parser for BLAST reports

VERSION

version 0.213470

USAGE

                                             |||||\\\\\\\\\\\\\\\\\|||
                                        ||\\\\||||                |||\\\||
                                   ||\\\|||                            |\\\
                                |\\\|        ||||||              ||\\\\\\@@@\|       |||||
      @                      |\\\|        |\\\\\\\\\\\||       \\\\@@@@@@@@@@@\\  |\\||||\\|
     \@@|               \|||\\|    \|   |\\\|        |\\\\    \\@@@@@@@@@@@@@@@@\\\       |\\
      \@@@\|            \\||\      |\| \\\             |\\\\\\|\@@@@@@@@@@@@@@@@@@|         \\|
      \@@@@@\|         |\||\|        \\\\     ||\\\\\\\\\\@@@\\@@@@@@@@@@@@@@@@@@@\    |\\\\\\\\\\
     \@@@@@@@@\       \\||           |\\\|   \\\|      @@@@@\\@@@@@@@@@@@@@@@@@@@@@|  |\\   |\\
    \@@@@@@@@@@\|                    \\||\\\\\|   @@@@@@\\\\\@@@@@@@@@@@@@@@@@@@@@\   \\     \\|
    \\@@@@@@@@@@@@@@\\               \\|      @@@@@@@@\\\@@@@@@@@@@@@@@@@@@@@@@@@\\  |\\      \\
 \\@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\\@@@@@@@@@@@@@\\@@@@@@@@@@@@@@@@@@@@@@@@\\||\\\\     |\\|
 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|@@@@@@@@@@@@\\@@@@@@@@@@@@@| |@@@@@\\\\|||\\\\      |\\|
|@@@@@@@@\\| |\\@@@@@@@@@@@@@@@@@@@@\\@@@@@@@@@@@@\@@@@@@@@@|@@@@@@@@\\||||@@@@@@@\     \\
\@@@@@\\|       \\@@@@@@@@@@@@@@@@@\\@@@@@@@@@@@@@@@@@@@@@@@\|@@@@@@\\\\\\\@@@@@\      \\
\@@@\\|            \\@@@@@@@@@@@@@@\\@@@@@@@@@@@@@@@@@@@@@@@@\\@@@@@\||\||||\|          \\
                    \\@@@@@@@@@@@\\@@@@@@@@@@@@@@@@@@@@@@@@@@\\\@\\\|\\|||\             \
                      |\@@@@@@@\\\@@@@@@@@@@@@@@@@@@@@@@@@@@@\\|| |\|\|||\|
                         |@@@\\\@@@@@@@@@@@@@@@@@@@@@@@@\\\\      |\|\||\|
                      |\\\\\\\||@@@@@@@@@@@@@@@@@@@@@\|  |\|       \|\||\
                    |\\\\||                    \@@@\|   |\|        \|\|\
                                               |@@\   |\\        |\\\\\\|
                                               \@     \|         \||\\||\
                                                     |\          |\|\||\
                                                                  |\\\|
                                                                   \\

   physeter.pl <infiles> --outfile=<file> --taxdir=<dir> --taxon-list=<file> \
       [optional arguments]

REQUIRED ARGUMENTS

<infiles>

Path to input BLAST report files [repeatable argument].

Report files must be named <assembly_accession.blastx> unless the organism does not have an assembly accession. In the latter case, use the --exp-tax option to provide the expected taxonomy of the organism.

--outfile=<file>

Path to output file.

--taxdir=<dir>

Path to local mirror of the NCBI Taxonomy database.

--taxon-list=<file>

List of taxa to consider when looking for foreign sequences. This labeler file is used throughout the program to truncate LCA lineages at specific taxonomic levels (which can vary from one lineage to the other).

OPTIONS

--threads=<n>

Number of threads to run in parallel [default: n.default]. Parallelization is achieved by processing several BLAST files in parallel using an internal queue. Therefore, the specified number of threads should not be larger than the number of input BLAST files.

--fasta-dir=<dir>

Path to the directory holding the FASTA query files [default: dir.default]. FASTA files must have the same basenames as BLAST infiles.

--exp[ected]-tax[on]=<string>

Organism taxon [default: automatic]. Use this option when the organism does not have an assembly accession. The specified taxon must be in the file provided with the --taxon-list option.

--auto-detect

Determine organism taxon based on BLAST infile [default: no].

--greedy-taxa

Enable greedy behavior when interpreting the ambiguous taxa provided in the required argument --taxon-list [default: no].

--tax-min-ident=<n>

Minimum identity percentage to consider a hit when computing a LCA [default: n.default].

--tax-min-len=<n>

Minimum alignment length to consider a hit when computing a LCA [default: n.default].

--tax-min-score=<n>

Minimum bit score to consider a hit when computing a LCA [default: n.default].

--tax-score-mul=<n>

Bit score reduction allowed when accumulating hits for LCA inference (MEGAN-like algorithm) [default: n.default].

--tax-min-hits=<n>

Minimum number of hits to use when computing LCAs [default: n.default]. Must be lower or equal to the next optional argument (--tax-max-hits).

--tax-max-hits=<n>

Maximum number of hits to use when computing LCAs [default: n.default]. Must be greater or equal to the previous optional argument (--tax-min-hits).

--tax-min-lca-freq=<n>

Minimum frequency for the common taxon when computing LCA [default: n.default]. When specified and lower than 1.0, the LCA inference algorithm returns the lowest taxon that is found in at least this fraction of lineages (instead of returning the lowest taxon found in all lineages).

--kraken

Write KRAKEN-like report file [default: no].

--anvio

Write ANVIO-like report file [default: no].

--krona

Write KRONA-compatible report file [default: no].

--lca

Write LCA report file (including the lineage of each query) [default: no].

--kfold=<file>

Enable the k-fold mode [default: no]. The provided file must contain the list of the NCBI GCA/GCF accessions of all the genomes composing the complete reference database (one accession per line).

--kfold-seed=<n>

Seed for the random number generator [default: none]. Use this to obtain predictable subsets of the database in k-fold mode.

--version
--usage
--help
--man

Print the usual program information

COLOPHON

physeter.pl is based on the script originally developed by Luc CORNET and Denis BAURAIN for the article Cornet, L., et al. (2018). "Consensus assessment of the contamination level of publicly available cyanobacterial genomes." PLoS One 13(7): e0200323. [PubMed]. The code was first completely rewritten by Valerian LUPO to use the Bio::MUST modules and then further reviewed and refactored by D. BAURAIN. Mick VAN VLIERBERGHE greatly contributed to the taxonomic methods offered by Bio::MUST modules.

AUTHOR

Denis BAURAIN <denis.baurain@uliege.be>

CONTRIBUTORS

  • Valerian LUPO <valerian.lupo@doct.uliege.be>

  • Mick VAN VLIERBERGHE <mvanvlierberghe@doct.uliege.be>

  • Luc CORNET <luc.cornet@uliege.be>

COPYRIGHT AND LICENSE

This software is copyright (c) 2020 by University of Liege / Unit of Eukaryotic Phylogenomics / Denis BAURAIN.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.