NAME
export_bgc_sql_tables.pl - Exports SQL tables of BGC data (Palantir and antiSMASH annotations)
VERSION
version 0.192540
NAME
export_bgc_sql_tables.pl - This tool exports SQL tables structuring the BGC data from antiSMASH reports and annotated with Palantir.
USAGE
$0 [options] --infiles [=] <report_paths>.../--file-table [=] <report.list>
REQUIRED ARGUMENTS
OPTIONAL ARGUMENTS
- --infiles [=] <report_paths>...
-
Paths to biosynML.xml (antiSMASH 3-4) or regions.js (antiSMASH 5) files. This option can takes multiple values.
- --file-table [=] <tsv_file>
-
TSV (Tab-Separated Values) format file to give non ambiguously the path of xml reports, proteomes and quast files. Order : xml reports (1st column), proteomes (2nd column) and quast files (3rd column). If you only want to parse xml and quast reports, you can follow this format : "biosynML.xml undef quast.tsv".
- --types [=] <str>...
-
Filter clusters on a/several specific type(s).
Types allowed: acyl_amino_acids, amglyccycl, arylpolyene, bacteriocin, butyrolactone, cyanobactin, ectoine, hserlactone, indole, ladderane, lantipeptide, lassopeptide, microviridin, nrps, nucleoside, oligosaccharide, otherks, phenazine, phosphonate, proteusin, PUFA, resorcinol, siderophore, t1pks, t2pks, t3pks, terpene.
Any combination of these types, such as nrps-t1pks or t1pks-nrps, is also allowed. The argument is repeatable.
- --taxdir [=] <dir>
-
Path to a local mirror of the NCBI Taxonomy database.
- --idm[-file] [=] <file>
-
Path to an id mapper file to retrieve the assembly accession numbers. The file should be in tabular format with accession numbers in the second column.
- --proteomes
-
Use organism proteome to predict with external pHMMs domains to include in SQL database.
- --quast
-
Create an additionnal table "Assemblies" with Quast statistics. For this option, you need to use the transposed_report.tsv output of quast and name it with the basename of your report file. For example, if you use my_org.xml, name your Quast file my_org.tsv.
- --contam-file [=] <file>
-
Add an SQL table for CheckM contamination results (tabular file). This option was devised for the interface database.
- --new-db
-
Remove the previous sql tables to start over the db.
- --db-name [=] <str>
-
Name of your database [default: bgc_db].
- --gap-filling [=] <bool>
-
Tries to find domains if gaps present in clusters [default: 1].
- --undef-cleaning [=] <bool>
-
Eliminates undef domains from antiSMASH output that can't be recovered [default: 1].
- --undef-recov [=] <bool>
-
Try to recover antismash undef domain values [default: 0].
- --evalue-threshold [=] <n>
-
E-value threshold to apply in HMMER searches [default: 1e-4].
- --cpu [=] <n>
-
Number of threads/cpus to use [default: 1].
- --version
- --usage
- --help
- --man
-
print the usual program information
AUTHOR
Loic MEUNIER <lmeunier@uliege.be>
CONTRIBUTOR
Denis BAURAIN <denis.baurain@uliege.be>
COPYRIGHT AND LICENSE
This software is copyright (c) 2019 by University of Liege / Unit of Eukaryotic Phylogenomics / Loic MEUNIER and Denis BAURAIN.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.