NAME

Bio::Palantir - core classes and utilities for Bio::Palantir

VERSION

version 0.211420

SYNOPSIS

use Bio::Palantir;

# open and parse biosynML.xml or regions.js antiSMASH report
my $infile = 'biosynML.xml'; 
my $report = Bio::Palantir::Parser->new( file => $infile );

# get main container
my $root = $report->root;

# explore Biosynthetic Gene Clusters (BGCs) content

# Bio::Palantir::Parser
for my $cluster ($root->all_clusters) {     # returns all clusters say
    $cluster->type;                         # returns the cluster type (e.g., nrps)
    
    for my $gene ($cluster->all_genes) {        # returns all genes say
        $gene->name;                            # for instance, returns the gene name say $gene->genomic_coordinates;     # returns DNA gene coordinates (relative to the genome) 
        say $gene->coordinates;                 # returns protein gene coordinates (also relative to the genome) 
        say $gene->protein_sequence;              # returns the gene protein sequence 

        # if the BGC possess domains (i.e., NRPS/PKS)
        for my $domain ($gene->all_domains) {   # returns all domains
        
            say $domain->rank;                  # for instance, returns the domain in the gene 
            say $domain->function;              # returns the domain function (e.g., condensation) 
            say join '-', $domain->coordinates; # returns the coordinates (which are relative to the gene ones)
            say $domain->protein_sequence;      # returns the domain protein sequence

            # lowest level is Motifs (for antiSMASH 3 and 4)
            for my $motif ($domain->all_motifs) {
                #...
            } 
        }

    # same way for looping into Module objects 
    for my $module ($cluster->all_modules) {
        # ...
    }
}


# Bio::Palantir::Refiner
use aliased 'Bio::Palantir::Refiner';
use aliased 'Bio::Palantir::Refiner::ClusterPus';

# it is possible to create Bio::Palantir::Refiner objects from already existing Bio::Palantir::Parser ones
my @cluster_plus;

for my $cluster ($root->all_clusters) { 
    push @cluster_plus, ClusterPlus->new( _cluster => $cluster ); 
}

# but if you intend to use the Refiner part, it is more convenient to create the Refiner object directly from a file
my $report = Refiner->new( file => biosynML.xml);

for my $cluster_plus ($report->all_clusters) {
    
    say $cluster_plus->type;

    for my $gene_plus ($cluster_plus->all_genes) {

        say $gene_plus->name;

        for my $domain_plus ($gene_plus->all_domains) {
            
            say 'Palantir version:'; 
            say $domain_plus->function; 
            say $domain_plus->coordinates; 
            say $domain_plus->evalue;
            
            # compare with antiSMASH results
            say 'antiSMASH version:'; say $domain_plus->_domain->function;
            say $domain_plus->_domain->coordinates;
            # say $domain_plus->evalue; # only available for Palantir part

        } 

    }

}


# Bio::Palantir::Explorer
use aliased 'Bio::Palantir::Explorer::ClusterFasta';

# from a Bio::Palantir::Refiner object
for my $cluster_plus ($report->all_clusters) {
    
    for my $gene_plus ($report->all_genes) {

        for my $domain_exp ($gene_plus->all_exp_domains) {

            say $domain_exp->function; 
            say $domain_exp->coordinates; 
            say $domain_exp->evalue;

        }

    }

}

# from a FASTA file (containing ONLY one BGC, each sequence being interpreted as a gene from the cluster)
my $cluster_exp = ClusterFasta->new( fasta => nrps_bgc.fasta );

for my $gene_exp ($cluster_exp->all_genes) {

    for my $domain_exp ($gene_exp->all_domains) {
            
            say $domain_exp->function; 
            say $domain_exp->coordinates; 
            say $domain_exp->evalue;

    }

}

DESCRIPTION

This distribution is the base of the Bio::Palantir module collection designed as a toolbox for handling the post-processing of antiSMASH report data (https://antismash.secondarymetabolites.org) and improving in some aspects its annotation of NRPS/PKS Biosynthetic Gene Clusters (BGCs), aiming then to support small and large-scale genome mining projects.

The Palantir libraries are organized as follows:

Bio::Palantir::Parser contains classes for hierarchically storing the information of antiSMASH gene clusters.

Bio::Palantir::Refiner consists in classes (parallel to Parser) dedicated to the improvement of NRPS/PKS gene clusters parallel classes to Bio::Palantir::Parser.

Bio::Palantir::Explorer contains classes (also parallel to Parser) giving access to an exploratory version of detected domains

More information on their internal structure can be found in their respective file.

Here is the list of functionalities offered by Palantir libraries and bins:

Refinement of NRPS/PKS BGC annotations

- Dynamic elongation of the coordinates of core domains: enrich the information contained in the sequences (application examples: improved similarity searches and evolutionary approaches)

- Filling the gaps in BGC annotation: retrieve missed domains from exceptions in the rules detection (application example: resolution of ambiguous or incoherent BGC annotation)

- Module delimitation: apply biological rules to group domains in modules (application example: analyses at module scale)

- BGC visualization: visualize and compare antismash and Palantir annotations [bin/draw_clusters.pl]

- Exploratory mode visualization: visualize and design the domain architecture consensus from a raw view of all detected signatures (application example: manual curation of the domaine architecture consensus)

BGC data manipulation

- Generation of PDF/Word reports: export customizable reports of refined BGC data (application example: manual reading of numerous (filtered) BGC data)

- Extraction of sequences: export Fasta files from BGC data at different scales: cluster, gene, module, domain (application example: data formatting for downstream analyses)

- Generation of SQL tables: export SQL tables containing BGC data details (application example: large-scale queries and statistics)

AUTHOR

Loic MEUNIER <lmeunier@uliege.be>

CONTRIBUTOR

Denis BAURAIN <denis.baurain@uliege.be>

COPYRIGHT AND LICENSE

This software is copyright (c) 2019 by University of Liege / Unit of Eukaryotic Phylogenomics / Loic MEUNIER and Denis BAURAIN.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.