NAME

Chemistry::OpenSMILES - OpenSMILES format reader

SYNOPSIS

use Chemistry::OpenSMILES::Parser;

my $parser = Chemistry::OpenSMILES::Parser->new;
my @moieties = $parser->parse( 'C#C.c1ccccc1' );

$\ = "\n";
for my $moiety (@moieties) {
    #  $moiety is a Graph::Undirected object
    print scalar $moiety->vertices;
    print scalar $moiety->edges;
}

DESCRIPTION

Chemistry::OpenSMILES provides support for SMILES chemical identifiers conforming to OpenSMILES v1.0 specification (<http://opensmiles.org/opensmiles.html>).

Chemistry::OpenSMILES::Parser reads in SMILES strings and returns them parsed to arrays of Graph::Undirected objects. Each atom is represented by a hash. The parser does not have any chemical inference heuristics, thus it plainly returns properties which it gets from the SMILES descriptor. That means numbers of implicit hydrogens and standard aromaticity representation are left for the user to derive.

Molecular graph

Disconnected parts of a compound are represented as separate Graph::Undirected objects. Atoms are represented as vertices, and bonds are represented as edges.

Atoms

Atoms, or vertices of a molecular graph, are represented as hash references:

{
    "symbol"    => "C",
    "isotope"   => 13,
    "chirality" => "@@",
    "hcount"    => 3,
    "charge"    => "+",
    "class"     => 0,
    "number"    => 0,
}

Except for symbol, class and number, all keys of hash are optional. Per OpenSMIILES specification, default value for class is 0.

Bonds

Bonds, or edges of a molecular graph, rely completely on Graph::Undirected internal representation. Bond orders other than sinlge (-, which is also a default) are represented as values of edge attribute bond. They correspond to the symbols used in OpenSMILES specification.

CAVEATS

Element symbols in square brackets are not limited to the ones known to chemistry. Currently any single or two-letter symbol is allowed.

Deprecated charge notations (-- and ++) are supported.

OpenSMILES specification mandates a strict order of ring bonds and branches:

branched_atom ::= atom ringbond* branch*

Chemistry::OpenSMILES::Parser supports both the mandated, and inverted structure, where ring bonds follow branch descriptions.

Whitespace is not supported yet. SMILES descriptors must be cleaned of it before attempting reading with Chemistry::OpenSMILES::Parser.

SEE ALSO

perl(1)

AUTHORS

Andrius Merkys, <merkys@cpan.org>