NAME

Chemistry::FormulaPattern - Match molecule by formula

SYNOPSIS

use Chemistry::FormulaPattern;

# somehow get a bunch of molecules...
use Chemistry::File::SDF;
my @mols = Chemistry::Mol->read("file.sdf");

# we want molecules with six carbons and 8 or more hydrogens
my $patt = Chemistry::FormulaPattern->new("C6H8-");

for my $mol (@mols) {
    if ($patt->match($mol)) {
        print $mol->name, " has a nice formula!\n";
    }
}

# a concise way of selecting molecules with grep
my @matches = grep { $patt->match($mol) } @mols;

DESCRIPTION

This module implements a simple language for describing a range of molecular formulas and allows one to find out whether a molecule matches the formula specification. It can be used for searching for molecules by formula, in a way similar to the NIST WebBook formula search (http://webbook.nist.gov/chemistry/form-ser.html). Note however that the language used by this module is different from the one used by the WebBook!

Chemistry::FormulaPattern shares the same interface as Chemistry::Pattern. To perform a pattern matching operation on a molecule, follow these steps.

1) Create a pattern object, by parsing a string. Let's assume that the pattern object is stored in $patt and that the molecule is $mol.

2) Execute the pattern on the molecule by calling $patt->match($mol).

If $patt->match returns true, there was a match. If $patt->match is called two consecutive times with the same molecule, it returns false; then true (if there is a match), then false, etc. This is because the Chemistry::Pattern interface is designed to allow multiple matches for a given molecule, and then returns false when there are no further matches; in the case of a formula pattern, there is only one possible match.

$patt->match($mol); # may return true
$patt->match($mol); # always false
$patt->match($mol); # may return true
$patt->match($mol); # always false
# ...

This allows one two use the standard while loop for all kinds of patterns without having to worry about endless loops:

# $patt might be a Chemistry::Pattern, Chemistry::FormulaPattern,
# or Chemistry::MidasPattern object
while ($patt->match($mol)) {
    # do something
}

Also note that formula patterns don't really have the concept of an atom map, so $patt->atom_map and $patt->bond_map always return the empty list.

FORMULA PATTERN LANGUAGE

In the simplest case, a formula pattern may be just a regular formula, as used by the Chemistry::File::Formula module. For example, the pattern "C6H6" will only match molecules with six carbons, six hydrogens, and no other atoms.

The interesting thing is that one can also specify ranges for the elements, as two hyphen-separated numbers. "C6H8-10" will match molecules with six carbons and eight to ten hydrogens.

Ranges may also be open, by omitting the upper part of the range. "C6H0-" will match molecules with six carbons and any number of hydrogens (i.e., zero or more).

A formula pattern may also allow for unspecified elements by means of the asterisk special character, which can be placed anywhere in the formula pattern. For example, "C2H6*" (or "C2*H6, etc.) will match C2H6, and also C2H6O, C2H6S, C2H6SO, etc.

Ranges can also be used after a subformula in parentheses: "(CH2)1-2" will match molecules with one or two carbons and two to four hydrogens. Note, however, that the "structure" of the bracketed part of the formula is forgotten, i.e., the multiplier applies to each element individually and does not have to be an integer. That is, the above pattern will match CH2, CH3, CH4, C2H2, C2H3, and C2H4.

VERSION

0.10

SEE ALSO

Chemistry::Pattern

The PerlMol website http://www.perlmol.org/

AUTHOR

Ivan Tubert-Brohman <itub@cpan.org>

COPYRIGHT

Copyright (c) 2004 Ivan Tubert-Brohman. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.