NAME

Image::Libpuzzle - Perl interface to libpuzzle.

SYNOPSIS

use Image::Libpuzzle; 

my $pic1 = q{pics/luxmarket_tshirt01.jpg};
my $pic2 = q{pics/luxmarket_tshirt01_sal.jpg};

my $p1 = Image::Libpuzzle->new;
my $p2 = Image::Libpuzzle->new;

my $sig1 = $p1->fill_cvec_from_file($pic1);
my $sig2 = $p2->fill_cvec_from_file($pic2);

# contrived example to show the setting of some parameters that affect the signature

foreach my $i ( 11, 9, 7, 5 ) {
  foreach my $j ( 2.0, 1.0, 0.5 ) {
    print "Lambda: $i, p ratio: $j\n";

    # set some params for sig1
    $p1->set_lambdas($i);
    $p1->set_p_ratio($j);

    # get signature for pic1
    $sig1 = $p1->fill_cvec_from_file($pic1);

    # set same params for sig2
    $p2->set_lambdas($i);
    $p2->set_p_ratio($j);

    # get signature for pic2
    $sig2 = $p2->fill_cvec_from_file($pic2);

    # stringify sig1
    my $string1 = $p1->signature_as_string;
    print qq{$string1\n};

    # stringify sig2
    my $string2 = $p2->signature_as_string;
    print qq{$string2\n};

    # generate a "document" of ngrams from sig1
    my $words1_ref = $p1->signature_as_ngrams; # defaults to $ngram size of $Image::Libpuzzle::DEFAULT_NGRAM_SIZE
    print join ' ', @$words1_ref;

    # generate a "document" of ngrams from sig2
    my $words2_ref = $p2->signature_as_ngrams(6); # example overriding $Image::Libpuzzle::DEFAULT_NGRAM_SIZE 
    print join ' ', @$words2_ref;

    # print Euclidean length of sig1
    printf("\nEuclidean length: %f",$p1->vector_euclidean_length);

    # print Euclidean length of sig2
    printf("\nDiff with \$p2: %f", $p1->vector_normalized_distance($p2));

    # compare images with a helper method
    printf("\nCompare 1: Is %s",($p1->is_most_similar($p2)) ? q{most similar} : q{not most similar});
    print "\n";

    # compare images directly
    printf("\nCompare 2: Is %s",( $p1->vector_normalized_distance($p2) < $Image::Libpuzzle::PUZZLE_CVEC_SIMILARITY_LOWER_THRESHOLD ) ? q{most similar} : q{not most similar});
    print "\n";
    print "\n\n";
  }
}

DESCRIPTION

This XS module provdes access to the most common functionality provided by Libpuzzle, http://www.pureftpd.org/project/libpuzzle.

It also includes some pure Perl helper methods users of Libpuzzle might find helpful when creating applications based on it.

This module is in its very early form. It may change without notice. If a feature is missing, please request it at https://github.com/estrabd/p5-puzzle-xs/issues.

NOTES ON USING LIBPUZZLE

Below are some brief notes on how to use this module in order to get the most out of the underlying Libpuzzle library.

Comparing Images

Libpuzzle presents a robust, fuzzy way to compare the similarity of images. Read more about the technique in the paper that describes it,

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.2585&rep=rep1&type=pdf

Working With Signatures

Signatures are typically not printable date, so one may either use the native Libpuzzle methods to work with them, such as vector_euclidean_length and vector_normalized_distance.

Image::Libpuzzle provides two methods for generating signatures in a printable form that may be used to deal with signatures in a more printable way, signature_as_string and signature_as_ngrans. See below for more details.

Comparing Millions of Images

This Stack Overflow URL seems to be the best resources for addressing this question:

http://stackoverflow.com/questions/9703762/libpuzzle-indexing-millions-of-pictures

The Image::Libpuzzle::signature_as_ngrams methods may be used to generate ngrams (words of size N) for use with the oft suggested approach to searching for similar images in a database of signatures.

Working With Compressed Signatures

Working with compressed signatures is not currently supported in this module, but may be added in the future if there is demand.

XS METHODS AND SUBROUTINES

new

Constructor, returns a Image::Libpuzzle reference.

get_cvec

Returns a Image::Libpuzzle::Cvec reference, currently one can't do much with this.

fill_cvec_from_file(q{./path/to/image})

Generates the signature for the given file.

get_signature

Returns the signature in an unprintable form.

set_lambdas($integer)

Wrapper around Libpuzzle's function. Sets the number of samples taken for each image.

The default is set in puzzle.h is 9; i.e., by default, pictures are divided in 9 x 9 blocks.

puzzle_set(3) says,

For large databases, for complex images, for images with a lot of text or for sets of near-similar images, it might be better to raise that value to 11 or even 13

However, raising that value obviously means that vectors will require more storage space.

The lambdas value should remain the same in order to get comparable vectors. So if you pick 11 (for instance), you should always use that value for all pictures you will compute a digest for. puzzle_set_p_ratio()

The average intensity of each block is based upon a small centered zone.

The "p ratio" determines the size of that zone. The default is 2.0, and that ratio mimics the behavior that is described in the reference algorithm.

For very specific cases (complex images) or if you get too many false positives, as an alternative to increasing lambdas, you can try to lower that value, for instance to 1.5.

The lowest acceptable value is 1.0.

set_p_ratio($double)

Wrapper around Libpuzzle's function. Sets the size of the samples. Used in conjunction with set_lambdas to get more or less precise signatures.

puzzle_set(3) says,

The "p ratio" determines the size of that zone. The default is 2.0, and that ratio mimics the behavior that is described in the reference algorithm.

set_max_width($integer)

Wrapper around Libpuzzle's function.

puzzle_set(3) says,

In order to avoid CPU starvation, pictures won't be processed if their width or height is larger than 3000 pixels.

set_max_height($integer)

Wrapper around Libpuzzle's function.

See set_max_width.

set_noise_cutoff($integer)

Wrapper around Libpuzzle's function.

puzzle_set(3) says,

The noise cutoff defaults to 2. If you raise that value, more zones with little difference of intensity will be considered as similar.

Unless you have very specialized sets of pictures, you probably don't want to change this.

set_autocrop(1|0)

Wrapper around Libpuzzle's function.

puzzle_set(3) says,

By default, featureless borders of the original image are ignored. The size of each border depends on the sum of absolute values of differences between adjacent pixels, relative to the total sum.

That feature can be disabled with puzzle_set_autocrop(0) Any other value will enable it.

puzzle_set_contrast_barrier_for_cropping() changes the tolerance. The default value is 5. Less shaves less, more shaves more.

puzzle_set_max_cropping_ratio() This is a safe-guard against unwanted excessive auto-cropping.

The default (0.25) means that no more than 25% of the total width (or height) will ever be shaved.

set_contrast_barrier_for_cropping($integer)

Wrapper around Libpuzzle's function.

See set_autocrop for details.

set_max_cropping_ratio($double)

Wrapper around Libpuzzle's function.

See set_autocrop for details.

vector_euclidean_length()

Wrapper around Libpuzzle's function. Returns a length value for the signature, used when computing distances between two images in vector_normalized_distance.

vector_normalized_distance(Image::Libpuzzle $instance2)

Returns the computed distance between two Image::Libpuzzle instances.

my $distance = $instance1->vector_normalized_distance($instance2);

is_similar(Image::Libpuzzle $instance2)

Convenience methods, compares images using PUZZLE_CVEC_SIMILARITY_THRESHOLD

is_very_similar(Image::Libpuzzle $instance2)

Convenience methods, compares images using PUZZLE_CVEC_SIMILARITY_LOW_THRESHOLD

is_most_similar(Image::Libpuzzle $instance2)

Convenience methods, compares images using PUZZLE_CVEC_SIMILARITY_LOWER_THRESHOLD

PUZZLE_VERSION_MAJOR()

Returns constant defining major version.

PUZZLE_VERSION_MINOR()

Returns constant defining minor version.

PUZZLE_CVEC_SIMILARITY_THRESHOLD()

Returns constant defining the average normalized distance cutoff for considering two images as similar. Used by is_similar.

PUZZLE_CVEC_SIMILARITY_HIGH_THRESHOLD()

Returns constant defining the upper limit normalized distance cutoff for considering two images as similar. Must be used directly.

PUZZLE_CVEC_SIMILARITY_LOW_THRESHOLD()

Returns constant defining more precise normalized distance cutoff for considering two images as similar. Used by is_very_similar.

PUZZLE_CVEC_SIMILARITY_LOWER_THRESHOLD()

Returns constant defining the most precise normalized distance cutoff for considering two images as similar. Used by is_most_similar.

Pure Perl METHODS AND SUBROUTINES

signature_as_string()

Returns a stringified version of the signature. The string is generated by unpack'ing into an array of ASCII characters (C*). Before the array of character codes is joined into a string, they are padded. For example, 1 turns into 001; 25 turns into 025; 211 remains the same.

signature_as_ngrams()

Takes the output of signature_as_string and returns an ARRAY ref of words of size $ngram_size. The default, $DEFAULT_NGRAM_SIZE is set to 10. An optional argument may be passed to override this default.

The paragraph of ngrams is constructed in a method consistent with the one described in the following link:

"/stackoverflow.com/questions/9703762/libpuzzle-indexing-millions-of-pict ures" in http:

ENVIRONMENT

This module assumes that libpuzzle is installed and puzzle.h is able to be found in a default LIBRARY path.

Libpuzzle is available via most Ports/package repos. It also builds easily, though it requires libgd.so.

Also see, http://www.pureftpd.org/project/libpuzzle.

Package Variables

There also exist corresponding methods to return these. Changing these package variables affects nothing at this time.

Image::Libpuzzle::PUZZLE_VERSION_MAJOR
Image::Libpuzzle::PUZZLE_VERSION_MINOR
Image::Libpuzzle::PUZZLE_CVEC_SIMILARITY_THRESHOLD
Image::Libpuzzle::PUZZLE_CVEC_SIMILARITY_HIGH_THRESHOLD
Image::Libpuzzle::PUZZLE_CVEC_SIMILARITY_LOW_THRESHOLD
Image::Libpuzzle::PUZZLE_CVEC_SIMILARITY_LOWER_THRESHOLD

BUGS AND LIMITATIONS

Please report them via https://github.com/estrabd/p5-puzzle-xs/issues.

AUTHOR

Brett Estrade <estrabd@gmail.com>

THANK YOU

My good and ridiculously smart friend, Xan Tronix (~xan), helped me patiently as I was working through n00b XS bits during the writing of this module.

COPYRIGHT AND LICENSE

Copyright (C) 2015 by B. Estrade

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.4 or, at your option, any later version of Perl 5 you may have available.