NAME

File::Locate::Harder - for when you're determined to use a locate db

SYNOPSIS

use File::Locate::Harder;

my $flh = File::Locate::Harder->new();
my $results_aref = $flh->locate( $search_term );


# using a defined db location, plus some locate options
my $flh = File::Locate::Harder->new( db => $db_file );
my $results_aref = $flh->locate( $search_pattern,
                                 { case_insensitive => 1,
                                   regexp           => 1,
                                 } );

# creating your own locate db, (in this example for doing tests)
use Test::More;
SKIP:
 {
   my $flh = File::Locate::Harder->new( db => $db_file );
   $flh->create_database( $path_to_tree_to_index );

   if( $flh->check_locate ) {
      my $reason = "Can't get File::Locate::Harder to work";
      skip "Can't run 'locate'", $test_count;
   }
   my $results_aref = $flh->locate( $search_term );
   is_deeply( $results_aref, $expected_aref, "Created expected files");
 }

# introspection (is it reading db directly, or shelling out to locate?)
my $report = $flh->how_works;
print "This is how File::Locate::Harder is doing locates: $report\n";

DESCRIPTION

File::Locate::Harder provides a generalized "locate" method to access the file system indexes used by the "locate" command-line utility. It is intended to be a relatively portable way for perl code to quickly ascertain what files are present on the current system.

This code is essentially a wrapper around multiple different techniques of accessing a locate database: it makes an effort to use the fastest method it can find that works.

The "locate" command is a well-established utility to find files quickly by using a special index database (typically updated via a cron-job). This module is an attempt at providing a perl front-end to "locate" which should be portable across most unix-like systems.

Behind the scenes, File::Locate::Harder silently tries many ways of doing the requested "locate" operation. If it can't establish contact with the file system's locate database, it will error out, otherwise you can be reasonably sure that a "locate" will return a valid result (including an empty set if the search turns up empty).

If possible, File::Locate::Harder will use the perl/XS module File::Locate to access the locate db directly, otherwise, it will attempt to shell out to a command line version of "locate".

If not told explicitly what locate db file to use, this module will try to find the file system's standard locate db using a number of reasonable guesses. If those all fail, as a last ditch effort, it will try shelling out to the command line "locate" without specifying a db for it (because it usually knows where to look).

Efficiency may be improved in some circumstances if you help File::Locate::Harder find the locate database, either by explicitly saying where it is (using the "db" attribute), or by setting the LOCATE_PATH environment variable. Also see the "introspection_results" method.

METHODS

new

Creates a new File::Locate::Harder object.

With no arguments, the newly created object (mostly) has attributes that are undefined. All may be set later using accessors named according to the "set_*" convention.

Inputs:

An optional hashref, with named fields identical to the names of the object attributes. The attributes, in order of likely utility:

Settings for ways to run "locate"
case_insensitive

Like the usual command-line "-i".

regexp

The search term will be interpeted as a POSIX regexp

posix_extended

The search term is a regexp with the standard POSIX extensions.

Overall settings (for "locate", "create_database", etc)
db

Locate database file, with full path. Use this to work with a non-standard location (e.g. to generate your own db via "create_database").

For internal use, testing, and so on:

The following items are lists used in the probing process which determines what works on the current system. These lists are defined with hardcoded defaults that will normally remain untouched, though are sometimes over-ridden for testing purposes.

locate_db_location_candidates

Likely places for a locate db.

test_search_terms

Common terms on perl/unix systems

The following are status fields where the results of system probing are stored. The user not will normally be uninterested in these, though see "introspection_results" for a hint about performance improvements in repeated runs.

system_db_not_found

Could not find where the standard locate db is.

use_shell_locate

Shell out to locate and forget about using File::Locate

shell_locate_failed

So don't try probe_db_via_shell_locate again

shell_locate_cmd_idx

Integer: controls the choice of syntax of the locate shell cmd

init

Method that initializes object attributes and then locks them down to prevent accidental creation of new ones.

Not of interest to client coders, though inheriting code should have an init of it's own that calls this one.

locate

Simple interface to performs the actual "locate" operation in a robust, reliable way. Uses the locate db file indicated by the object's "db" attribute (which is set automatically if not manually overridden).

Input:

A term to search for in the file name or path.

Return:

An array reference of matching files with full paths.

create_database

Tries to create the locate database file indicated in the object data, indexing the tree indicated by a path given as an argument. An optional second argument allows specifying a db file to overide the object's setting.

Returns false (0) on failure.

introspection

check_locate

Returns true (1) if this module's 'locate' method is capable of working.

how_works

Returns a report on how this module has been doing "locate" operations (e.g. via the shell or the File::Locate module, and using which db).

introspection_results

Returns a hashref of the results of File::Locate::Harder's probing of the system's "locate" setup, so that it can be easily used again without re-doing that work.

Example:

my $settings_href = $flh1->introspection_results;

# save    $settings_href somehow (e.g. dump to yaml file)
# restore $settings_href somehow

my $flh2 = File::Locate::Harder->new( $settings_href );
shell_locate_version

Tries to determine the version of the shell's "locate" command.

This will work only with the GNU locate and Secure Locate variants, not the Free BSD.

Returns the version string on success, otherwise 0.

special purpose methods (usually, though not exclusively, for internal use)

locate_via_module

Uses the perl/XS module File::Locate to perform a locate operation on the given search term, using the db file indicated by the object's db attribute.

An optional second argument allows passing in a coderef, an anonymous routine that operates on each match (the match value is set to $_): this makes it possible to work with a large result without storing the entire set in memory.

Uses the three object attribute toggles ("case_insensitive", </"regexp">, </"posix_extended">) to control the way locate is performed.

locate_via_shell

Given a search term returns an array reference of matches found from a "locate" search.

An optional second argument containing the locate command's "options string" (e.g. "-i", "-r", "-re", etc) may be passed in (otherwise it is generated from object data).

This method uses object data settings: "db", "shell_locate_cmd_idx"

And indirectly (via "build_opts_for_locate_via_shell"): "case_insensitive", "regexp", "posix_extended"

methods largely for internal use

determine_system_db

Internally used routine (called only by the </"db"> accessor): looks for a useable system-wide locate db.

probe_db_via_module_locate

Looks to see if it can find anything in the given db by using the File::Locate module.

probe_db_via_shell_locate

Tries the series of standard test searches by shelling out to the command-line form of locate to make sure that it can be used.

Tries to use the locate db file indicated by the objects "db" attribute, but this can be over-ridden with an optional argument.

Under some circumstances, the db may remain undefined, but this method will return "1" for success if it appears that command-line locate works in any case.

As a side-effect, saves the "shell_locate_cmd_idx" that indicates a form of the locate command that has been observed to work.

Returns: undef for failure, and for success either the db or 1 (because locate can work even if this code can't figure out what db file it's using).

generate_locate_cmd

Given an ordered list of parameters, returns a from of the locate command which can (in theory) be fed to the shell. In practice these different forms are expected to fail (some harder than others) on various different platforms, so some experimentation may be needed to find a form that works.

Special case:

with no arguments (actually, with $cmd_idx undefined) returns the count of avaliable command forms minus 1 ($#cmd_forms);

Inputs:

$cmd_idx: integer index (beginning with 0) that chooses the
          form of a command to return.

$search_term: string (or possibly regexp) to search for.

$db: full path to the locate db to search.

$opt_str: options string, defaults to values generated by
          build_opts_for_locate_via_shell

Example usage:

for ($i=0; $i<=$self->generate_locate_cmd; $i++) {
   my $locate_cmd =
     $self->generate_locate_cmd( $cmd_idx, $search_term, $db, $opt_str );
   my @result = `$locate_cmd 2 > /dev/null `;
   if ( scalar(@result) > 0 ) {
     return $i;
   }
}

Note: the various forms of locate are discussed below in "locate shell command"

build_opts_for_locate_via_shell

Converts the three object attribute toggles ("case_insensitive", </"regexp">, </"posix_extended">) into the command-line options string for locate.

build_opts_for_locate_via_module

Converting three object attribute toggles ("case_insensitive", </"regexp">, </"posix_extended">) into the form that the File::Locate::locate requires: returns an array.

basic setters and getters

db

Getter for object attribute system_db

This is a magic getter that does initilization of the value if it's not defined already.

automatic accessor generation

AUTOLOAD

Platforms

It's likely that this package will work on any unix-like system (including cygwin), though on some there might be a need for additional installation and setup (e.g. a "findutils" package).

Development was done on two varieties of linux (aka GNU/linux): Knoppix (32bit) on a Turion and Kubuntu on an Opteron machine. This covered two major varieties of the "locate" command: GNU locate and Secure Locate.

A serious attempt was made to support BSD locate on Freebsd, but the testing has not been completed.

Note: at present the File::Locate module appears to fail silently on 64bit platforms, so there the command-line shell locate will always be used.

MOTIVATION

This module uses File::Locate, which is a a perl XS interface to read locate (or slocate) dbs without shellling out to the command-line "locate" program.

File::Locate has one great limitation: it must be told which locate db to use (by explicit parameter, or by environment variable), it has no notion of a default location. Further, as of this writing, it appears to be limited to 32bit systems.

This module then is a wrapper around File::Locate that tries a number of common locations for the locate database, and instead of just giving up, it also tries the command-line locate, which has it's own ways of knowing where the database can be (configuration file, compiled-in default, or command-line parameter).

The intention here is to make this module as portable as possible... it might, for example, be useful to use in portable CPAN modules that need to look for things in the filesystem.

(As a case in point: the job of File::Locate::Harder would be a lot easier if it could use "locate" to find the locate db...).

Additional Examples

forcing locate via File::Locate module or via shell command

my $flh = File::Locate::Harder->new();
$result_via_module = $flh->locate_via_module( $term );
$result_via_shell  = $flh->locate_via_shell(  $term );

using the coderef feature of the File::Locate module

my $count = 0;
$flh->locate_via_module( $term, sub { $count++ } );
print "There are $count matches of $term\n";


$flh->locate_via_module( $term,
        sub { $count++ if $_ =~ m{ ^ /home }x } );
print "There are $count matches of $term located in /home\n";

speeding up multiple searches if you know you're using shell locate

This reduces the number of calls to build_opts_for_locate_via_shell:

my @searches = qw( .bashrc .bash_profile .emacs default.el );
my $flh = File::Locate::Harder->new();
my $opt_str = $self->build_opts_for_locate_via_shell;
foreach my $term (@searches) {
  $result_via_shell  = $flh->locate_via_shell( $term, $opt_str );
}

SEE ALSO

File::Locate

Manual pages: locate, slocate, and/or updatedb.

NOTES

architecture

The general philosophy in use here is to just try things that are likely to work and then just try something else if they fail. This is probably better than attempting to guess which form of locate to use based on the current platform, because (a) no one (to my knowledge) has a capabilities database that specifies which locate is found on which platform (b) different variants may be installed at the whim of a sysadmin (c) there may after all be variants of locate I've never encountered.

So checking ^O is of limited utility, and similarly, some of the existing forms of locate lack introspection features (e.g. you can't get freebsd's locate to tell you what version it is).

details

The object creation process "new" and "init" determines how to do system-wide locates, and saves it's conclusions for use by future calls of the locate method on this object.

Some of this elaborate initialization process can be short-circuited if it's told which db file to use: that's convenient for cases where you want to use this module to create a locate db of your own (there's no point in scoping for a system-wide db if we're going to use a specialized one).

If the db location is not known, the search process begins with making guesses about likely locations it might be found. It goes through this list:

/var/lib/slocate/slocate.db  -- Secure Locate under Kubuntu
/var/cache/locate/locatedb   -- GNU locate, under Knoppix
/var/db/locate.database      -- BSD locate, under FreeBSD
/usr/var/locatedb            -- mentioned: File::Locate docs and cygwin lists
/var/lib/locatedb            -- mentioned on insecure.org
/usr/local/var/locatedb      -- Solaris with findutils installed
/var/lib/locate/locatedb     -- mentioned on a Debian list in 2000
/var/spool/locate/locatedb   -- speculative mention on a cygwin list

So that's three names, in 8 locations. It also tries other permutations on speculation:

/var/cache/locate/slocate.db
/var/db/slocate.db
/usr/var/slocate.db
/usr/local/var/slocate.db
/var/lib/locate/slocate.db
/var/spool/locate/slocate.db

/var/lib/slocate/locate.database
/var/cache/locate/locate.database
/usr/var/locate.database
/usr/local/var/locate.database
/var/lib/locate/locate.database
/var/spool/locate/locate.database

/var/lib/slocate/locatedb
/var/db/locatedb

Each of these possibilites is checked for simple file-existance, and then checked to see if one works. (See "checking if a form of locate works" below.)

locate shell command

If attempts at using File::Locate fails, the system falls back to shelling out to the locate command (it really should already know how to find the system-wide db, either from a compiled-in default or a config file setting).

But the locate shell command has it's own problems. There are at least three variants, with some slight differences between GNU locate, slocate and freebsd locate.

The current architecture of locate_via_shell tries all of them in a certain order, and remembers the one that worked last time.

Briefly, here are the variations we need to account for:

-d or --database

-d is essentially more general, because freebsd has it but does not have --database. So, we try "-d" first, but also try "--database" just in case.

-q for quiet

As of this writing, with slocate, if you tell it explicitly which db to use, that works, but you also get an ignorable error about how you don't have permissions to mess with the system wide database. You can get this warning to go away with the "-q" option, but neither Gnu locate or freebsd has it, and if you use it with them it's a fatal error. So here we try to use "-q" first, and if that dies, we run without it.

And still other variations exist in requesting version information. The FreeBSD form does not understand "--version", and in fact doesn't seem to have any sort of version option.

(Ah, Cross-platform programming is such a joy.)

checking if a form of locate works

In order to check that a system-wide locate is working, we probe for files we know (or strongly suspect) will be there on the system. This module tries a series of guesses of decreasing specificity (there's no point in getting a huge number of hits if they're not needed), then bails out on the list if a result is recieved.

The list in use here begins with files in the standard perl library (which should accompany almost any installation of perl, unless they were removed for some reason):

MakeMaker
SelfStubber
DynaLoader

It then begins looking for strings that should be relatively common on most systems:

README
tmp
bin
the
htm
txt
home

The presumption is that if there are no hits on those searchs on a system-wide database, something is very wrong, and that particular form of "locate" just isn't working.

File::Locate

By using File::Locate with () to supress import, we need to call 'locate' like so:

File::Locate::locate

which makes it easy for us to define a new 'locate' method of our own.

The proceedural syntax of File::Locate::locate has it's ugly aspects, but the documentation is usually clear:

my @mp3s = File::Locate::locate "mp3", "/usr/var/locatedb";

# do regex search
@hits = File::Locate::locate "^/usr", -rex => 1, "/usr/var/locatedb";

@hits = File::Locate::locate "^/usr", -rexopt => 'ie', "/usr/var/locatedb";
# i - case insensitive
# e - POSIX extended regexps (say what?)

Note: it isn't abundantly clear from the documentation if -rexopt has to be used with -rex, but it appears that this is the case. (And there is a syntax diagram that indicates this).

Another oddity, though: there doesn't seem to be a way to do a case-insensitive search without using regexps. (Oddly enough, none of the tests use the "-rexopt" feature.)

A very cool touch is that you can hand it a coderef, and avoid building up a big result set:

File::Locate::locate "*.mp3", sub { print "MP3 found: $_n" };

Note: the order of arguments to File::Locate::locate is supposed to be irrelevant.

system status fields

The system status fields (the one's that can be saved or inspected via introspection_results) no doubt seem redundant:

db
system_db_not_found
use_shell_locate
shell_locate_failed
shell_locate_cmd_idx

It's likely that they *are* somewhat redundant: they were invented on-the-fly during development on an ad hoc basis.

However, despite the way it looks, this set is resistant to being reduced in size. I believe this is because two-valued logic has it's limitations: For our immediate purpose, there has to be ways to distinguish between "I don't know what this value is, and you should try to find out" and "I don't know what this value is, and it isn't worth trying to find it."

AUTHOR

Joseph Brenner, <doom@kzsu.stanford.edu>, 29 May 2007

COPYRIGHT AND LICENSE

Copyright (C) 2007 by Joseph Brenner

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.

BUGS

None reported... yet.