NAME

Trav::Dir - Traverse directories

SYNOPSIS

use FindBin '$Bin';
use Trav::Dir;
my $o = Trav::Dir->new (
    # Don't traverse these directories
    no_trav => qr!/(\.git|xt|blib)$!,
    # Reject these files
    rejfile => qr!~$|MYMETA|\.tar\.gz!,
);
my @files;
chdir "$Bin/..";
$o->find_files (".", \@files);
for (@files) {
    if (-f $_) {
        print "$_\n";
    }
}

produces output

./lib/Trav/Dir.pm
./lib/Trav/Dir.pod.tmpl
./lib/Trav/Dir.pod
./Makefile.PL
./t/trav-dir.t
./build.pl
./make-pod.pl
./examples/synopsis-out.txt
./examples/synopsis.pl
./MANIFEST.SKIP
./Changes
./.gitignore

(This example is included as synopsis.pl in the distribution.)

VERSION

This documents version 0.01 of Trav-Dir corresponding to git commit 6bdce48c0304b89f802b6327347b280abe28c661 released on Fri Feb 19 12:34:50 2021 +0900.

DESCRIPTION

Traverse directories and make a list of files. Replacement for "File::Find".

METHODS

find_files

$o->find_files ($dir, \@files);

Traverse $dir and its subdirectories, and list all the files found into @files. File names are fully qualified, in other words the file names in @files include the file's directory.

You can omit the second argument and use a "callback" instead.

my $o = Trav::Dir->new (callback => \& my_function);
$o->find_files ($dir);

See "callback".

The list of files @files is deliberately not made a return value so that you can run find_files over a list of directories.

for my $dir (@dirs) {
    $td->find_files ($dir, \@files);
}

New files are added to the end of @files using push.

If both callback and @files are omitted, find_files prints a warning and returns, since there is nothing to do. You might see this warning if you accidentally omit the slash before @files like this:

use Trav::Dir;
my $o = Trav::Dir->new ();
my @files;
$o->find_files (".", @files);

produces output

No file list and no callback at /usr/home/ben/projects/trav-dir/examples/forgot-slash.pl line 7.

(This example is included as forgot-slash.pl in the distribution.)

Symbolic links (files which return a true value with the -l test) are not traversed.

"find_files" was originally the name of a subroutine in the script which became Trav::Dir.

new

my $o = Trav::Dir->new (%options);

Create a new Trav::Dir object. There are no mandatory options. The options are as follows:

callback
Trav::Dir->new (callback => \& call_me);

A subroutine to call back when each file is found, similar to the wanted routine of "File::Find". It is called like this

&callback ($data, $file);

Here $file is the full path of the file and $data is the data you pass with "data".

Directories are also sent to your callback. If you don't want directories, use the "no_dir" option.

data
Trav::Dir->new (data => \%my_structure);

A data item to pass to callbacks. See "callback" and "preprocess" for the calling conventions.

maxsize
Trav::Dir->new (maxsize => 100_000_000);

Maximum file size to consider. If left undefined it is not used. If defined, if the file under consideration is bigger than this, the file is skipped. This test is not applied to directories.

This option was implemented to assist file search indexing by rejecting very large files.

minsize
Trav::Dir->new (minsize => 100_000_000);

Minimum file size to consider. If left undefined it is not used. If defined, if the file under consideration is smaller than this, the file is skipped. This test is not applied to directories.

This option was implemented to assist a file search indexing program by rejecting very small files.

no_dir
Trav::Dir->new (no_dir => 1);

Don't include directories in the results sent to "callback" or included in @files.

This option was implemented to assist a file search indexing program by rejecting directories.

no_trav
Trav::Dir->new (no_trav => qr!\.git\b!);

Regex to reject directories to traverse. If a directory matches this regex, it is not traversed at all, and its subdirectories are not traversed or even seen.

This option was implemented for an incremental backup system to stop going into directories containing files which didn't need to be backed up, and a file search indexing system for stopping going into directories containing files which don't need to be indexed, such as computer-generated HTML files or old web server log files.

only
Trav::Dir->new (only => qr!.*\.html$!);

Regex to accept only files which match it.

This option was implemented to assist searching for certain types of file.

For example the following script finds files called hanzierrorlog under a directory /mount/backup/incremental/2019, and then removes them when they are found with a callback named found:

my $td = Trav::Dir->new (
    only => qr!hanzierrorlog!,
    callback => \& found,
);
$td->find_files ('/mount/backup/incremental/2019');
sub found
{
    my (undef, $file) = @_;
    unlink $file or warn "Failed to unlink $file: $!";
}
preprocess
Trav::Dir->new (preprocess => \& my_function);

A function which preprocesses the list of files of a directory. It is called in the form

preprocess ($data, $dir, \@files);

where $data is what is specified with "data", $dir is the directory of the files, and @files is the list of files in that directory.

Trav::Dir does not call chdir, but the file names in @files are not qualified, that is they do not contain the directory of the file, $dir.

To alter what files are processed, alter the reference you get, e.g. to stop processing of the directory use

@$files = ();

This may change in a future version of the module.

This option was implemented as a substitute for the preprocess method of "File::Find" when I replaced its use by use of Trav::Dir, for an incremental backup system, to prevent the backup system going into directories flagged not to be backed up.

rejfile
Trav::Dir->new (rejfile => qr!~$!);

Regex for rejecting files. If a file matches this regex it is never sent to the callback specified with "callback".

This was implemented for things such as the above example, where ~ is the character used by Emacs editor backups, to prevent old editor backup files from being indexed by a search system.

verbose
Trav::Dir->new (verbose => 1);

Print a lot of messages about how the files are being processed.

SEE ALSO

CPAN

There are a number of other CPAN modules for going into directories and making a list of files.

File::Find

This is a Perl version of the Unix "find" utility. It is part of the Perl core so is installed with Perl by default.

Alternatives to File::Find

These modules offer alternatives to File::Find but are not based on it.

File::chdir::WalkDir
File::Find::Declare

Moose-based

File::Find::Node

"Object oriented directory tree traverser"

File::Find::Object

"An object oriented File::Find replacement"

File::Next
Path::Class::Iterator

"walk a directory structure"

Path::Class::Rule

"Iterative, recursive file finder with Path::Class"

Path::Iterator::Rule
File::Find extensions

These extend "File::Find" in various ways.

File::Find::CaseCollide

"find collisions in filenames, differing only in case"

File::Find::Duplicates
File::Find::Rex

"Combines simpler File::Find interface with support for regular expression search criteria."

File::Find::Rule

"Alternative interface to File::Find"

It features very comprehensive tests for different kinds of files which you can chain together to get lists of files.

File::Find::utf8

"Fully UTF-8 aware File::Find"

It forces the file names from bytes to characters.

File::Find assistants

These help you to use "File::Find".

File::Find::Closures

"functions you can use with File::Find"

File::Finder

"nice wrapper for File::Find ala find"

It writes wanted subroutines for File::Find.

File::Find::Wanted

"More obvious wrapper around File::Find"

Other
File::Find::Match

"Perform different actions on files based on file name."

File::Find::Random

The documentation doesn't make it very clear what this does.

HISTORY

Trav::Dir was created as an alternative to "File::Find" with the following merits:

No need for closures or global variables

Trav::Dir eliminates the need for closures or global variables by allowing the user to supply a "data" argument to "new".

No pseudo-global variables

File::Find communicates with the user routine it calls wanted using various pseudo-global variables like $File::Find::name. Trav::Dir uses standard Perl subroutine arguments in callbacks. See "callback" and "preprocess".

Pattern-matching

File::Find has no facility to match directories or files against patterns. Instead each and every directory and file must be handled by a user callback, and the user callback must interact with File::Find using lengthy fully-qualified arguments like $File::Find::prune.

Trav::Dir greatly simplifies the selection of files by allowing regex arguments like "no_trav", "only" and "rejfile" to sort through directories and file names.

Documentation

File::Find has problems with its documentation including wrong statements, undocumented variables like $File::Find::prune, and oddities like calling the user callback wanted and then writing that the subroutine is misnamed, or having both a CAVEAT section containing two caveats, followed by a BUGS AND CAVEATS section containing only one caveat, and no bugs.

I've reported some of them to the Perl bug list. I've also submitted a pull request to correct some of the problems. Please see there for details if you would like to contribute.

Does not call chdir

Trav::Dir does not call chdir. All returned file names are fully-qualified.

Prior to creating this module I was regularly using "File::Find" and I had also used "File::Find::Rule", as well as using code such as

my $pm = `find . -name *.pm`;
my @pm = split /\n/, $pm;

The bulk of Trav::Dir's code was taken from scripts written as an alternative to either File::Find and friends or the above kinds of things. The scripts had been in use for several years in various places. The random-looking names of the options to "new" are just the names from the old scripts.

Since starting this module in February 2021, I've been able to replace all uses of File::Find, backticks, and the other scripts, with Trav::Dir.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2021 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.