NAME
File::Repl - Perl module that provides file replication utilities
SYNOPSIS
use File::Repl;
%con = (
dira => 'C:/perl',
dirb => 'M:/perl',
verbose => '1',
age => '10',
);
$ref=File::Repl->New(\%con);
$r1 = $ref->Update('\.p(l|m)','a<>b',1);
$r2 = $ref->Update('\.t.*','\.tmp$','a<>b',1);
DESCRIPTION
The File:Repl provides simple file replication and management utilities. Its main functions are
- File Replication
-
Allowing two directory structures to be maintained, ensuring files that meet selection logic criteria are mirrored and otherwise synchronized.
- Bulk Renaming
-
Allowing files in a directory structure to be renamed according to the selection logic.
- Compressing
-
Allowing files in a directory structure to be compressed according to a given logic.
- Process
-
Run a common perl process against files in a directory structure according to selection logic.
- Deletion
-
Allowing files in a directory structure to be deleted according to the selection logic.
METHODS
- New(%con)
-
The New method constructs a new File-Repl object. Options are passed in the form of a hash reference \%con which define the file directories to be operated on and other parameters. The directories are scanned and each file is stat'ed. The hash keys have the following definitions-
- dira
-
This identifies the first directory to be scanned (required).
- dirb
-
This identifies the second directory to be scanned (required). If the object is only to have methods operate on it that operate on a single directory then dirb can be set to the same value as dira. This minimizes the directory structure to be sesarched.
- verbose
-
The verbose flag has several valid values:
- verbose = 0
-
No verbosity (default mode).
- verbose = 1
-
All file copies and deletes are printed.
- verbose = 2
-
Tombstone file trunkations are printed, and any timestamp changes made. Any file copies or deletes that would have been made that failed the agelimit criteria are printed.
- verbose = 3
-
Configuration settings (from %con) and Files meeting the match criteria are printed.
- verbose = 4
-
Files identified in each directory that match the regex requirements (from the Update method) are printed.
- age
-
This specifies the maximum age of a file in days. Files older than this will be ignored by Update, Rename, Compress and Delete methods.
If the age is specified as a negative number files newer than this age will be ignored.
A default value of zero causes no age limit to be tested - all files are accepted on age limits.
- recurse
-
When set to FALSE only files at the top level of the dira and dirb are scanned. Default value is TRUE
- ttl
-
This is the time to live (ttl for any tombstoned file, in days. Default value is 31.
- nocase
-
Switches for case sensitivity - default is TRUE (case insensitive).
- mkdirs
-
If either directory dira or dirb do not exist will attempt to create the directory if set TRUE. Default value is FALSE.
- Update(regex, [noregex,] action, commit)
-
The Update method makes the file updates requested - determined by the %con hash (from the New method) and four associated arguments.
This method also allows files to be tombstoned (ie removed from the replicated file sets). A file is tombstoned by appending .remove to the file name. The first Update will cause the file to be set to zero size, and any replica files to be renamed (so that the original file does not return). The next update after the ttl has expired will cause deletion of all file replicas.
If a directory is tombstoned (by adding .remove to its name) the directory and contents are removed and a file with the directory name and the .remove suffix replaces it. The file is removed as a normally tombstoned file. Note that tombstoning ignores the Update action qualifiers.
The Update method returns a reference to data structures evaluated during the method call. This is based on the method arguments, and allows arrays and hash's of the file structure meeting the selection criteria to be returned. See "EXAMPLES". Note that the aonly, bonly, amatch and bmatch array references, and the common hash reference all refer to the file structure state BEFORE the Update method makes any changes.
- regex
-
A regular expression, used to match all file names that are to be maintained.
- noregex
-
An optional regular expression used to match all files not to be maintained (ie excluded from the operation).
- action
-
defines the action to be performed. Note that tombstoning activities ignore the action and assume the A<>B directive for those files and directories being tombstoned.
- a>b
-
Files in the 'a' directory are to be replicated to the 'b' directory if a replica exists in 'b' directory and the timestamp is older than that of the file in the 'a' directory.
- a<b
-
Files in the 'b' directory are to be replicated to the 'a' directory if a replica exists in 'a' directory and the timestamp is older than that of the file in the 'b' directory.
- a<>b
-
Files in the 'a' directory are to be replicated to the 'b' directory if a replica exists in 'b' directory and the timestamp is older than that of the file in the 'a' directory. Files in the 'b' directory are to be replicated to the 'a' directory if a replica exists in 'a' directory and the timestamp is older than that of the file in the 'b' directory.
- A>B
-
Files in the 'a' directory are to be replicated to the 'b' directory - even if no replica exists in 'b' directory. If a replica already exists in the 'b' directory with a timestamp that is newer than that of the file in the 'a' directory it is not modified.
- A>B!
-
Files in the 'a' directory are to be replicated to the 'b' directory - even if no replica exists in 'b' directory. If a replica already exists in the 'b' directory with a timestamp that is newer than that of the file in the 'a' directory it is not modified. Orphan files in the 'b' directory are deleted.
- A<B
-
Files in the 'b' directory are to be replicated to the 'a' directory - even if no replica exists in 'a' directory. If a replica already exists in the 'a' directory with a timestamp that is newer than that of the file in the 'b' directory it is not modified.
- A<B!
-
Files in the 'b' directory are to be replicated to the 'a' directory - even if no replica exists in 'a' directory. If a replica already exists in the 'a' directory with a timestamp that is newer than that of the file in the 'b' directory it is not modified. Orphan files in the 'a' directory are deleted.
- A<>B
-
Files in the 'a' directory are to be replicated to the 'b' directory - even if no replica exists in 'b' directory. If a replica already exists in the 'b' directory with a timestamp that is newer than that of the file in the 'a' directory it is not modified. Files in the 'b' directory are to be replicated to the 'a' directory - even if no replica exists in 'a' directory. If a replica already exists in the 'a' directory with a timestamp that is newer than that of the file in the 'b' directory it is not modified.
- commit
-
When set TRUE makes changes required - set FALSE to show potential changes (which are printed to STDOUT)
- Rename(regex, [noregex], namesub, commit)
-
The Rename method is used to rename files in the dira directory structure in the object specified in the New method.
- regex
-
A regular expression, used to match all file names that are to be renamed.
- noregex
-
An optional regular expression used to match all files not to be renamed (ie excluded from the operation).
- namesub
-
The argument used for a perl substitution command is applied to the file name to create the file's new name.
e.g. /\.pl$/\.perl/
This examplewill rename all files (that meet regex and noregex criteria) from .pl to .perl
- commit
-
When set TRUE makes renames required - set FALSE to show potential changes (which are printed to STDOUT)
- Process
-
Not yet implemeneted
- Compress
-
Not yet implemented
- Delete(regex, [noregex], commit)
-
The Delete method removes files from the dira directory structure in the object specified in the New method.
- regex
-
A regular expression, used to match all file names that are to be deleted.
- noregex
-
An optional regular expression used to match all files not to be deleted (ie excluded from the operation).
- commit
-
When set TRUE makes deletions required - set FALSE to show potential changes (which are printed to STDOUT)
- Version
-
The Version method returns the File::Repl module version. No calling argument is necessary.
REQUIRED MODULES
File::Find;
File::Copy;
File::Basename;
Win32::API (Win32 platforms only)
TIMEZONE AND FILESYSTEMS
On FAT filesystems, mtime resolution is 1/30th of a second. A fudge of 2 seconds is used for synching FAT with other filesystems. Note that FAT filesystems save the local time in UTC (GMT).
On FAT filesystems, "stat" adds TZ_BIAS to the actual file times (atime, ctime and mtime) and conversley "utime" subtracts TZ_BIAS from the supplied parameters before setting file times. To maintain FAT at UTC time, we need to do the opposite.
If we don't maintain FAT filesystems at UTC time and the repl is between FAT and NON-FAT systems, then all files will get replicated whenever the TZ or Daylight Savings Time changes.
EXAMPLES
A simple example that retrieves and prints the working variables from the Update method
$ref=File::Repl->New(\%hash);
$my=$ref->Update('.*','A>B',1);
$sub = sub { # simple sub that determines the reference type and prints the associated values
my ($ref) =$_[0];
if ( ref($ref) eq "SCALAR" ) {
print " SCALAR $ref\n";
}elsif( ref($ref) eq "ARRAY" ) {
print " ARRAY";
foreach (@$ref) {
print "\t$_\n";
}
}elsif( ref($ref) eq "HASH" ) {
print " HASH ";
foreach (keys %$ref) {
print "\t$_ => $$ref{$_}\n";
}
}elsif( ref($ref) eq "REF" ) {
&$sub($$ref);
}else{
print " VALUE\t$ref\n";
}
print "\n";
};
foreach my $key (sort keys %$my) {
print "$key:\n";
&$sub($$my{$key});
}
and a sample output
References and values of $my
amatch:
ARRAY /a/b/c/d/e/dummy.c
/a/b
/a/b/c/d/e/bar.pl
/a/b/c/d/e/ABCDE.XYZ
/a
/a/b/c/d/e/foo.tst
/a/b/c/d
/a/b/c/d/e
/a/b/c
aonly:
ARRAY /a/b/c/d/e/foo.tst
/a/b/c/d/e/dummy.c
/a/b/c/d/e/ABCDE.XYZ
bmatch:
ARRAY /a/b
/a/b/c/d/e/bar.pl
/a
/a/b/c/d
/a/b/c/d/e
/a/b/c
bonly:
ARRAY
common:
HASH /a/b => /a/b
/a/b/c/d/e/bar.pl => /a/b/c/d/e/bar.pl
/a => /a
/a/b/c/d => /a/b/c/d
/a/b/c/d/e => /a/b/c/d/e
/a/b/c => /a/b/c
The amatch and bmatch array references are those files and directory's in the adir and bdir structures that met the regex and negregex regular expression criteria. The aonly and bonly array references give those files and directories that exist only in that directory structure.
The common hash reference identifies those files and directories that exist in both dira and dirb directory structures. The key is for the dira, and value for dirb. Note that, depending on the nocase value the key and value may show differences in case on FAT and NTFS file systems.
A similar approach could be used to determine the referenced data from $ref. This would give access to
The following script can be called from a windows explorer prompt as the alternative to the windows delete function (the windows delete action might potentially be reversed by a replication). Obviously this will only function for file sand directories that are regularly synchronised using this module.
use strict;
use warnings;
my(@files) = @ARGV;
END {print "\nDONE -- PRESS ENTER\n";<STDIN>};
print << "End_Of_Header";
================================================================================
Executing $0\n
This will mark files and/or directories for removal by the File::Repl file
synchronisation utility.
Files will be set to zero size when first processed by the File::Repl module,
and finally removed after the tombstone period is expired.
To reverse this process simply remove the added .remove file extension immediatly
================================================================================
End_Of_Header
foreach my $file (@files){
print "$file\n";
if ($file =~ m/\.remove$/){
print "\t-is already marked for removal\n";
}elsif (-f $file){
unless (rename "$file","$file.remove"){
print "Unable to rename $file to $file.remove\n";
}else{
print "\t-marked for removal\n";
}
}
elsif (-d $file){
unless (rename "$file","$file.remove"){
print "Unable to rename $file to $file.remove\n";
}else{
print "\t-marked for removal\n";
}
}else{
print "File $file not found !!\n";
}
}
- alist (blist)
-
a hash of file names (the key) and values (mtime) of all files in the adir (or bdir) structure.
- atype (btype)
-
a hash of file names (the key) and values (file mode - from a stat operation) of all files in the adir (or bdir) structure.
In addition the scalar values of various settings determined when the New method is called can be determined.
AUTHOR
Dave Roberts <droberts@cpan.org>
ACKNOWLEDGMENTS
Thanks to Nigel Hodgson for his many contributions in developing this utility and helpin understanding fiel system specifics.
SUPPORT
You can send bug reports and suggestions for improvements on this module to me at droberts@cpan.org. However, I can't promise to offer any other support for this package.
COPYRIGHT
This module is Copyright � 2000 to 2010 Dave Roberts. All rights reserved.
This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This script is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. The copyright holder of this script can not be held liable for any general, special, incidental or consequential damages arising out of the use of the script.
CHANGE HISTORY
$Log: Repl.pm $ Revision 2.3 2015/11/03 18:07:45 Dave.Roberts interim release - corrections to help manage situations when a file and directory are compared
Revision 2.2 2015/11/01 22:36:38 Dave.Roberts removed Win32::Admin requirement from documentation
Revision 2.1 2015/07/15 20:51:29 Dave.Roberts added timestamp info for reading directories
Revision 2.0 2015/07/15 20:00:30 dave New major version, now with Win32::AdminMisc depandency removed as this module becomes more difficult to acquire and build for recent Perl releases
Revision 1.1 2015/07/15 19:58:06 Dave.Roberts Initial revision
Revision 1.31 2014/01/25 21:27:59 Dave.Roberts as advised from CPAN testing modified to include =encoding utf8 and escape the < and > characters in the pod with < and > respectively.
Revision 1.29 2010/05/04 15:02:05 Dave.Roberts corrected documentation - layout near Update method was incorrect
Revision 1.28 2010/04/27 14:55:00 Dave.Roberts minor code improvements in output messages for the Delete method
Revision 1.27 2010/04/13 08:36:52 Dave.Roberts added functionality for testing negative ages. This allows files older than the age specified to be selected (excluding all files younger)
Revision 1.26 2010/04/12 16:29:57 Dave.Roberts added Version method to return the File::Repl version corrected silly mistake in documentation - in definition of %con hash
Revision 1.25 2010/04/12 16:04:54 Dave.Roberts added example script for tombstoning removed windows linefeed characters from file
Revision 1.24 2010/04/07 02:00:11 Dave.Roberts modified code to remove the use of a hash as a reference - this was generating warnings as this use of a hash has beeen depreciated.
Revision 1.21 2002/02/07 10:37:39 Dave.Roberts corrected mode identified for Update method (the check used previously was invalid), and also synopsis for use of Update method (args incorrectly ordered)
Revision 1.20 2002/01/09 12:51:17 Dave.Roberts corrected errors in tombstoning of directories - subs $del and $deltree in particular
Revision 1.19 2001/11/21 21:28:19 Dave.Roberts resolved error in determining file age, especially when the 'a' file is missing evaluated the current time at start (set $runtime), and then removed many "time" calls
Revision 1.18 2001/08/22 07:10:41 Dave.Roberts logic change so that we don't use the Win32::API on win9x machines
Revision 1.17 2001/08/03 09:38:29 Dave.Roberts corrected code error (lines 572/3) where $$ was incorrectly used corrected code error (lines 572/3) where $$ was incorrectly used in truncation code
Revision 1.16 2001/08/02 22:09:02 Dave.Roberts corrected code for the Rename routine
Revision 1.15 2001/07/17 21:05:43 Dave.Roberts small changes to _arraysort - simplifying code
Revision 1.14 2001/07/12 21:51:50 jj768 additional documentation - and minor code changes
Revision 1.13 2001/07/12 15:18:43 Dave.Roberts code tidy up and reorganisation fixed logic errors (A>B! mode in Update method was not copying new files from A to B), also for A<B! removed several local variables and used referred object directly
Revision 1.12 2001/07/11 10:30:16 Dave.Roberts resolved various errors introduced in 1.11 - mainly associsated with reference errors rehacked fc subroutine - to give more logical messages still in need of more documentation - esp of object reference returned and associated variables
Revision 1.11 2001/07/06 14:52:53 jj768 double referencing of blessed object removed (from New method) and subsequent methods updated. Requires Testing. Update and other methods now return reference to data arrays and hashs evaluated during method call
Revision 1.10 2001/07/06 08:23:48 Dave.Roberts code changes to allow the colume info to be detected correctly using Win32::AdminMisc when a drive letter is specified (was only working with UNC names)
Revision 1.9 2001/06/27 13:35:53 Dave.Roberts minor presentation changes
Revision 1.8 2001/06/27 12:59:22 jj768 logic to prevent "Use of uninitialized value in pattern match (m//)" errors on use of $vol{FileSystemName}
Revision 1.6 2001/06/21 12:32:15 jj768
*** empty log message ***
Revision 1.5 2001/06/20 20:39:21 Dave.Roberts minor header changes
Revision 1.4 2001/06/20 19:55:21 jj768 re-built module source files as per perlmodnew manpage
Revision 1.1 2001/06/20 19:53:03 Dave.Roberts Initial revision
Revision 1.3.5.0 2001/06/19 10:34:11 jj768 Revised calling of the New method to use a hash reference, rather than a hash directly
Revision 1.3.4.0 2001/06/19 09:48:38 jj768 intermediate development revision. Introduced Delete method and the _generic subroutine (used for all methods except New) this is preparatory to the hash being passed as a reference
Revision 1.3.3.0 2001/06/14 15:42:48 jj768 minor code changes in constructing hash and improvement in documentation -still need more docs on Timezones.