NAME

Mpp::BuildCache -- subroutines for handling the makepp build cache

SYNOPSIS

$bc = new Mpp::BuildCache("/path/to/build_cache", $create_flags_hash);
$bc->cache_file($file_info_of_file_to_archive, $file_key);
$bc_entry = $bc->lookup_file($file_key);

$build_info = $bc_entry->build_info;
$bc_entry->copy_from_cache($output_finfo);

The Mpp::BuildCache package

The Mpp::BuildCache is a cache system that makepp uses to store the results of compilation so that they can be used later. If a file with the same input signature is needed, it can be fetched again immediately instead of rebuilt. This can cut down compilation time significantly in a number of cases. For example:

  • Suppose you compile all files in your program for optimization. Then you find a bug and you recompile for debug. Then you fix the bug and you want to recompile for optimization. Most of the source files haven't changed, but you just wiped out all the .o files when you turned off optimization, so without a build cache you'd have to recompile everything. With the build cache, an extra copy of the file was made and stored in the cache, so it can be fetched again, instead of recompiling.

  • Suppose you have checked out several copies of your sources into several different directory trees, and have made small modifications to each tree. Now most of the files are the same across the directory trees, so when you compile another directory tree, it can fetch most of the compiled files from the build cache created when you built the first directory tree.

  • Suppose you have 5 developers all working on approximately the same set of sources. Once again, most of their files will be identical. If one person compiles a file, the remaining developers can fetch the file from the build cache rather than compiling it for themselves.

Cache format

The cache is actually a directory hierarchy where the filename of each file is the build cache key. For example, if the build cache key of a file is 0123456789abcdef, the actual file name might be 01/234/56789abcdef_xyz.o. On some file systems, performance suffers if there are too many files per directory, so Mpp::BuildCache can automatically break them up into directories as shown.

It remembers the key that it was given, which is presumably some sort of hash of all the inputs that went into building the file. Mpp::BuildCache does remember the build info structure for the file. This is intended to help in the very rare case where there is a collision in the key, and several files have the same key. Mpp::BuildCache cannot store multiple files with the same key, but by storing the build information it is at least possible to determine that the given file is the wrong file.

Use of Mpp::File

We do not use the Mpp::File class to store information about the files in the build cache. The reason is that we don't want to waste the memory storing all the results. Typically things are looked up once in the build cache and never examined again, so it's a waste of memory to build up the Mpp::File structures for them. For this reason, for any files in the build cache directories, we do the stat and other operations directly instead of calling the Mpp::File subroutines.

We do use the Mpp::File subroutines for files stored elsewhere, however.

new Mpp::BuildCache("/path/to/cache");

Opens an existing build cache.

cache_file

$build_cache->cache_file($file_info, $file_key, $build_info);

Copies or links the file into the build cache with the given file key. Also the build information is stored alongside the file so that when it is retrieved we can verify that in fact it is exactly what we want.

Returns a true value if the operation succeeded, false if any part failed. If anything failed in updating the build cache, the cache is cleaned up and left in a consistent state.

lookup_file

$bc_entry = $bc->lookup_file($file_key);

Lookup a file by its cache key. Returns undef if the file does not exist in the cache. Returns a Mpp::BuildCache::Entry structure if it does exist. You can query the Mpp::BuildCache::Entry structure to see what the build info is, or to copy the file into the current directory.

copy_check_md5

my $md5;
my $result = copy_check_md5("in", "out", \$md5, $setmode);

Assuming that the input file is atomically generated and removed, copy_check_md5 will either copy the file as-is or return undef with $! set, even if the input file is unlinked and/or re-created concurrently, even over NFS. Mode bits are copied as well if $mode is true. Copy_check_md5 will instead die if it detects that the input file is not being written atomically, or if it detects something that it can't explain.

If a Digest object is provided as a third argument, then the file's content is added to it. It may be modified even if the copy fails. See Digest(3pm).

A successful copy will return a 2-element array consisting of the size and modification time of the input file.

If the return value is an empty array, then $! is set as follows:

ENOENT

The input file was removed while it was being read.

ESTALE

The output file was removed while it was being written, or the directory containing the input file was removed.

Others

Many other errors are possible, such as EACCES, EINTR, EIO, EISDIR, ENFILE EMFILE, EFBIG, ENOSPC, EROFS, EPIPE, ENAMETOOLONG, ENOSTR. In most cases, these are non-transient conditions that require manual intervention, and should therefore cause the program to terminate.

The Mpp::BuildCache::Entry package

A Mpp::BuildCache::Entry is an object returned by BuildCache::lookup_file. You can do the following with the object:

absolute_filename

$bc_entry->absolute_filename

Returns the name of the file in the build cache.

copy_from_cache

$bc_entry->copy_from_cache($output_finfo, $rule, \$reason);

Replaces the file in $output_finfo with the file from the cache, and updates all the Mpp::File data structures to reflect this change. The build info signature is checked against the target file in the cache, and if $md5check is set, then the MD5 checksum is also verified.

Returns true if the file was successfully restored from the cache, false if not. (I think the only reason it wouldn't be successfully restored is that someone deleted the file from cache between the time it was returned from lookup_file and the time copy_from_cache is invoked.) If it returns false, then $reason is set to a string that explains why. If $reason ends with '(OK)', then the failure could have been due to legitimate concurrent access of the build cache. If it fails, then the output target is unlinked.