NAME
Git::Repository::Plugin::GitHooks - A Git::Repository plugin with some goodies for hook developers
VERSION
version 4.0.0
SYNOPSIS
# load the plugin
use Git::Repository 'GitHooks';
my $git = Git::Repository->new();
my $config = $git->get_config();
my $branch = $git->get_current_branch();
my @commits = $git->get_commits($oldcommit, $newcommit);
my @files_modified_by_commit = $git->filter_files_in_index('AM');
my @files_modified_by_push = $git->filter_files_in_range('AM', $oldcommit, $newcommit);
DESCRIPTION
This module adds several methods useful to implement Git hooks to Git::Repository.
In particular, it is used by the standard hooks implemented by the Git::Hooks
framework.
NAME
Git::Repository::Plugin::GitHooks - Add useful methods for hooks to Git::Repository
CONFIGURATION VARIABLES
CONFIG_ENCODING
Git configuration files usually contain just ASCII characters, but values and sub-section names may contain any characters, except newline. If your config files have non-ASCII characters you should ensure that they are properly decoded by specifying their encoding like this:
$Git::Repository::Plugin::GitHooks::CONFIG_ENCODING = 'UTF-8';
The acceptable values for this variable are all the encodings supported by the Encode
module.
METHODS FOR THE GIT::HOOKS FRAMEWORK
The following methods are used by the Git::Hooks framework and are not intended to be useful for hook developers. They're described here for completeness.
prepare_hook NAME, ARGS
This is used by Git::Hooks::run_hooks to prepare the environment for specific Git hooks before invoking the associated plugins. It's invoked with the arguments passed by Git to the hook script. NAME is the script name (usually the variable $0) and ARGS is a reference to an array containing the script positional arguments.
load_plugins
This loads every plugin configured in the githooks.plugin option.
invoke_external_hooks ARGS...
This is used by Git::Hooks::run_hooks to invoke external hooks.
post_hooks
Returns the list of post hook functions registered with the post_hook method below.
METHODS FOR HOOK DEVELOPERS
The following methods are intended to be useful for hook developers.
post_hook SUB
Plugin developers may be interested in performing some action depending on the overall result of every check made by every other hook. As an example, Gerrit's patchset-created
hook is invoked asynchronously, meaning that the hook's exit code doesn't affect the action that triggered the hook. The proper way to signal the hook result for Gerrit is to invoke it's API to make a review. But we want to perform the review once, at the end of the hook execution, based on the overall result of all enabled checks.
To do that, plugin developers can use this routine to register callbacks that are invoked at the end of run_hooks
. The callbacks are called with the following arguments:
HOOK_NAME
The basename of the invoked hook.
GIT
The Git::Repository object that was passed to the plugin hooks.
ARGS...
The remaining arguments that were passed to the plugin hooks.
The callbacks may see if there were any errors signaled by the plugin hook by invoking the get_errors
method on the GIT object. They may be used to signal the hook result in any way they want, but they should not die or they will prevent other post hooks to run.
cache SECTION
This may be used by plugin developers to cache information in the context of a Git::Repository object. SECTION is any string which becomes associated with a hash-ref. The method simply returns the hash-ref, which can be used by the caller to store any kind of information. Plugin developers are encouraged to use the plugin name as the SECTION string to avoid clashes.
get_config [SECTION [VARIABLE]]
This groks the configuration options for the repository by invoking git config --list
. The configuration is cached during the first invocation in the object Git::Repository
object. So, if the configuration is changed afterwards, the method won't notice it. This is usually ok for hooks, though, which are short-lived.
With no arguments, the options are returned as a hash-ref pointing to a two-level hash. For example, if the config options are these:
section1.a=1
section1.b=2
section1.b=3
section2.x.a=A
section2.x.b=B
section2.x.b=C
Then, it'll return this hash:
{
'section1' => {
'a' => [1],
'b' => [2, 3],
},
'section2.x' => {
'a' => ['A'],
'b' => ['B', 'C'],
},
}
The first level keys are the part of the option names before the last dot. The second level keys are everything after the last dot in the option names. You won't get more levels than two. In the example above, you can see that the option "section2.x.a" is split in two: "section2.x" in the first level and "a" in the second.
The values are always array-refs, even it there is only one value to a specific option. For some options, it makes sense to have a list of values attached to them. But even if you expect a single value to an option you may have it defined in the global scope and redefined in the local scope. In this case, it will appear as a two-element array, the last one being the local value.
So, if you want to treat an option as single-valued, you should fetch it like this:
$h->{section1}{a}[-1]
$h->{'section2.x'}{a}[-1]
If the SECTION argument is passed, the method returns the second-level hash for it. So, following the example above:
$git->get_config('section1');
This call would return this hash:
{
'a' => [1],
'b' => [2, 3],
}
If the section doesn't exist an empty hash is returned. Any key/value added to the returned hash will be available in subsequent invocations of get_config
.
If the VARIABLE argument is also passed, the method returns the value(s) of the configuration option SECTION.VARIABLE
. In list context the method returns the list of all values or the empty list, if the variable isn't defined. In scalar context, the method returns the variable's last value or undef
, if it's not defined.
As a special case, options without values (i.e., with no equals sign after its name in the configuration file) are set to the string 'true' to force Perl recognize them as true Booleans.
The string undef
may be used to reset the list of values. Only values after the last occurrence of undef
are considered either in list or in scalar context. This is a general way for you to cancel higher level configurations (e.g., system or global) configs in lower level configurations (e.g. local). And it works for every configuration option.
get_config_boolean SECTION VARIABLE
Git configuration variables may be grokked as Booleans. (See git help config
.) There are specific values meaning true (viz. yes
, on
, true
, 1
, and the absence of a value) and specific values meaning false (viz. no
, off
, false
, 0
, and the empty string).
This method checks the variable's value and returns 1 or 0 representing Boolean values in Perl. If the variable's value isn't recognized as a Git Boolean the method croaks. If the variable isn't defined the method returns undef.
In the Git::Hooks documentation, all configuration variables mentioning a BOOL
value are grokked with this method.
get_config_integer SECTION VARIABLE
Git configuration variables may be grokked as integers. (See git help config
.) They may start with an optional signal (+
or -
), followed by one or more decimal digits, and end with an optional scaling factor letter, viz. k
(1024), m
(1024*1024), or g
(1024*1024*1024). The scaling factor may be in lower or upper-case.
This method checks the variable's value format and returns the corresponding Perl integer. If the variable's value isn't recognized as a Git integer the method croaks. If the variable isn't defined the method returns undef.
In the Git::Hooks documentation, all configuration variables mentioning an INT
value are grokked with this method.
check_timeout
If the configuration option githooks.timeout
is set to a positive number, this method aborts the hook if more than that amount of time (in seconds) has passed since the start of the run. It's called in many places by Git::Hooks itself and by some of its plugins to try to stop runaway checks.
fault MESSAGE INFO
This method should be used by plugins to record consistent error or warning messages. It gets one or two arguments. MESSAGE is a multi-line string explaining the error. INFO is an optional hash-ref which may contain additional information about the message, which will be used to complement it.
A "complete" fault is formatted like this:
[PREFIX: CONTEXT]
MESSAGE
DETAILS
PREFIX gives contextual information about the message. It can be set via the prefix
INFO hash key. If not, the package name of the function which called fault
is used, which usually happens to be the name of the plugin which detected the error.
CONTEXT is additional contextual information, such as a reference name, a commit SHA-1, and a violated configuration option.
MESSAGE is the multi-line error message.
DETAILS is a multi-line string giving more details about the error. Usually showing error output from an external command.
Besides the MESSAGE, which is required, and the PREFIX, which has a default value, all other items must be informed via the INFO hash-ref with the following keys:
prefix
A string giving broad contextual information about the error message. When absent, the prefix used is the package name of the function which called
fault
, which is usually a Git::Hooks plugin name.commit
The SHA-1 or a Git::Repository::Log object representing a commit. It is informed in the CONTEXT area like this (as a short SHA-1):
[PREFIX: commit SHA-1]
ref
The name of a Git reference (usually a branch). It is informed in the CONTEXT area like this:
[PREFIX: on ref REF]
option
The name of a configuration option related to the error message. It is informed in the CONTEXT area like this:
[PREFIX: violates option 'OPTION']
details
A string containing details about the error message. If present, it is appended to the MESSAGE, separated by an empty line, and with its lines prefixed by two spaces.
The method simply records the formatted error message and returns. It doesn't die.
The messages can be colorized if they go to a terminal. This can be configured by the configuration options githooks.color
and githooks.color.<slot>
, which are explained in the section "CONFIGURATION" in Git::Hooks documentation.
get_faults
This method returns a string specially formatted with all error messages recorded with the fault
method, a header, and a footer, if requested by configuration.
fail_on_faults [WARN_ONLY]
By default (or if WARN_ONLY is false) if there are any faults registered so far by the fault
method, this method logs the fault messages in the ERROR level and aborts by croaking.
If WARN_ONLY is true, it logs the fault messages in the WARN level.
undef_commit
The undefined commit is a special SHA-1 used by Git in the update and pre-receive hooks to signify that a reference either was just created (as the old commit) or has been just deleted (as the new commit). It consists of 40 zeroes.
empty_tree
The empty tree represents an empty directory for Git.
get_commit COMMIT
Returns a Git::Repository::Log object representing COMMIT.
get_commits OLDCOMMIT NEWCOMMIT [OPTIONS [PATHS]]
Returns a list of Git::Repository::Log objects representing every commit reachable from NEWCOMMIT but not from OLDCOMMIT.
There are two special cases, though:
If NEWCOMMIT is the undefined commit, i.e., '0000000000000000000000000000000000000000', this means that a branch, pointing to OLDCOMMIT, has been removed. In this case the method returns an empty list, meaning that no new commit has been created.
If OLDCOMMIT is the undefined commit, this means that a new branch pointing to NEWCOMMIT is being created. In this case we want all commits reachable from NEWCOMMIT but not reachable from any other branch. The syntax for this is NEWCOMMIT ^B1 ^B2 ... ^Bn", i.e., NEWCOMMIT followed by every other branch name prefixed by carets. We can get at their names using the technique described in, e.g., this discussion.
The Git::Repository::Log objects are constructed ultimately by invoking the git log
command like this:
git log [<options>] <revision range> [-- <paths>]
The revision range
is usually just OLDCOMMIT..NEWCOMMIT
, but there are some special cases which require some calculating as discussed above.
The OPTIONS
optional argument is an array-ref pointing to an array of strings, which will be passed as options to the git-log command. It may be useful to grok some extra information about each commit (e.g., using --name-status
).
The PATHS
optional argument is an array-ref pointing to an array of strings, which will be passed as pathspecs to the git-log command. It may be useful to filter the list of commits, grokking only those affecting specific paths in the repository.
read_commit_msg_file FILENAME
Returns the relevant contents of the commit message file called FILENAME. It's useful during the commit-msg
and the prepare-commit-msg
hooks.
The file is read using the character encoding defined by the i18n.commitencoding
configuration option or utf-8
if not defined.
Some non-relevant contents are stripped off the file. Specifically:
diff data
Sometimes, the commit message file contains the diff data for the commit. This data begins with a line starting with the fixed string
diff --git a/
. Everything from such a line on is stripped off the file.comment lines
Every line beginning with a
#
character is stripped off the file.trailing spaces
Any trailing space is stripped off from all lines in the file.
trailing empty lines
Any empty line at the end is stripped off from the file, making sure it ends in a single newline.
All this cleanup is performed to make it easier for different plugins to analyze the commit message using a canonical base.
write_commit_msg_file FILENAME, MSG, ...
Writes the list of strings MSG
to FILENAME. It's useful during the commit-msg
and the prepare-commit-msg
hooks.
The file is written to using the character encoding defined by the i18n.commitencoding
configuration option or utf-8
if not defined.
An empty line (\n\n
) is inserted between every pair of MSG arguments, if there is more than one, of course.
get_affected_refs
Returns the list of names of the references affected by the current push command. It's useful in the update
and the pre-receive
hooks.
get_affected_ref_range REF
Returns the two-element list of commit ids representing the OLDCOMMIT and the NEWCOMMIT of the affected REF.
get_affected_ref_commits REF [OPTIONS [PATHS]]
Returns the list of commits leading from the affected REF's NEWCOMMIT to OLDCOMMIT. The commits are represented by Git::Repository::Log objects, as returned by the get_commits
method.
The optional arguments OPTIONS and PATHS are passed to the get_commits
method.
filter_name_status_in_index FILTER
Returns a hash with information about files changed in the index (aka stage area or cache) compared to HEAD. The hash maps file names to their respective statuses, which are uppercase letters, as returned by the git diff-index --name-status
command. It's useful in the pre-commit
hook when you want to know which files are being modified in the upcoming commit.
FILTER specifies in which kind of changes you're interested in. It's passed as the argument to the --diff-filter
option of git diff-index
, which is documented like this:
--diff-filter=[(A|C|D|M|R|T|U|X|B)...[*]]
Select only files that are Added (A), Copied (C), Deleted (D), Modified
(M), Renamed (R), have their type (i.e. regular file, symlink,
submodule, ...) changed (T), are Unmerged (U), are Unknown (X), or have
had their pairing Broken (B). Any combination of the filter characters
(including none) can be used. When * (All-or-none) is added to the
combination, all paths are selected if there is any file that matches
other criteria in the comparison; if there is no file that matches other
criteria, nothing is selected.
filter_name_status_in_range FILTER FROM TO [OPTIONS [PATHS]]
Returns a hash with information about files that are changed between commits FROM and TO. The hash maps file names to their respective statuses, which are uppercase letters, as returned by the git diff-tree --name-status
command. It's useful in the update
and the pre-receive
hooks when you want to know which files are being modified in the commits being received by a git push
command.
FILTER specifies in which kind of changes you're interested in. Please, read about it in the filter_name_status_in_index
method above.
FROM and TO are revision parameters (see git help revisions
) specifying two commits. They're passed as arguments to the git diff-tree
command in order to compare them and grok the files that differ between them.
A special case occurs when FROM is the undefined commit, which happens when we're calculating the commit range in a pre-receive or update hook and a new branch or tag has been pushed. In this case we pass FROM and TO to the get_commits
method to find the list of new commits being pushed and calculate the difference between the first commit's parent and TO. When the first commit has no parent (in case it's a root commit) we return an empty list.
The optional arguments OPTIONS and PATHS are passed to the get_commits
method.
filter_name_status_in_commit FILTER, COMMIT
Returns a hash with information about files that are changed in COMMIT. The hash maps file names to their respective statuses, which are uppercase letters, as returned by the git diff-tree --name-status
command. It's useful in the patchset-created
and the draft-published
hooks when you want to know which files are being modified in the single commit being received by a git push
command.
FILTER specifies in which kind of changes you're interested in. Please, read about it in the filter_name_status_in_index
method above.
COMMIT is a revision parameter (see git help revisions
) specifying the commit. It's passed a argument to git diff-tree
in order to compare it to its parents and grok the files that changed in it.
Merge commits are treated specially. Only files that are changed in COMMIT with respect to all of its parents are returned. The reasoning behind this is that if a file isn't changed with respect to one or more of COMMIT's parents, then it must have been checked already in those commits and we don't need to check it again. In this case, since the files may have been changed differently in each branch (added, modified, deleted, etc.), the hash values are strings of letters, one for each branch.
filter_files_in_index FILTER
Returns the sorted keys of the hash that would be returned by the filter_name_status_in_index
method if invoked with the same arguments.
filter_files_in_range FILTER FROM TO [OPTIONS [PATHS]]
Returns the sorted keys of the hash that would be returned by the filter_name_status_in_range
method if invoked with the same arguments.
filter_files_in_commit FILTER, COMMIT
Returns the sorted keys of the hash that would be returned by the filter_name_status_in_commit
method if invoked with the same arguments.
authenticated_user
Returns the username of the authenticated user performing the Git action. It groks it from the githooks.userenv
configuration variable specification, which is described in the Git::Hooks documentation. It's useful for most access control check plugins.
If githooks.userenv
isn't configured, it tries to grok the username from environment variables set by Gerrit, Bitbucket Server, and GitLab before trying the USER
environment variable as a last resort. If it can't find it, it returns undef.
repository_name
Returns the repository name as a string. Currently it knows how to grok the name from Gerrit, Bitbucket, and GitLab servers. Otherwise it tries to grok it from the GIT_DIR
environment variable, which holds the path to the Git repository.
get_current_branch
Returns the repository's current branch name, as indicated by the git symbolic-ref HEAD
command.
If the repository is in a detached head state, i.e., if HEAD points to a commit instead of to a branch, the method returns undef.
get_sha1 REV
Returns the SHA1 of the commit represented by REV, using the command
git rev-parse --verify REV
It's useful, for instance, to grok the HEAD's SHA1 so that you can pass it to the get_commit method.
get_head_or_empty_tree
Returns the string "HEAD" if the repository already has commits. Otherwise, if it is a brand new repository, it returns the SHA1 representing the empty tree. It's useful to come up with the correct argument for, e.g., git diff
during a pre-commit hook. (See the default pre-commit.sample script which comes with Git to understand how this is used.)
blob REV, FILE, ARGS...
Returns the name of a temporary file into which the contents of the file FILE in revision REV has been copied.
It's useful for hooks that need to read the contents of changed files in order to check anything in them.
These objects are cached so that if more than one hook needs to get at them they're created only once.
By default, all temporary files are removed when the Git::Repository object is destroyed.
Any remaining ARGS are passed as arguments to File::Temp::newdir
so that you can have more control over the temporary file creation.
If REV:FILE does not exist or if there is any other error while trying to fetch its contents the method dies.
file_size REV FILE
Returns the size (in bytes) of FILE (a path relative to the repository root) in revision REV.
file_mode REV FILE
Returns the mode (as a number) of FILE (a path relative to the repository root) in revision REV.
is_reference_enabled REF
This method should be invoked by hooks to see if REF is enabled according to the githooks.ref
and githooks.noref
options. Please, read about these options in Git::Hooks documentation.
REF must be a complete reference name or undef. Local hooks should pass the current branch, and server hooks should pass the references affected by the push command. If REF is undef, the method returns true.
The method decides if a reference is enabled using the following algorithm:
If REF matches any REFSPEC in
githooks.ref
then it is enabled.Else, if REF matches any REFSPEC in
githooks.noref
then it is disabled.Else, it is enabled.
match_user SPEC
Checks if the authenticated user (as returned by the authenticated_user
method above) matches the specification, which may be given in one of the three different forms acceptable for the githooks.admin
configuration configuration option, i.e., as a username
, as a @group
, or as a ^regex
.
im_admin
Checks if the authenticated user (again, as returned by the authenticated_user
method) matches the specifications given by the githooks.admin
configuration variable. This is useful to exempt "administrators" from the restrictions imposed by the hooks.
grok_acls CFG ACTIONS
This method returns a list of ACLs (Access Control Lists) grokked from the CFG.acl
options, where CFG is a configuration session like githooks.checkfile
.
The CFG.acl
is a multi-valued option specifying rules allowing or denying specific users to perform specific actions on specific "things". (Commons such things are references and files). By default any user can perform any action on any thing. So, the rules are used to impose restrictions.
When a hook is invoked it groks all things that were affected in any way by the commits involved and tries to match each of them to a RULE to see if the action performed on it is allowed or denied.
A RULE takes three or four parts, like this:
(allow|deny) [ACTIONS]+ <spec> (by <userspec>)?
(allow|deny)
The first part tells if the rule allows or denies an action.
[ACTIONS]+
The second part specifies which actions are being considered by a combination of letters. The ACTIONS argument is a string containing all valid letters for the corresponding ACLs.
See the documentation of the
acl
option in the Git::Hooks::CheckFile and the Git::Hooks::CheckReference plugins for two examples of this.<spec>
The third part specifies which things are being considered. In its simplest form, a
spec
is taken as a literal string matching the thing exactly by name.If the
spec
starts with a caret (^) it's interpreted as a Perl regular expression, the caret being kept as part of the regexp. These specs match potentially many things.Before being interpreted as a string or as a regexp, any sub-string of it in the form
{VAR}
is replaced by$ENV{VAR}
. This is useful, for example, to interpolate the committer's username in the spec, in order to create personal name spaces for users.(See the documentation of the
acl
option in the Git::Hooks::CheckFile and the Git::Hooks::CheckReference plugins for examples things as files and references, respectively.)by <userspec>
The fourth part is optional. It specifies which users are being considered. It can be the name of a single user (e.g.
james
) or the name of a group (e.g.@devs
).If not specified, the RULE matches any user.
The RULEs are matched in the reverse order as they appear in the result of the command git config CFG.acl
, so that later rules take precedence. This way you can have general rules in the global context and more specific rules in the repository context, naturally.
So, the last RULE matching the action, the file, and the user, tells if the operation is allowed or denied.
If no RULE matches the operation, it is allowed by default.
In the returned list, each ACL is represented by a hash with the following keys:
acl
Contains the original representation of the ACL, which is useful in producing error messages.
allow
A Boolean telling if the ACL is an "allow".
action
The string representation of the action (e.g. 'AMD' or 'CRUD').
spec
The spec, which can be either a string or a pre-compiled regex object.
who
The name of a user or of a group.
As an optimization, only ACLs matching the current user, either explicitly or by not having a WHO part, are returned in the list.
SEE ALSO
Git::Repository::Plugin
, Git::Hooks
.
Writing hook scripts in Bitbucket Server.
Git server hooks in GitLab.
Supported hooks in Gerrit.
AUTHOR
Gustavo L. de M. Chaves <gnustavo@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2024 by CPQD <www.cpqd.com.br>.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.