NAME

find-secret-leakage-in-git-diff.pl - Find secrets leakage in a Git diff

SYNOPSIS

find-secret-leakage-in-git-diff.pl [FILE]

DESCRIPTION

This script reads from a FILE or from STDIN the output of a git-diff command containing a patch and tries to detect secrets in the lines being added. It's intended to be invoked by the Git::Hooks::CheckDiff plugin, which feds it the output of either git-diff-index or git-diff-tree with the following options:

git diff* -p -U0 --no-color --diff-filter=AM --no-prefix

A "secret" is an API key, an authorization token, or a private key, which shouldn't be leaked by being saved in a versioned file. So, this script should be used in a pre-commit hook in order to alert the programmer when she does that.

When it finds a secret in the git-diff output it outputs a line like this:

<path>:<lineno>: Secret Leakage: <secret type> '<secret>'

Meaning:

  • <path>

    The path of the file adding the secret.

  • <lineno>

    The line number in the file where the secret is being added.

  • <secret type>

    The type of the secret found.

  • <secret>

    The specific secret found.

Sometimes you need to have a pseudo-secret in a file. Perhaps it's a credential used only in your test environment or as an example. You can mark these secrets so that this script disregards them. If you can, add the following mark in the same line of your pseudo-secret, like this:

my $aws_access_key = 'AKIA1234567890ABCDEF'; ## not a secret leak

The mark is the string ## not a secret leak. The two hashes are part of it!

Sometimes you can't put the mark in the same line. Lines beginning private keys, for example, do not have room for anything else. In these cases you can skip a whole block marking its beginning and end like this:

## not a secret leak begin
my $rsa_private_key = <<EOS;
-----BEGIN RSA PRIVATE KEY-----
izfrNTmQLnfsLzi2Wb9xPz2Qj9fQYGgeug3N2MkDuVHwpPcgkhHkJgCQuuvT+qZI
MbS2U6wTS24SZk5RunJIUkitRKeWWMS28SLGfkDs1bBYlSPa5smAd3/q1OePi4ae
<...>
8S86b6zEmkser+SDYgGketS2DZ4hB+vh2ujSXmS8Gkwrn+BfHMzkbtio8lWbGw0l
eM1tfdFZ6wMTLkxRhBkBK4JiMiUMvpERyPib6a2L6iXTfH+3RUDS6A==
-----END RSA PRIVATE KEY-----
EOS
## not a secret leak end

None of the lines inside the block will be denounced as leaks.

EXIT CODES

The script exits with the number of secrets found. So, it succeeds if no secret is found and fails if it finds at least one.

SEE ALSO

  • How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories

    This blog post sumarizes a paper by the same name which studies how secrets such as API keys, authorization tokens, and private keys are commonly leaked by being inadvertently pushed to GitHub directories. The study found that this much more common than one would think and tells which kind of secrets are most commonly leaked like that. Moreover, it shows specific regular expressions which can be used to detect such secrets in text. This is the main source of inspiration for this script.