NAME

Data::Password::Entropy - Calculate password strength

SYNOPSIS

use Data::Password::Entropy;

print "Entropy is ", password_entropy("pass123"), " bits.";   # prints 31

if (password_entropy("mypass") < password_entropy("Ha20&09_X!t")) {
    print "mypass is weaker. It is unexpectedly, isn't it?";
}

DESCRIPTION

Information entropy, also known as password quality or password strength when used in a discussion of the information security, is a measure of a password in resisting brute-force attacks.

There are a lot of different ways to determine a password's entropy. We use a simple, empirical algorithm: first, all characters from the string splitted to several classes, such as numbers, lower- or upper-case letters and so on. Any characters from one class have equal probability of being in the password. Mix of the characters from the different classes extends the number of possible symbols (symbols base) in the password and thereby increases its entropy. Then, we calculate the effective length of the password to ensure the next rules:

  • some orderliness decreases total entropy, so '1234' is weaker password than '1342',

  • repeating sequences decrease total entropy, so 'a' x 100 insignificantly stronger than 'a' x 4 (it may seem, that's too insignificantly).

Do not expect too much: an algorithm does not check the password's weakness with dictionary lookup (see Data::Password). Also it can not detect obfuscation like 'p@ssw0rd', sequences from a keyboard row or personally related information.

Probability of characters occurring depends on the capacity of character class only. Perhaps, it should be taken into account a prevalence of symbol class actually — it is very unlikely to find a control character in the password. But common password policies don't allow control characters, spaces or extended characters in passwords, therefore, so they should not occur in practice.

Similarly, there is no well-defined approach to process national characters. For example, the Greek letters block in Unicode Character Database contains about 400 symbols, but not all of them have equivalent frequency of usage. An intruder, who knows that password may contain Greek letters, will not probe the α (Greek letter Alpha) with the same probability as the ἆ (Greek small letter Alpha with psili and perispomeni), therefore it might be incorrect to consider a whole UCD block or script as a base for calculating probabilities.

So, data are treated as a bytes string, not a wide-character string, and all characters with codes higher than 127 form one class.

The character classes based on the ASCII encoding. If you have something else, e.g. EBCDIC, you can try something like the Encode or Convert::EBCDIC modules.

FUNCTIONS

There's only one function in this package and it is exported by default.

password_entropy($data)

Returns an entropy of $data, calculating in bits.

SEE ALSO

Data::Password, Data::Password::Manager, Data::Password::BasicCheck.

http://en.wikipedia.org/wiki/Password_strength

"A Conceptual Framework for Assessing Password Quality" by Wanli Ma, John Campbell, Dat Tran, and Dale Kleeman [PDF] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.3266&rep=rep1&type=pdf

COPYRIGHT

Copyright (c) 2010 Oleg Alistratov. All rights reserved.

This module is free software. You can redistribute it and/or modify it under the terms of the Artistic License 2.0.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

AUTHOR

Oleg Alistratov <zero@cpan.org>