Thursday, March 3, 2016

Calculate the entropy of a string (i.e. a password) with PowerShell

As with my other PowerShell stuff, this was made for fun and might make it's way into something later. The details of what the script is for, and the many assumptions it makes are in the code. Short story, this function will give you the bits of entropy in a provided string. Usually such a thing is interesting when trying to determine the "strength" of a password. The larger the number, the stronger the password (because the amount of space that would need to be explored to find the password via brute force is larger...has more entropy). The code notes a link that discusses the topic in MUCH greater detail.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
<#
.SYNOPSIS
    Calculate the entropy (in bits) of the provided string
.DESCRIPTION
    Based primarily upon discussion here, https://technet.microsoft.com/en-us/library/cc512609.aspx
    this function will calculate the entropy of the provided string, returning an integer result
    indicating bits of entropy. Numerous assumptions MUST be made in the calculation of this
    number. This function takes the easiest approach, which you can also read as "lazy" at best
    or misleading at worst.

    We need to figure out the "size" of the space from which the symbols in the string are drawn - after
    all the value we're calculating is not absolute in any way, it's relative to some max/min values. We
    make the following assumptions in this function:
    --if there is a lower case letter in the provided string, assume it's possible any lower case letter could
      have been used. Assume the same for upper, numeric, and special chars.
    --by "special characters" we mean the following: ~`!@#$%^&*()_-+={}[]|\:;"'<,>.?/
    --by "letters", we mean just the letters on a U.S. keyboard.
    --no rules regarding which symbols can appear where, e.g. can't start with a number.
    --no rules disallowing runs, e.g. sequential numbers, sequential characters, etc.
    --no rules considering non-normal distribution of symbols, e.g. "e" just as likley to appear as "#"
    The net impact of these assumptions is we are over-calculating the entropy so the best use of this
    function is probably for comparison between strings, not as some arbiter of absolute entropy.
.PARAMETER s
 The string for which to calculate entropy.
.EXAMPLE
 get-StringEntropy -s "JBSWY3DPEHPK3PXP"
.NOTES  
 FileName: get-StringEntropy
 Author: nelsondev1
#>

function get-StringEntropy {
[CmdletBinding()]
Param(
    [String]$s
)

$specialChars = @"
~`!@#$%^&*()_-+={}[]|\:;"'<,>.?/
"@

    $symbolCount = 0 # running count of our symbol space

    if ($s -cmatch "[a-z]+") {
        $symbolCount += 26
        Write-Verbose "$s contains at least one lower case character. Symbol space now $symbolCount"
    }
    if ($s -cmatch "[A-Z]+") {
        $symbolCount += 26
        Write-Verbose "$s contains at least one upper case character. Symbol space now $symbolCount"
    }
    if ($s -cmatch "[0-9]+") {
        $symbolCount += 10
        Write-Verbose "$s contains at least one numeric character. Symbol space now $symbolCount"
    }

    # In the particular use, I found trying to regex/match...challenging. Instead, just going
    # to iterate and look for containment.
    $hasSpecialChars = $false
    foreach ($c in $specialChars.ToCharArray())
    {
        if ($s.Contains($c))
        {
            $hasSpecialChars = $true
        }
    } 
    if ($hasSpecialChars) {
        $symbolCount += $specialChars.Length
        Write-Verbose "$s contains at least one special character. Symbol space now $symbolCount"
    }

    # in a batch mode, we might want to pre-calculate the possible values since log is slow-ish.
    # there wouldn't be many unique options (eg 26, 26+26, 26+10, 26+16, 26+26+10, etc.)
    # ...though in comparison to performing the above regex matches it may not be a big deal.
    # anyway...

    # Entropy-per-symbol is the base 2 log of the symbol space size
    $entroyPerSymbol = [Math]::Log($symbolCount) / [Math]::Log(2)
    Write-Verbose "Bits of entropy per symbol calculated to be $entroyPerSymbol"

    $passwordEntropy = $entroyPerSymbol * $s.Length

    Write-Verbose "Returning value of $passwordEntropy"
    return $passwordEntropy # this is the bits of entropy in the starting string
}