Benford’s Law in JavaScript

Benford's Law describes the distribution of the first digits of many, if not most, sets of numeric data and in this post I will implement a demonstration of the law in JavaScript.

Benford's Law centres on the perhaps surprising fact that in numeric data such as financial transaction, populations, sizes of geographical features etc. the frequencies of first digits follow a roughly reciprocal pattern.

This is shown in the following table, the relative frequencies being calculated using the formula in the heading of the right hand column.

nlog10(n + 1) - log10(n)
130.1
217.6
312.5
49.7
57.9
66.7
75.8
85.1
94.6

If the first digits followed a uniform distribution which you might intuitively expect each digit would appear about 11% of the time. However, in the Benford Distribution the number 1 occurs about 30% of the time, 2 about 18% of the time etc.. This is clearer when shown in a graph.

The best known use of Benford's Law is in fraud detection. If someone makes up false data it is unlikely to follow the Benford Distribution you would expect from genuine data, and if the numbers are purely random the first digits would probably fit a uniform distribution, ie. 11% each as described above.

I have mentioned first digits several times and you may be wondering about subsequent digits. Do they also fit the Benford Distribution? The answer is yes but to a decreasing amount. As you move along the digits the distributions become less Benfordian and more uniform.

For this post I'll write a function to take a list of numbers and create a table of values showing how closely it fits the Benford Distribution.

I'll then try it out with two sets of data, a fraudulent-looking one which roughly fits the uniform distribution and a genuine-looking one which roughly fits the Benford Distribution. I will also create a graph to show the results.

This project consists of the following JavaScript files as well as an HTML page and a few ancilliary files. You can download them as a zip or clone/download from Github if you prefer.

• benfordslaw.js

• benfordslawpage.js

• benfordtestdata.js

Let's first look at benfordslaw.js

benfordslaw.js

function calculateBenford(data)
{
/*
Calculates a set of values from the numeric list
input data showing how closely the first digits
fit the Benford Distribution.
Results are returned as a list of dictionaries.
*/

//                               1      2      3      4      5      6      7      8      9
const BenfordPercentages = [0, 0.301, 0.176, 0.125, 0.097, 0.079, 0.067, 0.058, 0.051, 0.046];

let results = [];

const firstDigits = data.map(function (item, index, array)
{
return item.toString()[0];
});

const firstDigitFrequencies = getDigitsFrequencies(firstDigits);

let dataFrequency;
let dataFrequencyPercent;
let BenfordFrequency;
let BenfordFrequencyPercent;
let differenceFrequency;
let differenceFrequencyPercent;

for(let n = 1; n <= 9; n++)
{
dataFrequency = firstDigitFrequencies[n];
dataFrequencyPercent = dataFrequency / data.length;
BenfordFrequency = data.length * BenfordPercentages[n];
BenfordFrequencyPercent = BenfordPercentages[n];
differenceFrequency = dataFrequency - BenfordFrequency;
differenceFrequencyPercent = dataFrequencyPercent - BenfordFrequencyPercent;

results.push({"n": n,
"dataFrequency":              dataFrequency,
"dataFrequencyPercent":       dataFrequencyPercent,
"BenfordFrequency":           BenfordFrequency,
"BenfordFrequencyPercent":    BenfordFrequencyPercent,
"differenceFrequency":        differenceFrequency,
"differenceFrequencyPercent": differenceFrequencyPercent});
}

return results;
}

function getDigitsFrequencies(firstDigits)
{
const digitCounts = Array(10).fill(0);

for(let n of firstDigits)
{
digitCounts[n]++;
}

return digitCounts;
}

function printAsTable(BenfordTable)
{
const width = 59;

writeToConsole("-".repeat(width) + "<br/>", "console");
writeToConsole("|   |      Data       |    Benford      |    Difference   |<br/>", "console");
writeToConsole("| n |  Freq     Pct   |  Freq     Pct   |  Freq     Pct   |<br/>", "console");
writeToConsole("-".repeat(width) + "<br/>", "console");

for(let item of BenfordTable)
{
writeToConsole(`| \${item["n"]} `, "console");
writeToConsole(`| \${item["dataFrequency"].toString().padStart(6, " ")} `, "console");
writeToConsole(`| \${(item["dataFrequencyPercent"] * 100).toFixed(2).padStart(6, " ")} `, "console");
writeToConsole(`| \${item["BenfordFrequency"].toFixed(0).padStart(6, " ")} `, "console");
writeToConsole(`| \${(item["BenfordFrequencyPercent"] * 100).toFixed(2).padStart(6, " ")} `, "console");
writeToConsole(`| \${item["differenceFrequency"].toFixed(0).padStart(6, " ")} `, "console");
writeToConsole(`| \${(item["differenceFrequencyPercent"] * 100).toFixed(2).padStart(6, " ")} `, "console");
writeToConsole("|<br/>", "console");
}

writeToConsole("-".repeat(width) + "<br/>", "console");
}

function printAsGraph(BenfordTable)
{
writeToConsole("<br/>  <span class='greenbg'>Benford's Distribution</span><br/>", "console");
writeToConsole("  <span class='redbg'>Data                  </span><br/><br/>", "console");

writeToConsole("  0%       10%       20%       30%       40%       50%<br/>", "console");
writeToConsole("  |         |         |         |         |         |<br/>", "console");

for(let item of BenfordTable)
{
writeToConsole(` \${item["n"]} <span class="greenbg">\${" ".repeat(item["BenfordFrequencyPercent"] * 100)}</span><br/>  <span class="redbg">\${" ".repeat(item["dataFrequencyPercent"] * 100)}</span><br/>`, "console");
}
}

calculateBenford

This function is at the core of this whole project. Firstly we need a constant array of Benfordian probabilities, an empty array to store the results, and an array of first digits obtained using a map function. These are then passed to the getDigitsFrequencies function which I will get to later.

After declaring a set of variables for use within a loop we iterate from 1 to 9, setting various variable values for the current digit. These are then pushed onto the results array as an object. (I first wrote this with the calculations within the object-creation code but it looked a mess so I split them out purely to make the code neater.)

If it still isn't clear exactly what this function is creating don't panic, further down we'll see the results printed out in a grid.

getDigitsFrequencies

A simple function which iterates the firstDigits array and keeps running totals of the counts of each digit 1-9, using these digits to index the digitCounts array. (Note that 0 is unused.)

printAsTable

Here we print out the data generated in calculateBenford in a grid with one row for each of the digits 1-9.

printAsGraph

This function prints the same data but in a graph format with two bars per digit 1-9. The top bar in green represents the Benfordian Distribution and the lower bar in red represents the actual data.

Now we need a couple of functions to generate test data.

benfordtestdata.js

function getRandomData()
{
// Returns a list of 1000 numbers approximately
// following the uniform distribution NOT the
// Benford Distribution.

const randomData = new Array(1000);

for(let i = 0; i < 1000; i++)
{
randomData[i] = Math.floor(Math.random() * 1000);
}

return randomData;
}

function getBenfordData()
{
// Returns a list of approximately 1000 numbers
// approximately following the Benford Distribution.

const BENFORD_PERCENTAGES = [0, 0.301, 0.176, 0.125, 0.097, 0.079, 0.067, 0.058, 0.051, 0.046];

let BenfordData = [];

let randomfactor;
let start;
let max;

for(let firstdigit = 1; firstdigit <= 9; firstdigit++)
{
// get a random number between 0.8 and 1.2
randomfactor = (Math.random() * 0.4) + 0.8;

max = Math.floor(1000 * BENFORD_PERCENTAGES[firstdigit] * randomfactor);

for(let numcount = 1; numcount < max; numcount++)
{
start = firstdigit * 1000;
BenfordData.push(randBetween(start, start + 1000));
}
}

return BenfordData;
}

function randBetween(min, max)
{
const range = max - min;

n = (Math.random() * range) + min;

return n;
}

getRandomData

This is a very straightforward function which generates an array of 1000 random values between 1 and 1000. With this many values it is highly probable that the first digits will follow a Benfordian Distribution.

getBenfordData

This is more complex. For first digits 1 to 9 we generate a number of values, that number being in accordance with the Benfordian Distribution subject to slight random variations.

Finally a short function to try out our code.

benfordslawpage.js

{
const data = getRandomData();
// const data = getBenfordData();

const BenfordTable = calculateBenford(data);

printAsTable(BenfordTable);
printAsGraph(BenfordTable);
}

If you open benfordslaw.htm in your browser you will see something like this.

A quick glance at the graph shows that the data (the red bars) is fairly evenly distributed, with no relationship to the Benford distribution (green) we would expect from genuine data.

The columns in the grid are:

• n - the first digits

• Data Freq and Pct - actual frequencies and percentages of each digit in the data

• Benford Freq and Pct - frequencies and percentages of a perfect Benfordian Distribution

• Difference Freq and Pct - differences between the actual data and the perfect Benfordian Distribution

In window.onload comment out the call to getRandomData, uncomment the call to getBenfordData, and refresh the page. You'll see something like this.

Here the data is a much closer fit to the Benford distribution. If we didn't know better we would think it was genuine.