Fun with mathematics: Benford’s Law

If you gather a large collection of numbers from a naturally occurring source (for example, the surface areas of rivers, or typical sales figures for a shop), what is the probability of any given number in the collection starting with the digit “1”?

Because numbers don’t start with the digit “0”, there are nine possible digits for a number to start with: 1-9. So you’d expect the probability of the first digit being “1” to be one in nine, or about 11%, right?

Wrong. Because of an curious statistical phenomenon known as “Benford’s Law“, the probability is actually about 30%. The odds of the first digit being “2” are about 18%, and they decrease down to 4.6% for a “9”.

Simon Newcomb, an astronomer, first pointed out this phenomenon way back in 1881, but it never got much attention. It wasn’t until the physicist Frank Benford did a much larger study of numbers from dozens of different sources in 1938, and found that the rule applied pretty much everywhere, that it received more notoriety. And it still isn’t as well known as the birthday paradox (where the odds of two people sharing a birthday are about 50% if you get as few as 23 folk in a room), so you can easily use it as an amazing fact for impressing chicks at parties.

But Benford’s Law can be used for much more than just courtship. Because it applies across most naturally occurring number distributions, it can also be used to detect fraud in financial accounts, and to spot faked results in clinical trials. In recent years, professor Theodore Hill of the Georgia Institute of Technology has written several papers discussing the difficulty of faking data. His 1996 paper, “A Statistical Derivation of the Significant-Digit Law” also provides a solid explanation for just why Benford’s Law works the way it does.


Related Links

One Reply to “Fun with mathematics: Benford’s Law”

  1. Took a look at the paper and some related stuff, did you notice that a corollary of Benford’s law is that the probability distribution of subsequent digits is dependent on previous ones with the scale of the dependency reducing the further apart a pair of digits is.

    That’s Mad!

Comments are closed.