NWR Benford's Law and Iran Election

Frank Deis · Jun 19, 2009

I got addicted to reading http://www.fivethirtyeight.com during the Obama campaign and the election last fall. Nate Silver showed up a lot on various political talk shows, looking a lot like a bespectacled math geek with Asperger's Syndrome. At any rate he is clearly a smart guy and comes up with some interesting insights. A couple of days ago there were two articles on the site discussing Benford's Law which I had never heard of. Basically in a random collection of real world numbers, the distribution will favor "one" as the first digit. It doesn't much matter whether you are talking about populations of small towns, or prices in the 7-11, and it certainly should apply to vote tallies. The curve looks like this.

If you look at the vote totals for Karroubi, one of the minor candidates in the Iranian election, it looks instead like this

At any rate, read the article. One surprising side note is that it really looks like someone was jiggering the results in the Al Franken - Norm Coleman senatorial election in Minnesota.

ARTICLE

Frank

Ian Fitzsimmons · Jun 19, 2009

Thanks, Frank, for putting this interesting site up on the board. Interesting article, too.

Frank Deis · Jun 19, 2009

Glad you liked it Ian. It is kind of maddeningly inconclusive but a good thing to chew on.

For what it's worth, obviously this doesn't apply to true random numbers, i.e. computer selected. A truly random set would give you a horizontal graph with all numbers represented equally in the first position.

When the University stopped using social security numbers, I started having the students choose three digit numbers as their personal ID. I tell them to use their birthday or their house address or whatever they want. My guess is that the numbers they choose would fit the Benford curve. There are always way more numbers between 100 (the minimum number for this purpose) and 150 than, say 800 and 850.

I suppose I would also like to know more about the Franken Coleman numbers. Both sides had motive. Which side had the opportunity? Hard to figure.

Ian Fitzsimmons · Jun 20, 2009

Apropos Franken-Coleman, Nate Silver offers a plausible explanation of the observed variation that derives from the voting administration system, as opposed to fraud, so it's hard to say that his analysis indicates impropriety. I wonder if he could get a handle on that question by running the same analysis on a series of historical Minnesota state-wide elections; if the particular variation pattern he notes is in caused by the administrative system, you would expect to see it repeated from election to election, no?

Also, it's hard to compare the significance of an analysis like this made on a US election with one made on an Iranian election. There's a good discussion of this point in one of Nate's other articles, in which he compares the vote-counting procedures in US elections with those in Iran. There aren't a lot of cross-checks and safeguards in the Iranian system, it appears, and the votes are counted once by a single, central government ministry. Large-scale vote switching would therefore be relatively easy to achieve and difficult to challenge. The decentralized vote-counting systems in the US, where counting activity is distributed among precincts all across the state, makes such large-scale fraud inherently more difficult here.

Unrelated, it would be interesting to see a series of historical analysis done for the state of Florida (and perhaps Ohio) in recent presidential elections.

It's great to see math used this way, in any event, and the site conveys a pleasing sense of tough-minded intellectual integrity. Thanks again.

Frank Deis · Jun 20, 2009

Just by chance -- at our last department meeting Gyan Bhanot was talking about the same sort of approach to scientific research. He uses software which can make matrices with a million dimensions and kind of rotate them until clusters pop into view. This allows detailed comparison of, say, gene sequences or other complex data. One thing he was applying it to was the various sorts of breast cancer cells -- some drugs affect only one or two of the cell types.

It's kind of cool that MATH exists, out there, in its own frame of reference but then suddenly the most remotely theoretical math can turn out to have a fundamental significance for explaining phenomena in daily life.

I wonder if anyone here has seen "Dark Matter" -- about Chinese grad students doing physics Ph.D. work in the U.S. A beautiful movie but shocking.

Steve Guattery · Jun 20, 2009

originally posted by Frank Deis:

For what it's worth, obviously this doesn't apply to true random numbers, i.e. computer selected.

[Pedantry mode on]

If by "computer selected" you mean computer generated, then you are not talking about true random numbers, but pseudorandom numbers. Good pseudorandom generators pass all sorts of statistical tests that show predictable patterns do not occur in their outputs. However, if you know the current state and the algorithm used by the generator, the output sequence is deterministic.

[Pedantry mode off]

MLipton · Jun 20, 2009

originally posted by Steve Guattery:

originally posted by Frank Deis:

For what it's worth, obviously this doesn't apply to true random numbers, i.e. computer selected.

Click to expand...

[Pedantry mode on]

If by "computer selected" you mean computer generated, then you are not talking about true random numbers, but pseudorandom numbers. Good pseudorandom generators pass all sorts of statistical tests that show predictable patterns do not occur in their outputs. However, if you know the current state and the algorithm used by the generator, the output sequence is deterministic.

[Pedantry mode off]

True in most cases, Teri, but there are pseudorandom processes that use physical data to introduce a nondeterministic element, such as HotBits and the ever-amusing LavaRnd.

Mark Lipton

Steve Guattery · Jun 20, 2009

originally posted by MLipton:

True in most cases, Jeff, but there are pseudorandom processes that use physical data to introduce a nondeterministic element, such as HotBits and the ever-amusing LavaRnd.

Note that Jeff isn't to blame for the pedantry, I am. Using measurements of random physical processes can yield true random numbers. Both of the generators you mention use physical data in the process, though LavaRnd does a fair bit of processing of the data. It looks like a lot of work to determine what the processing does in terms of randomness, though a quick look at their website suggests the system runs some random bits through a deterministic algorithm (possibly) to amplify some statistical properties. I thought the following statement, which discusses LavaRnd in terms of pseudorandom number generators, was interesting:

"A cryptographically sound random number generator is a very high quality random number generator that is almost certainly a cryptographically strong random number generator. While the generator lacks a formal proof, there should exist a solid and well reasoned argument for its cryptographic strength.

"A cryptographically sound random number generator will pass statistical tests with a very high level of confidence. In fact, sound generators will pass tests at the same confidence level as strong generators. In particular, any standard battery of statistical tests for randomness, when given a statistically significant amount of data, will not be able to distinguish the random number generator from a true random source."

The bit about the formal proof is certainly amusing.

Frank Deis · Jun 20, 2009

Is "pedantry mode" ever "off" in this place????

Signed

Jeff

MLipton · Jun 21, 2009

originally posted by Steve Guattery:

originally posted by MLipton:

True in most cases, Jeff, but there are pseudorandom processes that use physical data to introduce a nondeterministic element, such as HotBits and the ever-amusing LavaRnd.

Click to expand...

Note that Jeff isn't to blame for the pedantry, I am.

Whoops! FiX0red in the original.

Using measurements of random physical processes can yield true random numbers. Both of the generators you mention use physical data in the process, though LavaRnd does a fair bit of processing of the data.

Not to mention the Lava Lamps *snicker*

"A cryptographically sound random number generator is a very high quality random number generator that is almost certainly a cryptographically strong random number generator. While the generator lacks a formal proof, there should exist a solid and well reasoned argument for its cryptographic strength.

"A cryptographically sound random number generator will pass statistical tests with a very high level of confidence. In fact, sound generators will pass tests at the same confidence level as strong generators. In particular, any standard battery of statistical tests for randomness, when given a statistically significant amount of data, will not be able to distinguish the random number generator from a true random source."

The bit about the formal proof is certainly amusing.

I predict that that whole disclaimer is in there to forestall Bruce Schneier's criticism, thereby bringing Jeff into this discussion kicking and screaming.

Mark Lipton

Frank Deis · Jun 21, 2009

by the way I hope some of you guys Google Gyan Bhanot...

F

MLipton · Jun 21, 2009

originally posted by Frank Deis:
by the way I hope some of you guys Google Gyan Bhanot...

F

But of course we did. I'd sort of written him off as an anagram of "hang botany." Go figger. (sorry, just recently finished Anathem)

Mark Lipton

Jeff Grossman · Jun 21, 2009

Did someone call?

Frank Deis · Jun 23, 2009

Nate Silver is looking at the Iran data from a different angle

Another Iranian Oddity

NWR Benford's Law and Iran Election

Frank Deis

Ian Fitzsimmons

Frank Deis

Ian Fitzsimmons

Frank Deis

Steve Guattery

Mark Lipton

Steve Guattery

Frank Deis

Mark Lipton

Frank Deis

Mark Lipton

Jeff Grossman

Frank Deis