The first thing you learned about probability is wrong*

*or dangerously incomplete.

I’ve just started reading Against the Gods: The remarkable Story of Risk, a book by Peter Bernstein that’s been high on my “To Read” list for a while. I suspect it will be quite interesting, though it’s clearly targeted at a general audience with no technical background. In Chapter 1 Bernstein makes the distinction between games which require some skill, and games of pure chance. Of the latter, Bernstein notes:

“The last sequence of throws of the dice conveys absolutely no information about what the next throw will bring. Cards, coins, dice, and roulette wheels have no memory.”

This is, often, the very first lesson that gets presented in a book or a lecture on probability theory. And, so far as theory goes it’s correct. For that celestially perfect fair coin, the odds of getting heads remain forever fixed at 1 to 1, toss after platonic toss. The coin has no memory of its past history. As a general rule, however, to say that the last sequence tells you nothing about what the next throw will bring is dangerously inaccurate.

In the real world, there’s no such thing as a perfectly fair coin, die, or computer-generated random number. Ok, I see you growling at your computer screen. Yes, that’s a very obvious point to make. Yes, yes, we all know that our models aren’t perfect, but they are very close approximations and that’s good enough, right? Perhaps, but good enough is still wrong, and assuming that your theory will always match up with reality in a “good enough” way puts you on the express train to ruin, despair and sleepless nights.

Let’s make this a little more concrete. Suppose you have just tossed a coin 10 times, and 6 out of the ten times it came up heads. What is the probability you will get heads on the very next toss? If you had to guess, using just this information, you might guess 1/2, despite the empirical evidence that heads is more likely to come up.

Now suppose you flipped that same coin 10,000 times and it came up heads exactly 6,000 times. All of a sudden you have a lot more information, and that information tells you a much different story than the one about the coin being perfectly fair. Unless you are completely certain of your prior belief that the coin is perfectly fair, this new evidence should be strong enough to convince you that the coin is biased towards heads.

Of course, that doesn’t mean that the coin itself has memory! It’s simply that the more often you flip it, the more information you get. Let me rephrase that, every coin toss or dice roll tells you more about what’s likely to come up on the next toss. Even if the tosses converge to one-half heads and one-half tails, you now know with a high degree of certainty what before you had only assumed: the coin is fair.

The more you flip, the more you know! Go back up and reread Bernstein’s quote. If that’s the first thing you learned about probability theory, then instead of knowledge you we’re given a very nasty set of blinders. Astronomers spent century after long century trying to figure out how to fit their data with the incontrovertible fact that the earth was the center of the universe and all orbits were perfectly circular. If you have a prior belief that’s one-hundred-percent certain, be it about fair coins or the orbits of the planets, then no new data will change your opinion. Theory has blinded you to information. You’ve left the edifice of science and are now floating in the either of faith.

Tags: , ,


  1. In short, the coin doesn’t have memory, but it MAY have bias. Because its assumed to be a fair coin, its assumed to be perfectly unbiased – and such a coin only really exists in pure theory.

    There is no way to determine the bias or the true probabilty of ANY event without performing the experiment or event a sufficient enough time that you can get the sample size up high enough to get the confidence interval narrowed down.

    But once you’ve established a good estimate of the true probability of the biased coin (60% heads for example) then it IS true that the coin has no memory. You might flip a coin 10 times and get heads 10 times, and on flip 11 you will STILL have a 60% chance of getting heads. This is no different than the theoretical case where you have an unbiased coin, and flip it 10 times and get heads 10 times. The only difference is that your odds of the next head is 50% instead of 60%.

    Basically, it should be rephrased that any random number generator has no memory of its past history. BUT every throw gets you closer to knowing what the true probability of an event is. Everything only breaks down when you assume that the theoretical concept of a perfect random number generator actually exists in the real world.

  2. @Jeff
    Well put. Though I wonder if it ever makes sense to say that a coin has a “true” probability of landing on heads, as in some immutable property of the coin itself. It seems like the information we gain lets us make limited inferences about that coin tossed in a particular way over a particular period of time (obviously a coin will change slowly).

  3. Interesting. I’ve started teaching probability the last two years differently since finishing my MS in statistics. I actually teach experimental probability first as opposed to second and discuss how important its implications are. I actually had answers on my last test that described how experimental probability is a more accurate way to describe a solution than describing all possibly outcomes.

  4. Didn’t really like the “against the gods” book, too much weight on theory and less about real life, prefer Taleb’s books (black swan, fooled by randomness)

  5. Leaving aside the possibility that the coin flipper can sufficiently control the flip to bias the outcome (see the mathematician/statistician/magician Persi Diaconis’ work) and making the weaker assumption that individual flips are independent of each other with probability P(H) = p not equal to zero or one, you can use John von Neumann’s algorithm/trick to construct a virtual “honest” coin: 1) Flip the coin twice, 2) if outcome is HT, with P(HT) = p*(1-p), the virtual coin is H, 3) if outcome is TH, with P(TH) = (1-p)*p, the virtual coin is T, 4) if outcome is HH or TT, go to step 1.

    Similar constructions can be use to build honest virtual dice, etc.

    I rather liked Against the Gods.

  6. Michael Anderson

    Computer-generated random numbers can be worse! With that old standby, the linear congruent generator, you’re assured of never getting the same number two or more times in succession.

  7. Excellent article, but I can’t believe the name of Thomas Bayes appears nowhere in it.

  8. Fascinating! Most excellent thinking. I teach research statistics to both undergraduate and graduate students, and have always ass-u-me(d) that a coin toss was always 50-50! But, you are correct. Once you have completed an adequate statistical sample, and the results show that the potential is more 55-45, you have established a new reality for that coin (or?). I love it, thank you for thinking… And I think The Black Swan is much more statistically meaningful of a codex on probability, just sayin…