Continuous distributions like the Normal aren’t memoryless since your conditional probability changes as you go out into the tail. To illustrate, suppose someone samples from the Normal distribution, and you start to ask, “is it more than one standard deviation from the mean?”, “Is it more than 2?”, “More than 3?”, and so on. The chance that it will be past the forth, given that it is already past the third, is quite small. The chance that it’s beyond 5, given that it’s beyond 4, is even smaller. Meanwhile, with the exponential, those conditional probabilities stay the same each time. This gives rise to the strange feeling, as your waiting time gets longer and longer, that it really should be “due”, like you say, but the conditional expected wait time stays the same. IMO understanding this, and how conflicts with our intuitive feel, is extremely important. I suspect that a lot of “throwing good money after bad” problems are the result of thinking that conditional probabilities have moved in your favor, when it’s quite possible that they have remained the same or even moved against you. Does that make sense?

For those that don’t know, the Laplace is two exponentials bolted together to make it symmetric (http://www.statisticsblog.com/?s=laplace&x=0&y=0). If you know that you are positive or negative, then the conditional probabilities within that range should have the “memoryless” quality even though the full distribution isn’t memoryless.

]]>i wonder if you can shed some light on the exponential being the only memoryless continuous distribution, though? aren’t most of the familiar continuous distributions memoryless? (i.e. what makes the exponential memoryless when compared to the Laplace? what am i missing in this context?)

cheerio,

alexis

If you look at the formula for the standard deviation, you’ll find it is already ‘normalized’ by dividing through n-1 (or n) ]]>

We are using the standard deviation as a comparison measurement among several data sets. Problem is, the data sets vary widely in size.

Is there any sense in defining a metric that is sdev/N, where N is the number of samples in each data set, and using it as a normalized metric for comparison?

Thanks in advance for your comments.

Best regards. ]]>