for (i in iterations:1 ) {

if (itemHistory[i] != chosen) {

break

}

}

The resulting distribution of moments of divergence for 200 trials (100th iterations each) looks like this:

http://gis-lab.info/images/screenshots/20100903-872-37kb.jpg

]]>Thanks for the reply! I’ve used

items = c(5, 10, 20, 50, 75, 100, 200, 300, 400, 500, 600, 700, 800)

which is about as far as my machine will let me go in a reasonable amount of time. On log-log paper the plot of divergence point vs number of items is linear until about 600 items, where it appears to change slope. In normal coordinates it looks exponential as I modeled but seems to be linear > 600. Do you see something similar? I’ve seen it in two simulations so far, but don’t have an intuitive sense for why that would be.

Thanks again!

]]>I don’t think there’s any non-arbitrary way to measure the moment of divergence. What you did makes sense. You’re formula is good for certain ranges, but I suggest you do a log-log plot for a wide range of numbItems. Try this:

1. Generate a whole bunch of different numbers to test going up to a fairly high amount. For example, I used this code to generate the ones to test:

numbItemsVector = floor(c(1.15^(23:45)))

2. Keep track of when each diverges. Use your method, or, for a quick test, after normalizing the weights see if the Chosen item now has weight 999 times more than all the others combined:

if(itemWeights[chosen] > .999) { # Store results and break

3. Plot log(iterationsToDivergence) vs log(numbItems).

How does that look?

Cheers!

]]>I really enjoy your blog, and have been using it to help develop my R skills. I am probably your least statistically knowledgable reader but I do enjoy working through your examples and trying to extend them. Thanks for another fun post!

I replicated your results at a variety of item counts and was surprised to see such a low variability between trials (~8.4% for 200 balls, ~2.6% for 500, n = 20) and a pretty clear relationship between iterations to divergence (y) and number of items (x):

y = 1.942 * x^2

I’m curious how you would measure the moment of divergence. I looked for the iteration where the last thousand iterations had the same result, then subtracted 1000 from it. Any suggestions for a better way?

Thanks!

]]>If you don’t re-normalize the weights each time than the Chosen One is eventually assigned a weight of “Inf” (ie a number too big for R to handle), which causes an error. This way the weight of the Chosen One becomes 1 while the others become ever smaller.

@xi’an

Good call. It is like Polya’s with fixed number of balls, unlimited colors, and fractional weightings. I’m not sure why “numbItems fixed is not compatible with your description of a 5% weight increase” though perhaps I should have made clear that a 5% increase in weight is *not* exactly the same as a 5% increase in the probability a ball will be picked. I didn’t want to go into that nuance in the post, but here’s an example of how the math works:

If you start with 10 balls, each with p=0.1 of being picked, then when you increase the weight of the first winner its new p=0.1089, an increase of 8.9% (because new p=1.1/sum(itemWeights), where we have yet to re-normalize the weights). Weight always rises by a fixed 10%, but the closer a ball gets to dominating the others, the slower its probability of being picked increases. If p grew by a (fixed) 10% each time it would soon surpass 1, which as we all know would bring on Probabilistic Armageddon because such a thing can’t happen in our universe. Maybe in some other universe (there’s a thesis idea for you, build a mathematical universe where p can exceed 1), but not in this one.

]]>itemWeights[chosen] = itemWeights[chosen] + itemWeights[chosen] /numbItems

is [almost] equivalent to replacing the chosen ball with tw balls of the same color. (I write [almost] because you do not update the weight as

itemWeights[chosen] = itemWeights[chosen] + 1/numbItems

and the number of balls as

numbItems into numbItems+1

after each replacement, which would be the case for the Polya Urn process. I also think that keeping numbItems fixed is not compatible with your description of a 5% weight increase.) The limiting behaviour of the Polya Urn process is known to be concentrated on (0,1) for the probability of drawing ball i. Hence the same for your process because the weights on the chosen balls are increasing with the iterations… ]]>