r


14
Feb 13

Population simulation leads to Valentine’s Day a[R]t

Working on a quick-and-dirty simulation of people wandering around until they find neighbors, then settling down. After playing with the coloring a bit I arrived at the above image, which I quite like. Code below:

# Code by Matt Asher for statisticsblog.com
# Feel free to modify and redistribute, but please keep this notice 

maxSettlers = 150000

# Size of the area
areaW = 300
areaH = 300

# How many small movements will they make to find a neighbor
maxSteps = 200

# Homesteaders, they don't care about finding a neighbor
numbHomesteaders = 10

areaMatrix = matrix(0, nrow=areaW, ncol=areaH)

# For the walk part
adjacents = array(c(1,0,1,1,0,1,-1,1,-1,0,-1,-1,0,-1,1,-1), dim=c(2,8))

# Is an adjacent cell occupied?
hasNeighbor <- function(m,n,theMatrix) {
	toReturn = FALSE
	for(k in 1:8) {
		yCheck = m + adjacents[,k][1]
		xCheck = n + adjacents[,k][2]
		if( !((xCheck > areaW) | (xCheck < 1) | (yCheck > areaH) | (yCheck < 1)) ) {
			if(theMatrix[yCheck,xCheck]>0) {
				toReturn = TRUE
			}
		}
	}
	return(toReturn)
}


# Main loop
for(i in 1:maxSettlers) {
	steps = 1
	xPos = sample(1:areaW, 1)
	yPos = sample(1:areaH, 1)
	
	if(i <= numbHomesteaders) {
		# Seed it with homesteaders
		areaMatrix[xPos,yPos] = 1
	} else {
		if(areaMatrix[xPos,yPos]==0 & hasNeighbor(xPos,yPos,areaMatrix)) {
			areaMatrix[xPos,yPos] = 1
		} else {
			spotFound = FALSE
			outOfBounds = FALSE
			
			while(!spotFound & !outOfBounds & (steps areaW) | (xPos < 1) | (yPos > areaH) | (yPos < 1)) {
					outOfBounds = TRUE
				} else if(hasNeighbor(xPos,yPos,areaMatrix) ) {
					areaMatrix[xPos,yPos] = steps
					spotFound = TRUE
				}
			}
		}
		
	}

}
	
image(areaMatrix, col=rev(rgb(seq(0.01,1,0.01),seq(0.01,1,0.01),seq(0.01,1,0.01))))

# I think this version looks nicer!
# areaMatrix[areaMatrix !=0] = 1
# image(areaMatrix, col=rev(rgb(.5,0,seq(0.2,1,0.2))))

4
Feb 13

Landmine detection revisited; the inverse unicorn problem

A couple weeks ago I wrote about an interesting idea to clear landmines using the power of the wind. A reader asked me to comment more on the value of using these wind-powered “Kafons” to do an initial assay of a suspected minefield, an idea I mentioned at the end of my video on the subject. In particular, how good would the devices be at detecting the existence of mines if they were very sparse in an area? In a sense, this is the inverse of the unicorn problem; instead of trying to find every last mine, we’re concerned with finding the very first one, if indeed land mines are there. Put another way: How hard do we have to look before we can safely conclude that unicorns don’t exist?

Download the code for this simulation.

The animated plot shown at the top of this post represents a small sample of data from the simulation I ran. Each blue dot shows the progress of testing of a location to see if that field has mines. I’ve cutoff the testing at $30,000, which is 600 kafons based on their estimated cost (feel free to go into the code and change the cutoff to whatever you want). The dots at the top, with numbers above them, represent testing that used all 600 kanfons without finding any mines. Does this mean no mines exist? Sometimes, but not always. The number above the dot shows the true number of mines in that field during that particular simulation. As you can see, it’s very possible for the field to have several mines, yet still not have any detected, even after trying with hundreds of kafons. In the entire simulation, there were 283 trials with a single mine in the field. On 36 of those occasions, the mine was detected (and, in the simulation, detonated). The other 87% of time, we spent a (virtual) $30,000 and failed to detect its presence.

I’ve shown the results as an animation so that you can put yourself into the position of someone trying to decide whether to continue testing a field for mines, or move on to another location. Each new test costs additional funds, but when do you stop? My $30,000 cutoff is arbitrary. It represents a best guess on my part as to when it would be better to use other methods to test for landmines. These kinds of optimal stopping points can be extremely difficult to determine, especially when, as in this case, those testing for landmines will have to deal with the problem of sunk costs: imagine you’ve just spent $30,000 testing for mines in a field you suspect is dangerous, but you haven’t found anything. The very next kafon, at a cost of just $50, could be the one to find a mine. Do you continue? In my simulation, with this particular distribution of probabilities, once no mine was found in the first 300 kafons, it was very unlikely one would ever be found (although, as mentioned, even when no mine was detected after 600 kafons the field was still way more likely than not to have mines).

Based on the results of the simulation, using kafons to detect mines is cheap when the probability of finding a mine is very high, but in that case I would imagine there’s already strong evidence that landmines exist. In the case where landmines are more sparse, testing with kafons is expensive and the question of when to stop testing is difficult. Note that in a real world use, we don’t know the underlying probability of a mine in the field; we you could be anywhere along the x-axis of the plot shown at top. All we see, in real time, is a rising cost and no kafon found.

If we know how much (new) area is covered by each addition kafon, and if we assume that coverage and placement of landmines is randomly distributed (at least from our position of ignorance), then we can come up with some probability estimates for the chances that a field has an undetected landmine after each additional kafon is given a chance to detect mines. The biggest challenge is that, unless I’m missing something, the question of this exact probability is unanswerable unless we assume a prior distribution on the number of landmines in our field.

We wanted to let the data speak for itself in terms of whether we have hidden mines or not, but in the end, our final beliefs will depend as much on our previous hunch as on the data itself. Which is, in effect, exactly what we were hoping to avoid by sending out kafons to detect for mines.

Shown below is a full plot of all 2800 trials, each dot at the top might represent dozens of failed attempts to find a mine.


23
Jan 13

Fake text generation the wrong way, and a contest

As part of a bigger project, I needed to simulate a text string based on a source document, but at the character level. Just in case people find the code useful, I’ve uploaded it to MCMCtext.r.

In my simulated text, each character is chosen based on the transition probabilities in the source text from one character to another. The result is (nearly complete) gibberish without much interest to anyone, except perhaps those looking for a replacement for the standard Lorum Ibsum dummy text. More interesting fake text could be generated by using two character (or more) transition probabilities, or by working at the level of words.

Before moving on, I thought it might be interesting to see if anyone can “reverse engineer” my fake text output to figure out which original text was used as a source to generate it. Got that? The source text comes from Project Gutenberg. Hint: some features of the (fake) text could help you narrow the field of candidates.

First person to post a correct guess in the comments gets a copy of my comic and an unlimited supply of Hotpockets*. Limit one guess per person please.

* Hotpockets offer only valid if you are currently saving the planet from destruction.


10
Jan 13

Simulation of landmine clearing with Massoud Hassani’s Mine Kafon

Code used: MineClearingSimulationWithKafons.r

TRANSCRIPT OF VIDEO:
Hello, I’m Matt Asher with StatisticsBlog.com. This video is about my attempt to simulate a landmine clearing device built by Massoud Hassani called the Mine Kafon. I’ve put a link to his webpage at StatisticsBlog.com, I highly recommend checking out the video. Hassani’s device looks like this:

It’s a cheap, easy to build mine clearer that travels under the power of the wind. When I first saw the video, I was awestruck by what Hassani had done. It seemed like an awesome achievement, cleaver, creative, a fantastic idea for making the world better. A device that used the power of nature to clean up after man.

The more I thought about it, though, the more I wondered what might happen if hundreds of Kafons were sent out onto a mine field. So to examine that question, I built a simulation. I’ll run it now.

I’ve slowed it down so you can see what’s happening. Each blue line represents the path a Kafon might take across the minefield. The red circles represent exploded mines, and the gray parts are places where paths have overlapped.

Based on Hassani’s video, I’ve assumed there’s a prevailing wind that sweeps the devices right out into the field, and that the Kafons are released at equal intervals at the edge of the minefield. The movements up and down represent turbulence, uneven ground, or the natural tendency of the Kafons themselves to move with a wobble.

I’ve posted the code to this simulation on my website, as I do for all my blog posts. It’s written in R, a free and open source programming language. You can go into my code and easily change the wind speed and other parameters, then re-run the simulation. For example, I’ve set the number of mines that each Kafon can absorb before it stops working to 4, that’s on based on what Hassani estimates, but you can change that up or down.

Paths that stop at a red circle before they traverse the whole minefield are Kafons that have “plated out” after hitting 4 mines.

Just looking at the simulation in this way, it seems to be working very well. Most of the Kafons are finding land mines, and lot of land mines are getting cleared.

The biggest problem I see with Hassani’s approach has to do with efficiency, especially as you try to get more and more of the mines detected. The more Kafons you send into the field, the more overlapping you get, and the lower your efficiency becomes, and the harder it gets to detect remaining mines. I ran the simulation with different numbers of Kafons, always spaced at equal intervals, which gives them the best chance to clear as many mines as possible.

Here’s a graph showing the percentage of unexploded mines still left versus the number of Kafons that were released into the minefield. Each point on the graph represents a new simulation with that many Kafons. As you can see, at first adding more Kafons gives an almost linear decrease in the percentage of mines left, but the closer you get to clearing all the mines, the more elusive that goal becomes. Even after 2000 Kafons have been released, which if moving perfectly straight could have covered the area of our simulation four times over, there are still some mines left unexploded.

If you look on my blog you’ll see a post I did about something I call The Unicorn Problem, related to finding all of the new species in an environment. The problem there, as with this approach, is that the marginal rate of detection goes down as the number of attempts goes up. What’s happening, is that the more Kafons you use, the more they overlap territory.

Here you can see the amount of territory that’s be traversed more than once by a Kafon. The overall result is that the cost per mine destroyed gets higher and higher as you get closer to and closer to eliminating all of them. Here’s a plot of the cost per destroyed mine versus the number of Kafons used.

End result is cost per mine detected keeps increasing. Hassini estimates a cost of 40 euros to build each device, or about $50.

I wish Hassani the best of luck with his project, hopefully these issues can be addressed. Whatever the faults with this approach, this is a very important thing he’s doing. I noticed that in a more recent video he mentioned tracking the Kafon’s motion with GPS. I don’t know whether his initial price estimate included the cost of GPS. At any rate this would help to keep track of which areas have been covered and which haven’t, but unless he’s using a GPS accurate to within a foot, I wouldn’t want to try and walk in the exact path cleared by the device. In my simulation I’m assuming that every single time the Kafon is in the area of a mine it explodes it. It’s not clear that would be the case. It would be easy enough to add a probability of failure to the simulation.

There are adjustments you can make to the simulation or its parameters which would result in more of the Kafons being effective, though any design that relies on wind patterns is going to suffer from the same diminishing returns and unicorn problem, even if the wind is widely turbulent and increases the probability that all of the plates will get used, there’s still the problem of overlap and perhaps even worse performance if the Kafons get stuck in one area, or are quickly blown out of the mine field. Overall there are lots of reasons not to want to walk out into a mine field, no matter how many kafons have been through it.

Even if the Mine Kafon isn’t the best option for clearing an entire region of mines, they might still be an effective way to test for the presence of mines in an area, to do a sample of the area and see how likely it is to contain land mines, and if so how many and what regions might have higher concentrations of mines.


14
Dec 12

Let it snow!

A couple days ago I noticed a fun piece of R code by Allan Roberts, which lets you create a digital snowflake by cutting out virtual triangles. Go give it a try. Roberts inspired me to create a whole night sky of snowflakes. I tried to make the snowfall look as organic as possible. There are lots of options to adjust. Here’s the code, have fun and Happy Holidays!

# Code by Matt Asher for statisticsblog.com
# Feel free to modify and redistribute 

# How many flakes do you want to fall?
flakes = 100

# Width and height of your space
width = 800
height = 600

# Initial wind
wind = 0

# Setup the background of the plot and margins
par(bg = "black")
par(oma=c(0,0,0,0))
par(mar=c(0,0,0,0))
plot(0, 0, col="black", pch=".", xlim=c(0,width), ylim=c(0,height), axes=F)

for(i in 1:flakes) {
    startY = height
    startX = runif(1,1,width)

    xPos = startX
    yPos = startY

    for(j in 1:height) {

		# Optional drift in wind
		wind = wind + rcauchy(1,0,.05)

		# Update snowflake position
        xPos = xPos + rnorm(1,.1,1.5)
        yPos = yPos - runif(1,4,20)

        # Are we in the space, if so display it
        if(xPos > 0 && xPos <= width && yPos > 0 && yPos <= height) {
            points(round(xPos), round(yPos), col="white", pch=8)

            # System dely, slows down the flakes
            Sys.sleep(0.1)
        }
    }
}

3
Dec 12

The surprisingly weak case for global warming

I welcome your thoughts on this post, but please read through to the end before commenting. Also, you’ll find the related code (in R) at the end. For those new to this blog, you may be taken aback (though hopefully not bored or shocked!) by how I expose my full process and reasoning. This is intentional and, I strongly believe, much more honest than presenting results without reference to how many different approaches were taken, or how many models were fit, before everything got tidied up into one neat, definitive finding.

Fast summaries

TL;DR (scientific version): Based solely on year-over-year changes in surface temperatures, the net increase since 1881 is fully explainable as a non-independent random walk with no trend.
TL;DR (simple version): Statistician does a test, fails to find evidence of global warming.

Introduction and definitions

As so often happens to terms which have entered the political debate, “global warming” has become infused with additional meanings and implications that go well beyond the literal statement: “the earth is getting warmer.” Anytime someone begins a discussion of global warming (henceforth GW) without a precise definition of what they mean, you should assume their thinking is muddled or their goal is to bamboozle. Here’s my own breakdown of GW into nine related claims:

  1. The earth has been getting warmer.
  2. This warming is part of a long term (secular) trend.
  3. Warming will be extreme enough to radically change the earth’s environment.
  4. The changes will be, on balance, highly negative.
  5. The most significant cause of this change is carbon emissions from human beings.
  6. Human beings have the ability to significantly reverse this trend.
  7. Massive, multilateral cuts to emissions are a realistic possibility.
  8. Such massive cuts are unlikely to cause unintended consequences more severe than the warming itself.
  9. Emissions cuts are better than alternative strategies, including technological fixes (i.e. iron fertilization), or waiting until scientific advances make better technological fixes likely.

Note that not all proponents of GW believe all nine of these assertions.

The data and the test (for GW1)

The only claims I’m going to evaluate are GW1 and GW2. For data, I’m using surface temperature information from NASA. I’m only considering the yearly average temperature, computed by finding the average of four seasons as listed in the data. The first full year of (seasonal) data is 1881, the last year is 2011 (for this data, years begin in December and end in November).

According to NASA’s data, in 1881 the average yearly surface temperature was 13.76°C. Last year the same average was 14.52°C, or 0.76°C higher (standard deviation on the yearly changes is 0.11°C). None of the most recent ten years have been colder than any of the first ten years. Taking the data at face value (i.e. ignoring claims that it hasn’t been properly adjusted for urban heat islands or that it has been manipulated), the evidence for GW1 is indisputable: The earth has been getting warmer.

Usually, though, what people mean by GW is more than just GW; they mean GW2 as well, since without GW2 none of the other claims are tenable, and the entire discussion might be reduced to a conversation like this:

“I looked up the temperature record this afternoon, and noticed that the earth is now three quarters of a degree warmer than it was in the time of my great great great grandfather.”
“Why, I do believe you are correct, and wasn’t he the one who assassinated James A. Garfield?”
“No, no, no. He’s the one who forced Sitting Bull to surrender in Saskatchewan.”

Testing GW2

Do the data compel us to view GW as part of a trend and not just background noise? To evaluate this claim, I’ll be taking a standard hypothesis testing approach, starting with the null hypothesis that year-over-year (YoY) temperature changes represent an undirected random walk. Under this hypothesis, the YoY changes are modeled as a independent draws from a distribution with mean zero. The final temperature represents the sum of 130 of these YoY changes. To obtain my sampling distribution, I’ve calculated the 130 YoY changes in the data, then subtracted the mean from each one. This way, I’m left with a distribution with the same variance as in the original data. YoY jumps in temperature will be just as spread apart as before, but with the whole distribution shifted over until its expected value becomes zero. Note that I’m not assuming a theoretical distributional form (eg Normality), all of the data I’m working with is empirical.

My test will be to see if, by sampling 130 times (with replacement!) from this distribution of mean zero, we can nonetheless replicate a net change in global temperatures that’s just as extreme as the one in the original data. Specifically, our p-value will be the fraction of times our Monte Carlo simulation yields a temperature change of greater than 0.76°C or less than -0.76°C. Note that mathematically, this is the same test as drawing from the original data, unaltered, then checking how often the sum of changes resulted in a net temperature change of less than 0 or more than 1.52°C.

I have not set a “critical” p-value in advance for rejecting the null hypothesis, as I find this approach to be severely limiting and just as damaging to science as J-Lo is to film. Instead, I’ll comment on the implied strength of the evidence in qualitative terms.

Initial results

The initial results are shown graphically at the beginning of this post (I’ll wait while you scroll back up). As you can see, a large percentage of the samples gave a more extreme temperature change than what was actually observed (shown in red). During the 1000 trials visualized, 56% of the time the results were more extreme than the original data after 130 years worth of changes. I ran the simulation again with millions of trials (turn off plotting if you’re going to try this!); the true p-value for this experiment is approximately 0.55.

For those unfamiliar with how p-values work, this means that, assuming temperature changes are randomly plucked out of a bundle of numbers centered at zero (ie no trend exists), we would still see equally dramatic changes in temperature 55% of the time. Under even the most generous interpretation of the p-value, we have no reason to reject the null hypothesis. In other words, this test finds zero evidence of a global warming trend.

Testing assumptions Part 1

But wait! We still haven’t tested our assumptions. First, are the YoY changes independent? Here’s a scatterplot showing the change in temperature one year versus the change in temperature the next year:

Looks like there’s a negative correlation. A quick linear regression gives a p-value of 0.00846; it’s highly unlikely that the correlation we see (-0.32) is mere chance. One more test worth running is the ACF, or the Autocorrelation function. Here’s the plot R gives us:

Evidence for a negative correlation between consecutive YoY changes is very strong, and there’s some evidence for a negative correlation between YoY changes which are 2 years apart as well.

Before I explain how to incorporate this information into a revised Monte Carlo simulation, what does a negative correlation mean in this context? It tells us that if the earth’s temperature rises by more than average in one year, it’s likely to fall (or rise less than average) the following year, and vice versa. The bigger the jump one way, the larger the jump the other way next year (note this is not a case of regression to the mean; these are changes in temperature, not absolute temperatures. Update: This interpretation depends on your assumptions. Specifically, if you begin by assuming a trend exists, you could see this as regression to the mean. Note, however, that if you start with noise, then draw a moving average, this will induce regression to the mean along your “trendline”). If anything, this is evidence that the earth has some kind of built in balancing mechanism for global temperature changes, but as a non-climatologist all I can say is that the data are compatible with such a mechanism; I have no idea if this makes sense physically.

Correcting for correlation

What effect will factoring in this negative correlation have on our simulation? My initial guess is that it will cause the total temperature change after 130 years to be much smaller than under the pure random walk model, since changes one year are likely to be balanced out by changes next year in the opposite direction. This would, in turn, suggest that the observed 0.76°C change over the past 130 years is much less likely to happen without a trend.

The most straightforward way to incorporate this correlation into our simulation is to sample YoY changes in 2-year increments. Instead of 130 individual changes, we take 65 changes from our set of centered changes, then for each sample we look at that year’s changes and the year that immediately follows it. Here’s what the plot looks like for 1000 trials.

After doing 100,000 trials with 2 year increments, we get a p-value of 0.48. Not much change, and still far from being significant. Sampling 3 years at a time brings our p-value down to 0.39. Note that as we grab longer and longer consecutive chains at once, the p-value has to approach 0 (asymptotically) because we are more and more likely to end up with the original 130 year sequence of (centered) changes, or a sequence which is very similar. For example, increasing our chain from one YoY change to three reduces the number of samplings from 130130 to approximately 4343 – still a huge number, but many orders of magnitude less (Fun problem: calculate exactly how many fewer orders of magnitude. Hint: If it takes you more than a few minutes, you’re doing it wrong).

Correcting for correlation Part 2 (A better way?)

To be more certain of the results, I ran the simulation in a second way. First I sampled 130 of the changes at random, then I threw out any samplings where the correlation coefficient was greater than -0.32. This left me with the subset of random samplings whose coefficients were less than -0.32. I then tested these samplings to see the fraction that gave results as extreme as our original data.

Compared to the chained approach above, I consider this to be a more “honest” way to sample an empirical distribution, given the constraint of a (maximum) correlation threshold. I base this on E.T. Jaynes’ demonstration that, in the face of ignorance as to how a particular statistic was generated, the best approach is to maximize the (informational) entropy. The resulting solution is the most likely result you would get if you sampled from the full space (uniformly), then limited your results to those which match your criteria. Intuitively, this approach says: Of all the ways to arrive at a correlation of -0.32 or less, which are the most likely to occur?

For a more thorough discussion of maximum entropy approaches, see Chapter 11 of Jaynes’ book “Probability Theory” or his “Papers on Probability” (1979). Note that this is complicated, mind-blowing stuff (it was for me, anyway). I strongly recommend taking the time to understand it, but don’t bother unless you have at least an intermediate-level understanding of math and probability.

Here’s what the plot looks like subject to the correlation constraint:

If it looks similar to the other plots in terms of results, that’s because it is. Empirical p-value from 1000 trials? 0.55. Because generating samples with the required correlation coefficients took so long, these were the only trials I performed. However, the results after 1000 trials are very similar to those for 100,000 or a million trials, and with a p-value this high there’s no realistic chance of getting a statistically significant result with more trials (though feel free to try for yourself using the R code and your cluster of computers running Hadoop). In sum, the maximum entropy approach, just like the naive random walk simulation and the consecutive-year simulations, gives us no reason to doubt our default explanation of GW2 – that it is the result of random, undirected changes over time.

One more assumption to test

Another assumption in our model is that that YoY changes have constant variance over time (homoscedasticity). Here’s the plot of the (raw, uncentered) YoY changes:

It appears that the variance might be increasing over time, but just looking at the plot isn’t conclusive. To be sure, I took the absolute value of the changes and ran a simple regression on them. The result? Variance is increasing (p-value 0.00267), though at a rate that’s barely perceptible; the estimated absolute increase in magnitude of the YoY changes is 0.046. That figure is in hundreths of degrees Celsius, so our linear model gives a rate of increase in variability of just 4.6 ten-thousands of a degree per year. Over the course of 130 years, that equates to an increase of six hundredths of a degree Celsius (margin of error of 3.9 hundredths at two std deviations). This strikes me as a miniscule amount, though relative to the size of the YoY changes themselves it’s non-trivial.

Does this increase in volatility invalidate our simulation? I don’t think so. Any model which took into account this increase in volatility (while still being centered) would be more likely to produce extreme results under the null hypothesis of undirected change. In other words, the bigger the yearly temperature changes, the more likely a random sampling of those changes will lead us far away from our 13.8°C starting point in 1881, with most of the variation coming towards the end. If we look at the data, this is exactly what happens. During the first 63 years of data the temperature increases by 42 hundredths of a degree, then drops 40 hundredths in just 12 years, then rises 80 hundredths within 25 years of that; the temperature roller coaster is becoming more extreme over time, as variability increases.

Beyond falsifiability

Philosopher Karl Popper insisted that for a theory to be scientific, it must be falsifiabile. That is, there must exist the possibility of evidence to refute the theory, if the theory is incorrect. But falsifiability, by itself, is too low a bar for a theory to gain acceptance. Popper argued that there were gradations and that “the amount of empirical information conveyed by a theory, or it’s empirical content, increases with its degree of falsifiability” (emphasis in original).

Put in my words, the easier it is to disprove a theory, the more valuable the theory. (Incorrect) theories are easy to disprove if they give narrow prediction bands, are testable in a reasonable amount of time using current technology and measurement tools, and if they predict something novel or unexpected (given our existing theories).

Perhaps you have already begun to evaluate the GW claims in terms of these criteria. I won’t do a full assay of how the GW theories measure up, but I will note that we’ve had several long periods (10 years or more) with no increase in global temperatures, so any theory of GW3 or GW5 will have to be broad enough to encompass decades of non-warming, which in turn makes the theory much harder to disprove. We are in one of those sideways periods right now. That may be ending, but if it doesn’t, how many more years of non-warming would we need for scientists to abandon the theory?

I should point out that a poor or a weak theory isn’t the same as an incorrect theory. It’s conceivable that the earth is in a long-term warming trend (GW2) and that this warming has a man-made component (GW5), but that this will be a slow process with plenty of backsliding, visible only over hundreds or thousands of years. The problem we face is that GW3 and beyond are extreme claims, often made to bolster support for extreme changes in how we live. Does it make sense to base extreme claims on difficult to falsify theories backed up by evidence as weak as the global temperature data?

Invoking Pascal’s Wager

Many of the arguments in favor of radical changes to how we live go like this: Even if the case for extreme man-made temperature change is weak, the consequences could be catastrophic. Therefore, it’s worth spending a huge amount of money to head off a potential disaster. In this form, the argument reminds me of Pascal’s Wager, named after Blaise Pascal, a 17th century mathematician and co-founder of modern probability theory. Pascal argued that you should “wager” in favor of the existance of God and live life accordingly: If you are right, the outcome is infinitely good, whereas if you are wrong and there is no God, the most you will have lost is a lifetime of pleasure.

Before writing this post, I Googled to see if others had made this same connection. I found many discussions of the similarities, including this excellent article by Jim Manzi at The American Scene. Manzi points out problems with applying Pascal’s Wager, including the difficulty in defining a stopping point for spending resources to prevent the event. If a 20°C increase in temperature is possible, and given that such an increase would be devastating to billions of people, then we should be willing to spend a nearly unlimited amount to avert even a tiny chance of such an increase. The math works like this: Amount we should be willing to spend = probability of 20°C increase (say 0.00001) * harm such an increase would do (a godzilla dollars). The end result is bigger than the GDP of the planet.

Of course, catastrophic GW isn’t the only potential threat can have Pascal’s Wager applied to it. We also face annihilation from asteroids, nuclear war, and new diseases. Which of these holds the trump card to claim all of our resources? Obviously we need some other approach besides throwing all our money at the problem with the scariest Black Swan potential.

There’s another problem with using Pascal’s Wager style arguments, one I rarely see discussed: proponents fail to consider the possibility that, in radically altering how we live, we might invite some other Black Swan to the table. In his original argument, Pascal the Jansenist (sub-sect of Christianity) doesn’t take into account the possibility that God is a Muslim and would be more upset by Pascal’s professed Christianity than He would be with someone who led a secular lifestyle. Note that these two probabilities – that God is Muslim who hates Christians more than atheists, or that God is Christian and hates atheists – are incommesurable! There’s no rational way to weigh them and pick the safer bet.

What possible Black Swans do we invite by forcing people to live at the same per-capita energy-consumption level as our forefathers in the time of James A. Garfield?

Before moving on, I should make clear that humans should, in general, be very wary of inviting Black Swans to visit. This goes for all experimentation we do at the sub-atomic level, including work done at the LHC (sorry!), and for our attempts to contact aliens (as Stephen Hawking has pointed out, there’s no certainty that the creatures we attract will have our best interests in mind). So, unless we can point to strong, clear, tangible benefits from these activities, they should be stopped immediately.

Beware the anthropic principle

Strictly speaking, the anthropic principle states that no matter how low the odds are that any given planet will house complex organisms, one can’t conclude that the existence of life on our planet is a miracle. Essentially, if we didn’t exist, we wouldn’t be around to “notice” the lack of life. The chance that we should happen to live on a planet with complex organisms is 1, because it has to be.

More broadly, the anthropic principle is related to our tendency to notice extreme results, then assume these extremes must indicate something more than the noise inherent in random variation. For example, if we gathered together 1000 monkeys to predict coin tosses, it’s likely that one of them will predict the first 10 flips correctly. Is this one a genius, a psychic, an uber-monkey? No. We just noticed that one monkey because its record stood out.

Here’s another, potentially lucrative, most likely illegal, definitely immoral use of the anthropic principle. Send out a million email messages. In half of them, predict that a particular stock will go up the next day, in the other half predict it will go down. The next day, send another round of predictions to just those emails that got the correct prediction the first time. Continue sending predictions to only those recipients who receive the correct guesses. After a dozen days, you’ll have a list of people who’ve seen you make 12 straight correct predictions. Tell these people to buy a stock you want to pump and dump. Chances are good they’ll bite, since from their perspective you look like a stock-picking genius.

What does this have to do with GW? It means that we have to disentangle our natural tendency to latch on to apparent patterns from the possibility that this particular pattern is real, and not just an artifact of our bias towards noticing unlikely events under null hypotheses.

Biases, ignorance, and the brief life, death, and afterlife of a pet theory

While the increase in volatility seen in the temperature data complicates our analysis of the data, it gives me hope for a pet theory about climate change which I’d buried last year (where does one bury a pet theory?). The theory (for which I share credit with my wife and several glasses of  wine) is that the true change in our climate should best be described as Distributed Season Shifting, or DSS. In short, DSS states that we are now more likely to have unseasonably warm days during the colder months, and unseasonably cold days during the warmer months. Our seasons are shifting, but in a chaotic, distributed way. We built this theory after noticing a “weirdening” of our weather here in Toronto. Unfortunately (for the theory), no matter how badly I tortured the local temperature data, I couldn’t get it to confess to DSS.

However, maybe I was looking at too small a sample of data. The observed increase in volatility of global YoY changes might also be reflected in higher volatility within the year, but the effects may be so small that no single town’s data is enough to overcome the high level of “normal” volatility within seasonal weather patterns.

My tendency to look for confirmation of DSS in weather data is a bias. Do I have any other biases when it comes to GW? If anything, as the owner of a recreational property located north of our northern city, I have a vested interest in a warmer earth. Both personally (hotter weather = more swimming) and financially, GW2 and 3 would be beneficial. In a Machiavellian sense, this might give me an incentive to downplay GW2 and beyond, with the hope that our failure to act now will make GW3 inevitable. On the other hand, I also have an incentive to increase the perception of GW2, since I will someday be selling my place to a buyer who will base her bid on how many months of summer fun she expects to have in years to come.

Whatever impact my property ownership and failed theory have on this data analysis, I am blissfully free of one biasing factor shared by all working climatologists: the pressures to conform to peer consensus. Don’t underestimate the power of this force! It effects everything from what gets published to who gets tenure. While in the long run scientific evidence wins out, the short run isn’t always so short: For several decades the medical establishment pushed the health benefits of a low fat, high carb diet. Alternative views are only now getting attention, despite hundreds of millions of dollars spent on research which failed to back up the consensus claims.

Is the overall evidence for GW2 – 9 as weak as the evidence used to promote high carb diets? I have no idea. Beyond the global data I’m examining here, and my failed attempt to “discover” DSS in Toronto’s temperature data, I’m coming from a position of nearly complete ignorance: I haven’t read the journal articles, I don’t understand the chemistry, and I’ve never seen Al Gore’s movie.

Final analysis and caveats

Chances are, if you already had strong opinions about the nine faces of GW before reading this article, you won’t have changed your opinion much. In particular, if a deep understanding of the science has convinced you that GW is a long term, man-made trend, you can point out that I haven’t disproven your view. You could also argue the limitations of testing the data using the data, though I find this more defensible than testing the data with a model created to fit the data.

Regardless of your prior thinking, I hope you recognize that my analysis shows that YoY temperature data, by itself, provides no evidence for GW2 and beyond. Also, because of the relatively long periods of non-warming within the context of an overall rise in global temperature, any correct theory of GW must include backsliding within it’s confidence intervals for predictions, making it a weaker theory.

What did my analysis show for sure? Clearly, temperatures have risen since the 1880s. Also, volatility in temperature changes has increased. That, of itself, has huge implications for our lives, and tempts me to do more research on DSS (what do you call pet theory that’s risen from the dead?). I’ve also become intrigued with the idea that our climate (at large) has mechanisms to balance out changes in temperature. In terms of GW2 itself, my analysis has not convinced me that it’s all a myth. If we label random variation “noise” and call trend a “signal,” I’ve shown that yearly temperature changes are compatible with an explanation of pure noise. I haven’t shown that no signal exists.

Thanks for reading all the way through! Here’s the code:

Code in R

theData = read.table("/path/to/theData/FromNASA/cleanedForR.txt", header=T) 

# There has to be a more elegant way to do this
theData$means = rowMeans(aggregate(theData[,c("DJF","MAM","JJA","SON")], by=list(theData$Year), FUN="mean")[,2:5])

# Get a single vector of Year over Year changes
rawChanges = diff(theData$means, 1)

# SD on yearly changes
sd(rawChanges)

# Subtract off the mean, so that the distribution now has an expectaion of zero
changes = rawChanges - mean(rawChanges)

# Find the total range, 1881 to 2011
(theData$means[131] - theData$means[1])/100

# Year 1 average, year 131 average, difference between them in hundreths
y1a = theData$means[1]/100 + 14
y131a = theData$means[131]/100 + 14
netChange = (y131a - y1a)*100 

# First simulation, with plotting
plot.ts(cumsum(c(0,rawChanges)), col="red", ylim=c(-300,300), lwd=3, xlab="Year", ylab="Temperature anomaly in hundreths of a degrees Celsius")

trials = 1000
finalResults = rep(0,trials)

for(i in 1:trials) {
	jumps = sample(changes, 130, replace=T)

	# Add lines to plot for this, note the "alpha" term for transparency
	lines(cumsum(c(0,jumps)), col=rgb(0, 0, 1, alpha = .1))

	finalResults[i] = sum(jumps)

}

# Re-plot red line again on top, so it's visible again
lines(cumsum(c(0,rawChanges)), col="red", ylim=c(-300,300), lwd=3) 

# Fnd the fraction of trials that were more extreme than the original data
( length(finalResults[finalResults>netChange]) + length(finalResults[finalResults<(-netChange)]) ) / trials # Many more simulations, minus plotting trials = 10^6 finalResults = rep(0,trials) for(i in 1:trials) { 	jumps = sample(changes, 130, replace=T) 	 	finalResults[i] = sum(jumps) } # Fnd the fraction of trials that were more extreme than the original data ( length(finalResults[finalResults>netChange]) + length(finalResults[finalResults<(-netChange)]) ) / trials # Looking at the correlation between YoY changes x = changes[seq(1,129,2)] y = changes[seq(2,130,2)] plot(x,y,col="blue", pch=20, xlab="YoY change in year i (hundreths of a degree)", ylab="YoY change in year i+1 (hundreths of a degree)") summary(lm(x~y)) cor(x,y) acf(changes) # Try sampling in 2-year increments plot.ts(cumsum(c(0,rawChanges)), col="red", ylim=c(-300,300), lwd=3, xlab="Year", ylab="Temperature anomaly in hundreths of a degrees Celsius") trials = 1000 finalResults = rep(0,trials) for(i in 1:trials) { 	indexes = sample(1:129,65,replace=T) 	 	# Interlace consecutive years, to maintian the order of the jumps  	jumps = as.vector(rbind(changes[indexes],changes[(indexes+1)])) 	 	lines(cumsum(c(0,jumps)), col=rgb(0, 0, 1, alpha = .1)) 	 	finalResults[i] = sum(jumps) } # Re-plot red line again on top, so it's visible again lines(cumsum(c(0,rawChanges)), col="red", ylim=c(-300,300), lwd=3)  # Find the fraction of trials that were more extreme than the original data ( length(finalResults[finalResults>netChange]) + length(finalResults[finalResults<(-netChange)]) ) / trials # Try sampling in 3-year increments trials = 100000 finalResults = rep(0,trials) for(i in 1:trials) { 	indexes = sample(1:128,43,replace=T) 	 	# Interlace consecutive years, to maintian the order of the jumps  	jumps = as.vector(rbind(changes[indexes],changes[(indexes+1)],changes[(indexes+2)])) 	 	# Grab one final YoY change to fill out the 130 	jumps = c(jumps, sample(changes, 1)) 	 	finalResults[i] = sum(jumps) } # Fnd the fraction of trials that were more extreme than the original data ( length(finalResults[finalResults>netChange]) + length(finalResults[finalResults<(-netChange)]) ) / trials # The maxEnt method for conditional sampling lines(cumsum(c(0,rawChanges)), col="red", ylim=c(-300,300), lwd=3)  trials = 1000 finalResults = rep(0,trials) for(i in 1:trials) { 	theCor = 0 	while(theCor > -.32) {
		jumps = sample(changes, 130, replace=T)
		theCor = cor(jumps[1:129],jumps[2:130])
	}

	# Add lines to plot for this
	lines(cumsum(jumps), col=rgb(0, 0, 1, alpha = .1))

	finalResults[i] = sum(jumps)

}

# Re-plot red line again on top, so it's visible again
lines(cumsum(c(0,rawChanges)), col="red", ylim=c(-300,300), lwd=3) 

( length(finalResults[finalResults>74]) + length(finalResults[finalResults<(-74)]) ) / trials

# Plot of YoY changes over time
plot(rawChanges,pch=20,col="blue", xlab="Year", ylab="YoY change (in hundreths of a degree)")

# Is there a trend?
absRawChanges = abs(rawChanges)
pts = 1:130
summary(lm(absRawChanges~pts))

25
Oct 12

How fat are your tails?

Lately I’ve been thinking about how to measure the fatness of the tails of a distribution. After some searching, I came across the Pareto Tail Index method. This seems to be used mostly in economics. It works by finding the decay rate of the tail. It’s complicated, both in formula and in it’s R implementation (I couldn’t get “awstindex” to run, which supposedly can be used to calculate it). The Index also has the disadvantage of being a “curve fitting” approach, where you start by assuming a particular distribution, then see which parameter gives the best fit. If this doesn’t seem morally abhorrent to you, perhaps you have a future as a high-paid econometrician.

In the past I’ve looked at how to visualize the impact of the tails on expectation, but what I really wanted was a single number to measure fatness. Poking around the interwebs, I found a more promising approach. The Mean Absolute Deviation (or MAD, not to be confused with the Median Absolute Distribution, or MAD) measures the average absolute distance between a random variable and it’s mean. Unlike the Standard Deviation (SD), the MAD contains no squared terms, which makes it less volatile to outliers.

As a result, we can use the MAD/SD ratio as a gauge of fat-tailedness. The closer the number is to zero, the fatter the tails. The closer the number is to 1 (it can never exceed 1!), the thinner the tails. For example, the normal distribution has a MAD/SD ratio of 0.7970, which happens to be the square root of 2 over pi (not a coincidence, try proving this if you rock at solving integrals).

The graph at the beginning of this post shows a Monte Carlo estimation of the MAD/SD ratio for the Student T distribution as it goes from very high Degrees of Freedom (1024) to very low (1/4). You may know that the T distro converges to the Normal at high degrees of freedom (hence the result of nearly .8 for high DF), but did you know that the T distro on 1 Degree of Freedom is the same as the infamously fat-tailed Cauchy? And why stop at 1? We can keep going into fractional DFs. I’ve plotted the ratio all the way down to 1/4. As always, code in R is at the end of the post.

One more thing: there is at least one continuous distribution for which the MAD/SD ratio reaches it’s maximum possible value of one. First person to guess this maximally thin-tailed distribution gets a free copy of the comic I worked on.

# Start with a Normal, move to a Cauchy
dfs = 2^(10:-2)
results = c()
for(i in dfs) {
	x = rt(1000000,i)
	results = c(results, mean(mean(abs(x))/sd(x)))
}

# Note the wonky x-axis limit and order
plot(rev(-2:10), results, col="blue", pch=20, xlim=rev(range(-2:10)), xlab="Degrees of Freedom (binary log scale)", ylab="MAD/SD ratio")

13
Oct 12

The unicorn problem

Let’s say your goal is to observe all known species in a particular biological category. Once a week you go out and collect specimens to identify, or maybe you just bring your binoculars to do some spotting. How long will it take you to cross off every species on your list?

I’ve been wondering this lately since I’ve begun to hang out on the Mushrooms of Québec flickr group. So far they have over 2200 species included in the photos. At least one of the contributors has written a book on the subject, which got me wondering how long it might take him to gather his own photos and field observations on every single known species.

My crude model (see R code at the end of this post) assumes that every time Yves goes out, he has a fixed chance of encountering every given species. In other words, if there were 1000 species to find, and he averages 50 species per hunt, then every species is assigned a 1/20 chance of being found per trip. Thus the total found on any trip would have a Poisson distribution with parameter 50.

This model is unrealistic for lots of reasons (can you spot all the bad assumptions?), but it does show one of the daunting problems with this task: the closer you get to the end, the harder it becomes to find the last few species. In particular, you can be stuck at 1 for a depressingly long time. Run the simulation with different options and look at the graphs you get. I’m calling this “The unicorn problem,” after Nic Cage’s impossible-to-rob car in the movie Gone in 60 Seconds.

Do you have a real-life unicorn of one sort or another?


species = 1000
findP = 1/20
trials = 100
triesToFindAll = rep(0, trials)



for(i in 1:trials) {
	triesNeeded = 0
	
	leftToFind = rep(1, species)
	leftNow = species
	numberLeft = c()
	
	while (leftNow > 0) {
	
		found = sample( c(0,1), 1000, replace=TRUE, prob = c((1-findP),findP))
		
		leftToFind = leftToFind - found
		
		leftNow = length(leftToFind[leftToFind > 0])
		
		numberLeft = c(numberLeft, leftNow)
		
		triesNeeded = triesNeeded + 1
	
	}
	
	if(i == 1) {
		plot.ts(numberLeft, xlim=c(0, 2*triesNeeded), col="blue", ylab="Species left to find", xlab="Attempts")
	} else {
		lines(numberLeft, col="blue")
	}
	
	triesToFindAll[i] = triesNeeded
}


4
Jan 12

Iowa: Was the fix in? (a statistical analysis of the results)

Summary/TL;DR
Either the first precincts to report were widely unrepresentative of Iowa as a whole, or something screwy happened.

Background
Yesterday was the first primary for the 2012 U.S. presidential elections. When I logged off the internet last night, the results in Iowa showed a dead heat between Ron Paul, Mitt Romney, and Rick Santorum. When I woke up this morning and checked the results from my phone, they were very different. So before connecting to the internet, I took a screen shot of what I saw before going to bed. Here it is:

Then I connected to the internet and refreshed that page:

It seemed strange to me that the results should change so dramatically after 25% of the votes had already been recorded. As a statistician, my next question was: how unusual is this? That’s a question that can be tested. In particular, I can test how often you might have a split of voters like the one shown in the first screen shot, if the final split is like the one shown in the other screen shot, given that the first precincts to report were similar to later ones in voter composition.

That’s a lot to digest all at once, so I’m going to repeat and clarify exactly what I’m assuming, and what I’m going to test.

The assumptions
First, I assume the following:
1. That CNN was showing the correct partial results as they became available. Similarly, I am assuming that the amount shown with 99% of votes reported (second screen shot) is the true final tally, give or take some insignificant amount.

2. That the precincts to report their vote totals first were a random sampling of the precincts overall. Given how spread out these appear to be in the first screen shot, this seems like a good assumption. But that might not be the case. See the end of this post for more about that possibility.

3. No fraud, manipulation, or other shenanigans occurred in terms of counting up the votes and reporting them.

The test
Given these three assumptions, I’m going to come up with a numeric value for the following:
1. What is the probability that the split, at 25% of the vote tallied, would show Ron Paul, Mitt Romney, and Rick Santorum all above 6,200 votes.

It’s possible to come up with a theoretical value for this probability using a formal statistical test. If you decide to work this out, make sure to take into account the fact that your initial sample size (25%) is large compared to the total population. You’ll also need to factor in all of the candidates. Could get messy.

For my analysis, I used the tool I trust: Monte Carlo simulation. I created a simulated population of 121,972 votes, with 26,219 who favor Ron Paul, 30,015 who favor Mitt Romney, and so on. Then I sampled 27,009 from them (the total votes tallied as of the first screen shot). Then I looked at the simulated split as of that moment, and saw if the three top candidates at the end are all above 6,200 votes. What about just Ron Paul?

I’ve coded my simulation using the programming language R, you can see my code at the end of this post.

The results
Out of 100,000 simulations, this result came up not even once! In all those trials, Ron Paul never broke 6,067 votes at the time of the split.

I ran this test a couple times, and each time the result was the same.

Conclusion
If my three assumptions are correct, the probability of observing partial results like we saw is extremely small. It’s much more likely that one of the assumptions is wrong. It could be that the early reports were wrong, though that seems unlikely. The other websites showed the same information or very similar, so it seems doubtful that an error occurred in passing along the information.

Was there something odd about the precincts that reported early? This is not something you could tell just by looking at split vs final data. The data clearly show that the later precincts disfavored Ron Paul, but that’s just what we want to know: did they really disfavor him, or was the data manipulated in some way. The question is, were any of the results faked, tweaked, massaged, Diebold-ed?

To answer that question, we’d need to know if these later precincts to report were expected, beforehand, to disfavor Ron Paul relative to the others. It would also help to look at entrance polling from all of the precincts, and compare the ones that were part of the early reporting versus those that were part of the later reports. At this point, I have to ask for help from you, citizen of the internet. Is this something we can figure out?

UPDATE
In case folks are interested, here’s a histogram of the 100,000 simulations. This shows the distribution of votes for Ron Paul as of the split, given the assumptions. As you can see it’s a nice bell curve, which it should be. Also note how far out on the curve 6,240 would be.

The code
Oh, one final possibility is that I messed up my code. You can check it below and see:

# Code for StatisticsBlog.com by Matt Asher

# Vote amounts
splits = list()
splits["MR"] = 6297
splits["RS"] = 6256
splits["RP"] = 6240
splits["NG"] = 3596
splits["JRP"] = 2833
splits["MB"] = 1608
splits["JH"] = 169
splits["HC"] = 10

finals = list()
finals["MR"] = 30015
finals["RS"] = 30007
finals["RP"] = 26219
finals["NG"] = 16251
finals["JRP"] = 12604
finals["MB"] = 6073
finals["JH"] = 745
finals["HC"] = 58

# Get an array with all voters:
population = c()
for (name in names(finals)) {
    population = c(population, rep(name, finals[[name]]))
}

# This was the initial split
initialSplit = c()
for (name in names(splits)) {
    initialSplit = c(initialSplit, rep(name, splits[[name]]))
}


# How many times to pull a sample
iters = 100000

# Sample size equal to the size at split
sampleSize = length(initialSplit)

successes = 0
justRPsuccesses = 0

# Track how many votes RP gets at the split
rpResults = rep(0, iters)

for(i in 1:iters) {
	ourSample = sample(population, sampleSize, replace=F)
	results = table(ourSample)
	
	rpResults[i] = results[["RP"]];
	
	if(results[["RP"]]>6200) {
		justRPsuccesses = justRPsuccesses + 1
		
		if(results[["MR"]]>6200 & results[["RS"]]>6200) {
			successes = successes + 1
		}
	}
}

cat(paste("Had a total of", successes, "out of", iters, "trials, for a proportion of", successes/iters, "\n"))
cat(paste("RP had a total of", justRPsuccesses, "out of", iters, "trials, for a proportion of", justRPsuccesses/iters, "\n"))

6
Dec 11

My oh my

Noted without comment, visit Biostatistics Ryan Gosling !!! for more gems like the one above.