Press "Enter" to skip to content

Month: October 2012

Hurricane Sandy and the gambler's fallacy

There are presently no shortage of articles being written about Hurricane (now Post-Tropical Cyclone) Sandy and what role climate change has played in the formation of the "Frankenstorm" that has left over 7.5 million people without power. I came across one article, however, that should be addressed: Hurricane Sandy: The Worst-Case Scenario For New York City Is Unimaginable (by Mike Tidwell, at ThinkProgress.org).

Much of the article is a traditional "worst-case scenario" description found commonly in the media for seemingly all combinations of disasters and major cities. Of note, however, is the following paragraph (emphasis added):

Another major storm struck in 1892, then another in 1938 when the borderline Category 4 “Long Island Express” passed through the outskirts of greater New York, inflicting widespread death and destruction across New York state, New Jersey and much of New England. But that storm, 68 years ago, was the last major hurricane (Category 3 or above) to strike the New York Metropolitan region. It’s now a matter of when, not if, a big hurricane will strike again, according to meteorologists. And history says “when” is very soon.

The bolded statement is a classic example of the gambler's fallacy. It seems as if hurricanes can be reasonably modeled by a Poisson process, a useful probability model (e.g. this website, this paper (pdf), and this paper (pdf)), and this assumption will be used throughout this post.

One of the important properties associated with this model is called memorylessness. Essentially, what this means is that at any given time period, the probability of an event occurring does not depend on the history up to that point. That is, it doesn't matter how long a system has been running, a 'success' is just as likely to occur at this time period if this is the first time period, if this is the 10th time period, or if this is the 76th time period.

For example, when playing roulette, the probability of spinning a 'black' are the same on every spin, irrespective of how many times you have previously spun the wheel. Even if there have been 999 'red' in a row, we are not 'due' for a black spin (assuming the wheel is fair, etc.).

Just because New York City hasn't had a major hurricane for a while does not mean that it is due or overdue for one. Each hurricane season is a new cycle and does not remember the previous years' hurricanes.

I am not an expert in the fields of research associated with most disasters, but this makes sense in terms of hurricanes (both from various sources and my life as a Floridian). For other disasters (e.g. volcanoes and earthquakes), my understanding is that pressure builds over time and therefore reasonable models for those systems should not have the memorylessness property, so perhaps talking about 'overdue' for a major disaster is warranted. Maybe not, but I do want to emphasize that this is specific to hurricanes.

(It might also be worth checking out these comics related to the gambler's fallacy.)

Satellite image of Hurricane Sandy (25 October 2012)
Satellite image of Hurricane Sandy (25 October 2012)
Leave a Comment

A Terrible Pie Chart

I saw this graphic reblogged by NPR on Tumblr (originally posted by Luminous Enchiladas, though I can't be sure of the creator), and I must say that it is impressive.

Olympics vs Mars

Olympics vs Mars

There are some pretty substantial problems with this impressively bad graphic.

  • Pie charts should only be used when comparing parts to a whole. The $17.5 billion dollars that went to the Olympics and the Curiosity Rover wasn't a priori some whole amount of money. Treating it as "the whole" implies that there was only $17.5 billion dollars from wherever to be spent, and that it was spent only on the Olympics and Mars.
  • The pieces of the pie chart aren't labeled with the dollar amounts. Instead, the pieces are labeled with the piece's name which does address a complaint with pie charts (namely that the reader needs to continually look back and forth from the chart to the key). Because there are only two pieces, there is room for including the dollar figures in the chart area. With more complicated charts, this wouldn't be the case.
  • This chart uses an unnecessary "3D" effect which obscures the true areas being compared. A flat pie chart would be less misleading.

Additionally, there are some general problems with pie charts which make them inferior to other charts (specifically bar charts):

  • Comparing areas is difficult. Cleveland (1985) writes about how area comparisons are subject to bias, and Schmid (1983) specifically describes how, when comparing two circles (e.g. two pie charts of different size used to indicate change over time), the area of the larger circle is underestimated relative to the smaller.
  • Comparing angles is difficult. Cleveland (1985) states that ordering the sections of a pie chart is prone to error based on earlier empirical research.

References:

  • Cleveland, W. S. (1985). The elements of graphing data. Monterey, Calif: Wadsworth Advanced Books and Software.
  • Schmid, C. F. (1983). Statistical graphics: Design principles and practices. New York: Wiley.
Leave a Comment

Quadrant Count Ratio in R

The other day I was looking for a package that did the Quadrant Count Ratio (QCR) in R. I couldn't find one, so I whipped up some simple code to do what I needed to do.

qcr <- function(dat){
	n <- nrow(dat);
	m.x <- mean(dat[,1]); m.y <- mean(dat[,2]);
	# in QCR we ignore points that are on the mean lines
	# number of points in Quadrants 1 and 3
	q13 <- sum(dat[,1] > mean(dat[,1]) & dat[,2] > mean(dat[,2]))+sum(dat[,1] < mean(dat[,1]) & dat[,2] < mean(dat[,2]))
	# number of points in Quadrants 2 and 4
	q24 <- sum(dat[,1] < mean(dat[,1]) & dat[,2] > mean(dat[,2]))+sum(dat[,1] < mean(dat[,1]) & dat[,2] > mean(dat[,2]))
	return((q13-q24)/n)
	}

The above assumes dat is an Nx2 array with column 1 serving as X and column 2 serving as Y. This can easily be changed. I also wrote a little function to plot the mean lines:

plot.qcr <- function(dat){
	value <- qcr(dat);
	plot(dat);
	abline(v=mean(dat[,1]),col="blue"); # adds a line for x mean
	abline(h=mean(dat[,2]),col="red"); # adds a line for y mean
	}

Both of these functions are simple, but I will likely extend and polish them (and then release them as a package). I'd also like to explore what would happen to the QCR if median lines were used instead of mean lines. (This new QCR* would no longer directly motivate Pearson's Product-Moment Correlation, but could have its own set of advantages.) Below is a quick example:

# QCR example
set.seed(1)
dat.x <- c(1:10)
dat.y <- rbinom(10,10,.5)
dat <- cbind(dat.x,dat.y)
qcr(dat)
# [1] 0.6
plot.qcr(dat)

This is the plot:

A plot showing a QCR of 0.6.

For more information on the QCR check out this article: Holmes, Peter (2001). “Correlation: From Picture to Formula,” Teaching Statistics, 23(3):67–70.

[Updated on 2013-10-03 to add a link to Wikipedia.]

Leave a Comment

Grad school update

This semester I only have class two days per week. I knew going into this semester that I would be busy every day and not just the days I have class (this isn't my first time in grad school after all).

But wow — I've really stretched myself thin this semester. I just tried to list it all... and it didn't seem like much, but underestimating some of the time commitments is what is getting me.

A lot of the smaller projects I've wanted to work one have gotten pushed back, and keeping up with this blog has been harder than I anticipated. I don't quite have time for the Monday/Wednesday/Friday update schedule that I had planned, so I'll be switching to Monday/Thursday to see if I can stick to it.

Also, I took a trip to San Antonio in conjunction with LOCUS. That was fun and served to confirm that I'm on the right career path.

And now, a few quotable things I've heard this semester:

  • "We have the curse of multidimensionality... that'd be a good idea for a Halloween costume." -- Dr. Leite
  • "Better out than perfect." (on manuscripts and journal submissions) Said by Dr. Bondy, but not sure the original source. (Maybe a Dr. Johnson?)
Leave a Comment