Press "Enter" to skip to content

Tag: correlation

Another comic related to correlation and causation

Tree Lobsters has another statistics comic related to correlation, causation, and the misconception that they are the same thing. This comic really captures the need for greater statistical and scientific literacy and, more broadly, for better scientific communication. It is unreasonable to expect the public to be able to go to the literature to source claims and evaluate their reasonableness - becoming acquainted with the literature is part of what makes scientists, researchers, etc. specialists. Rather, we need to equip our students (all people, really, but we have access to them as students) with the ability to examine reports in the media with a critical lens.

Of course, combating pseudo-scientific thought and media hype is a lot easier said than done. Recently ScienceBlogs had a post using the context of anti-vaccine sentiment ("An open letter to my dad on the occasion of his recent anti-vax Facebook postings") which examines the issue of familiarity with the literature and the need to not seek out reports which confirm what one already believes. It is an engaging, personal read that has some very useful information.

The Overreactington Municipal School Board has voted overwhelmingly to remove all the other thing from its educational facilities.
"#361 This, That & The Other Thing" - Copyright 2008-2012, Tree Lobsters
keywords: correlation; causation; media;

Leave a Comment

Statistics in the news (use and abuse)

In early October 2012, the media started running stories about how eating many servings of fruits and vegetables is linked with happiness. The study in question was an observational study of the eating habits of 80,000 Britons and did find that, controlling for other socio-economic variables, high levels of happiness were associated with eating 7-8 servings (2.8oz) of fruits and vegetables per day (Blanchflower, Oswald, & Stewart-Brown, 2012). The study made it clear (in both the abstract and text) that because of its observational nature causality could not be determined:

  • "Reverse causality and problems of confounding remain possible."
  • "This implies that, as in some other parts of the well-being literature, we cannot draw firm inferences about causality."
  • "... with caveats about the lack here of clinching causal evidence..."
  • "... it is sensible to emphasize, first, the need for extreme caution in the interpretation of this study’s findings..."

The authors repeatedly made appropriate statements about the interpretability of the study  and the potential for future controlled studies to determine causality — exactly what one should do. The importance of this work is not diminished because of its observational nature, and serves to fill a gap in the well-being literature and suggest areas for future research.

But then the media happened.

Of course, some sources got the story right:

References

  • Blanchflower, David G., Oswald, Andrew J., and Stewart-Brown, Sarah (2012). Is Psychological Well-Being Linked to the Consumption of Fruit and Vegetables? Social Indicators Research(in press).
Leave a Comment

Quadrant Count Ratio in R

The other day I was looking for a package that did the Quadrant Count Ratio (QCR) in R. I couldn't find one, so I whipped up some simple code to do what I needed to do.

qcr <- function(dat){
	n <- nrow(dat);
	m.x <- mean(dat[,1]); m.y <- mean(dat[,2]);
	# in QCR we ignore points that are on the mean lines
	# number of points in Quadrants 1 and 3
	q13 <- sum(dat[,1] > mean(dat[,1]) & dat[,2] > mean(dat[,2]))+sum(dat[,1] < mean(dat[,1]) & dat[,2] < mean(dat[,2]))
	# number of points in Quadrants 2 and 4
	q24 <- sum(dat[,1] < mean(dat[,1]) & dat[,2] > mean(dat[,2]))+sum(dat[,1] < mean(dat[,1]) & dat[,2] > mean(dat[,2]))
	return((q13-q24)/n)
	}

The above assumes dat is an Nx2 array with column 1 serving as X and column 2 serving as Y. This can easily be changed. I also wrote a little function to plot the mean lines:

plot.qcr <- function(dat){
	value <- qcr(dat);
	plot(dat);
	abline(v=mean(dat[,1]),col="blue"); # adds a line for x mean
	abline(h=mean(dat[,2]),col="red"); # adds a line for y mean
	}

Both of these functions are simple, but I will likely extend and polish them (and then release them as a package). I'd also like to explore what would happen to the QCR if median lines were used instead of mean lines. (This new QCR* would no longer directly motivate Pearson's Product-Moment Correlation, but could have its own set of advantages.) Below is a quick example:

# QCR example
set.seed(1)
dat.x <- c(1:10)
dat.y <- rbinom(10,10,.5)
dat <- cbind(dat.x,dat.y)
qcr(dat)
# [1] 0.6
plot.qcr(dat)

This is the plot:

A plot showing a QCR of 0.6.

For more information on the QCR check out this article: Holmes, Peter (2001). “Correlation: From Picture to Formula,” Teaching Statistics, 23(3):67–70.

[Updated on 2013-10-03 to add a link to Wikipedia.]

Leave a Comment