The internal systems used by research and academia are not often the subject of discussion by members of the public. They are, after all, somewhat tedious and removed from the lives of a vast majority of people. Because of this, when I was listening to NPR on Friday morning and heard the words "peer review" and "open access", I immediately turned up the volume to listen in.
NPR was interviewing John Bohannon regarding a study he conducted wherein he sent an article that was written to be deliberately bad to several hundred open access journals. Bohannon wrote about this study for Science as "Who's Afraid of Peer Review?" In the end, a majority of the journals he submitted to ultimately accepted the paper despite the fundamental flaws (that were ostensibly obvious to anyone with some modicum of training in the field) it contained. NPR ran the story as "Some Online Journals Will Publish Fake Science, For A Fee" and describes the study as a "sting". Bohannon is quoted as saying that the sting revealed "the contours of an emerging Wild West in academic publishing."
In early October 2012, the media started running stories about how eating many servings of fruits and vegetables is linked with happiness. The study in question was an observational study of the eating habits of 80,000 Britons and did find that, controlling for other socio-economic variables, high levels of happiness were associated with eating 7-8 servings (2.8oz) of fruits and vegetables per day (Blanchflower, Oswald, & Stewart-Brown, 2012). The study made it clear (in both the abstract and text) that because of its observational nature causality could not be determined:
"Reverse causality and problems of confounding remain possible."
"This implies that, as in some other parts of the well-being literature, we cannot draw firm inferences about causality."
"... with caveats about the lack here of clinching causal evidence..."
"... it is sensible to emphasize, first, the need for extreme caution in the interpretation of this study’s findings..."
The authors repeatedly made appropriate statements about the interpretability of the study and the potential for future controlled studies to determine causality — exactly what one should do. The importance of this work is not diminished because of its observational nature, and serves to fill a gap in the well-being literature and suggest areas for future research.
There are presently no shortage of articles being written about Hurricane (now Post-Tropical Cyclone) Sandy and what role climate change has played in the formation of the "Frankenstorm" that has left over 7.5 million people without power. I came across one article, however, that should be addressed: Hurricane Sandy: The Worst-Case Scenario For New York City Is Unimaginable (by Mike Tidwell, at ThinkProgress.org).
Much of the article is a traditional "worst-case scenario" description found commonly in the media for seemingly all combinations of disasters and major cities. Of note, however, is the following paragraph (emphasis added):
Another major storm struck in 1892, then another in 1938 when the borderline Category 4 “Long Island Express” passed through the outskirts of greater New York, inflicting widespread death and destruction across New York state, New Jersey and much of New England. But that storm, 68 years ago, was the last major hurricane (Category 3 or above) to strike the New York Metropolitan region. It’s now a matter of when, not if, a big hurricane will strike again, according to meteorologists. And history says “when” is very soon.
One of the important properties associated with this model is called memorylessness. Essentially, what this means is that at any given time period, the probability of an event occurring does not depend on the history up to that point. That is, it doesn't matter how long a system has been running, a 'success' is just as likely to occur at this time period if this is the first time period, if this is the 10th time period, or if this is the 76th time period.
For example, when playing roulette, the probability of spinning a 'black' are the same on every spin, irrespective of how many times you have previously spun the wheel. Even if there have been 999 'red' in a row, we are not 'due' for a black spin (assuming the wheel is fair, etc.).
Just because New York City hasn't had a major hurricane for a while does not mean that it is due or overdue for one. Each hurricane season is a new cycle and does not remember the previous years' hurricanes.
I am not an expert in the fields of research associated with most disasters, but this makes sense in terms of hurricanes (both from various sources and my life as a Floridian). For other disasters (e.g. volcanoes and earthquakes), my understanding is that pressure builds over time and therefore reasonable models for those systems should not have the memorylessness property, so perhaps talking about 'overdue' for a major disaster is warranted. Maybe not, but I do want to emphasize that this is specific to hurricanes.
(It might also be worth checking out thesecomics related to the gambler's fallacy.)
I've begun searching for news articles that either use statistics incorrectly (misunderstanding what a study means, making bogus claims, etc.), present data exceptionally poorly (misleading graphs), or other offenses against numeracy. I wasn't expecting to find much on Day 1. I was wrong.
The Alligator (UF's 'unofficial' school newspaper) has a daily online poll. Monday's question was "Have you eaten pizza this month?" with the results displayed as percentages and a bar chart. For example:
A few problems:
The responses are given numerically only as a percentage with no indication of sample size (either by category or total responses). We are left to guess the sample size from the bar chart.
The bar chart doesn't show 0 and goes so far as to not display the observations from the "no" respondents! Based on the percentages and guessing that "yes" had 61 respondents (it is hard to tell), it seems like "no" should have 11 which is certainly not what the graph appears to show.
It isn't visible in the above chart, but on different pages of The Alligator the poll results showed slightly different numbers (85%/15% (picture) and 84%/16% (picture)). I could click back and forth between the pages and the numbers didn't change.
Overall, the online poll (powered by amCharts.com apparently) that The Alligator hosts does a woefully inadequate job of displaying the results.
There are tons of examples of statistics being misused or misrepresented by the media and politicians (among others). I'm going to try and collect examples as I find them so that I don't have to search for them at the last minute.
I saw this image floating around the internet (not sure who originally took the screenshot):
This graph is misleading because it exaggerates the resulting increase in tax rate by not showing having the Y-axis display zero. Displaying the graph only from 34% and up makes the tax rate after January 1, 2013 to be 5.6 times what the rate currently is. In fact, the actual tax rate after January 1, 2013 is about 1.13 times what it currently is if the Bush-era tax cuts are allowed to expire. This is what a more accurate graph would look like:
Disturbingly, when I went to make the above in LibreOffice Calc the default scale was 32-40% (same in Microsoft Excel). While there are times when displaying zero is not necessary, not including zero magnifies the relative differences among the categories (Lemon & Tyagi, 2009). Kozak (2011) gives some recommendations about when including and excluding zero is appropriate; essentially, if zero is meaningful in the context of the data it should be included.
Kozak, M. (2011). When should zero be included on a scale showing magnitude? Teaching Statistics, 33(2), 53–58. (link to abstract)
Lemon, J. and Tyagi, A. (2009). The fan plot: A technique for displaying relative quantities and differences. Statistical Computing and Graphics Newsletter, 20(1), 8–10. (link to full article) [Note: I don't know if this article is peer-reviewed, but it is a publication of the ASA, so there is some weight to it.]