Month: April 2013

Another comic related to correlation and causation

Tree Lobsters has another statistics comic related to correlation, causation, and the misconception that they are the same thing. This comic really captures the need for greater statistical and scientific literacy and, more broadly, for better scientific communication. It is unreasonable to expect the public to be able to go to the literature to source claims and evaluate their reasonableness - becoming acquainted with the literature is part of what makes scientists, researchers, etc. specialists. Rather, we need to equip our students (all people, really, but we have access to them as students) with the ability to examine reports in the media with a critical lens.

Of course, combating pseudo-scientific thought and media hype is a lot easier said than done. Recently ScienceBlogs had a post using the context of anti-vaccine sentiment ("An open letter to my dad on the occasion of his recent anti-vax Facebook postings") which examines the issue of familiarity with the literature and the need to not seek out reports which confirm what one already believes. It is an engaging, personal read that has some very useful information.

The Overreactington Municipal School Board has voted overwhelmingly to remove all the other thing from its educational facilities.
"#361 This, That & The Other Thing" - Copyright 2008-2012, Tree Lobsters
On the perils of Express Scribe (software to aid transcription)

As part of the introductory qualitative methods course I am taking, each of us must conduct interviews and transcribe them as part of a larger class project. I recorded the interviews using Easy Voice Recorder Free (for Android), and it worked well for what I needed it to do. (Note to self: Put your cell phone on silent before conducting an interview. The recording device buzzing each time a text message is received is both unprofessional and distracting on the recording.)

As I am not (yet?) a qualitative researcher, I tried to complete the transcribing as inexpensively as possible, and free is the best kind of inexpensive. Rather than using specialty qualitative data analysis software (such as Nvivo), I've opted to go for transcribing and coding in Microsoft Word. Simple, but effective enough for a project of this size. (Of course, there is no reason another program such as LibreOffice Writer could not be used to really get at the "free" goal.) To play back the audio, a colleague recommended Express Scribe, a program by NCH Software.

Express Scribe has a free version which allows one to play the audio and control basic functions (Stop, Rewind, Fast-Forward, Play (regular and slow), etc.) using the function (Fn) keys on one's keyboard in lieu of using a pedal, though it also supports pedals. The function keys are used even the Express Scribe isn't in focus, allowing one to control the audio playback without leaving Word. Super convenient, and the entire transcribing process was relatively painless thanks, in large part, to Express Scribe.

But it isn't all roses.

When I downloaded the free version of Express Scribe, I didn't realize that wasn't all I was getting. Apparently, the free version of Express Scribe (and possibly the paid version?) includes 'extras.' Let's explore the situation.

The first thing that I noticed is that Express Zip had associated itself with nearly every type of archive (e.g. compressed files) on my system. Furthermore, it had given itself a context-menu (right-click menu) entry as "Extract with Express Zip". The picture below shows what I'm talking about.

Express Zip appears in both the context menu and as the icon for the archive files.
Express Zip has weaseled its way into my computer. (Note that the file type icon is very similar to the icon for Express Scribe.)

'Okay, so what?' you might be inclined to say. Surely this is benevolence from NCH Software - free software that might make our lives easier. Except, when one double-clicks a file that has been associated with Express Zip or chooses "Extract with Express Zip" from the context menu, this is what appears:

A pop-up window saying that Express Zip is an install-on-demand component.
Express Zip isn't even installed! All that is installed is an advertisement for Express Zip.

All "an install-on-demand component is required for this operation" means in this case is that Express Zip isn't even really installed yet - just an advertisement for Express Zip is installed! I was curious as to what all Express Scribe had done to my computer, and pulled up the Set Associations window. (The easiest way that I've found to get to it in Windows 7 is to search for "Set Associations" in the Control Panel window.)

24 file types associated with Express Scribe in the Set Associations window. 20 are boxed as being unreasonable.
The 24 file types associated with Express Scribe. Of these, the 20 boxed in red are unreasonable associations.

Now, of the file types that Express Scribe has oh-so-graciously associated itself with, I count four types that seem reasonable and twenty that are unreasonable (boxed in red above). In fact, Express Scribe (Zip?) doesn't even know what to do with some file types (e.g. .iso, a file type for disc images) and instead describes them as "Unhandled Extension Handler Finder". Oh, joy.

"Now, Doug," you might be tempted to begin saying, "Surely you assented to installing these 'features' when you installed Express Scribe!" My retort would be a resounding, "Not so!" While the inclusion of "extras" is a burgeoning trend in free software (e.g. Oracle's Java attempting to install the Ask Toolbar if the option is not unchecked), I carefully read each page of an install to make sure that shit like this doesn't happen. Excuse the language. But not really. These shenanigans are infuriating to me. In fact, I went back through the installer to see what actually transpired. Check out the next two images.

The License Agreement which gives only a hint about the "install-on-demand" components.
Every box corresponding to optional software that Express Scribe tries to install is unchecked.
Who has two thumbs and unchecked every single box for optional software to install? This guy.

As shown in the images above, even if all boxes for optional software are unchecked, there are still things installed besides Express Scribe. These "install-on-demand" components are only hinted at in the License Agreement, and one may reasonably assume (as I did), that the components referred to were the ones recommended on the following page. They weren't. Let's see what was actually installed.

NCH Software Suite in the Start Menu program list. I boxed Express Scribe in green because this was what I actually wanted to install.
The "NCH Software Suite" comprises no fewer than seventeen install-on-demand components. Keep in mind that none of these seventeen components are actually installed; rather, these are effectively advertisements for them.

So now we have a clear idea of the problem arising from installing Express Scribe. Even when a user is careful and chooses to not select any optional components for installation, Express Scribe infiltrates the system to associate itself with unrelated files to offer you advertisements using 'components' that you did not choose to install. This is the sort of behavior that malware undertakes and, if it walks like a duck and quacks like a duck...

cannot recommend Express Scribe or any software created by NCH to colleagues. In fact, I will actively recommend against using it whenever possible. I am not presently aware of a free (open-source or otherwise) alternative solution, but I cannot imagine that one does not exist (or that one would be easy to create). If you know of one, please leave a comment saying what it is and where to get it.

A dialog box confirming that the uninstall was completed.
A standard uninstall may do the trick for removing Express Scribe and the NCH Software Suite.

On my main computer (running Windows 7 Professional x64), uninstalling Express Scribe through Programs and Features in Control Panel seemed to remove the NCH Software Suite and the Express Zip context menu entry. I didn't have quite the same luck on another computer I use, and, if I can duplicate the problems, I will put up a guide for eliminating all traces of this software in the situation that a regular uninstall isn't sufficient.

A note to all software developers: I control what is installed on my computer, not you. Sneaking extra software onto my computer isn't cute or clever. Rather, this is the behavior of malicious software. If your software does this, as Express Scribe does, it is malicious - no matter how useful such software might be.

Update 2015-03-23: I'm working an open-source alternative to ExpressScribe called TranscribeSharp. An early preview release is available here.


On Elsevier buying Mendeley

[Update: Steve Dennis, a developer for Mendeley, posted a comment explaining a bit more about the data collection and privacy concerns some users have with Mendeley Desktop. It adds some pros and cons about the process outlined here.]

Two days ago (2013-04-08) Elsevier (the academic publishing company that is the subject of some controversy) bought Mendeley (the reference manager and a tool often mentioned when discussing 'open' research). The Mendeley Blog does a quick Q&A covering what changes will take place (they take the position that this will be better for everyone), while the folks over at The Chronicle examine the sale with a slightly more critical lens. Some accounts I follow on Twitter began using the hashtag #mendelete, taking an even more critical stance on the sale. (Someone has even made a guide to exporting data from and then deleting one's Mendeley account - useful, even if just for the exporting data portion.)

While it remains to be seen what changes will, in fact, take place, the simple fact remains that Mendeley is not open source and remains controlled by a company that does not have my (and your) best interests in mind. The most important thing for Elsevier is making money, and, for now, keeping Mendeley operating serves this goal. However, my work is too important to rely on a tool that somebody else controls. (I did a pretty thorough post on my views about this after the discontinuation of Google Reader.)

Now, Mendeley advertises itself as both a reference manager (think iTunes for PDFs) and a social network. This social network aspect has generated a lot of data, and many researchers seem to find it useful. Consequently, Mendeley has integrated their web services and their desktop client so that a single account is required to use both. Yes, an account is required to use the desktop software that would work perfectly well without an online account. Sure, it is enhanced by internet connectivity, but an internet connection is not required to organize my documents.

The login window that appears when Mendeley first launches.
When Mendeley first launches, there is no option to skip logging in.

But, an account is not really required. With a teeny bit of work, the Mendeley desktop software can be configured to work without a Mendeley account. This solution comes from the Mendeley support website, and is used to help people launch Mendeley when there are issues with the software and/or accounts. It is a 'feature' for support, but is certainly not something they advertise. The trick is to add --setting General_FirstRun:false (with two dashes, an underscore, and a colon) as an argument to the program when it launches. I'm doing this on Windows, but as Mendeley is cross-platform, it should ostensibly work on OS X and Linux, too. (Let me know in the comments if it does or doesn't work.)

To add this argument, right click the shortcut for Mendeley (e.g. the one on your desktop) and select properties. Then, add it to the box labeled "Target" outside of the quotation marks. Check out the image below.

Mendeley Desktop Properties window with the Target field circled
Notice the argument is outside the quotation mark and uses two dashes (-- not -).

After clicking Apply, you may need to grant Administrator approval for the changes to be saved, depending on your UAC settings. Adding this to the launcher skips the initial window that asks for your account information allowing you to use the offline features of Mendeley in peace. (You can add an account later by choosing "Tools" followed by "Options".)

Now, this option works with at least Mendeley Desktop 1.8.4, but there are no guarantees about 1.8.5 retaining this same feature. (Though there are many uses for this ability in a support context and removing it would be silly.) I feel somewhat assuaged knowing that I can use Mendeley on my computer whenever, wherever. Moreover, if I ever need to install Mendeley and their servers are unavailable, I can still use it.

I'm not Men-deleting my account. Not yet, at least. I'm still holding out hope that the program will be made open-source, assuaging even more of my concerns. There may still yet be hope according to something I saw from William Gunn, Mendeley's Head of Academic Outreach:

A conversation on Twitter between @TheDougW, @mrgunn, and @MendeleySupport.
Apparently open sourcing Mendeley is still being talked about.

So, open source is still being talked about, and the API is remaining open. These are promising signs. For the meantime, at least, I'm going to check out using Zotero in addition to Mendeley. I've been hearing some good things about Zotero, and it never hurts to have options. As the saying goes, "Two is one, one is none."


Who am I writing for?

I want to write.

I want to tell my story, help others, create knowledge, learn, grow, and everything else that one can do. I want to do it all - and writing is necessary for this. Therefore, I want to write.

The advice I have always been given (or, more accurately, have read) is to just write. Write anything that you want, but just keep writing and do so regularly. I'm not even going to find attributions for this because I think it may even be common knowledge by this point. Another bit of common advice is to know one's audience. (Kurt Vonnegut gives some advice for writing short stories, and many other authors have spoken or written on the subject.)

Well, who exactly is my audience? This website is a blog - my blog - and I write whatever I want. The name at the top isn't a cutesy title derived from a statistics or education term; it is my name. My audience is me. I'm writing to myself because I want to remember this time in my life. I want to remember the joy and pain, the triumphs and defeats of doctoral school (hopefully heavier on the joy/triumph than pain/defeat).

But I'm not solipsistic. I also have a 'real' audience in mind. It is fragmented, but it is still my intended reader. The groups I imagine comprising my audience are:

  • Graduate students (or soon-to-be graduate students) - We will undoubtedly share many similar experiences, and camaraderie (even if virtual) is a Good Thing.
  • Anyone at UF - I sometimes post things that are related to Gainesville and UF, and these might be good resources for anyone involved with UF (undergrads, grads, staff, faculty, etc.).
  • People interested in statistics education - Statistics education is what I'm studying, and I'll be posting things related to it pretty much for as long as this blog is around. This includes both researchers and teachers of statistics. Hopefully this audience will grow over time.
  • People interested in statistics/data - I attended statistics graduate school for two years because I love statistics - I just happen to love the educational aspects of it more. I still love data, graphs, R, analysis, visualization, etc. and will post on these things from time to time.
  • People searching the web for individual examples of statistical things - I get a good number of hits from people searching for misleading graphs in the news, and I'll try to keep posting things that people are looking for. This is an audience that I wasn't intending to have, but will try to be a good steward of.
  • People searching the web for specific computer issues - I know how frustrating it is when hardware or software goes awry, so whenever I have issues (or find what I consider a particularly good solution for a problem/task) I'll post it. The more quality explanations for problems the better.

So that's who I'm writing to. If you are a member of one of the above groups, what would you like to see more of? If you are not, do you still view yourself as my audience? Respond in the comments and we'll get a dialogue going!

A blog post to keep me in the habit of blogging

It's April! Let's go!

While the main text isn't in Comic Sans, it does appear on every page. =[
Reuben's Fall by Sheri Leafgren. 
This weekend I finished reading Sheri Leafgren's Reuben's Fall: A Rhizomatic Analysis of Disobedience in Kindergarten for the introductory qualitative methods course I'm taking this semester. This is certainly not the sort of book I would have picked up without some coaxing, but I'm glad I did. It attempts to illustrate the focus on obedience in the American school system (particularly in kindergarten) while illustrating that disobedient children are not necessarily "bad" and obedient children are not necessarily "good". The book began as a dissertation (for which I heard it won some sort of award), and that is the version I actually read. I found the dissertation online through the Kent State University website and, because the content is nearly identical to the printed book, I used it as an ebook proxy on my tablet. An added bonus is that the dissertation doesn't use Comic Sans anywhere. [Edit: The typeface used on the cover is Chalkboard, a font that ships on Apple computers.  It isn't Comic Sans, but is close (for key differences notice the 'F' and the 'u'). I begrudgingly accept that it is an appropriate choice, though I still don't like particularly like casual, graphic typefaces. The dissertation is typeset entirely in Times New Roman. This seems like a decent resource for more information about classifying typefaces.] All in all, it was a well-written book that unsettled my understanding of obedience in schools, and I'm glad to have read it.

Other recent developments include the election of new officers for the Statistics Club at UF (I didn't run for any positions and need to update my CV) and my return to active tweeting (@TheDougW). Nothing too exciting. I'm just keeping my head down, working on school and side-projects, trying to not stay still for too long. I've got a few ideas for blog posts with content (mostly about software that I use), so those are in the pipeline.

Also, apparently yesterday was World Backup Day (in addition to being Easter Sunday), so go back up your data if you haven't done so recently! Advice from the website:

"DON'T BE AN APRIL FOOL. Backup your files. Check your restores."

