# Tag: problems

Familiarity with statistical computing software - particularly programs as flexible and feature-filled as R and the packages on CRAN - has been a tremendous boon. However, this familiarity has sent me searching the web for a way ask for particular output that is not printed by default. This expectation that the output I want from software is available with the right option or command has led me (more than once) to forget the possibility of simply computing the required output manually.

In particular, I recently needed to compute the RMSEA of the null model for confirmatory factor analysis (CFA). A few months ago, I chose to use Mplus for the CFA because I was familiar with it (moreso than the lavaan R package at least) and it had some estimation methods I needed that other software does not always have implemented (e.g. the WLSMV estimator is not available in JMP 10 with SAS PROC CALIS).

Mplus does not print the RMSEA for the null model (or baseline model, in Mplus parlance) in the standard output, nor does there seem to be an command to request it. Fortunately, this is not an insurmountable problem because the formula for RMSEA is straightforward:

where $X^2$ is the observed Chi-Square test statistic, df is the associated degrees of freedom, and N is the sample size. In the Mplus output, look for "Chi-Square Test of Model Fit for the Baseline Model" for the $X^2$ and df values.

The reason for needing to check the null RMSEA is that incremental fit indices such as CFI and TLI may not be informative if the null RMSEA is less than 0.158 (Kenny, 2014). If you are using the lavaan package, it appears this can be calculated using the nullRMSEA function in the semTools package.

(As an aside, don't let the vintage '90s design fool you: David Kenny's website is a great resource for structural equation modeling. He has the credentials to back up the site, too: Kenny is a Distinguished Professor Emeritus at the University of Connecticut.)

References

Kenny, D. A. (2014). Measuring Model Fit. Retrieved from http://davidakenny.net/cm/fit.htm

Optical Character Recognition: Why?

Graduate school is marked by a tremendous amount of reading. The vast majority of this reading seems to be in the form journal articles or book chapters which - thankfully - are often available electronically. (If they aren't, I often take the time to scan them myself.) I end up reading most of these on my tablet where I want to highlight text and otherwise annotate them. Sometimes, however, one comes across a PDF whose text cannot be selected - and therefore cannot have its text highlighted. The solution for this is to run optical character recognition (OCR) software on the file. While many modern scanners automatically perform OCR as part of the scanning process, I still come across enough scanned documents without select-able text to warrant this post (see Figure 1).

There is considerable variety among the OCR solutions available. MakeUseOf gives its recommendations for three free OCR solutions, but all of them result in a the PDF's text being stored in a separate text document. This is useful if getting access to the raw text is the goal, but it is not sufficient for my purposes: I want the OCR'd text to be stored in the original PDF file in such a way as the text in the original file can be selected and highlighted. There are no doubt commercially available tools to accomplish this task, but I prefer free (and open source) tools whenever possible. Enter WatchOCR.

I need to use Carpenter, Franke, and Levi's Thinking Mathematically: Integrating Arithmetic and Algebra in Elementary School for a course. This textbook includes a supplementary CD with video examples of children displaying the mathematical thinking described in the text, and the authors emphasize that watching these videos is an integral part of reading the book. Unfortunately, the videos are references in the text by the section number with which they correspond, but are not labeled thusly on the CD. The CD contains a program for Windows that acts as a wrapper to display the appropriate videos. This program requires Apple's Quicktime software to display the videos within the program. Therefore, if someone does not have both Microsoft Windows and Apple Quicktime installed, there is no clear way to check the correspondence between the video files and sections in the textbook. I obtained access to an appropriate computer and made the following mapping of lessons to video files:

Section File (.mov)
2.1 Kevin
2.2 David
2.3 Lillian
3.1 Emma
3.3 Kenzie
4.1 KF111400
4.2 Megan
5.1 Allison16
5.2 Cody
7.1 Allison
7.2 Susie
8.1 Mike

The .mov files themselves can be played by many media players (not just Quicktime), so with the above table the supplementary CD should work irrespective of the computer software one uses.

As part of the introductory qualitative methods course I am taking, each of us must conduct interviews and transcribe them as part of a larger class project. I recorded the interviews using Easy Voice Recorder Free (for Android), and it worked well for what I needed it to do. (Note to self: Put your cell phone on silent before conducting an interview. The recording device buzzing each time a text message is received is both unprofessional and distracting on the recording.)

As I am not (yet?) a qualitative researcher, I tried to complete the transcribing as inexpensively as possible, and free is the best kind of inexpensive. Rather than using specialty qualitative data analysis software (such as Nvivo), I've opted to go for transcribing and coding in Microsoft Word. Simple, but effective enough for a project of this size. (Of course, there is no reason another program such as LibreOffice Writer could not be used to really get at the "free" goal.) To play back the audio, a colleague recommended Express Scribe, a program by NCH Software.

Express Scribe has a free version which allows one to play the audio and control basic functions (Stop, Rewind, Fast-Forward, Play (regular and slow), etc.) using the function (Fn) keys on one's keyboard in lieu of using a pedal, though it also supports pedals. The function keys are used even the Express Scribe isn't in focus, allowing one to control the audio playback without leaving Word. Super convenient, and the entire transcribing process was relatively painless thanks, in large part, to Express Scribe.

But it isn't all roses.

When I downloaded the free version of Express Scribe, I didn't realize that wasn't all I was getting. Apparently, the free version of Express Scribe (and possibly the paid version?) includes 'extras.' Let's explore the situation.

The first thing that I noticed is that Express Zip had associated itself with nearly every type of archive (e.g. compressed files) on my system. Furthermore, it had given itself a context-menu (right-click menu) entry as "Extract with Express Zip". The picture below shows what I'm talking about.

'Okay, so what?' you might be inclined to say. Surely this is benevolence from NCH Software - free software that might make our lives easier. Except, when one double-clicks a file that has been associated with Express Zip or chooses "Extract with Express Zip" from the context menu, this is what appears:

All "an install-on-demand component is required for this operation" means in this case is that Express Zip isn't even really installed yet - just an advertisement for Express Zip is installed! I was curious as to what all Express Scribe had done to my computer, and pulled up the Set Associations window. (The easiest way that I've found to get to it in Windows 7 is to search for "Set Associations" in the Control Panel window.)

Now, of the file types that Express Scribe has oh-so-graciously associated itself with, I count four types that seem reasonable and twenty that are unreasonable (boxed in red above). In fact, Express Scribe (Zip?) doesn't even know what to do with some file types (e.g. .iso, a file type for disc images) and instead describes them as "Unhandled Extension Handler Finder". Oh, joy.

"Now, Doug," you might be tempted to begin saying, "Surely you assented to installing these 'features' when you installed Express Scribe!" My retort would be a resounding, "Not so!" While the inclusion of "extras" is a burgeoning trend in free software (e.g. Oracle's Java attempting to install the Ask Toolbar if the option is not unchecked), I carefully read each page of an install to make sure that shit like this doesn't happen. Excuse the language. But not really. These shenanigans are infuriating to me. In fact, I went back through the installer to see what actually transpired. Check out the next two images.

As shown in the images above, even if all boxes for optional software are unchecked, there are still things installed besides Express Scribe. These "install-on-demand" components are only hinted at in the License Agreement, and one may reasonably assume (as I did), that the components referred to were the ones recommended on the following page. They weren't. Let's see what was actually installed.

The "NCH Software Suite" comprises no fewer than seventeen install-on-demand components. Keep in mind that none of these seventeen components are actually installed; rather, these are effectively advertisements for them.

So now we have a clear idea of the problem arising from installing Express Scribe. Even when a user is careful and chooses to not select any optional components for installation, Express Scribe infiltrates the system to associate itself with unrelated files to offer you advertisements using 'components' that you did not choose to install. This is the sort of behavior that malware undertakes and, if it walks like a duck and quacks like a duck...

cannot recommend Express Scribe or any software created by NCH to colleagues. In fact, I will actively recommend against using it whenever possible. I am not presently aware of a free (open-source or otherwise) alternative solution, but I cannot imagine that one does not exist (or that one would be easy to create). If you know of one, please leave a comment saying what it is and where to get it.

On my main computer (running Windows 7 Professional x64), uninstalling Express Scribe through Programs and Features in Control Panel seemed to remove the NCH Software Suite and the Express Zip context menu entry. I didn't have quite the same luck on another computer I use, and, if I can duplicate the problems, I will put up a guide for eliminating all traces of this software in the situation that a regular uninstall isn't sufficient.

A note to all software developers: I control what is installed on my computer, not you. Sneaking extra software onto my computer isn't cute or clever. Rather, this is the behavior of malicious software. If your software does this, as Express Scribe does, it is malicious - no matter how useful such software might be.

Update 2015-03-23: I'm working an open-source alternative to ExpressScribe called TranscribeSharp. An early preview release is available here.

[Update: Steve Dennis, a developer for Mendeley, posted a comment explaining a bit more about the data collection and privacy concerns some users have with Mendeley Desktop. It adds some pros and cons about the process outlined here.]

Two days ago (2013-04-08) Elsevier (the academic publishing company that is the subject of some controversy) bought Mendeley (the reference manager and a tool often mentioned when discussing 'open' research). The Mendeley Blog does a quick Q&A covering what changes will take place (they take the position that this will be better for everyone), while the folks over at The Chronicle examine the sale with a slightly more critical lens. Some accounts I follow on Twitter began using the hashtag #mendelete, taking an even more critical stance on the sale. (Someone has even made a guide to exporting data from and then deleting one's Mendeley account - useful, even if just for the exporting data portion.)

While it remains to be seen what changes will, in fact, take place, the simple fact remains that Mendeley is not open source and remains controlled by a company that does not have my (and your) best interests in mind. The most important thing for Elsevier is making money, and, for now, keeping Mendeley operating serves this goal. However, my work is too important to rely on a tool that somebody else controls. (I did a pretty thorough post on my views about this after the discontinuation of Google Reader.)

Now, Mendeley advertises itself as both a reference manager (think iTunes for PDFs) and a social network. This social network aspect has generated a lot of data, and many researchers seem to find it useful. Consequently, Mendeley has integrated their web services and their desktop client so that a single account is required to use both. Yes, an account is required to use the desktop software that would work perfectly well without an online account. Sure, it is enhanced by internet connectivity, but an internet connection is not required to organize my documents.

But, an account is not really required. With a teeny bit of work, the Mendeley desktop software can be configured to work without a Mendeley account. This solution comes from the Mendeley support website, and is used to help people launch Mendeley when there are issues with the software and/or accounts. It is a 'feature' for support, but is certainly not something they advertise. The trick is to add --setting General_FirstRun:false (with two dashes, an underscore, and a colon) as an argument to the program when it launches. I'm doing this on Windows, but as Mendeley is cross-platform, it should ostensibly work on OS X and Linux, too. (Let me know in the comments if it does or doesn't work.)

To add this argument, right click the shortcut for Mendeley (e.g. the one on your desktop) and select properties. Then, add it to the box labeled "Target" outside of the quotation marks. Check out the image below.

After clicking Apply, you may need to grant Administrator approval for the changes to be saved, depending on your UAC settings. Adding this to the launcher skips the initial window that asks for your account information allowing you to use the offline features of Mendeley in peace. (You can add an account later by choosing "Tools" followed by "Options".)

Now, this option works with at least Mendeley Desktop 1.8.4, but there are no guarantees about 1.8.5 retaining this same feature. (Though there are many uses for this ability in a support context and removing it would be silly.) I feel somewhat assuaged knowing that I can use Mendeley on my computer whenever, wherever. Moreover, if I ever need to install Mendeley and their servers are unavailable, I can still use it.

I'm not Men-deleting my account. Not yet, at least. I'm still holding out hope that the program will be made open-source, assuaging even more of my concerns. There may still yet be hope according to something I saw from William Gunn, Mendeley's Head of Academic Outreach:

So, open source is still being talked about, and the API is remaining open. These are promising signs. For the meantime, at least, I'm going to check out using Zotero in addition to Mendeley. I've been hearing some good things about Zotero, and it never hurts to have options. As the saying goes, "Two is one, one is none."