Press "Enter" to skip to content

Category: Technology

Posts related to technology, the internet, computing, etc.

Introducing TranscribeSharp

I've previously recounted how I was (am) unsatisfied with ExpressScribe for controlling audio playback for transcribing interviews. Because the basic functionality of the program is so straightforward (using global hotkeys to control audio playback), I was disappointed that there was no free, open-source alternative to ExpressScribe. So I'm making one.

The TranscribeSharp UI
The TranscribeSharp UI is essentially identical to the PracticeSharp UI in this preview release. Transcription-focused changes to the UI are planned.

TranscribeSharp is a program that will let you control the playback for audio files using hotkeys while you transcribe the file in another program (e.g. Microsoft Word or LibreOffice Writer). You can slow down, speed up, fast forward, rewind, pause, etc. the audio using you keyboard without ever leaving the program you are transcribing with.

The Hotkey Settings menu
The Hotkey Settings menu can be used to change between two predefined sets of hotkeys.

TranscribeSharp is in large part based on PracticeSharp by Yuval Naveh with addition of the LowLevelHooks library by Curtis Rutland. I do not consider myself a skilled programmer, and TranscribeSharp is in some sense just these two pieces of software smashed together. Without PracticeSharp and LowLevelHooks, TranscribeSharp would not be possible.

Right now, this is just a preview release. There are a few bugs, but it is a functional software solution. I wrote this to use for transcribing interviews for my dissertation, but I figured that other people may be interested in using it as well. This is a very low priority project for me, though, so please understand that. I do have a list of features that I would like to eventually add (e.g. the ability to customize the hotkeys, video playpack, an installer, and a different UI), but I have no timeline for implementing these.

TranscribeSharp is written in C# using Visual Studio 2013 and licensed under the LGPL. The full source code is available at BitBucket. If anyone is interested in helping out with bug fixes or implementing new features, just get in contact with me - I am very interested in not working on this project alone. (This was also my first time really using Git, so in future releases I intend on re-structuring the way I use the dependencies.)

To use the program, just unzip the file below and run TranscribeSharp.exe. You'll need the .NET Framework (at least version 4) installed to use it. The program should work on Windows 7 and 8 (maybe more). I hope you find this program useful.

Download TranscribeSharp (Preview) version (zip, 2.6 MB) here


PDF Reader for Windows with Highlighting

Previously, I had been annotating PDFs on my Nexus 7 (2012) using ezPDF Reader Pro (by Unidocs) with good results. The most useful feature was the ability to highlight text in different colors. (This allowed me to color code parts of articles, e.g. yellow for general claims, blue for details about the study/article, green for things germane to my work, etc.)

I've started using a Windows 8 tablet (an Asus Transformer T100TA) and wanted a similar program. Many PDF readers support highlighting in Windows, but many (such as the free Adobe Reader) only allow yellow highlighting. Changing the color of the highlighting should be trivial, and I don't think this feature is worth the $119 for Adobe XI Pro.

Fortunately, this post on superuser led me to Okular. Part of the KDE suite, Okular is popular on Linux, but it does run on Windows. The installation is a little larger than some other programs because KDE provides a platform for many different programs, but one doesn't need to install all of them to use Okular. The tools for reviewing are accessed via F6 or the Tools menu, but the default set includes several useless (for me) tools and only a yellow highlighter. The aforementioned post, however, describes editing a tools.xml file to customize the offerings.

Okular showing customized reviewing tools and several highlighting colors in a document.
Okular showing customized reviewing tools and several highlighting colors in a document.

On Windows 8, tools.xml file is located in C:\ProgramData\KDE\share\apps\okular along with a folder named pics that holds the icons used. (The folder is probably in a different location on other versions of Windows, but searching for tools.xml should turn up the location.) I've uploaded my tools.xml file and some quick icons I made in Paint that match the highlighting colors I use below; changing the colors/tools shouldn't be difficult if my exact setup doesn't work for you. (The color/properties of individual annotations can be changed by right-clicking.)

When annotating PDFs in Okular, all of the changes are automatically saved but not in the PDF file itself. It can be a little confusing to open a file, see the highlighting, and then to email it a colleague where the highlighting has disappeared. The original PDF file that was opened does not seem to be changed by Okular. To save the highlighting in the PDF file itself, choose File -> Save As... The new file will display the highlighting on in other PDF readers on other computers.

In short, with a tiny bit of work Okular is a great, free tool for reading and annotating PDFs on Windows 8 and is comfortable to use on a tablet.

Customized tools.xml and icons (70 KB; .zip) - unzip into the above folder to use

Leave a Comment

Updates and changes to my ThinkPad T420i

While I haven't been doing much posting recently, I have been doing behind-the-scenes work on this website. The main thing was that I changed quite a few images from being hot-linked (when sites were okay with that) to being hosted; this was done after realizing some web filters were not blocking my website but were blocking a bunch of the images that were hosted off-site.

I've also made some substantive changes to my laptop: I am no longer dual-booting Scientific Linux and Windows 7, so no more posts about Linux and this computer. Instead, I am running Windows 7 on my laptop and running other operating systems in virtual machines inside of that (using VirtualBox). Most of the time I use Windows 7 as a guest (yes, on a Windows 7 host), but I do also use WatchOCR and Xubuntu. The reason why I virtualize Windows 7 on Windows 7 is so recovering from a catastrophic incident (e.g. computer stolen or destroyed) is quicker: just move the Windows 7 VM with all of my work on it to another computer and continue working.

Of course, simultaneously running 3 VMs is tough on a computer, so I upgraded to 16GiB of RAM. I specifically used 2x Centon 8GB DDR3-1333 (PC3-10666) 204-pin. The specific part number was R1333SO8192, pictured below. (RAM can be finicky, so I figure giving more details about what worked for me is better than too few. Also, the 8GB is what is on the package even though it should be 8GiB - I just don't want to be called out for inconsistency. </pedantic>)

I used two of these Centon sticks in my computer, and they seem to work great.
I used two of these Centon sticks in my computer, and they seem to work great.

I also replaced the hard drive with a Samsung 840 Pro 256GB SSD (MZ-7PD256BW). It works pretty well, and I utilize the whole disk encryption, though I don't really notice any benefits over my previous Intel SSD (other than increased capacity).

Of course, it's not all good news: with all of these changes to my computer, my battery life has taken a substantial hit. When it was new, I was comfortably getting 8 hours. Now, with battery capacity at 62% according to the ThinkVantage Toolbox, I'm getting about 2 hours. I'm not often away from a plug, but virtualizing OSes may not be such a great idea if you need long battery life. Or simplicity in many other ways - but my complex system works well for me.

(To help those using search engines, my ThinkPad T420i is model number 4177-CTO.)

Leave a Comment

Working with PDF files efficiently: WatchOCR

Optical Character Recognition: Why?

Graduate school is marked by a tremendous amount of reading. The vast majority of this reading seems to be in the form journal articles or book chapters which - thankfully - are often available electronically. (If they aren't, I often take the time to scan them myself.) I end up reading most of these on my tablet where I want to highlight text and otherwise annotate them. Sometimes, however, one comes across a PDF whose text cannot be selected - and therefore cannot have its text highlighted. The solution for this is to run optical character recognition (OCR) software on the file. While many modern scanners automatically perform OCR as part of the scanning process, I still come across enough scanned documents without select-able text to warrant this post (see Figure 1).

An example of selection in a document without OCR.
Figure 1. Come on, Adobe. You know that's not what I wanted.

There is considerable variety among the OCR solutions available. MakeUseOf gives its recommendations for three free OCR solutions, but all of them result in a the PDF's text being stored in a separate text document. This is useful if getting access to the raw text is the goal, but it is not sufficient for my purposes: I want the OCR'd text to be stored in the original PDF file in such a way as the text in the original file can be selected and highlighted. There are no doubt commercially available tools to accomplish this task, but I prefer free (and open source) tools whenever possible. Enter WatchOCR.


On the perils of Express Scribe (software to aid transcription)

As part of the introductory qualitative methods course I am taking, each of us must conduct interviews and transcribe them as part of a larger class project. I recorded the interviews using Easy Voice Recorder Free (for Android), and it worked well for what I needed it to do. (Note to self: Put your cell phone on silent before conducting an interview. The recording device buzzing each time a text message is received is both unprofessional and distracting on the recording.)

As I am not (yet?) a qualitative researcher, I tried to complete the transcribing as inexpensively as possible, and free is the best kind of inexpensive. Rather than using specialty qualitative data analysis software (such as Nvivo), I've opted to go for transcribing and coding in Microsoft Word. Simple, but effective enough for a project of this size. (Of course, there is no reason another program such as LibreOffice Writer could not be used to really get at the "free" goal.) To play back the audio, a colleague recommended Express Scribe, a program by NCH Software.

Express Scribe has a free version which allows one to play the audio and control basic functions (Stop, Rewind, Fast-Forward, Play (regular and slow), etc.) using the function (Fn) keys on one's keyboard in lieu of using a pedal, though it also supports pedals. The function keys are used even the Express Scribe isn't in focus, allowing one to control the audio playback without leaving Word. Super convenient, and the entire transcribing process was relatively painless thanks, in large part, to Express Scribe.

But it isn't all roses.

When I downloaded the free version of Express Scribe, I didn't realize that wasn't all I was getting. Apparently, the free version of Express Scribe (and possibly the paid version?) includes 'extras.' Let's explore the situation.

The first thing that I noticed is that Express Zip had associated itself with nearly every type of archive (e.g. compressed files) on my system. Furthermore, it had given itself a context-menu (right-click menu) entry as "Extract with Express Zip". The picture below shows what I'm talking about.

Express Zip appears in both the context menu and as the icon for the archive files.
Express Zip has weaseled its way into my computer. (Note that the file type icon is very similar to the icon for Express Scribe.)

'Okay, so what?' you might be inclined to say. Surely this is benevolence from NCH Software - free software that might make our lives easier. Except, when one double-clicks a file that has been associated with Express Zip or chooses "Extract with Express Zip" from the context menu, this is what appears:

A pop-up window saying that Express Zip is an install-on-demand component.
Express Zip isn't even installed! All that is installed is an advertisement for Express Zip.

All "an install-on-demand component is required for this operation" means in this case is that Express Zip isn't even really installed yet - just an advertisement for Express Zip is installed! I was curious as to what all Express Scribe had done to my computer, and pulled up the Set Associations window. (The easiest way that I've found to get to it in Windows 7 is to search for "Set Associations" in the Control Panel window.)

24 file types associated with Express Scribe in the Set Associations window. 20 are boxed as being unreasonable.
The 24 file types associated with Express Scribe. Of these, the 20 boxed in red are unreasonable associations.

Now, of the file types that Express Scribe has oh-so-graciously associated itself with, I count four types that seem reasonable and twenty that are unreasonable (boxed in red above). In fact, Express Scribe (Zip?) doesn't even know what to do with some file types (e.g. .iso, a file type for disc images) and instead describes them as "Unhandled Extension Handler Finder". Oh, joy.

"Now, Doug," you might be tempted to begin saying, "Surely you assented to installing these 'features' when you installed Express Scribe!" My retort would be a resounding, "Not so!" While the inclusion of "extras" is a burgeoning trend in free software (e.g. Oracle's Java attempting to install the Ask Toolbar if the option is not unchecked), I carefully read each page of an install to make sure that shit like this doesn't happen. Excuse the language. But not really. These shenanigans are infuriating to me. In fact, I went back through the installer to see what actually transpired. Check out the next two images.

The License Agreement which gives only a hint about the "install-on-demand" components.
The License Agreement which gives only a hint about the "install-on-demand" components.
Every box corresponding to optional software that Express Scribe tries to install is unchecked.
Who has two thumbs and unchecked every single box for optional software to install? This guy.

As shown in the images above, even if all boxes for optional software are unchecked, there are still things installed besides Express Scribe. These "install-on-demand" components are only hinted at in the License Agreement, and one may reasonably assume (as I did), that the components referred to were the ones recommended on the following page. They weren't. Let's see what was actually installed.

NCH Software Suite in the Start Menu program list. I boxed Express Scribe in green because this was what I actually wanted to install.
NCH Software Suite in the Start Menu program list. I boxed Express Scribe in green because this was what I actually wanted to install.

The "NCH Software Suite" comprises no fewer than seventeen install-on-demand components. Keep in mind that none of these seventeen components are actually installed; rather, these are effectively advertisements for them.

So now we have a clear idea of the problem arising from installing Express Scribe. Even when a user is careful and chooses to not select any optional components for installation, Express Scribe infiltrates the system to associate itself with unrelated files to offer you advertisements using 'components' that you did not choose to install. This is the sort of behavior that malware undertakes and, if it walks like a duck and quacks like a duck...

cannot recommend Express Scribe or any software created by NCH to colleagues. In fact, I will actively recommend against using it whenever possible. I am not presently aware of a free (open-source or otherwise) alternative solution, but I cannot imagine that one does not exist (or that one would be easy to create). If you know of one, please leave a comment saying what it is and where to get it.

A dialog box confirming that the uninstall was completed.
A standard uninstall may do the trick for removing Express Scribe and the NCH Software Suite.

On my main computer (running Windows 7 Professional x64), uninstalling Express Scribe through Programs and Features in Control Panel seemed to remove the NCH Software Suite and the Express Zip context menu entry. I didn't have quite the same luck on another computer I use, and, if I can duplicate the problems, I will put up a guide for eliminating all traces of this software in the situation that a regular uninstall isn't sufficient.

A note to all software developers: I control what is installed on my computer, not you. Sneaking extra software onto my computer isn't cute or clever. Rather, this is the behavior of malicious software. If your software does this, as Express Scribe does, it is malicious - no matter how useful such software might be.

Update 2015-03-23: I'm working an open-source alternative to ExpressScribe called TranscribeSharp. An early preview release is available here.