Press "Enter" to skip to content

Tag: PDF

PDF Reader for Windows with Highlighting

Previously, I had been annotating PDFs on my Nexus 7 (2012) using ezPDF Reader Pro (by Unidocs) with good results. The most useful feature was the ability to highlight text in different colors. (This allowed me to color code parts of articles, e.g. yellow for general claims, blue for details about the study/article, green for things germane to my work, etc.)

I've started using a Windows 8 tablet (an Asus Transformer T100TA) and wanted a similar program. Many PDF readers support highlighting in Windows, but many (such as the free Adobe Reader) only allow yellow highlighting. Changing the color of the highlighting should be trivial, and I don't think this feature is worth the $119 for Adobe XI Pro.

Fortunately, this post on superuser led me to Okular. Part of the KDE suite, Okular is popular on Linux, but it does run on Windows. The installation is a little larger than some other programs because KDE provides a platform for many different programs, but one doesn't need to install all of them to use Okular. The tools for reviewing are accessed via F6 or the Tools menu, but the default set includes several useless (for me) tools and only a yellow highlighter. The aforementioned post, however, describes editing a tools.xml file to customize the offerings.

Okular showing customized reviewing tools and several highlighting colors in a document.
Okular showing customized reviewing tools and several highlighting colors in a document.

On Windows 8, tools.xml file is located in C:\ProgramData\KDE\share\apps\okular along with a folder named pics that holds the icons used. (The folder is probably in a different location on other versions of Windows, but searching for tools.xml should turn up the location.) I've uploaded my tools.xml file and some quick icons I made in Paint that match the highlighting colors I use below; changing the colors/tools shouldn't be difficult if my exact setup doesn't work for you. (The color/properties of individual annotations can be changed by right-clicking.)

When annotating PDFs in Okular, all of the changes are automatically saved but not in the PDF file itself. It can be a little confusing to open a file, see the highlighting, and then to email it a colleague where the highlighting has disappeared. The original PDF file that was opened does not seem to be changed by Okular. To save the highlighting in the PDF file itself, choose File -> Save As... The new file will display the highlighting on in other PDF readers on other computers.

In short, with a tiny bit of work Okular is a great, free tool for reading and annotating PDFs on Windows 8 and is comfortable to use on a tablet.

Customized tools.xml and icons (70 KB; .zip) - unzip into the above folder to use

Leave a Comment

Working with PDF files efficiently: WatchOCR

Optical Character Recognition: Why?

Graduate school is marked by a tremendous amount of reading. The vast majority of this reading seems to be in the form journal articles or book chapters which - thankfully - are often available electronically. (If they aren't, I often take the time to scan them myself.) I end up reading most of these on my tablet where I want to highlight text and otherwise annotate them. Sometimes, however, one comes across a PDF whose text cannot be selected - and therefore cannot have its text highlighted. The solution for this is to run optical character recognition (OCR) software on the file. While many modern scanners automatically perform OCR as part of the scanning process, I still come across enough scanned documents without select-able text to warrant this post (see Figure 1).

An example of selection in a document without OCR.
Figure 1. Come on, Adobe. You know that's not what I wanted.

There is considerable variety among the OCR solutions available. MakeUseOf gives its recommendations for three free OCR solutions, but all of them result in a the PDF's text being stored in a separate text document. This is useful if getting access to the raw text is the goal, but it is not sufficient for my purposes: I want the OCR'd text to be stored in the original PDF file in such a way as the text in the original file can be selected and highlighted. There are no doubt commercially available tools to accomplish this task, but I prefer free (and open source) tools whenever possible. Enter WatchOCR.

3 Comments

On Elsevier buying Mendeley

[Update: Steve Dennis, a developer for Mendeley, posted a comment explaining a bit more about the data collection and privacy concerns some users have with Mendeley Desktop. It adds some pros and cons about the process outlined here.]

Two days ago (2013-04-08) Elsevier (the academic publishing company that is the subject of some controversy) bought Mendeley (the reference manager and a tool often mentioned when discussing 'open' research). The Mendeley Blog does a quick Q&A covering what changes will take place (they take the position that this will be better for everyone), while the folks over at The Chronicle examine the sale with a slightly more critical lens. Some accounts I follow on Twitter began using the hashtag #mendelete, taking an even more critical stance on the sale. (Someone has even made a guide to exporting data from and then deleting one's Mendeley account - useful, even if just for the exporting data portion.)

While it remains to be seen what changes will, in fact, take place, the simple fact remains that Mendeley is not open source and remains controlled by a company that does not have my (and your) best interests in mind. The most important thing for Elsevier is making money, and, for now, keeping Mendeley operating serves this goal. However, my work is too important to rely on a tool that somebody else controls. (I did a pretty thorough post on my views about this after the discontinuation of Google Reader.)

Now, Mendeley advertises itself as both a reference manager (think iTunes for PDFs) and a social network. This social network aspect has generated a lot of data, and many researchers seem to find it useful. Consequently, Mendeley has integrated their web services and their desktop client so that a single account is required to use both. Yes, an account is required to use the desktop software that would work perfectly well without an online account. Sure, it is enhanced by internet connectivity, but an internet connection is not required to organize my documents.

The login window that appears when Mendeley first launches.
When Mendeley first launches, there is no option to skip logging in.

But, an account is not really required. With a teeny bit of work, the Mendeley desktop software can be configured to work without a Mendeley account. This solution comes from the Mendeley support website, and is used to help people launch Mendeley when there are issues with the software and/or accounts. It is a 'feature' for support, but is certainly not something they advertise. The trick is to add --setting General_FirstRun:false (with two dashes, an underscore, and a colon) as an argument to the program when it launches. I'm doing this on Windows, but as Mendeley is cross-platform, it should ostensibly work on OS X and Linux, too. (Let me know in the comments if it does or doesn't work.)

To add this argument, right click the shortcut for Mendeley (e.g. the one on your desktop) and select properties. Then, add it to the box labeled "Target" outside of the quotation marks. Check out the image below.

Mendeley Desktop Properties window with the Target field circled
Notice the argument is outside the quotation mark and uses two dashes (-- not -).

After clicking Apply, you may need to grant Administrator approval for the changes to be saved, depending on your UAC settings. Adding this to the launcher skips the initial window that asks for your account information allowing you to use the offline features of Mendeley in peace. (You can add an account later by choosing "Tools" followed by "Options".)

Now, this option works with at least Mendeley Desktop 1.8.4, but there are no guarantees about 1.8.5 retaining this same feature. (Though there are many uses for this ability in a support context and removing it would be silly.) I feel somewhat assuaged knowing that I can use Mendeley on my computer whenever, wherever. Moreover, if I ever need to install Mendeley and their servers are unavailable, I can still use it.

I'm not Men-deleting my account. Not yet, at least. I'm still holding out hope that the program will be made open-source, assuaging even more of my concerns. There may still yet be hope according to something I saw from William Gunn, Mendeley's Head of Academic Outreach:

A conversation on Twitter between @TheDougW, @mrgunn, and @MendeleySupport.
Apparently open sourcing Mendeley is still being talked about.

So, open source is still being talked about, and the API is remaining open. These are promising signs. For the meantime, at least, I'm going to check out using Zotero in addition to Mendeley. I've been hearing some good things about Zotero, and it never hurts to have options. As the saying goes, "Two is one, one is none."

2 Comments

On a return to blogging after a hiatus

With the winter holiday I returned to my lazy, non-blogging habits. A New Year's resolution did little to change the situation. I suppose one just jumps in, though. I'll try to keep up with things more this semester. Really.

Plans for this semester

I'm currently taking a seminar on statistics education and an introductory course on qualitative methods. While the former is clearly my area of interest, the latter is proving to be more enjoyable than I had anticipated. One of the books for the course is Crotty's The Foundations of Social Research: Meaning and Perspective in the Research Process which is a bit more abstract than I was expecting, focusing on epistemologies and theoretical perspectives. It is a refreshing change, and I'm currently working my way through Feyerabend's Against Method after having my views on post-positivism challenged. (They seemed to be most aligned with Popper before this academic year.) Other plans include a trip to San Diego for LOCUS-related things and In-N-Out Burger, insha'Allah.

Dealing with Protected/Secured PDFs

Occasionally I'll come across a PDF that is Protected/Secured (it says 'SECURED' in the title bar of Adobe Reader) which are rather annoying to deal with. I've been using Mendeley to organize the articles/books I've read, and I copy the abstract into the software so that it can be searched. Alas, one journal whose articles I often read secure every single PDF so that copying cannot be done. Really frustrating.

Thankfully, this "secured" state is not encrypted or password protected. From what I gather, the state is determined by setting a bit in the file to disable certain features and Adobe, upon finding this information, respects the file's instructions. Not all software respects the file's instructions, and those that don't allow copying without issue. Two such readers are Evince (part of GNOME) and Okular (part of KDE). Both are open source, and both at least have options for disabling the DRM on the files. They are also both available on Windows (as well as many other platforms and are exceedingly common on Linux); if you're just looking for a quick download on Windows, Evince might be better. Either way, problem solved.

Leave a Comment