Chances are, if you’re an academic in the 21st century, you read a lot of PDFs, whether they are articles downloaded from journals’ websites, works in progress from colleagues, or a student papers. To avoid the costs of printing, and for your own convenience, you might be interested in reading them on a portable device, such as e-book reader like the Sony Reader Digital Book, Barnes & Noble Nook or Amazon Kindle. Although larger models exist, the reasonably priced ones have screens the size of small paperbacks, roughly 9 × 12cm. (Of course, you might even want to use a smart phone with an even smaller screen!) This can pose a problem when it comes to PDFs designed to be printed on letter (8.5 × 11in) or even A4 sized paper. The device will typically shrink them to fit, resulting in an unreadably small font size.
But there are a number of things you can try to make those PDFs more readable. Here I focus my comments on free methods, rather than on those that rely on expensive commercial tools like Adobe Acrobat Professional (which, to be honest, doesn’t work that well for this anyway). Here’s a run down of my experiences with a variety of methods:
- Working with the source document, if available
- Simple whitespace cropping
- Special cropping tools like SoPDF and BRISS
- “Reflowing” and/or converting to another format
Working with the source document, if available
This is by far the best method. The catch is obvious: you actually have to have access to the source document from which the PDF was created. This is great for your own works, and perhaps those of students or generous colleagues, but often not even a possibility. For a traditional WordProcessor file, it usually requires changing the output “paper” size to the size of your portable, and minimizing the margins. Sony has put together excellent and detailed instructions for doing this with OpenOffice and/or Microsoft Word, which would mostly carry over to other portables.
Readers of this blog may be interested in doing it with LaTeX source files. I’ve had luck adding the following (or a slight modification) to the preamble:
\usepackage[papersize={90mm,120mm},margin=2mm]{geometry}
\usepackage[kerning=true]{microtype}
\usepackage[T1]{fontenc}
\usepackage[charter]{mathdesign}
\usepackage[normalmargins]{savetrees}
\sloppy
\pagestyle{empty}
The geometry package sets the page size and very minimal margins. You’ll want to change the first option to match your device’s screen, and adjust the second according to preference. The microtype package improves typography in cramped quarters, and allows certain punctuation to extend slightly into the margins in a way that is visibly unnoticeable. The savetrees option adjusts the size and spacing of titles and lists, and generally squeezes more onto a page. (The normalmargins option is needed to avoid a clash with our geometry settings.) The \sloppy command makes LaTeX less fussy about avoiding underfull “bad boxes”; without this you’ll get text running off the right side of the screen. The \pagestyle command suppresses headers and footers and page numbers. (Not that there’s room for them anyway.)
I like to use T1 font encoding with the Bitstream Charter font provided by the mathdesign package. I find this font to be very legible when read from a screen, but still professional looking. This is of course a matter of taste. Substitute your favorite font instead. Personally, I think the default Computer Modern LaTeX fonts are better suited for paper than screen reading. If you choose to use XeLaTeX for greater font selection, go ahead, but you’ll need to remove the microtype package, which only works with regular latex/pdflatex.
The result should be something quite nice and comfortable to read on a portable device. If you visit discussion communities dedicated to e-book readers like MobileRead or TeleRead, you’ll read a lot of bashing of the PDF format. In my opinion, PDF is both the best and the worst document format. It’s the best when the PDF was designed for the medium it’s viewed on. It’s the worst when it was not.
Simple whitespace cropping
One of the simplest changes you can make to a PDF, without changing any of the content or formatting, is to remove the margins so that the content of the pages takes up the whole screen. There is a lot of software, both free and commercial, that can be used to crop a PDF, including pdflatex itself (though it’s not the most efficient tool for the job). Unfortunately, it’s not always as straightforward as it may seem. The so-called “crop” feature of Adobe Acrobat Professional, for example, doesn’t actually crop PDFs, but instead inserts an instruction telling the viewing software to ignore the cropped areas, which often isn’t respected by portable devices.
Nonetheless, you probably already have free/open source software installed that can do the trick. If you have LaTeX installed, you almost certainly have ghostscript installed. If you own a digital e-book reader, there’s a good chance you have calibre, the free/open source ebook management software installed. (And if not, you should, even if you don’t like it as a library manager.) The two of them together can be used to analyze and auto-crop a PDF with a single command, or even batch-crop an entire folder’s worth.
On linux/Unix (including, I think, Mac OS X, though I haven’t tested this), from the command line, you can type in:
for i in *.pdf ; do gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=bbox "$i" 2> bounding ; pdfmanipulate crop -o "${i%.pdf}-cropped.pdf" -b bounding "$i"; done
This will create a cropped version of every PDF in the active directory.
On Windows, you can do the same by creating a batch file. Using notepad or other text editor, create a file named pdfbatchcrop.bat (or anything else you’ll remember ending with .bat), and copy and paste the following into it:
for %%I in (*.pdf) do ( "C:\Program Files\gs\gs8.71\bin\gswin32c.exe" -dSAFER -dNOPAUSE -dBATCH -sDEVICE#bbox "%%I" 2> bounding "C:\Program Files\Calibre2\pdfmanipulate.exe" crop -o "%%~nI-cropped.pdf" -b bounding "%%I" )
(Double check that the paths I have given to your calibre and ghostscript executables are right. They may not be if you’re using an older version of ghostscript, or a 64-bit version of Windows.) Save it in the same directory as the PDFs you want to crop, and then double click on the .bat file in “My Computer” or “Windows Explorer”, and it should crop away.
If a simple crop isn’t “cutting it” for you, whether it is because the header and footer are not removed, or the PDFs have multiple columns which need separating, read on for other suggestions.
Special cropping tools like SoPDF and BRISS
If you need a “smarter” crop, or if you’re one of those people to whom command line programs are anathema, there are some free tools out there that are especially helpful in cropping PDFs, or even dividing PDF pages into e-reader-screen-sized “chunks”. Here I focus on two of my favorites.
SoPDF
SoPDF is a command line program, written in C/C++, available for both Windows and Linux. (The source code is available, if anyone wants to try compiling it for Mac.) For those who hate the command line, there are also multiple GUI wrappers for it, one of which, available here, in full disclosure, was written by yours truly back when I still used Windows. (Another, in some ways better, cross-platform GUI can be found here.)
It not only auto-crops a PDF, but also (at least on default settings), divides each page in half and rotates it 90 degrees. This is meant to allow you to read comfortably a PDF, half a page at a time, by holding your reader sideways to take advantage of the high height/width proportion of these devices.
Despite the GUIs available, on linux, I often find myself using the command line version for batch processing. E.g., to do a directory of PDFs at once:
for file in *.pdf ; do sopdf -i "$file" ; done
SoPDF is lightning fast. Compared to the ghostscript/calibre method mentioned above, you won’t believe how fast this thing works. With its quick and effective method, it’s by far the most-often used tool in my toolbox for making PDFs readable on my Sony Reader.
The downsides are that the original developer has stopped working on it, so it’s stagnated a bit, and doesn’t work if you need to customize the crop amount (for multiple columns, or to remove large headers and footers). In these cases, I turn to …
BRISS
A new and very promising open source tool on the scene is BRISS. The developer likes to call it “a little snipping tool to make the most of your six inch.” (A reference to the six inch screen of these portable devices.) Yes, I apologize for repeating that.
This is a cross-platform Java tool with a nice GUI. It shows you a sort of fuzzy blend of all the pages at once, so that you can manually select crop regions for dividing a page into one, two or more chunks. It also does intelligent handling of documents with different margins on odd and even pages, so that you can treat them differently.
Here’s a screenshot.

In fact, I can imagine this tool being useful in just about any context in which you might want to crop a PDF, not just for the purpose of reading PDFs on portables. One context that comes to mind is the possible need for extracting a single image or diagram from a PDF without converting to a rasterized format.
BRISS is a new project and under rapid development. I expect new features, such as page rotation, in the future.
The are some older tools with similar functions. The big downside with these is that they rasterize the PDF, i.e., convert it to a series of pixelized images, before doing other processing. This has the downside of increasing file size quite a lot, and also makes it impossible to use search or dictionary features of your reader, if it has them. Nevertheless, especially if your PDF starts off as a series of images (such as a scan), these might be worth looking at:
- PDFRead: PDF reading solution available for Mac and Windows.
- PaperCrop: Specialized tool for cropping and dividing multi-column PDFs (Windows only?)
- PDFLRF: Very nice page division and even recombination algorithm, but outputs a rather outdated format, and is no longer being updated
“Reflowing” and/or converting to another format
Some portable readers advertise a feature to “reflow” a PDF, or take the words and sentences and rearrange them to match their display, and your font size preference. In my experience, at least with my (admittedly a few years old) Sony Reader PRS-505, this does not work well at all. Often the result is an unreadable mess, and parts of the document (especially vector graphics) are simply lost.
This isn’t all that surprising. The PDF format wasn’t designed to allow the end user to customize the look of one. Instead, it was designed precisely to preserve a uniform look across the board. In other words, it was designed purely as an output format. A PDF basically stores the exact location of each character or image and nothing else. It doesn’t even store information about what characters go together to make up a single word, much less where paragraphs begin and end, what constitutes a header or footnote as opposed to the body text. It requires a fairly high level of artificial intelligence for a computer to reconstruct such information, which of course is necessary to convert a PDF to a reflowable format.
It’s probably not worth even trying to convert a PDF to another format if it contains complex formatting such as diagrams, tables, logical and mathematical formulas or characters from non-Latin alphabets. But if it’s more or less straight Latin alphabet text arranged in normal paragraphs, it might be worth giving it a shot, especially if you can live with less-than perfect results.
Again, I’m limiting my discussion to free tools. I believe all of the following rely on the open source poppler PDF libraries for their basic textual analysis, and hence, yield fairly similar results.
AbiWord
AbiWord is actually an open source cross-platform word processor. However, especially if you install the additional import and export plugins, it is very handy for converting between formats. (It can even be used from the command line to convert between any formats it knows, which by itself makes it an invaluable tool. It even has a LaTeX export plugin, so you can use it to convert, e.g., MS Word docs and similar to LaTeX, but that’s a topic for another occasion …)
AbiWord will open a PDF and attempt to convert it into a Word Processor file that can be edited as normal. The results are usually far from perfect, but this is still very handy in a variety of circumstances. You can then save the result as RTF or HTML or ODT. Some readers support such formats directly, but if yours doesn’t, these formats are much easier to convert to native e-Reader formats like ePub or mobi.
(A newer plugin for OpenOffice that does more or less the same is also available, though I have little experience myself.)
Calibre
Mentioned above, calibre is an amazingly rich e-book software suite that can be used for library management, document viewing and conversion between many different e-book formats. It comes with a fairly user-friendly (if somewhat slow and uncustomizable) graphical user interface, but most of its functions can also be called from a command-line shell, which is, again, great for scripts.
Calibre recently added the option of converting directly from PDF to formats such as ePub (used by most e-book readers, not to mention the iPad’s iBooks application) or mobi (one of the few formats supported by the Kindle). It allows you to use Regular Expressions for header and footer detection and removal. Again, the results are usually far from perfect, but for a quick conversion of a relatively simple document, this is a very simple and elegant solution.
pdfreflow
Another relatively new project, pdfreflow is a command line program (written in Java and available for Windows, Mac and Linux) that can be used to “reflow” the HTML output of poppler’s pdftohtml conversion routine. (Really, it converts an XML file into a more traditional HTML file than pdftohtml generates by itself.) There are more detailed instructions for it given here.
While many people dislike command-line only programs such as this, it is good for scripting, and in general, I’ve found that its paragraph and formatting detection algorithms seem to be more reliable than calibre’s. Its recognition of things like footnotes could be improved, but perhaps will in future releases.
Since it generates HTML, you may need to use other software for converting the HTML into something your e-book reader can handle, but HTML is an ideal format for converting to others. Converting to ebook formats can be done fairly easily with calibre or a WYSIWYG ePub editor such as Sigil.
A final thing to consider is combining tools. For example, using BRISS to crop away headers and footers, and then converting the results with AbiWord, calibre or pdfreflow might yield better results than one of these tools alone.

What an excellent (wonderfully comprehensive) resource, thanks!
I’ve had some success at pdf cropping in pure LaTeX, using the pdfpages package. I’m sure some of the dedicated tools above are more convenient, but for a quick-and-dirty hack it’s not bad.
Yes, that’s precisely what I meant by doing it with pdflatex “if you know what you’re doing”. What makes it less convenient is that you have to set the trim amount by trial and error. Nothing wrong with it, though if you’re comfortable doing it that way.
Thanks a lot for this overview!
With a Mac you don’t need any additional software: you can crop PDFs with “Preview” which has been part of OS X for years.
You just select the text area on one page, select ALL for the thumb pages, crop and save. Done. Not much work at all.
I too work with LaTeX and it just looks good.
Thanks for the comment.
While that is a nice feature, if it doesn’t do batch processing, and doesn’t allow you to select multiple regions on the same page of the original to divide into pages of the output, it hardly replaces the tools mentioned above.
Thanks for this post. I’ve been on the trail of a solution like this for weeks now and was subsequently getting close to finding one. However, I won’t complain that you laid it out into a neat package for me including some tools like BRISS that I would have remained completely unaware of. Both the ghostscript trick and BRISS work beautifully in OS X. Thanks!
Thanks for the nice set of opensource tools. one more that you might mention is pdftohtml a script that allows you to convert the pdf to html.
A final solution is docs.google.com which allows import of pdfs
Again none of these really works for OCR scanned pdfs or complex pdfs with math.
Thanks
If you’re on linux, the document viewer Okular (KDE) has a feature “Trim margins” which views the pdf without any margin whitespace. It doesn’t actually crop the document which is especially good for protected documents you can’t even crop or if you don’t like modifying your documents.
I’ve been using pdfshuffler (GTK) on linux for all my cropping needs. It also allows inserting, rearranging, deleting, etc. of pdf pages. The only commercial software I miss is Acrobat Pro for OCRing pdfs.
I actually think Google Docs will OCR scanned PDFs now, but I haven’t heard good things about the results.
This blog post here describes a method for creating “searchable image” scanned PDFs similar to what Acrobat Pro produces using only free tools on linux. I’ve been meaning to try it, but haven’t yet.
Thanks for the tip about PDF shuffler too. I should try that as well.
Awesome resource. Works great for my nook!!!
Thanks for the work.
Thanks for all this valuable information. I should mention upprint. which also automatically crops PDFs: http://www.mscs.dal.ca/~selinger/lpr-wrapper/ Thanks molecule-eye for mentioning pdfshuffler, which seems to work elegantly enough (although it, to, doesn’t allow for multiple regions.
Use– and beautiful!
Hi Kevin, I was trying to use your pdf cropping script on some of my files but I am unable to get it to work. I’m using Windows 7(x64) and running the latest versions of Ghostscript (gs9) and Calibre. I’ve placed the bat file in a folder along with the pdf I want to crop and run the file from within the folder. Whenever I run the bat file, it opens a ‘Ghostscript Image’ program. It stays open and does nothing. When I manually close both the gs image program and the command prompt, it just creates a file ‘bounding’. If I run the bat file as administrator, the command prompt opens and closes and doesn’t seem to be doing anything. Any suggestion would be helpful. Thanks, Joseph.
I can try to help, but since I don’t use Windows anymore (yuck!), it’s a bit hard to try to test. You must have changed the batch file so that the path to the Ghostscript executable puts it in the right folder. Can you post how you changed it? And are there any other executables in the same folder? I seem to recall the old version having both a gswin32.exe and a gswin32c.exe, and you needed to be sure to use one rather than the other.
for %%I in (*.pdf) do ( “C:\Program Files (x86)\gs\gs9.00\bin\gswin32c.exe” –dSAFER –dNOPAUSE –sDEVICE#bbox “%%I” 2> bounding “C:\Program Files (x86)\Calibre2\pdfmanipulate.exe” crop –o “%%~nI-cropped.pdf” –b bounding “%%I“ )
This is the full code that I’ve put into the bat file. I did notice the 2 exe files in the gs\bin folder. I’ve tried both. Using gswin32.exe opens up ghostscript and it seems to want an input of some sort (the command line stays at ‘GS>’).
Pretty sure you want the one with the c at the end.
You do have line breaks in there, right? (I know they don’t show up here in the comments.)
Does it help if you add –dBATCH to the list of flags for ghostscript? (Right after –dSAFER and –dNOPAUSE.) I have that in the mix for linux; not sure why I didn’t include it for Windows.
OK, I think the missing –dBATCH might have been the problem. I just tried with ghostscript 9.00 and the newest version of calibre on my wife’s computer (Win XP), and it didn’t work quite right without –dBATCH, but did when I added that in. (Well, it does work even without –dBATCH, but you have to type “quit” at the gs> prompt after each file it processes, which is annoying.) I’ll change the post accordingly, thanks.
Let me know if that helps for you.
Hi Kevin, yes the missing –dBATCH was the issue and the code is working properly. Thanks for helping out.
Great post!
The link for sopdf includes a windows executable. Do you know where I can find a linux binary by any chance?
Cheers
The linux binary can be found on page 2 of the same forum thread. In particular, in this post. You can also get it bundled with a GUI here for 32 bit linux or here for 64 bit linux.
Great post! Very helpful. I have an alternative to SoPDF that rotates pages, divides them in half and crops them, but does so in a way that gives the user greater control over the crop. So in cases where a book chapter is photocopied unevenly/imperfectly — i.e., some pages are centered, others aren’t — the user can adjust the crop as necessary while still doing most of the work in a batch-processing way. This alternative involves using three tools, all of which are mentioned in this post and all of which run on GNU/Linux, MacOS X and Windows; PDF-Shuffler, Pdftk and Briss. I have a short write-up for anyone interested at: http://nuxtp.com/howto/chptr.html.
Sorry. Link came out wonky, b/c trailing period got included in url. Trying again: http://nuxtp.com/howto/chptr.html
Google took me to your page as I was searching for an alternative to the Ghostscript/Calibre solution.
So thanks a lot for presenting these different options. Hat off to you.
I recently started having problems with cropping. I crop single pages. No problem when they come in portrait format, but cropping all messed up when they are in landscape format. The same images that I used to crop without any problem now don’t crop so nicely.
I’m not sure what happened. I upgraded Calibre, could that be the reason?
This is what I usually do (see code below): but omitting the call to gswin64c makes no difference. Could it be a problem with it? I also simultaneously upgraded Ghostscript from 32 bits to 64 bits…
Could it be that the bounding box that Ghostscript is supposed to calculate is wrong?
The file “bounding” (no extension) created by the process contains this:
%%BoundingBox: 0 0 416 370 %%HiResBoundingBox: 0.324000 0.162000 415.475987 369.174153
The top dimensions are a good bounding box. The bottom one are all wrong.
Anyone experienced the same problems?
I’m on Windows, I haven’t tried on Linux yet.
for %%I in (*.pdf) do ( “C:\Program Files\Ghostscript\gs9.02\bin\gswin64c.exe” –dSAFER –dNOPAUSE –dBATCH –sDEVICE#bbox “%%I” 2> bounding “C:\Program Files (x86)\Calibre2\pdfmanipulate.exe” crop –o “%%~nICropped.pdf” –b bounding “%%I“ )
follow-up on my comment above.
Your blog post is a great source of inspiration because there is indeed some difficulty in applying the Ghostscript/Calibre cropping scheme.
There seems to be several problems. One problem is related to using calibre2 with gswin64c (latest version I tested is gs9.04) as opposed to gswin32c. It may not crop up, as it were pun intended etc, on all machines, but it does on some. The upshot is, if I understand correctly, that calibre uses a modified/patched version of pypdf; if your system already has pypdf somewhere and calibre attempts to use it then it won’t work. I don’t know how to fix the problem except that calling the 32 bit version of Ghostscript before Calibre works. Strange.
Another problem has to do with using postscript files unwisely generated by software like Maple or Matlab which place the bounding box line: %%BoundingBox w x y z somewhere other than the second line of the file. One workaround is to use epstopdf instead of ps2pdf, say, to fix the bounding box. However, I couldn’t get epstool to do what it’s supposed to do (calculate the bounding box), it seems that these Matlab/Maple files are busted in more ways than one.
To sum up, using the 32 bit version of Ghostscript (on a 64 bit machine, no problem) works for decent postscript images, such as produced by pstricks for instance, but for Matlab/Maple postscript files, applying epstopdf helps to calculate the bounding box correctly; otherwise the cropping is done haphazardly.
Your alternative suggestions are therefore extremely valuable.
References: Windows: http://www.mobileread.com/forums/archive/index.php/t-103097.html Linux: https://bugs.launchpad.net/ubuntu/+source/calibre/+bug/800551
I got a kindle for Christmas and have just got round to trying some of these solutions out. One thing to say is that emailing a pdf to your “kindle” email address does a pretty good job of cropping the whitespace, for the most part. So i jumped straight in to BRISS. I like it a lot. But if, you don’t get the cutting up just right, the last couple of lines of a page appear on their own on a separate screen. not ideal. Not really any better than kindle’s automatic system. What I do imagine this would be useful for is splitting up a two column document into one that could be read easily.