Reading PDFs on portables

Chances are, if you’re an aca­d­e­mic in the 21st cen­tury, you read a lot of PDFs, whether they are arti­cles down­loaded from jour­nals’ web­sites, works in progress from col­leagues, or a stu­dent papers. To avoid the costs of print­ing, and for your own con­ve­nience, you might be inter­ested in read­ing them on a portable device, such as e-book reader like the Sony Reader Dig­i­tal Book, Barnes & Noble Nook or Ama­zon Kin­dle. Although larger mod­els exist, the rea­son­ably priced ones have screens the size of small paper­backs, roughly 9 × 12cm. (Of course, you might even want to use a smart phone with an even smaller screen!) This can pose a prob­lem when it comes to PDFs designed to be printed on let­ter (8.5 × 11in) or even A4 sized paper. The device will typ­i­cally shrink them to fit, result­ing in an unread­ably small font size.

But there are a num­ber of things you can try to make those PDFs more read­able. Here I focus my com­ments on free meth­ods, rather than on those that rely on expen­sive com­mer­cial tools like Adobe Acro­bat Pro­fes­sional (which, to be hon­est, doesn’t work that well for this any­way). Here’s a run down of my expe­ri­ences with a vari­ety of methods:

  1. Work­ing with the source doc­u­ment, if available
  2. Sim­ple white­space cropping
  3. Spe­cial crop­ping tools like SoPDF and BRISS
  4. “Reflow­ing” and/or con­vert­ing to another format


Work­ing with the source doc­u­ment, if available

This is by far the best method. The catch is obvi­ous: you actu­ally have to have access to the source doc­u­ment from which the PDF was cre­ated. This is great for your own works, and per­haps those of stu­dents or gen­er­ous col­leagues, but often not even a pos­si­bil­ity. For a tra­di­tional Word­Proces­sor file, it usu­ally requires chang­ing the out­put “paper” size to the size of your portable, and min­i­miz­ing the mar­gins. Sony has put together excel­lent and detailed instruc­tions for doing this with OpenOf­fice and/or Microsoft Word, which would mostly carry over to other portables.

Read­ers of this blog may be inter­ested in doing it with LaTeX source files. I’ve had luck adding the fol­low­ing (or a slight mod­i­fi­ca­tion) to the preamble:

\usepackage[papersize={90mm,120mm},margin=2mm]{geometry}
\usepackage[kerning=true]{microtype}
\usepackage[T1]{fontenc}
\usepackage[charter]{mathdesign}
\usepackage[normalmargins]{savetrees}
\sloppy
\pagestyle{empty}

The geom­e­try pack­age sets the page size and very min­i­mal mar­gins. You’ll want to change the first option to match your device’s screen, and adjust the sec­ond accord­ing to pref­er­ence. The microtype pack­age improves typog­ra­phy in cramped quar­ters, and allows cer­tain punc­tu­a­tion to extend slightly into the mar­gins in a way that is vis­i­bly unno­tice­able. The save­trees option adjusts the size and spac­ing of titles and lists, and gen­er­ally squeezes more onto a page. (The nor­mal­mar­gins option is needed to avoid a clash with our geom­e­try set­tings.) The \sloppy com­mand makes LaTeX less fussy about avoid­ing under­full “bad boxes”; with­out this you’ll get text run­ning off the right side of the screen. The \pagestyle com­mand sup­presses head­ers and foot­ers and page num­bers. (Not that there’s room for them anyway.)

I like to use T1 font encod­ing with the Bit­stream Char­ter font pro­vided by the math­de­sign pack­age. I find this font to be very leg­i­ble when read from a screen, but still pro­fes­sional look­ing. This is of course a mat­ter of taste. Sub­sti­tute your favorite font instead. Per­son­ally, I think the default Com­puter Mod­ern LaTeX fonts are bet­ter suited for paper than screen read­ing. If you choose to use XeLa­TeX for greater font selec­tion, go ahead, but you’ll need to remove the microtype pack­age, which only works with reg­u­lar latex/pdflatex.

The result should be some­thing quite nice and com­fort­able to read on a portable device. If you visit dis­cus­sion com­mu­ni­ties ded­i­cated to e-book read­ers like Mobil­eRead or Tel­eRead, you’ll read a lot of bash­ing of the PDF for­mat. In my opin­ion, PDF is both the best and the worst doc­u­ment for­mat. It’s the best when the PDF was designed for the medium it’s viewed on. It’s the worst when it was not.


Sim­ple white­space cropping

One of the sim­plest changes you can make to a PDF, with­out chang­ing any of the con­tent or for­mat­ting, is to remove the mar­gins so that the con­tent of the pages takes up the whole screen. There is a lot of soft­ware, both free and com­mer­cial, that can be used to crop a PDF, includ­ing pdfla­tex itself (though it’s not the most effi­cient tool for the job). Unfor­tu­nately, it’s not always as straight­for­ward as it may seem. The so-called “crop” fea­ture of Adobe Acro­bat Pro­fes­sional, for exam­ple, doesn’t actu­ally crop PDFs, but instead inserts an instruc­tion telling the view­ing soft­ware to ignore the cropped areas, which often isn’t respected by portable devices.

Nonethe­less, you prob­a­bly already have free/open source soft­ware installed that can do the trick. If you have LaTeX installed, you almost cer­tainly have ghost­script installed. If you own a dig­i­tal e-book reader, there’s a good chance you have cal­i­bre, the free/open source ebook man­age­ment soft­ware installed. (And if not, you should, even if you don’t like it as a library man­ager.) The two of them together can be used to ana­lyze and auto-crop a PDF with a sin­gle com­mand, or even batch-crop an entire folder’s worth.

On linux/Unix (includ­ing, I think, Mac OS X, though I haven’t tested this), from the com­mand line, you can type in:

for i in *.pdf ; do gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=bbox "$i" 2> bounding ; pdfmanipulate crop -o "${i%.pdf}-cropped.pdf" -b bounding "$i"; done

This will cre­ate a cropped ver­sion of every PDF in the active directory.

On Win­dows, you can do the same by cre­at­ing a batch file. Using notepad or other text edi­tor, cre­ate a file named pdfbatchcrop.bat (or any­thing else you’ll remem­ber end­ing with .bat), and copy and paste the fol­low­ing into it:

for %%I in (*.pdf) do (
"C:\Program Files\gs\gs8.71\bin\gswin32c.exe" -dSAFER -dNOPAUSE -dBATCH -sDEVICE#bbox "%%I" 2> bounding
"C:\Program Files\Calibre2\pdfmanipulate.exe" crop -o "%%~nI-cropped.pdf" -b bounding "%%I"
)

(Dou­ble check that the paths I have given to your cal­i­bre and ghost­script exe­cuta­bles are right. They may not be if you’re using an older ver­sion of ghost­script, or a 64-bit ver­sion of Win­dows.) Save it in the same direc­tory as the PDFs you want to crop, and then dou­ble click on the .bat file in “My Com­puter” or “Win­dows Explorer”, and it should crop away.

If a sim­ple crop isn’t “cut­ting it” for you, whether it is because the header and footer are not removed, or the PDFs have mul­ti­ple columns which need sep­a­rat­ing, read on for other suggestions.


Spe­cial crop­ping tools like SoPDF and BRISS

If you need a “smarter” crop, or if you’re one of those peo­ple to whom com­mand line pro­grams are anath­ema, there are some free tools out there that are espe­cially help­ful in crop­ping PDFs, or even divid­ing PDF pages into e-reader-screen-sized “chunks”. Here I focus on two of my favorites.

SoPDF

SoPDF is a com­mand line pro­gram, writ­ten in C/C++, avail­able for both Win­dows and Linux. (The source code is avail­able, if any­one wants to try com­pil­ing it for Mac.) For those who hate the com­mand line, there are also mul­ti­ple GUI wrap­pers for it, one of which, avail­able here, in full dis­clo­sure, was writ­ten by yours truly back when I still used Win­dows. (Another, in some ways bet­ter, cross-platform GUI can be found here.)

It not only auto-crops a PDF, but also (at least on default set­tings), divides each page in half and rotates it 90 degrees. This is meant to allow you to read com­fort­ably a PDF, half a page at a time, by hold­ing your reader side­ways to take advan­tage of the high height/width pro­por­tion of these devices.

Despite the GUIs avail­able, on linux, I often find myself using the com­mand line ver­sion for batch pro­cess­ing. E.g., to do a direc­tory of PDFs at once:

for file in *.pdf ; do sopdf -i "$file" ; done

SoPDF is light­ning fast. Com­pared to the ghostscript/calibre method men­tioned above, you won’t believe how fast this thing works. With its quick and effec­tive method, it’s by far the most-often used tool in my tool­box for mak­ing PDFs read­able on my Sony Reader.

The down­sides are that the orig­i­nal devel­oper has stopped work­ing on it, so it’s stag­nated a bit, and doesn’t work if you need to cus­tomize the crop amount (for mul­ti­ple columns, or to remove large head­ers and foot­ers). In these cases, I turn to …

BRISS

A new and very promis­ing open source tool on the scene is BRISS. The devel­oper likes to call it “a lit­tle snip­ping tool to make the most of your six inch.” (A ref­er­ence to the six inch screen of these portable devices.) Yes, I apol­o­gize for repeat­ing that.

This is a cross-platform Java tool with a nice GUI. It shows you a sort of fuzzy blend of all the pages at once, so that you can man­u­ally select crop regions for divid­ing a page into one, two or more chunks. It also does intel­li­gent han­dling of doc­u­ments with dif­fer­ent mar­gins on odd and even pages, so that you can treat them differently.

Here’s a screenshot.

BRISS in Action

In fact, I can imag­ine this tool being use­ful in just about any con­text in which you might want to crop a PDF, not just for the pur­pose of read­ing PDFs on porta­bles. One con­text that comes to mind is the pos­si­ble need for extract­ing a sin­gle image or dia­gram from a PDF with­out con­vert­ing to a ras­ter­ized format.

BRISS is a new project and under rapid devel­op­ment. I expect new fea­tures, such as page rota­tion, in the future.

The are some older tools with sim­i­lar func­tions. The big down­side with these is that they ras­ter­ize the PDF, i.e., con­vert it to a series of pix­elized images, before doing other pro­cess­ing. This has the down­side of increas­ing file size quite a lot, and also makes it impos­si­ble to use search or dic­tio­nary fea­tures of your reader, if it has them. Nev­er­the­less, espe­cially if your PDF starts off as a series of images (such as a scan), these might be worth look­ing at:

  • PDF­Read: PDF read­ing solu­tion avail­able for Mac and Windows.
  • Paper­Crop: Spe­cial­ized tool for crop­ping and divid­ing multi-column PDFs (Win­dows only?)
  • PDFLRF: Very nice page divi­sion and even recom­bi­na­tion algo­rithm, but out­puts a rather out­dated for­mat, and is no longer being updated

Reflow­ing” and/or con­vert­ing to another format

Some portable read­ers adver­tise a fea­ture to “reflow” a PDF, or take the words and sen­tences and rearrange them to match their dis­play, and your font size pref­er­ence. In my expe­ri­ence, at least with my (admit­tedly a few years old) Sony Reader PRS-505, this does not work well at all. Often the result is an unread­able mess, and parts of the doc­u­ment (espe­cially vec­tor graph­ics) are sim­ply lost.

This isn’t all that sur­pris­ing. The PDF for­mat wasn’t designed to allow the end user to cus­tomize the look of one. Instead, it was designed pre­cisely to pre­serve a uni­form look across the board. In other words, it was designed purely as an out­put for­mat. A PDF basi­cally stores the exact loca­tion of each char­ac­ter or image and noth­ing else. It doesn’t even store infor­ma­tion about what char­ac­ters go together to make up a sin­gle word, much less where para­graphs begin and end, what con­sti­tutes a header or foot­note as opposed to the body text. It requires a fairly high level of arti­fi­cial intel­li­gence for a com­puter to recon­struct such infor­ma­tion, which of course is nec­es­sary to con­vert a PDF to a reflow­able format.

It’s prob­a­bly not worth even try­ing to con­vert a PDF to another for­mat if it con­tains com­plex for­mat­ting such as dia­grams, tables, log­i­cal and math­e­mat­i­cal for­mu­las or char­ac­ters from non-Latin alpha­bets. But if it’s more or less straight Latin alpha­bet text arranged in nor­mal para­graphs, it might be worth giv­ing it a shot, espe­cially if you can live with less-than per­fect results.

Again, I’m lim­it­ing my dis­cus­sion to free tools. I believe all of the fol­low­ing rely on the open source pop­pler PDF libraries for their basic tex­tual analy­sis, and hence, yield fairly sim­i­lar results.

Abi­Word

Abi­Word is actu­ally an open source cross-platform word proces­sor. How­ever, espe­cially if you install the addi­tional import and export plu­g­ins, it is very handy for con­vert­ing between for­mats. (It can even be used from the com­mand line to con­vert between any for­mats it knows, which by itself makes it an invalu­able tool. It even has a LaTeX export plu­gin, so you can use it to con­vert, e.g., MS Word docs and sim­i­lar to LaTeX, but that’s a topic for another occasion …)

Abi­Word will open a PDF and attempt to con­vert it into a Word Proces­sor file that can be edited as nor­mal. The results are usu­ally far from per­fect, but this is still very handy in a vari­ety of cir­cum­stances. You can then save the result as RTF or HTML or ODT. Some read­ers sup­port such for­mats directly, but if yours doesn’t, these for­mats are much eas­ier to con­vert to native e-Reader for­mats like ePub or mobi.

(A newer plu­gin for OpenOf­fice that does more or less the same is also avail­able, though I have lit­tle expe­ri­ence myself.)

Cal­i­bre

Men­tioned above, cal­i­bre is an amaz­ingly rich e-book soft­ware suite that can be used for library man­age­ment, doc­u­ment view­ing and con­ver­sion between many dif­fer­ent e-book for­mats. It comes with a fairly user-friendly (if some­what slow and uncus­tomiz­able) graph­i­cal user inter­face, but most of its func­tions can also be called from a command-line shell, which is, again, great for scripts.

Cal­i­bre recently added the option of con­vert­ing directly from PDF to for­mats such as ePub (used by most e-book read­ers, not to men­tion the iPad’s iBooks appli­ca­tion) or mobi (one of the few for­mats sup­ported by the Kin­dle). It allows you to use Reg­u­lar Expres­sions for header and footer detec­tion and removal. Again, the results are usu­ally far from per­fect, but for a quick con­ver­sion of a rel­a­tively sim­ple doc­u­ment, this is a very sim­ple and ele­gant solution.

pdfre­flow

Another rel­a­tively new project, pdfre­flow is a com­mand line pro­gram (writ­ten in Java and avail­able for Win­dows, Mac and Linux) that can be used to “reflow” the HTML out­put of poppler’s pdfto­html con­ver­sion rou­tine. (Really, it con­verts an XML file into a more tra­di­tional HTML file than pdfto­html gen­er­ates by itself.) There are more detailed instruc­tions for it given here.

While many peo­ple dis­like command-line only pro­grams such as this, it is good for script­ing, and in gen­eral, I’ve found that its para­graph and for­mat­ting detec­tion algo­rithms seem to be more reli­able than calibre’s. Its recog­ni­tion of things like foot­notes could be improved, but per­haps will in future releases.

Since it gen­er­ates HTML, you may need to use other soft­ware for con­vert­ing the HTML into some­thing your e-book reader can han­dle, but HTML is an ideal for­mat for con­vert­ing to oth­ers. Con­vert­ing to ebook for­mats can be done fairly eas­ily with cal­i­bre or a WYSIWYG ePub edi­tor such as Sigil.

A final thing to con­sider is com­bin­ing tools. For exam­ple, using BRISS to crop away head­ers and foot­ers, and then con­vert­ing the results with Abi­Word, cal­i­bre or pdfre­flow might yield bet­ter results than one of these tools alone.

Posted Saturday, June 12th, 2010 under LaTeX, Uncategorized.

27 comments

  1. What an excel­lent (won­der­fully com­pre­hen­sive) resource, thanks!

    I’ve had some suc­cess at pdf crop­ping in pure LaTeX, using the pdf­pages pack­age. I’m sure some of the ded­i­cated tools above are more con­ve­nient, but for a quick-and-dirty hack it’s not bad.

  2. Yes, that’s pre­cisely what I meant by doing it with pdfla­tex “if you know what you’re doing”. What makes it less con­ve­nient is that you have to set the trim amount by trial and error. Noth­ing wrong with it, though if you’re com­fort­able doing it that way.

  3. Thanks a lot for this overview!

    With a Mac you don’t need any addi­tional soft­ware: you can crop PDFs with “Pre­view” which has been part of OS X for years.

    You just select the text area on one page, select ALL for the thumb pages, crop and save. Done. Not much work at all.

    I too work with LaTeX and it just looks good.

  4. Thanks for the comment.

    While that is a nice fea­ture, if it doesn’t do batch pro­cess­ing, and doesn’t allow you to select mul­ti­ple regions on the same page of the orig­i­nal to divide into pages of the out­put, it hardly replaces the tools men­tioned above.

  5. Thanks for this post. I’ve been on the trail of a solu­tion like this for weeks now and was sub­se­quently get­ting close to find­ing one. How­ever, I won’t com­plain that you laid it out into a neat pack­age for me includ­ing some tools like BRISS that I would have remained com­pletely unaware of. Both the ghost­script trick and BRISS work beau­ti­fully in OS X. Thanks!

  6. Anthony says:

    Thanks for the nice set of open­source tools. one more that you might men­tion is pdfto­html a script that allows you to con­vert the pdf to html.

    A final solu­tion is docs​.google​.com which allows import of pdfs

    Again none of these really works for OCR scanned pdfs or com­plex pdfs with math.

    Thanks

  7. molecule-eye says:

    If you’re on linux, the doc­u­ment viewer Oku­lar (KDE) has a fea­ture “Trim mar­gins” which views the pdf with­out any mar­gin white­space. It doesn’t actu­ally crop the doc­u­ment which is espe­cially good for pro­tected doc­u­ments you can’t even crop or if you don’t like mod­i­fy­ing your documents.

    I’ve been using pdf­shuf­fler (GTK) on linux for all my crop­ping needs. It also allows insert­ing, rear­rang­ing, delet­ing, etc. of pdf pages. The only com­mer­cial soft­ware I miss is Acro­bat Pro for OCRing pdfs.

  8. I actu­ally think Google Docs will OCR scanned PDFs now, but I haven’t heard good things about the results.

    This blog post here describes a method for cre­at­ing “search­able image” scanned PDFs sim­i­lar to what Acro­bat Pro pro­duces using only free tools on linux. I’ve been mean­ing to try it, but haven’t yet.

    Thanks for the tip about PDF shuf­fler too. I should try that as well.

  9. Randy Pope says:

    Awe­some resource. Works great for my nook!!!

    Thanks for the work.

  10. Thanks for all this valu­able infor­ma­tion. I should men­tion upprint. which also auto­mat­i­cally crops PDFs: http://www.mscs.dal.ca/~selinger/lpr-wrapper/ Thanks molecule-eye for men­tion­ing pdf­shuf­fler, which seems to work ele­gantly enough (although it, to, doesn’t allow for mul­ti­ple regions.

  11. Use– and beautiful!

  12. Hi Kevin, I was try­ing to use your pdf crop­ping script on some of my files but I am unable to get it to work. I’m using Win­dows 7(x64) and run­ning the lat­est ver­sions of Ghost­script (gs9) and Cal­i­bre. I’ve placed the bat file in a folder along with the pdf I want to crop and run the file from within the folder. When­ever I run the bat file, it opens a ‘Ghost­script Image’ pro­gram. It stays open and does noth­ing. When I man­u­ally close both the gs image pro­gram and the com­mand prompt, it just cre­ates a file ‘bound­ing’. If I run the bat file as admin­is­tra­tor, the com­mand prompt opens and closes and doesn’t seem to be doing any­thing. Any sug­ges­tion would be help­ful. Thanks, Joseph.

  13. I can try to help, but since I don’t use Win­dows any­more (yuck!), it’s a bit hard to try to test. You must have changed the batch file so that the path to the Ghost­script exe­cutable puts it in the right folder. Can you post how you changed it? And are there any other exe­cuta­bles in the same folder? I seem to recall the old ver­sion hav­ing both a gswin32.exe and a gswin32c.exe, and you needed to be sure to use one rather than the other.

  14. for %%I in (*.pdf) do ( “C:\Program Files (x86)\gs\gs9.00\bin\gswin32c.exe” –dSAFER –dNOPAUSE –sDEVICE#bbox “%%I” 2> bound­ing “C:\Program Files (x86)\Calibre2\pdfmanipulate.exe” crop –o “%%~nI-cropped.pdf” –b bound­ing “%%I“ )

    This is the full code that I’ve put into the bat file. I did notice the 2 exe files in the gs\bin folder. I’ve tried both. Using gswin32.exe opens up ghost­script and it seems to want an input of some sort (the com­mand line stays at ‘GS>’).

  15. Pretty sure you want the one with the c at the end.

    You do have line breaks in there, right? (I know they don’t show up here in the comments.)

    Does it help if you add –dBATCH to the list of flags for ghost­script? (Right after –dSAFER and –dNOPAUSE.) I have that in the mix for linux; not sure why I didn’t include it for Windows.

  16. OK, I think the miss­ing –dBATCH might have been the prob­lem. I just tried with ghost­script 9.00 and the newest ver­sion of cal­i­bre on my wife’s com­puter (Win XP), and it didn’t work quite right with­out –dBATCH, but did when I added that in. (Well, it does work even with­out –dBATCH, but you have to type “quit” at the gs> prompt after each file it processes, which is annoy­ing.) I’ll change the post accord­ingly, thanks.

    Let me know if that helps for you.

  17. Hi Kevin, yes the miss­ing –dBATCH was the issue and the code is work­ing prop­erly. Thanks for help­ing out. :)

  18. Great post!

    The link for sopdf includes a win­dows exe­cutable. Do you know where I can find a linux binary by any chance?

    Cheers

  19. The linux binary can be found on page 2 of the same forum thread. In par­tic­u­lar, in this post. You can also get it bun­dled with a GUI here for 32 bit linux or here for 64 bit linux.

  20. Great post! Very help­ful. I have an alter­na­tive to SoPDF that rotates pages, divides them in half and crops them, but does so in a way that gives the user greater con­trol over the crop. So in cases where a book chap­ter is pho­to­copied unevenly/imperfectly — i.e., some pages are cen­tered, oth­ers aren’t — the user can adjust the crop as nec­es­sary while still doing most of the work in a batch-processing way. This alter­na­tive involves using three tools, all of which are men­tioned in this post and all of which run on GNU/Linux, MacOS X and Win­dows; PDF-Shuffler, Pdftk and Briss. I have a short write-up for any­one inter­ested at: http://​nuxtp​.com/​h​o​w​t​o​/​c​h​p​t​r​.​h​tml.

  21. Sorry. Link came out wonky, b/c trail­ing period got included in url. Try­ing again: http://​nuxtp​.com/​h​o​w​t​o​/​c​h​p​t​r​.​h​tml

  22. pat toche says:

    Google took me to your page as I was search­ing for an alter­na­tive to the Ghostscript/Calibre solution.

    So thanks a lot for pre­sent­ing these dif­fer­ent options. Hat off to you.

    I recently started hav­ing prob­lems with crop­ping. I crop sin­gle pages. No prob­lem when they come in por­trait for­mat, but crop­ping all messed up when they are in land­scape for­mat. The same images that I used to crop with­out any prob­lem now don’t crop so nicely.

    I’m not sure what hap­pened. I upgraded Cal­i­bre, could that be the reason?

    This is what I usu­ally do (see code below): but omit­ting the call to gswin64c makes no dif­fer­ence. Could it be a prob­lem with it? I also simul­ta­ne­ously upgraded Ghost­script from 32 bits to 64 bits…

    Could it be that the bound­ing box that Ghost­script is sup­posed to cal­cu­late is wrong?

    The file “bound­ing” (no exten­sion) cre­ated by the process con­tains this:

    %%Bound­ing­Box: 0 0 416 370 %%HiRes­Bound­ing­Box: 0.324000 0.162000 415.475987 369.174153

    The top dimen­sions are a good bound­ing box. The bot­tom one are all wrong.

    Any­one expe­ri­enced the same problems?

    I’m on Win­dows, I haven’t tried on Linux yet.

    for %%I in (*.pdf) do ( “C:\Program Files\Ghostscript\gs9.02\bin\gswin64c.exe” –dSAFER –dNOPAUSE –dBATCH –sDEVICE#bbox “%%I” 2> bound­ing “C:\Program Files (x86)\Calibre2\pdfmanipulate.exe” crop –o “%%~nICropped.pdf” –b bound­ing “%%I“ )

  23. pat toche says:

    follow-up on my com­ment above.

    Your blog post is a great source of inspi­ra­tion because there is indeed some dif­fi­culty in apply­ing the Ghostscript/Calibre crop­ping scheme.

    There seems to be sev­eral prob­lems. One prob­lem is related to using calibre2 with gswin64c (lat­est ver­sion I tested is gs9.04) as opposed to gswin32c. It may not crop up, as it were pun intended etc, on all machines, but it does on some. The upshot is, if I under­stand cor­rectly, that cal­i­bre uses a modified/patched ver­sion of pypdf; if your sys­tem already has pypdf some­where and cal­i­bre attempts to use it then it won’t work. I don’t know how to fix the prob­lem except that call­ing the 32 bit ver­sion of Ghost­script before Cal­i­bre works. Strange.

    Another prob­lem has to do with using post­script files unwisely gen­er­ated by soft­ware like Maple or Mat­lab which place the bound­ing box line: %%Bound­ing­Box w x y z some­where other than the sec­ond line of the file. One workaround is to use epstopdf instead of ps2pdf, say, to fix the bound­ing box. How­ever, I couldn’t get epstool to do what it’s sup­posed to do (cal­cu­late the bound­ing box), it seems that these Matlab/Maple files are busted in more ways than one.

    To sum up, using the 32 bit ver­sion of Ghost­script (on a 64 bit machine, no prob­lem) works for decent post­script images, such as pro­duced by pstricks for instance, but for Matlab/Maple post­script files, apply­ing epstopdf helps to cal­cu­late the bound­ing box cor­rectly; oth­er­wise the crop­ping is done haphazardly.

    Your alter­na­tive sug­ges­tions are there­fore extremely valuable.

    Ref­er­ences: Win­dows: http://​www​.mobil​eread​.com/​f​o​r​u​m​s​/​a​r​c​h​i​v​e​/​i​n​d​e​x​.​p​h​p​/​t​-​1​0​3​0​9​7​.​h​tml Linux: https://​bugs​.launch​pad​.net/​u​b​u​n​t​u​/​+​s​o​u​r​c​e​/​c​a​l​i​b​r​e​/​+​b​u​g​/​8​0​0​551

  24. I got a kin­dle for Christ­mas and have just got round to try­ing some of these solu­tions out. One thing to say is that email­ing a pdf to your “kin­dle” email address does a pretty good job of crop­ping the white­space, for the most part. So i jumped straight in to BRISS. I like it a lot. But if, you don’t get the cut­ting up just right, the last cou­ple of lines of a page appear on their own on a sep­a­rate screen. not ideal. Not really any bet­ter than kindle’s auto­matic sys­tem. What I do imag­ine this would be use­ful for is split­ting up a two col­umn doc­u­ment into one that could be read easily.

Leave a Reply