Friday, March 5, 2010

Converting a PDF to a Word Doc with KWord

I was posed with a challenge yesterday and fortunately, the challenge was cancelled. Let me explain why I say "fortunately". At my day job, my boss wanted me to convert a document produced in LaTeX to a Word document. I work with LaTeX in Kile and this isn't an option that seems available. The native output of my little set up is PDF but the PDF to Word doc conversion options didn't look promising either.

As I said, the immediacy of the challenge was cancelled and another solution was found, but the request could come up again and I thought it would be nice to find an answer now while I have a bit of time on my hands. Long story short, I haven't found a way to convert LaTeX to a Word doc format, but there is a way to open a PDF and save it as a Word doc, using KWord.

I did a fair amount of searching and finally discovered an article at the EmbraceUbuntu.com blog. It's older information...almost three years old, but I thought I'd see if the solution works, since it promises to be able to open a PDF, save it as an odt or doc, and preserve the formatting. This last part is important, because I really need tables in the PDF to still be tables in the doc.

I dutifully installed KWord on my Ubuntu machine and gave it a shot. While the latest incarnation of KWord does a more or less OK job of preserving format, it is far from perfect. Here are my examples. The first image is the sample PDF page I chose to work with. No, it doesn't have tables, but I'll get to that in a minute.

The second image is the same page opened in KWord. Not exactly a stunning likeness of the first image, but it is pretty good. That said, I tried it on the actual pages I had been asked to work with yesterday and the table formatting completely disappeared when imported into KWord.

I looked at the example of the process in the 2007 blog article vs. what I performed, and the steps and features seem identical. While it looks like KWord (as part of KOffice) is continuing to be developed and maintained, this particular feature doesn't appear to have changed much, if at all, in the past almost three years.

I guess I can't complain too much. This is the closest I've come to solving my little problem, but if converting a PDF to a Word doc is a task on someone's plate at KOffice, I humbly request that it get a little more attention. It would be a big help. Honest.

Afterword: I regularly use OpenOffice.org Writer to convert odt and doc files to PDF and it works just great. Too bad the abundant resources being fed into OOo development can't also be used to include reversing the process.

13 comments:

  1. Converting PDF to anything usable in terms of formatting is always a challenge. One option I found recently is Sun's PDF import extension for OpenOffice (http://extensions.services.openoffice.org/project/pdfimport). The thing is that it converts PDFs to OpenOffice Draw, not Writer. If you only need to make minor changes, it's definitively a good option, although according to the website it does not support processing layout of LaTeX PDF, so you would have to check with your files.
    One advantage of using this extension is that you can save your OOo documents in a PDF/ODF hybrid format, so that they can be opened both by PDF readers and OpenOffice (without loosing the layout).
    Let's hope they improve this extension, or better yet, that they implement PDF>ODF conversion directly in OOo.

    ReplyDelete
  2. An extension to OpenOffice to import pdf files is available here:
    http://extensions.services.openoffice.org/project/pdfimport

    ReplyDelete
  3. Isn't there a PDF Import extension for OpenOffice that's been available for a couple years? Originally from Sun? The conversion isn't perfect, and it imports as an ODG rather than ODT. But it's not bad. Search "openoffice pdf".

    ReplyDelete
  4. 1. Does the pdfimport extension not work for you? (http://extensions.services.openoffice.org/project/pdfimport)

    2. There is latex2rtf in the repositories that will give decent rtf output. Haven't tried it for anything but simple documents, so don't know how it will fare for you.

    ReplyDelete
  5. You might give the Sun PDF import extension a try in OpenOffice:
    http://extensions.services.openoffice.org/en/project/pdfimport

    I haven't taken a close look at how it handles tables yet, but it may work for you.

    ReplyDelete
  6. Nevermind on that last comment, I read a bit further on the Sun PDF import:

    Not supported:
    * Processing layout of LaTeX PDF
    * Conversion of tables

    Argghh, maybe they will get around to enhancing this plugin to support table conversion in the near future. I could really use this feature for converting wholesale pricelists to retail pricelists at my job, although I would want to go from PDF to Calc/Excel.

    ReplyDelete
  7. http://wiki.services.openoffice.org/wiki/Pdf_Import_Extension

    ReplyDelete
  8. If you have the original LaTEX, you may have good luck going from LaTEX to HTML or RTF and opening the file in a word processor.

    ReplyDelete
  9. doesn't lyx go from latex to odt?

    ReplyDelete
  10. The ironic thing is that a PDF of just the specific content from my larger document would have done just fine. No conversation required. It would have taken me five minutes. Oy.

    ReplyDelete

Please make comments.