Edit: crossreference to the Community Contributions thread
]]>You could mark this thread as [Solved] and point to a new thread in Community Contributions with the details of your app.
]]>I'd love it if some volunteers would try this out, beat it up, and see what breaks first so I can improve it. After a little polishing I might put it up in the AUR.
Get the code from my dropbox here
The Ugly Patchwork Makefile and a very brief TODO list are also posted.
I'll put together a PKGBUILD once this is in better order for distribution and installation. I just got the darn thing to work, it's time to celebrate, not code more.
Note to Mods: as in my "report", please move this thread to Community Contributions.
]]>However, looking through the docs gives me hope that there would be a C API out there somewhere. Now to find it.
EDIT: I think I'm on to something, but if anyone more familiar comes along let me know if I'm wasting my time reading the poppler docs.
]]>I first struggled with finding a viewer that views the annotations, and those that could (evince, okular) all have more dependencies that I'd want/need.
I decided there must be an lighter weight way to extract the text of annotations from pdfs. I searched, read, and learned about pdf annotation formats and I figured out how to extract adobe annotations from a pdf. Only then did I realize that my collegue didn't use adobe. The annotations were created in Mac OSX's preview. Preview, it would seem, does not use the adobe xfpdf format for annotations, it uses some other means to embed the annotations.
I searched the pdf file in text and hex editors, but I couldn't find anything resembling what I knew to be in some of the annotations. However Preview does it, the text must be encoded and unreadable in the "raw" file. This is in contrast, it would seem, with adobe's annotations.
My question, then, is two-fold. First, are there any text-annotation extractor tools that can get notes created in OSX's preview? If not, does anyone know of any documentation outlining how Preview embeds this information? I've been googling the latter question, but I'll I'm getting is "where-to-click" level tutorials on how to DO annotations in Preview. I can't find any documentation of how that information is embeded into the file.
Note: evince does get the text of the annotations, but I'd prefer not to keep that installed. Evince-gtk is only every so slightly lighter on dependencies. Also, in evince I get flooded with "sticky notes" with all the annotation text. It would take a while to move those around and close them one at a time to be able to actually read them. I'm hoping to be able to extract all annotation text and dump it into a text file.
]]>