You are not logged in.
Pages: 1
Hi,
I need some advice. I've two versions of a book in pdf form, each some 570 pages long. I need to be able compare the versions and highlight the differences.
I haven't found an app with a gui and am not sure whether diff is really what I am after. Any suggestions out there?
Many thanks in advance.
Last edited by toad (2010-02-07 15:13:37)
never trust a toad...
::Grateful ArchDonor::
::Grateful Wikipedia Donor::
Offline
Offline
I was going to suggest pdftotext && diff too.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
Thanks for the tip. My impression was that diff compares line by line, i.e. it only needs one addition in line 2 of text b for all subsequent lines to be out as well. I hope that is a false assumption on my part...
never trust a toad...
::Grateful ArchDonor::
::Grateful Wikipedia Donor::
Offline
I hope that is a false assumption on my part...
It is.
It would be useless for creating patches if that were the case. Just look at the diff man page to learn how to configure the output.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
Right, finally got the second pdf file. It turns out that it the pagination is different, that it contains a headline for each and every page (which wasn't in the original), a time stamp, page numbers, etc.
I found kdiff3 but it seems unable to get to a stage where it is capable of finding two paras which start with the same words. Instead it hangs up on each of the above.
Sooooo, I suppose I've got to configure pdftotext to ignore this stuff? Or use awk or sed to go through the resulting txt file and get rid of exactly those phrases that bother me?
Hm, totally lost at the mo. Got to do some more research...
never trust a toad...
::Grateful ArchDonor::
::Grateful Wikipedia Donor::
Offline
I installed dwdiff - a nifty piece of software that concentrates on words rather than bites or what have you. Just what I needed for two versions of a book.
In short:
packer -S dwdiff
dwdiff file1 file2 > file_showing_differences
Then open file_showing_differences with Ooo writer. Bits taken out of file1 are marked with [-...] and things added in file2 are marked with {+...}
Just da ting fo me
never trust a toad...
::Grateful ArchDonor::
::Grateful Wikipedia Donor::
Offline
Pages: 1