You are not logged in.

#1 2010-01-23 15:49:47

toad
Member
From: if only I knew
Registered: 2008-12-22
Posts: 1,775
Website

[SOLVED] - compare two pdfs

Hi,

I need some advice. I've two versions of a book in pdf form, each some 570 pages long. I need to be able compare the versions and highlight the differences.

I haven't found an app with a gui and am not sure whether diff is really what I am after. Any suggestions out there?

Many thanks in advance.

Last edited by toad (2010-02-07 15:13:37)


never trust a toad...
::Grateful ArchDonor::
::Grateful Wikipedia Donor::

Offline

#2 2010-01-23 16:23:19

n0dix
Member
Registered: 2009-09-22
Posts: 956

Re: [SOLVED] - compare two pdfs

Offline

#3 2010-01-23 16:26:13

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: [SOLVED] - compare two pdfs

I was going to suggest pdftotext && diff too.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#4 2010-01-23 17:10:31

toad
Member
From: if only I knew
Registered: 2008-12-22
Posts: 1,775
Website

Re: [SOLVED] - compare two pdfs

Thanks for the tip. My impression was that diff compares line by line, i.e. it only needs one addition in line 2 of text b for all subsequent lines to be out as well. I hope that is a false assumption on my part...


never trust a toad...
::Grateful ArchDonor::
::Grateful Wikipedia Donor::

Offline

#5 2010-01-23 17:12:46

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: [SOLVED] - compare two pdfs

toad wrote:

I hope that is a false assumption on my part...

It is. smile

It would be useless for creating patches if that were the case. Just look at the diff man page to learn how to configure the output.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#6 2010-02-07 10:55:20

toad
Member
From: if only I knew
Registered: 2008-12-22
Posts: 1,775
Website

Re: [SOLVED] - compare two pdfs

Right, finally got the second pdf file. It turns out that it the pagination is different, that it contains a headline for each and every page (which wasn't in the original), a time stamp, page numbers, etc.

I found kdiff3 but it seems unable to get to a stage where it is capable of finding two paras which start with the same words. Instead it hangs up on each of the above.

Sooooo, I suppose I've got to configure pdftotext to ignore this stuff? Or use awk or sed to go through the resulting txt file and get rid of exactly those phrases that bother me?

Hm, totally lost at the mo. Got to do some more research...


never trust a toad...
::Grateful ArchDonor::
::Grateful Wikipedia Donor::

Offline

#7 2010-02-07 15:17:54

toad
Member
From: if only I knew
Registered: 2008-12-22
Posts: 1,775
Website

Re: [SOLVED] - compare two pdfs

I installed dwdiff - a nifty piece of software that concentrates on words rather than bites or what have you. Just what I needed for two versions of a book.

In short:

packer -S dwdiff
dwdiff file1 file2 > file_showing_differences
Then open file_showing_differences with Ooo writer. Bits taken out of file1 are marked with [-...] and things added in file2 are marked with {+...}
Just da ting fo me smile


never trust a toad...
::Grateful ArchDonor::
::Grateful Wikipedia Donor::

Offline

Board footer

Powered by FluxBB