You are not logged in.

#1 2015-11-27 14:17:10

snack
Member
From: Italy
Registered: 2009-01-13
Posts: 861

Reduce size of a PDF document programmatically

Hi, I have a program which assembles a PDF from many PNG images. The problem is that the resulting PDF is quite big so I am looking for some way to compress it. It should be feasible since Opening it with Master PDF Editor and saving it in an optimized format will shrink it considerably. So I am looking for a programmatic way to do that in my program: I use PoDoFo for creating the PDF but as far as I've been able to understand PoDoFo has no compression utility/routine. Maybe I'm wrong and someone can give me a coding example? Or suggest a good third party library? Thanks.

Offline

#2 2015-11-27 15:27:41

respiranto
Member
Registered: 2015-05-15
Posts: 479
Website

Re: Reduce size of a PDF document programmatically

You could simply use imagemagick, see convert(1) once you have it installed.

Imagemagick will convert the PDF's to raster graphics, but that shouldn't be a problem if they are solely based on PNG images.

Offline

#3 2015-11-27 15:33:42

snack
Member
From: Italy
Registered: 2009-01-13
Posts: 861

Re: Reduce size of a PDF document programmatically

Thanks respiranto. Funny enough, what I'm actually doing is using ImageMagick to convert the single PNG files into single PDF pages and then assemble them into a multi page PDF with PoDoFo. I tried before with converting the PNGs into a multi page PDF directly with ImageMagick but gave up due to the abnormous memory consumption of ImageMagick, which becomes manageable if I convert one PNG at a time (but the I need to assemble the single PDFs, hence PoDoFo).
Now you siggest me to go back to an image format with ImageMagick... a couple of questions:
1) Which raster format can handle multiple pages?
2) Can ImageMagick assemble many PNGs into a multiple-page image so that I can skip the PDF step entirely?
Thanks.

Offline

#4 2015-11-27 16:02:07

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,449
Website

Re: Reduce size of a PDF document programmatically

Whoa - that's definitely the wrong way to do that.

For each file:

convert -compress zip infile.png eps3:infile.eps

Then put all the images in a latex document:

\documentclass{article}
\usepackage[margin=0in]{geometry}
\usepackage{graphics}

\begin{document}
\includegraphics{infile1}
\clearpage
\includegraphics{infile2}
% ... more here
\clearpage
\includegraphics{infileN}
\end{document}

Then run pdflatex.  You'll probably save at least an order of magnitude in size
(I did better than that with 4 images tested here).

You may want to play with the margins/paper-size depending on what your end goal actually is.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#5 2015-11-27 16:08:36

respiranto
Member
Registered: 2015-05-15
Posts: 479
Website

Re: Reduce size of a PDF document programmatically

snack wrote:

1) Which raster format can handle multiple pages?
2) Can ImageMagick assemble many PNGs into a multiple-page image so that I can skip the PDF step entirely?

1) Though the data is converted to a raster format during the resize operation, the output format may well be PDF.

2) A quick web search revealed: Yes.
You simply need to list several input files before the output file, i.e.:

imagemagick [options] <in0> <in1> [...] <out.pdf>

Last edited by respiranto (2015-11-27 16:09:26)

Offline

#6 2015-11-27 16:23:19

snack
Member
From: Italy
Registered: 2009-01-13
Posts: 861

Re: Reduce size of a PDF document programmatically

@Trilby: I need to do everything inside a C++ program. I think I could run commands from my program but I'd prefer to do everything programmatically using libraries. I think I can convert the PNGs into compressed EPSs with Magick++ (C++ API for ImageMagick) but then how can I assemble them into a PDF? Can I convert PNGs into compressed PDFS instead onf PNGS to assemble them into a multi page document without using latex?

@respiranto: if I do like you suggest in 2) then I run into the memory consumption problem I was speaking about. Maybe using a different format than PDF for the output can help. but then which format can I use to obtain a multi page image?

Offline

#7 2015-11-27 16:25:04

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,449
Website

Re: Reduce size of a PDF document programmatically

Well yes, if you are writing your own program, use poppler.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#8 2015-11-27 16:28:15

frostschutz
Member
Registered: 2013-11-15
Posts: 1,409

Re: Reduce size of a PDF document programmatically

compressing PDF depends a lot on the content. for best results you have to do it manually.

I once scanned a document that was a mix of text, lineart, black&white images, color images.

text and line art can be reduced to 1 bit color depth. white is white, black is black, saves a ton of data compared to random color paper-structure noise that a regular scan gives you.

images should be cleaned (denoise paper structure), and reduced to pure gray levels for black&white images; color depends on whether it's a photo or just tabular data, in the latter case you can probably reduce color depth as well.

Getting rid of random dirt, ugly borders, and such also helps reducing file size.

For automatic optimization you're left to... reducing color depth, applying automatic filters and hope for the best, reduce resolution, and use lossy compression instead of PNG. apart from imagemagick, ghostscript itself also gives some options there.

Offline

#9 2015-11-27 16:30:35

snack
Member
From: Italy
Registered: 2009-01-13
Posts: 861

Re: Reduce size of a PDF document programmatically

frostschutz wrote:

compressing PDF depends a lot on the content. for best results you have to do it manually.

I once scanned a document that was a mix of text, lineart, black&white images, color images.

text and line art can be reduced to 1 bit color depth. white is white, black is black, saves a ton of data compared to random color paper-structure noise that a regular scan gives you.

images should be cleaned (denoise paper structure), and reduced to pure gray levels for black&white images; color depends on whether it's a photo or just tabular data, in the latter case you can probably reduce color depth as well.

Getting rid of random dirt, ugly borders, and such also helps reducing file size.

For automatic optimization you're left to... reducing color depth, applying automatic filters and hope for the best, reduce resolution, and use lossy compression instead of PNG. apart from imagemagick, ghostscript itself also gives some options there.

My pages are all colored plots on white background, so mostly white pictures. I think I can safely assume a single size optimization strategy. I'll give a look to ghostscript, thanks.

Offline

#10 2015-11-27 16:31:11

respiranto
Member
Registered: 2015-05-15
Posts: 479
Website

Re: Reduce size of a PDF document programmatically

snack wrote:

@respiranto: if I do like you suggest in 2) then I run into the memory consumption problem I was speaking about. Maybe using a different format than PDF for the output can help. but then which format can I use to obtain a multi page image?

Sorry, I didn't read your post thoroughly
And no, I do not know of any alternatives.

Offline

#11 2015-11-27 16:38:15

snack
Member
From: Italy
Registered: 2009-01-13
Posts: 861

Re: Reduce size of a PDF document programmatically

I tried to follow Trilby's suggestion and did this check. I used convert to compress the pdf I obtained with the current version of my program:

convert -compress zip input.pdf input_compressed.pdf

The result is consistent with the ~ order of magnitude improvement Trilby quoted, since I obtain 3.2Mb from 21Mb. But I see that converting the full document requires again a lot of memory. I'll try to insert the compression in my current workflow, namely I'll compress every single PDF page produced by ImageMagick from PNGs before assembling them with PoDoFo, hoping that this last step will not disrupt the compression.

Offline

#12 2015-11-27 16:42:37

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,449
Website

Re: Reduce size of a PDF document programmatically

Eh ... that was not at all my recommendation.  But if it works for you, have fun.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#13 2015-11-27 16:47:33

snack
Member
From: Italy
Registered: 2009-01-13
Posts: 861

Re: Reduce size of a PDF document programmatically

Sorry Trilby, what I'm doing is:

  convert each PNG into a PDF, then assemble them into a multi page PDF usign PoDoFo

And what I'll try is:

  convert each PNG into a compressed PDF, then assemble them into a multi page PDF using PoDofo

If I understand correctly you suggest:

  convert each PNG into a compressed EPS, then assemble them into a multi page PDF using latex

The procedures look quite similar to me, so I don't catch the big difference. Maybe there is something wrong in merging PDFs which is correctly done by compiling EPSs into a PDF with latex? Thanks.

Offline

#14 2015-11-27 18:11:15

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,449
Website

Re: Reduce size of a PDF document programmatically

I was suggesting converting each raster image into a compressed eps once.  Then you could use latex, poppler, or any number of other tools to assemble them into a pdf once.

Your version converts each raster image to post script, and postscript to pdf, then from the pdf the raster source is again extracted, compressed, converted to eps, and then pdf, and then all the pdfs assembled.

The results may be similar, but your approach needlessly converts between formats *many* times and will be much slower.


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#15 2015-11-27 18:24:27

snack
Member
From: Italy
Registered: 2009-01-13
Posts: 861

Re: Reduce size of a PDF document programmatically

@Trilby: thanks, now I understand, the PNG->PDF conversion is not a single step. For now performance is not an issue for me, it's more a matter of size of the final product, but I'll keep your suggestion in mind in case I would need more speed. By the way, I think I found that the size problem is due to ImageMagick being compiled without zlib support. It's strange because it seems that zlib is not found by configure:

checking for ZLIB... no
$ convert -compress zip uncompressed.dqm.pdf compressed.dqm.pdf
convert: delegate library support not built-in `compressed.dqm.pdf' (ZLIB) @ error/pdf.c/WritePDFImage/1365

while the produced ImageMagick libraries links libz:

$ ldd /wizard/17/CALET-software/install/IMAGEMAGICK_6.9.2-0/lib/libMagick++-6.Q16HDRI.so | grep libz
        libz.so.1 => /usr/lib/libz.so.1 (0x00276000)

In another system where there's no problem with zlib I obtain much smaller PDFs with the same code. So now I'm digging into the compilation of ImageMagick which automatically compresses the PDF if compiled with zlib support, I guess Thanks everybody for the suggestions

Offline

Board footer

Powered by FluxBB