Convert HTML documentation into one single PDF

orschiro · 2012-03-13 15:48:37

Hello,

I'm looking for an elegant and simple way to convert a HTML documentation/book that consists of several single html pages into on single PDF document.

The whole book can be found here: http://download2.galileo-press.de/openb … _linux.zip

How would you do that?

Preferably in Bash I guess but is it also possible in Python?

Regards

Xyne · 2012-03-13 17:54:05

I would try Pandoc. It should work as long as the HTML is standard.
Pandoc is available in the [haskell] repo (haskell-pandoc). Of course, if you don't have ghc installed, you may feel that ghc is too large a dependency.

xs · 2012-03-13 18:08:01

Can just use 'print to file' and select PDF/postscript as output (which will output your HTML doc as a pdf.). I do it all the time for maps etc. Don't see why it wouldn't work for any HTML docs really. I doubt it will have page navigation but you could just use your PDF reader for that.

Xyne · 2012-03-13 18:21:59

If you do that, then the following script can be used to join the different PDF documents into a single PDF document:

#!/bin/bash

OUT="$1"
shift
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile="$OUT" "$@"

The first argument is the output file name, followed by the input files.

Trilby · 2012-03-13 18:29:56

princeXML is a free (but not open source) tool in the AUR that may be worth a look.

firecat53 · 2012-03-13 18:34:49

I've been using wkhtmltopdf-static daily for a couple of years now to convert several html pages of orders into a single pdf. I prefer the statically linked version because it gives you some additional capabilities (like adding the header/footer). It can also be used on a server without an X server running as long as the X libs are installed. The command is something like:

wkhtmltopdf -O Landscape -B 0 -L 0 -R 0 -T .3cm -s Letter --header-center "Header Text" --header-font-size 7 <input_file1> <input_file2> <input_file3> <output_file>

Scott

Stemp · 2012-03-13 19:10:43

webkit2pdf
Webkit2pdf is a little tool designed to fetch web pages and export them to numbered PDF files (or to print them). It allows specifying paper size and output directory.

skanky · 2012-03-14 09:35:32

Xyne wrote:

I would try Pandoc. It should work as long as the HTML is standard.
Pandoc is available in the [haskell] repo (haskell-pandoc). Of course, if you don't have ghc installed, you may feel that ghc is too large a dependency.

There is a version of pandoc in the AUR that builds then removes haskell. https://aur.archlinux.org/packages.php?ID=32490
Not used it though.

orschiro · 2012-03-14 11:49:51

Thank you all for your responses.

webkit2pdf looks the most promising.

I quickly created a PKGBUILD to test it since I could not find it yet.

However, the building process failed.

Here is the PKGBUILD:

# Maintainer: Robert Orzanna <orschiro@googlemail.com>
pkgname=webkit2pdf
pkgver=0.2
pkgrel=1
pkgdesc="Webkit2pdf is a little tool designed to fetch web pages and export them to numbered PDF files (or to print them)."
arch=('x86_64' 'i686')
license=('GPL2')
url="http://webkit2pdf.sourceforge.net/"
depends=('libwebkit' 'gtk2' 'poppler-glib') 
source=(http://sourceforge.net/projects/webkit2pdf/files/webkit2pdf/0.2/$pkgname-$pkgver.tar.gz)
md5sums=('81f069a1d998b9d4f0edef0ba280ede1')

build() {
  cd $startdir/src/
  cd $pkgname-$pkgver
  ./configure --prefix=/usr
  make || return 1
  make DESTDIR=$startdir/pkg install || return 1
  install -d $startdir/pkg/usr/share/$pkgname
}

And the error message:

main.o: In function `load_done':
main.c:(.text+0xb69): undefined reference to `poppler_page_render_to_pixbuf'
collect2: ld returned 1 exit status
make[2]: *** [webkit2pdf] Fehler 1
make[2]: Leaving directory `/home/user/webkit2pdf/src/webkit2pdf-0.2/src'
make[1]: *** [all-recursive] Fehler 1
make[1]: Leaving directory `/home/user/webkit2pdf/src/webkit2pdf-0.2'
make: *** [all] Fehler 2

Installed packages besides:

poppler-glib 0.18.3-2
libwebkit 1.6.3-1
gtk2 2.24.10-3

The website of webkit2pdf states that dev packages of poppler-glib have to be installed too. Are they included in the poppler-glib package?

Last edited by orschiro (2012-03-14 16:36:04)

N30N · 2012-03-14 18:32:15

poppler_page_render_to_pixbuf is deprecated there's a patch in the bug tracker so I added webkit2pdf to AUR (feel free to adopt).

karol · 2012-03-15 14:27:46

weasyprint - Converts web documents (HTML, CSS, SVG, ...) to PDF.
https://aur.archlinux.org/packages.php?ID=57621

It just entered AUR, haven't tried it.

orschiro · 2012-03-15 18:07:21

@karol,

Nice visual result for that one according to the example.

http://weasyprint.org/samples/CSS21-intro.pdf

However, it seems that is, same as webkit2pdf, cannot deal with links which is a real pitty when you have internal anchors in a documentation.

skottish · 2012-03-17 02:52:44

I've used htmldoc in the past. I can't remember if it's any good though.

defears · 2012-03-17 11:42:05

Libreoffice Writer is the only program I've found to export html to pdf while preserving links.(There are settings for diffrent kinds of links.) Then just merge the pdf's together.

http://www.archlinux.org/packages/extra … pdfimport/

vuokkosetae · 2012-09-11 13:41:47

Sorry to revieve over 6 month old thread. However the thread is exactly what I'm looking for. Good suggestions but I haven't been able to test all of them yet.

I have documentation with imagemaps (class diagrams) and lot of links to across the documentation. Links are all relative so is there tool which generates from hundreds of html files to single PDF and even then keeps track of links. That kind of killer app I'm looking for. Any good suggestions?

nikigetchev · 2015-03-23 19:28:01

If the html files are online, the converter http://websitetopdf.net/ should do the job. This converter is web-based.

Arch Linux

#1 2012-03-13 15:48:37

Convert HTML documentation into one single PDF

#2 2012-03-13 17:54:05

Re: Convert HTML documentation into one single PDF

#3 2012-03-13 18:08:01

Re: Convert HTML documentation into one single PDF

#4 2012-03-13 18:21:59

Re: Convert HTML documentation into one single PDF

#5 2012-03-13 18:29:56

Re: Convert HTML documentation into one single PDF

#6 2012-03-13 18:34:49

Re: Convert HTML documentation into one single PDF

#7 2012-03-13 19:10:43

Re: Convert HTML documentation into one single PDF

#8 2012-03-14 09:35:32

Re: Convert HTML documentation into one single PDF

#9 2012-03-14 11:49:51

Re: Convert HTML documentation into one single PDF

#10 2012-03-14 18:32:15

Re: Convert HTML documentation into one single PDF

#11 2012-03-15 14:27:46

Re: Convert HTML documentation into one single PDF

#12 2012-03-15 18:07:21

Re: Convert HTML documentation into one single PDF

#13 2012-03-17 02:52:44

Re: Convert HTML documentation into one single PDF

#14 2012-03-17 11:42:05

Re: Convert HTML documentation into one single PDF

#15 2012-09-11 13:41:47

Re: Convert HTML documentation into one single PDF

#16 2015-03-23 19:28:01

Re: Convert HTML documentation into one single PDF

Board footer