You are not logged in.

#1 2012-03-13 15:48:37

orschiro
Member
Registered: 2009-06-04
Posts: 2,136
Website

Convert HTML documentation into one single PDF

Hello,

I'm looking for an elegant and simple way to convert a HTML documentation/book that consists of several single html pages into on single PDF document.

The whole book can be found here: http://download2.galileo-press.de/openb … _linux.zip

How would you do that?

Preferably in Bash I guess but is it also possible in Python?

Regards

Offline

#2 2012-03-13 17:54:05

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: Convert HTML documentation into one single PDF

I would try Pandoc. It should work as long as the HTML is standard.
Pandoc is available in the [haskell] repo (haskell-pandoc). Of course, if you don't have ghc installed, you may feel that ghc is too large a dependency.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#3 2012-03-13 18:08:01

xs
Member
From: San Jose, CA.
Registered: 2011-04-06
Posts: 92

Re: Convert HTML documentation into one single PDF

Can just use 'print to file' and select PDF/postscript as output (which will output your HTML doc as a pdf.). I do it all the time for maps etc. Don't see why it wouldn't work for any HTML docs really. I doubt it will have page navigation but you could just use your PDF reader for that.


I like pie. Especially with a side of Arch.

Offline

#4 2012-03-13 18:21:59

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,965
Website

Re: Convert HTML documentation into one single PDF

If you do that, then the following script can be used to join the different PDF documents into a single PDF document:

#!/bin/bash

OUT="$1"
shift
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile="$OUT" "$@"

The first argument is the output file name, followed by the input files.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#5 2012-03-13 18:29:56

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,330
Website

Re: Convert HTML documentation into one single PDF

princeXML is a free (but not open source) tool in the AUR that may be worth a look.


"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#6 2012-03-13 18:34:49

firecat53
Member
From: Lake Stevens, WA, USA
Registered: 2007-05-14
Posts: 1,542
Website

Re: Convert HTML documentation into one single PDF

I've been using wkhtmltopdf-static daily for a couple of years now to convert several html pages of orders into a single pdf. I prefer the statically linked version because it gives you some additional capabilities (like adding the header/footer). It can also be used on a server without an X server running as long as the X libs are installed. The command is something like:

wkhtmltopdf -O Landscape -B 0 -L 0 -R 0 -T .3cm -s Letter --header-center "Header Text" --header-font-size 7 <input_file1> <input_file2> <input_file3> <output_file>

Scott

Offline

#7 2012-03-13 19:10:43

Stemp
Member
From: Paris, Europe
Registered: 2011-04-26
Posts: 61
Website

Re: Convert HTML documentation into one single PDF

webkit2pdf
Webkit2pdf is a little tool designed to fetch web pages and export them to numbered PDF files (or to print them). It allows specifying paper size and output directory.


Sorry for my English - Home Page - «Violence never settles anything.» : Genghis Khan, 1162-1227

Offline

#8 2012-03-14 09:35:32

skanky
Member
From: WAIS
Registered: 2009-10-23
Posts: 1,847

Re: Convert HTML documentation into one single PDF

Xyne wrote:

I would try Pandoc. It should work as long as the HTML is standard.
Pandoc is available in the [haskell] repo (haskell-pandoc). Of course, if you don't have ghc installed, you may feel that ghc is too large a dependency.

There is a version of pandoc in the AUR that builds then removes haskell. https://aur.archlinux.org/packages.php?ID=32490
Not used it though.


"...one cannot be angry when one looks at a penguin."  - John Ruskin
"Life in general is a bit shit, and so too is the internet. And that's all there is." - scepticisle

Offline

#9 2012-03-14 11:49:51

orschiro
Member
Registered: 2009-06-04
Posts: 2,136
Website

Re: Convert HTML documentation into one single PDF

Thank you all for your responses.

webkit2pdf looks the most promising.

I quickly created a PKGBUILD to test it since I could not find it yet.

However, the building process failed.

Here is the PKGBUILD:

# Maintainer: Robert Orzanna <orschiro@googlemail.com>
pkgname=webkit2pdf
pkgver=0.2
pkgrel=1
pkgdesc="Webkit2pdf is a little tool designed to fetch web pages and export them to numbered PDF files (or to print them)."
arch=('x86_64' 'i686')
license=('GPL2')
url="http://webkit2pdf.sourceforge.net/"
depends=('libwebkit' 'gtk2' 'poppler-glib') 
source=(http://sourceforge.net/projects/webkit2pdf/files/webkit2pdf/0.2/$pkgname-$pkgver.tar.gz)
md5sums=('81f069a1d998b9d4f0edef0ba280ede1')

build() {
  cd $startdir/src/
  cd $pkgname-$pkgver
  ./configure --prefix=/usr
  make || return 1
  make DESTDIR=$startdir/pkg install || return 1
  install -d $startdir/pkg/usr/share/$pkgname
}

And the error message:

main.o: In function `load_done':
main.c:(.text+0xb69): undefined reference to `poppler_page_render_to_pixbuf'
collect2: ld returned 1 exit status
make[2]: *** [webkit2pdf] Fehler 1
make[2]: Leaving directory `/home/user/webkit2pdf/src/webkit2pdf-0.2/src'
make[1]: *** [all-recursive] Fehler 1
make[1]: Leaving directory `/home/user/webkit2pdf/src/webkit2pdf-0.2'
make: *** [all] Fehler 2

Installed packages besides:

poppler-glib 0.18.3-2
libwebkit 1.6.3-1
gtk2 2.24.10-3

The website of webkit2pdf states that dev packages of poppler-glib have to be installed too. Are they included in the poppler-glib package?

Last edited by orschiro (2012-03-14 16:36:04)

Offline

#10 2012-03-14 18:32:15

N30N
Member
Registered: 2007-04-08
Posts: 273

Re: Convert HTML documentation into one single PDF

poppler_page_render_to_pixbuf is deprecated there's a patch in the bug tracker so I added webkit2pdf to AUR (feel free to adopt).

Offline

#11 2012-03-15 14:27:46

karol
Archivist
Registered: 2009-05-06
Posts: 25,440

Re: Convert HTML documentation into one single PDF

weasyprint - Converts web documents (HTML, CSS, SVG, ...) to PDF.
https://aur.archlinux.org/packages.php?ID=57621

It just entered AUR, haven't tried it.

Offline

#12 2012-03-15 18:07:21

orschiro
Member
Registered: 2009-06-04
Posts: 2,136
Website

Re: Convert HTML documentation into one single PDF

@karol,

Nice visual result for that one according to the example.

http://weasyprint.org/samples/CSS21-intro.pdf

However, it seems that is, same as webkit2pdf, cannot deal with links which is a real pitty when you have internal anchors in a documentation.

Offline

#13 2012-03-17 02:52:44

skottish
Forum Fellow
From: Here
Registered: 2006-06-16
Posts: 7,942

Re: Convert HTML documentation into one single PDF

I've used htmldoc in the past. I can't remember if it's any good though.

Offline

#14 2012-03-17 11:42:05

defears
Member
Registered: 2010-07-26
Posts: 218

Re: Convert HTML documentation into one single PDF

Libreoffice Writer is the only program I've found to export html to pdf while preserving links.(There are settings for diffrent kinds of links.)  Then just merge the pdf's together.

http://www.archlinux.org/packages/extra … pdfimport/

Offline

#15 2012-09-11 13:41:47

vuokkosetae
Member
Registered: 2009-03-12
Posts: 21

Re: Convert HTML documentation into one single PDF

Sorry to revieve over 6 month old thread. However the thread is exactly what I'm looking for. Good suggestions but I haven't been able to test all of them yet.

I have documentation with imagemaps (class diagrams) and lot of links to across the documentation. Links are all relative so is there tool which generates from hundreds of html files to single PDF and even then keeps track of links. That kind of killer app I'm looking for. Any good suggestions?

Offline

#16 2015-03-23 19:28:01

nikigetchev
Member
Registered: 2015-03-23
Posts: 1

Re: Convert HTML documentation into one single PDF

If the html files are online, the converter http://websitetopdf.net/ should do the job. This converter is web-based.

Offline

Board footer

Powered by FluxBB