You are not logged in.
Hello,
I'm looking for an elegant and simple way to convert a HTML documentation/book that consists of several single html pages into on single PDF document.
The whole book can be found here: http://download2.galileo-press.de/openb … _linux.zip
How would you do that?
Preferably in Bash I guess but is it also possible in Python?
Regards
Offline
I would try Pandoc. It should work as long as the HTML is standard.
Pandoc is available in the [haskell] repo (haskell-pandoc). Of course, if you don't have ghc installed, you may feel that ghc is too large a dependency.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
Can just use 'print to file' and select PDF/postscript as output (which will output your HTML doc as a pdf.). I do it all the time for maps etc. Don't see why it wouldn't work for any HTML docs really. I doubt it will have page navigation but you could just use your PDF reader for that.
I like pie. Especially with a side of Arch.
Offline
If you do that, then the following script can be used to join the different PDF documents into a single PDF document:
#!/bin/bash
OUT="$1"
shift
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile="$OUT" "$@"
The first argument is the output file name, followed by the input files.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
princeXML is a free (but not open source) tool in the AUR that may be worth a look.
"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman
Offline
I've been using wkhtmltopdf-static daily for a couple of years now to convert several html pages of orders into a single pdf. I prefer the statically linked version because it gives you some additional capabilities (like adding the header/footer). It can also be used on a server without an X server running as long as the X libs are installed. The command is something like:
wkhtmltopdf -O Landscape -B 0 -L 0 -R 0 -T .3cm -s Letter --header-center "Header Text" --header-font-size 7 <input_file1> <input_file2> <input_file3> <output_file>
Scott
Offline
webkit2pdf
Webkit2pdf is a little tool designed to fetch web pages and export them to numbered PDF files (or to print them). It allows specifying paper size and output directory.
Sorry for my English - Home Page - «Violence never settles anything.» : Genghis Khan, 1162-1227
Offline
I would try Pandoc. It should work as long as the HTML is standard.
Pandoc is available in the [haskell] repo (haskell-pandoc). Of course, if you don't have ghc installed, you may feel that ghc is too large a dependency.
There is a version of pandoc in the AUR that builds then removes haskell. https://aur.archlinux.org/packages.php?ID=32490
Not used it though.
"...one cannot be angry when one looks at a penguin." - John Ruskin
"Life in general is a bit shit, and so too is the internet. And that's all there is." - scepticisle
Offline
Thank you all for your responses.
webkit2pdf looks the most promising.
I quickly created a PKGBUILD to test it since I could not find it yet.
However, the building process failed.
Here is the PKGBUILD:
# Maintainer: Robert Orzanna <orschiro@googlemail.com>
pkgname=webkit2pdf
pkgver=0.2
pkgrel=1
pkgdesc="Webkit2pdf is a little tool designed to fetch web pages and export them to numbered PDF files (or to print them)."
arch=('x86_64' 'i686')
license=('GPL2')
url="http://webkit2pdf.sourceforge.net/"
depends=('libwebkit' 'gtk2' 'poppler-glib')
source=(http://sourceforge.net/projects/webkit2pdf/files/webkit2pdf/0.2/$pkgname-$pkgver.tar.gz)
md5sums=('81f069a1d998b9d4f0edef0ba280ede1')
build() {
cd $startdir/src/
cd $pkgname-$pkgver
./configure --prefix=/usr
make || return 1
make DESTDIR=$startdir/pkg install || return 1
install -d $startdir/pkg/usr/share/$pkgname
}
And the error message:
main.o: In function `load_done':
main.c:(.text+0xb69): undefined reference to `poppler_page_render_to_pixbuf'
collect2: ld returned 1 exit status
make[2]: *** [webkit2pdf] Fehler 1
make[2]: Leaving directory `/home/user/webkit2pdf/src/webkit2pdf-0.2/src'
make[1]: *** [all-recursive] Fehler 1
make[1]: Leaving directory `/home/user/webkit2pdf/src/webkit2pdf-0.2'
make: *** [all] Fehler 2
Installed packages besides:
poppler-glib 0.18.3-2
libwebkit 1.6.3-1
gtk2 2.24.10-3
The website of webkit2pdf states that dev packages of poppler-glib have to be installed too. Are they included in the poppler-glib package?
Last edited by orschiro (2012-03-14 16:36:04)
Offline
poppler_page_render_to_pixbuf is deprecated there's a patch in the bug tracker so I added webkit2pdf to AUR (feel free to adopt).
Offline
weasyprint - Converts web documents (HTML, CSS, SVG, ...) to PDF.
https://aur.archlinux.org/packages.php?ID=57621
It just entered AUR, haven't tried it.
Offline
@karol,
Nice visual result for that one according to the example.
http://weasyprint.org/samples/CSS21-intro.pdf
However, it seems that is, same as webkit2pdf, cannot deal with links which is a real pitty when you have internal anchors in a documentation.
Offline
I've used htmldoc in the past. I can't remember if it's any good though.
Offline
Libreoffice Writer is the only program I've found to export html to pdf while preserving links.(There are settings for diffrent kinds of links.) Then just merge the pdf's together.
Offline
Sorry to revieve over 6 month old thread. However the thread is exactly what I'm looking for. Good suggestions but I haven't been able to test all of them yet.
I have documentation with imagemaps (class diagrams) and lot of links to across the documentation. Links are all relative so is there tool which generates from hundreds of html files to single PDF and even then keeps track of links. That kind of killer app I'm looking for. Any good suggestions?
Offline
If the html files are online, the converter http://websitetopdf.net/ should do the job. This converter is web-based.
Offline