You are not logged in.
[0916 Update]
Added 2 more demo pages:
http://coolwanglu.github.com/pdf2htmlEX/demo/cheat.html
http://coolwanglu.github.com/pdf2htmlEX … eneve.html
* Completed removed Boost
* Relaxed dependency of C++11, supports GCC no earlier than 4.4.6
* Links are now supported (In-document jumping is accurate to pages)
* Fixed an encoding problem for some fonts.
Demo comes first:
http://coolwanglu.github.com/pdf2htmlEX/demo/demo.html
Another (with CJK):
http://coolwanglu.github.com/pdf2htmlEX/demo/chn.html
Home page:
https://github.com/coolwanglu/pdf2htmlEX
Special thanks to Arthur Titeica for the AUR package.
https://aur.archlinux.org/packages.php?ID=62426
There are bascially 2 types of pdf-to-html converters:
One is roughly a pdf-to-text converter with a few pre-defined formats in HTML.
The other is render-everything-as-images converter, which loses all text and generated huge files.
But pdf2htmlEX takes advatanges of both, retaining both Text and Styling.
Features:
1.Extract and embed fonts from PDF
2.Optimizing for web while making sure render is precise
3.Non-text objects are rendered as images.
4.Single-file output mode -- I know you hate spearated font/image files
To compile & install
grab a recent poppler (>=0.20.3), make sure '--enable-xpdf-headers' is used for configure
grab the latest git version of fontforge https://github.com/fontforge/fontforge, because I submitted a few features/bugs for pdf2htmlEX
the boost c++ library. (See detailed depended components in the project home page)
cmake
GCC that supports c++11
Any suggestion, fork/star-at-gihub, bug-report is appreciated.
Last edited by coolwanglu (2012-09-16 14:41:29)
Offline
I must admit, this is pretty impressive to me, could be a good starting point to get saner pdf->epub.
I know PDF is a pita to handle and parse, but here are some feature wishes:
a) automaticly create working links for any valid URL and mail addresses
b) trying to find table of contents and link it
c) link objects/images to open in a new window/tab, so I can look at them and read the surrounding text more easily.
Will try it over the weekend and give feedback, thanks for far, much appreciated.
Offline
coolwanglu,
Welcome to the Arch Forums. Very nice application you have put together there.
Generally, we reserve these forums for Arch support. That you are using Ubuntu does not violate our rules as you are not asking for support and are not about to create confusion amongst Arch users.
We do, however, strongly encourage users to use our build system so that Pacman (our package manager) can keep abreast of the system files that are installed. I see you asked if anyone would like to help package this for Arch. We have a subforum for just that purpose; I am wondering if this thread should perhaps be moved to that subforum.
a) What are your thoughts on the move.
b) Why not make the move to Arch?
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
Hi.
I've made an initial AUR package.
While it's functional as it is with the stable fontforge from extra the html rendering is a bit odd in my tests. I plan later to check on fontforge-git.
Thanks for the cool program. Just the other day I was looking for something like that and couldn't find anything functional.
Edit: with fontforge-git everything looks beautiful
Last edited by roentgen (2012-09-01 12:09:09)
Offline
I must admit, this is pretty impressive to me, could be a good starting point to get saner pdf->epub.
I know PDF is a pita to handle and parse, but here are some feature wishes:
a) automaticly create working links for any valid URL and mail addresses
b) trying to find table of contents and link it
c) link objects/images to open in a new window/tab, so I can look at them and read the surrounding text more easily.Will try it over the weekend and give feedback, thanks for far, much appreciated.
Thanks for your attention.
I've been working so far to handle all kinds of text/fonts stuff. And in future versions I'm planning to support other objects (images/link/drawing etc) "natively" in HTML. So (a) (b) is in the plan.
But not sure about (c), what do you expect to see in a new tab after clicking an image?
Offline
coolwanglu,
Welcome to the Arch Forums. Very nice application you have put together there.Generally, we reserve these forums for Arch support. That you are using Ubuntu does not violate our rules as you are not asking for support and are not about to create confusion amongst Arch users.
We do, however, strongly encourage users to use our build system so that Pacman (our package manager) can keep abreast of the system files that are installed. I see you asked if anyone would like to help package this for Arch. We have a subforum for just that purpose; I am wondering if this thread should perhaps be moved to that subforum.
a) What are your thoughts on the move.
b) Why not make the move to Arch?
Hello,
sorry if I have posted in the wrong subforum,
I just saw the description "A place for true innovation. Share your own created utilities with the Arch community." and came in.
I didn't intend to advertise that ubuntu ppa, I just put a general description and wanted to broadcast this tool.
As there's already one user kindly made a package, I'll remove the line of PPA and add a link to this instead.
Would this be OK?
Offline
Hi.
I've made an initial AUR package.While it's functional as it is with the stable fontforge from extra the html rendering is a bit odd in my tests. I plan later to check on fontforge-git.
Thanks for the cool program. Just the other day I was looking for something like that and couldn't find anything functional.
Thank you very much!
I'll put in into the git repo.
I've submitted a few features/bugs for fontforge recently for pdf2htmlEX. So the scripts may not be valid for earlier versions of fontforge.
I think the 'odd' you saw was incorrect fonts, as there were no fonts generated actually.
Please do check with the lastest version and see if they'll work.
Offline
Hi.
I've made an initial AUR package.While it's functional as it is with the stable fontforge from extra the html rendering is a bit odd in my tests. I plan later to check on fontforge-git.
Thanks for the cool program. Just the other day I was looking for something like that and couldn't find anything functional.
Edit: with fontforge-git everything looks beautiful
Glad to hear that
Offline
Hi.
I've made an initial AUR package.While it's functional as it is with the stable fontforge from extra the html rendering is a bit odd in my tests. I plan later to check on fontforge-git.
Thanks for the cool program. Just the other day I was looking for something like that and couldn't find anything functional.
Edit: with fontforge-git everything looks beautiful
I'm not sure about linking your AUG.
Shall I put the file into the git repo, or link to the file in aur.archlinux.org? Which one is better?
Offline
I'm not sure about linking your AUG.
Shall I put the file into the git repo, or link to the file in aur.archlinux.org? Which one is better?
Just linking to the page is fine for archlinux users. Thanks.
Offline
coolwanglu wrote:I'm not sure about linking your AUG.
Shall I put the file into the git repo, or link to the file in aur.archlinux.org? Which one is better?
Just linking to the page is fine for archlinux users. Thanks.
Done.
Offline
Hi.
I've made an initial AUR package.While it's functional as it is with the stable fontforge from extra the html rendering is a bit odd in my tests. I plan later to check on fontforge-git.
Thanks for the cool program. Just the other day I was looking for something like that and couldn't find anything functional.
Edit: with fontforge-git everything looks beautiful
I've updated several stuff in the devv branch.
The good news is that now fontforge-git is not depeneded, a recent version should be enough. And fontforge is linked directly instead of call with scripts, so it should be faster.
The bad news is that fontforge.so is not supported officially, so I'm doing something heuristic in CMakeList.txt.
Can you please check if it works in Arch Linux?
Offline
The good news is that now fontforge-git is not depeneded, a recent version should be enough. And fontforge is linked directly instead of call with scripts, so it should be faster.
The bad news is that fontforge.so is not supported officially, so I'm doing something heuristic in CMakeList.txt.
Can you please check if it works in Arch Linux?
As far as I've tested (with fontforge 20120731_b-1) things go pretty bad and ugly. The page is mis-aligned, certain characters are missing (diacritics).
This is the output when converting a pdf.
Working: Warning: encoding confliction detected in font: f1
Warning: fontforge failed.
Warning: cannot read font info for f1
Warning: encoding confliction detected in font: f2
Warning: fontforge failed.
Warning: cannot read font info for f2
Warning: encoding confliction detected in font: f3
Warning: fontforge failed.
Warning: cannot read font info for f3
Warning: fontforge failed.
Warning: cannot read font info for f4
Warning: encoding confliction detected in font: f5
Warning: fontforge failed.
Warning: cannot read font info for f5
Warning: fontforge failed.
Warning: cannot read font info for f6
Warning: encoding confliction detected in font: f7
Warning: fontforge failed.
Warning: cannot read font info for f7
Warning: fontforge failed.
Warning: cannot read font info for f8
Warning: fontforge failed.
Warning: cannot read font info for f9
..Warning: encoding confliction detected in font: fa
Warning: fontforge failed.
Warning: cannot read font info for fa
....Warning: encoding confliction detected in font: fb
Warning: fontforge failed.
Warning: cannot read font info for fb
................................Warning: fontforge failed.
Warning: cannot read font info for fc
.....Warning: fontforge failed.
Warning: cannot read font info for fd
......
Any ideas?
Offline
Any ideas?
Are you using the lastest devv branch ?
There should never been a 'fontforge failed' message. It should have been removed.
Offline
Indeed. I updated the AUR package to use the devv branch and now everything is fine with fontforge from [extra].
Thanks.
Offline
Two more beautiful demo pages have been included.
Please check out the github page.
Offline
Indeed. I updated the AUR package to use the devv branch and now everything is fine with fontforge from [extra].
Thanks.
I've removed the dependency of boost, and pushed everything into the master branch.
Could you please change the AUR accordingly,and probably check if it stil works on Arch ?
Thanks!!
Offline
I've removed the dependency of boost, and pushed everything into the master branch.
Could you please change the AUR accordingly,and probably check if it stil works on Arch ?
Thanks for the updates. Everything looks fine. I updated the AUR package.
Offline