[hocr] Convert hocr to pdf

Tristelune · 2015-01-21 22:12:49

Hello,

I scan a lot and I wanted a solution to get a better accuracy after the optical character recognition of a document.
After discovering the hocr format, I used it to correct the misspelled words. I convert my scanned documents in pdf, so
the problem was to convert the hocr document to pdf. I know about hocr2pdf from exactimage, but the result for me was bad.
So I discovered the python script Hocrconverter. Because the maintainer didn't want to make the changes for python3 I did it.
I also had some problems with the conversion. Sometimes, the text from hocr was not embeded in the resulting pdf. I did some
changes and I hope now it should work as expected.

I have made a package on AUR: hocrconverter-git.

Perhaps it can be useful for somebody. Feel free if you have any comments!

Arch Linux

#1 2015-01-21 22:12:49

[hocr] Convert hocr to pdf

Board footer