You are not logged in.
Pages: 1
Hello,
I scan a lot and I wanted a solution to get a better accuracy after the optical character recognition of a document.
After discovering the hocr format, I used it to correct the misspelled words. I convert my scanned documents in pdf, so
the problem was to convert the hocr document to pdf. I know about hocr2pdf from exactimage, but the result for me was bad.
So I discovered the python script Hocrconverter. Because the maintainer didn't want to make the changes for python3 I did it.
I also had some problems with the conversion. Sometimes, the text from hocr was not embeded in the resulting pdf. I did some
changes and I hope now it should work as expected.
I have made a package on AUR: hocrconverter-git.
Perhaps it can be useful for somebody. Feel free if you have any comments!
Offline
Pages: 1