You are not logged in.
Hi,
I'm trying to use gscan2pdf with tesseract to do OCR (optical) character recocgnition in portuguese. Unfortunatly in gscan2pdf GUI interface I just have the english option for tesseract which has portuguese installed as you can see:
$ ls /usr/share/tessdata/por.* -lh-rw-r--r-- 1 root root 970 mai 30 2008 /usr/share/tessdata/por.DangAmbigs
-rw-r--r-- 1 root root 3,0K mai 30 2008 /usr/share/tessdata/por.freq-dawg
-rw-r--r-- 1 root root 1,4M mai 30 2008 /usr/share/tessdata/por.inttemp
-rw-r--r-- 1 root root 58K mai 30 2008 /usr/share/tessdata/por.normproto
-rw-r--r-- 1 root root 1,1K mai 30 2008 /usr/share/tessdata/por.pffmtable
-rw-r--r-- 1 root root 843 mai 30 2008 /usr/share/tessdata/por.unicharset
-rw-r--r-- 1 root root 9 mai 30 2008 /usr/share/tessdata/por.user-words
-rw-r--r-- 1 root root 2,0M mai 30 2008 /usr/share/tessdata/por.word-dawg
One question and one suggestion:
1) How is gscan2pdf knowing which languages are avalaible?
2) That would be nice to be able to save in "dual mode" tiff and OCR created text file.
Last edited by manouchk (2009-06-10 16:02:54)
Offline