You are not logged in.

#1 2009-06-10 15:48:58

manouchk
Member
Registered: 2008-07-29
Posts: 306

tesseract in gscan2pdf

Hi,

I'm trying to use gscan2pdf with tesseract to do OCR (optical) character recocgnition in portuguese. Unfortunatly in gscan2pdf GUI interface I just have the english option for tesseract which has portuguese installed as you can see:

$ ls /usr/share/tessdata/por.* -lh-rw-r--r-- 1 root root  970 mai   30  2008 /usr/share/tessdata/por.DangAmbigs
-rw-r--r-- 1 root root 3,0K mai   30  2008 /usr/share/tessdata/por.freq-dawg
-rw-r--r-- 1 root root 1,4M mai   30  2008 /usr/share/tessdata/por.inttemp
-rw-r--r-- 1 root root  58K mai   30  2008 /usr/share/tessdata/por.normproto
-rw-r--r-- 1 root root 1,1K mai   30  2008 /usr/share/tessdata/por.pffmtable
-rw-r--r-- 1 root root  843 mai   30  2008 /usr/share/tessdata/por.unicharset
-rw-r--r-- 1 root root    9 mai   30  2008 /usr/share/tessdata/por.user-words
-rw-r--r-- 1 root root 2,0M mai   30  2008 /usr/share/tessdata/por.word-dawg

One question and one suggestion:

1) How is gscan2pdf knowing which languages are avalaible?

2) That would be nice to be able to save in "dual mode" tiff and OCR created text  file.

Last edited by manouchk (2009-06-10 16:02:54)

Offline

Board footer

Powered by FluxBB