tesseract in gscan2pdf

manouchk · 2009-06-10 15:48:58

Hi,

I'm trying to use gscan2pdf with tesseract to do OCR (optical) character recocgnition in portuguese. Unfortunatly in gscan2pdf GUI interface I just have the english option for tesseract which has portuguese installed as you can see:

$ ls /usr/share/tessdata/por.* -lh-rw-r--r-- 1 root root  970 mai   30  2008 /usr/share/tessdata/por.DangAmbigs
-rw-r--r-- 1 root root 3,0K mai   30  2008 /usr/share/tessdata/por.freq-dawg
-rw-r--r-- 1 root root 1,4M mai   30  2008 /usr/share/tessdata/por.inttemp
-rw-r--r-- 1 root root  58K mai   30  2008 /usr/share/tessdata/por.normproto
-rw-r--r-- 1 root root 1,1K mai   30  2008 /usr/share/tessdata/por.pffmtable
-rw-r--r-- 1 root root  843 mai   30  2008 /usr/share/tessdata/por.unicharset
-rw-r--r-- 1 root root    9 mai   30  2008 /usr/share/tessdata/por.user-words
-rw-r--r-- 1 root root 2,0M mai   30  2008 /usr/share/tessdata/por.word-dawg

One question and one suggestion:

1) How is gscan2pdf knowing which languages are avalaible?

2) That would be nice to be able to save in "dual mode" tiff and OCR created text file.

Last edited by manouchk (2009-06-10 16:02:54)

Arch Linux

#1 2009-06-10 15:48:58

tesseract in gscan2pdf

Board footer