You are not logged in.

#1 2019-10-05 19:45:32

yetanothernewbie
Member
Registered: 2019-07-23
Posts: 6

Vertical Japanese and Chinese data for tesseract OCR

When you install tesseract-data-jpn, tesseract-data-chi_sim, and/or tesseract-data-chi_tra from the official repositories, they don't contain OCR data for vertical Japanese and Chinese. I spent some time looking around, and eventually figured out that the tesseract devs split the horizontal and vertical data into separate files, which need to be installed as tesseract-data-jpn_vert, etc.

I compiled and installed the packages using the tesseract-data-git package on AUR (by editing the _langs variable in the PKGBUILD to include jpn_vert, chi_sim_vert, and chi_tra_vert), but I think it was an accident that the official repository doesn't include those packages. I'm not sure where I should request this, since it's not technically a bug, so I'm asking here.

Offline

#2 2019-10-05 20:00:20

loqs
Member
Registered: 2014-03-06
Posts: 17,195

Re: Vertical Japanese and Chinese data for tesseract OCR

Welcome to the arch linux forums yetanothernewbie.
I would suggest making a feature request on the arch bugtracker.

Offline

Board footer

Powered by FluxBB