OCR and captioning images

AinmhiCoille · 2024-04-24 01:25:57

Hello, I have a big collection of meme images and I want to make them searchable by visual/semantic content and by text.

So far I've figured out that I can use e.g. tesseract or easyocr to detect text in the image, and I can use exiftool to write custom XMP tags with the output of each command (e.g. exiftool -overwrite_original -xmp-Generated:EasyOCR="${OCR_OUTPUT}" "$@" after establishing the tag format in exiftool.config). Since different tools produce better results on different types of images and I don't want to do anything by hand if I can help it, I'm adding a new custom tag for the output of each tool. I am hoping to do the same with captioning tools that can describe the visual content of the image as well.

However, there aren't actually a lot of tools that search XMP metadata. exiftool can do it quite well but only in command line, so I don't produce a comfy Gwenview window that I can browse e.g. images that contain the word 'dog'. I was kind of hoping that Baloo would index the metadata so I could search it in Dolphin, but it only supports a pre-defined set of file metadata for each file type. I could try to shove a JSON blob into the Comment field but that seems fragile and fiddly.

Maybe the best would be if there were an image viewer that could search/filter by arbitrary metadata. So far all the ones I've tried have the same problem as Dolphin: they use a hard-coded schema to define search parameters and don't search custom XMP metadata.
Second-best would be a viewer that can take output from exiftool I guess,

If all else fails I'm going to have to build something from scratch.

Am I insane for even trying to do this? Is there a saner solution I'm overlooking?

$ exiftool -if '$xmp-Generated:EasyOCR =~ /dog/' -xmp-Generated:EasyOCR .
======== ./unknown - 2022-11-19T164412.544.png
Easy OCR                        : Hedgedog
======== ./unknown - 2022-09-24T142531.756.png
Easy OCR                        : My dog's love for me.All of my failures.My dog' ' S love for me.My dog's Ilove for me
======== ./unknown (45).png
Easy OCR                        : Get out 0f my workshop.need YOU to.invent some farm tools can use while holding all these dogs.02022 KATIE TIEDRICH.WWW.AWKWARDZOMBIE COM
[...]

Oh, in case this comes up, I'm also trying to avoid using a sidecar file for each image as these are harder to keep in sync.

Last edited by AinmhiCoille (2024-04-24 03:26:30)

seth · 2024-04-24 06:56:14

picture browsers like eg. feh can take a random list of arguments, geequie will open them as collection and the latter also has an elaborate search function which can filter for "keywords" which I just boldly assume are xmp.

As a last resort you could create your own collection by creating a tmpdir in /tmp and symlinking the results of the exiftool search there and then browse that directory with whatever.

AinmhiCoille · 2024-04-25 00:54:40

I don't know what geeqie is searching with keywords either but it isn't XMP data.
piping exiftool searches into feh works decently for viewing the images at least.

The symlink idea might be the only thing that works for browsing the search results by thumbnail which isn't exactly sane but it does work. Thank you.

Arch Linux

#1 2024-04-24 01:25:57

OCR and captioning images

#2 2024-04-24 06:56:14

Re: OCR and captioning images

#3 2024-04-25 00:54:40

Re: OCR and captioning images

Board footer