You are not logged in.

#1 2009-08-20 12:54:46

zenlord
Member
From: Belgium
Registered: 2006-05-24
Posts: 1,223
Website

first bash script: inotifywait-problem

Hi,

This is my first attempt to make a bash-script. I have some expierence with PHP, but that's about it, so please don't laugh smile

I think this is very useful for my small office, and it didn't seem so hard, but it won't work. The script:

#!/bin/sh

inotifywait -m --format '%f' -e close_write /share/Kantoor/Scans/ | while read NAME
do
    NEWNAME=`date +%Y%m%d-%H%M%S_scan`
    echo "Beginning OCR"
    tesseract $NAME $NEWNAME -l nld
    mv $NAME.tif $NEWNAME.tif
    echo "OCR complete"
done

The problem: as soon as the (temporary) file is created in the watched directory, tesseract-ocr starts, but by the time it starts, the temporary file doesn't exist anymore and fails. Adding 'sleep 10' is also not helping me because:
1. a bigger file might take longer than 10s to upload
2. I guess tesseract will look for the temporary file instead of the completely uploaded file.

How can I make sure that tesseract is fed the correct file? I read somewhere that this could be solved by mv the file instead of cp'ing the file, but the file is uploaded via FTP by my network scanner - no way in controlling that I guess...

Any help?
THX!

Offline

#2 2009-08-20 17:57:24

zenlord
Member
From: Belgium
Registered: 2006-05-24
Posts: 1,223
Website

Re: first bash script: inotifywait-problem

I guess it can be done more elegantly, but I got it working:

#!/bin/sh

inotifywait -m --format '%f' -e close_write /share/Kantoor/Scans/tmp/ | while read LINE
do
    for ((a=1; a <= 30 ; a++))
    do
        if [ -e "$(find /share/Kantoor/Scans/tmp/ -name BRN\*)" ] ; then
            NAME=$(find /share/Kantoor/Scans/tmp/ -name BRN\*)
            NEWNAME=`date +%Y%m%d-%H%M%S_scan`
            echo "Beginning OCR on $NEWNAME"
            tesseract $NAME /share/Kantoor/Scans/$NEWNAME -l nld
            mv $NAME /share/Kantoor/Scans/$NEWNAME.tif
            echo "OCR complete"
            a=31
        else
            echo "File upload not complete yet"
        fi
    sleep 1
    done
done

It has worked on all occasions so far - does anybody see a problem in this code? The only thing I can come up with is that if a file takes more than 30 seconds to upload, it will fail. But since the FTP-connection is internal, I don't think it'll be a problem...

Now let's see how I can make the script automatically start on startup.

Offline

#3 2009-08-20 18:52:14

ghostHack
Member
From: Bristol UK
Registered: 2008-02-29
Posts: 261

Re: first bash script: inotifywait-problem

I'm not too familiar with inotifywait and have only been able to test it with local file copies but it looks like your first version should work, provided the script was started in /share/Kantoor/scans.  The %f format from inotifywait only returns the file name, not the full path so that may have been why it failed. So;

#!/bin/sh

SCAN_PATH=/share/Kantoor/Scans
TMP_PATH=$SCAN_PATH/tmp

inotifywait -m --format '%f' -e close_write $TMP_PATH | while read NAME
do
    NEWNAME=`date +%Y%m%d-%H%M%S_scan`
    echo "Beginning OCR"
    tesseract $TMP_PATH/$NAME $SCAN_PATH/$NEWNAME -l nld
    mv $SCAN_PATH/$NAME.tif $SCAN_PATH/$NEWNAME.tif
    echo "OCR complete"
done

However, this will call tesseact on ANY file that is written to /share/Kantoor/Scans/tmp so you may want to try and check the file type or file name, just in case someone uploads something incorrectly.

Your second version may behave strangely when there is more than one file in /share/Kantoor/Scans/tmp since the find will return all files and so $NAME will contain a list of files, which will then be passed to tesseact.

Offline

#4 2009-08-21 14:39:48

zenlord
Member
From: Belgium
Registered: 2006-05-24
Posts: 1,223
Website

Re: first bash script: inotifywait-problem

The first script:
I 've done some more research on inotifywait, and the problem I was having is caused by a limitation of inotifywait. Apparently there is a serious difference in linux between cp and mv: if you cp a file, the copy gets written immediately to the new destination, while mv will not write anything to the destination folder until the file is complete.

In my case, I'll be uploading files through ftp, and thus the file transfer will make a temporary file (like 'zeezretn.dblg684r4.tmp') until the upload is complete and at that point rename the file to 'BRN24512698.tif'. So: inotifywait will feel the presence of 'zeezretn.dblg684r4.tmp' and feed that filename to tesseract, but tesseract will not find that temporary file, because in the mean time, the temporary file is already renamed (unless it is a really big file, but then I guess the temporary file would not be accepted by tesseract as a supported format).

The second script is indeed limited to only 1 file in that folder, and that's why I used to mv all files outside the tmp-dir once they were processed. So either I should make clear noone saves a file in that folder or I could setup permissions on that folder in order to allow only the networkscanner to write to that folder.

THX for your thoughts - I will be using more variables and maybe put this in community contributions as an exercise in scripting.

Offline

#5 2009-08-21 17:21:48

ghostHack
Member
From: Bristol UK
Registered: 2008-02-29
Posts: 261

Re: first bash script: inotifywait-problem

If thats the case then simply testing for the name of the file returned by inotify should be sufficient.

e.g. wrap the OCR bit inside an if clause like:

inotifywait -m --format '%f' -e close_write $TMP_PATH | while read NAME
do
  if [ $(echo $NAME | grep '^BRN') ] ; then
     ### do tesseract ocr
  fi
done

Note that I've assumed that all the files you are interested in start with BRN but you could make this more sophisticated if you need to.

That way, while inotify will tell you about the temporary file, your script should be able ignore it provided you can get inotify to tell you about the final file which may mean looking for other inotify events e.g. 'move_to' or 'create'.

Offline

#6 2009-08-21 17:37:05

brisbin33
Member
From: boston, ma
Registered: 2008-07-24
Posts: 1,799
Website

Re: first bash script: inotifywait-problem

might be better as a case:

  case $NAME in
    BRN*) do whatever ;;
    * ) : ;; # do nothing
  esac

Offline

#7 2009-08-21 18:56:05

ghostHack
Member
From: Bristol UK
Registered: 2008-02-29
Posts: 261

Re: first bash script: inotifywait-problem

@brisbin33, you're right, I'd completely forgotten about the case statement roll

Offline

Board footer

Powered by FluxBB