You are not logged in.
Hi,
This is my first attempt to make a bash-script. I have some expierence with PHP, but that's about it, so please don't laugh
I think this is very useful for my small office, and it didn't seem so hard, but it won't work. The script:
#!/bin/sh
inotifywait -m --format '%f' -e close_write /share/Kantoor/Scans/ | while read NAME
do
NEWNAME=`date +%Y%m%d-%H%M%S_scan`
echo "Beginning OCR"
tesseract $NAME $NEWNAME -l nld
mv $NAME.tif $NEWNAME.tif
echo "OCR complete"
done
The problem: as soon as the (temporary) file is created in the watched directory, tesseract-ocr starts, but by the time it starts, the temporary file doesn't exist anymore and fails. Adding 'sleep 10' is also not helping me because:
1. a bigger file might take longer than 10s to upload
2. I guess tesseract will look for the temporary file instead of the completely uploaded file.
How can I make sure that tesseract is fed the correct file? I read somewhere that this could be solved by mv the file instead of cp'ing the file, but the file is uploaded via FTP by my network scanner - no way in controlling that I guess...
Any help?
THX!
Offline
I guess it can be done more elegantly, but I got it working:
#!/bin/sh
inotifywait -m --format '%f' -e close_write /share/Kantoor/Scans/tmp/ | while read LINE
do
for ((a=1; a <= 30 ; a++))
do
if [ -e "$(find /share/Kantoor/Scans/tmp/ -name BRN\*)" ] ; then
NAME=$(find /share/Kantoor/Scans/tmp/ -name BRN\*)
NEWNAME=`date +%Y%m%d-%H%M%S_scan`
echo "Beginning OCR on $NEWNAME"
tesseract $NAME /share/Kantoor/Scans/$NEWNAME -l nld
mv $NAME /share/Kantoor/Scans/$NEWNAME.tif
echo "OCR complete"
a=31
else
echo "File upload not complete yet"
fi
sleep 1
done
done
It has worked on all occasions so far - does anybody see a problem in this code? The only thing I can come up with is that if a file takes more than 30 seconds to upload, it will fail. But since the FTP-connection is internal, I don't think it'll be a problem...
Now let's see how I can make the script automatically start on startup.
Offline
I'm not too familiar with inotifywait and have only been able to test it with local file copies but it looks like your first version should work, provided the script was started in /share/Kantoor/scans. The %f format from inotifywait only returns the file name, not the full path so that may have been why it failed. So;
#!/bin/sh
SCAN_PATH=/share/Kantoor/Scans
TMP_PATH=$SCAN_PATH/tmp
inotifywait -m --format '%f' -e close_write $TMP_PATH | while read NAME
do
NEWNAME=`date +%Y%m%d-%H%M%S_scan`
echo "Beginning OCR"
tesseract $TMP_PATH/$NAME $SCAN_PATH/$NEWNAME -l nld
mv $SCAN_PATH/$NAME.tif $SCAN_PATH/$NEWNAME.tif
echo "OCR complete"
done
However, this will call tesseact on ANY file that is written to /share/Kantoor/Scans/tmp so you may want to try and check the file type or file name, just in case someone uploads something incorrectly.
Your second version may behave strangely when there is more than one file in /share/Kantoor/Scans/tmp since the find will return all files and so $NAME will contain a list of files, which will then be passed to tesseact.
Offline
The first script:
I 've done some more research on inotifywait, and the problem I was having is caused by a limitation of inotifywait. Apparently there is a serious difference in linux between cp and mv: if you cp a file, the copy gets written immediately to the new destination, while mv will not write anything to the destination folder until the file is complete.
In my case, I'll be uploading files through ftp, and thus the file transfer will make a temporary file (like 'zeezretn.dblg684r4.tmp') until the upload is complete and at that point rename the file to 'BRN24512698.tif'. So: inotifywait will feel the presence of 'zeezretn.dblg684r4.tmp' and feed that filename to tesseract, but tesseract will not find that temporary file, because in the mean time, the temporary file is already renamed (unless it is a really big file, but then I guess the temporary file would not be accepted by tesseract as a supported format).
The second script is indeed limited to only 1 file in that folder, and that's why I used to mv all files outside the tmp-dir once they were processed. So either I should make clear noone saves a file in that folder or I could setup permissions on that folder in order to allow only the networkscanner to write to that folder.
THX for your thoughts - I will be using more variables and maybe put this in community contributions as an exercise in scripting.
Offline
If thats the case then simply testing for the name of the file returned by inotify should be sufficient.
e.g. wrap the OCR bit inside an if clause like:
inotifywait -m --format '%f' -e close_write $TMP_PATH | while read NAME
do
if [ $(echo $NAME | grep '^BRN') ] ; then
### do tesseract ocr
fi
done
Note that I've assumed that all the files you are interested in start with BRN but you could make this more sophisticated if you need to.
That way, while inotify will tell you about the temporary file, your script should be able ignore it provided you can get inotify to tell you about the final file which may mean looking for other inotify events e.g. 'move_to' or 'create'.
Offline
@brisbin33, you're right, I'd completely forgotten about the case statement
Offline