You are not logged in.
Looking for a search command/ tool that does not use indexing and searches also inside binary files (such as Openoffice, pdf files, or archives). I'd like to use that in Openbox/LXDE, I know that in KDE, kfind works. Many search tools out there, the closest thing I've found is 'pcmanfm -f %F', not yet reliable. Thank you.
Last edited by no-daemon (2010-03-28 17:07:23)
less is a lot more!
Offline
Offline
Been there, tried that. I'll do it again. I have to read more about '--binary-files=' option from grep/zgrep.
Edit: grep "almost" works, except for pdf's /edit
Now, I won't mind using CLI for this, except that on some occasions for productivity sake, when I search though Gigs of docs, I have to click fast on them, to see what's in there. Thanks.
Last edited by no-daemon (2010-03-24 18:15:56)
less is a lot more!
Offline
Let me see if I understand: You are looking for a GUI application that allows you to find a string in a file, including "binary" formats like PDF and ODF, and "kfind" does what you want but you don't want to use it because you don't use KDE.
Is that correct?
Offline
Let me see if I understand: You are looking for a GUI application that allows you to find a string in a file, including "binary" formats like PDF and ODF, and "kfind" does what you want but you don't want to use it because you don't use KDE.
Is that correct?
This is exact, thank you and sorry if I didn't made myself clear from start. I'd prefer GUI, not a constrain, though.
less is a lot more!
Offline
OK, I can use some assistance here. Below is my unskilled script, that attempts some search. Pls. have mercy, it's the first one ever
#!/bin/bash
# bin-grep.sh # better if placed in a folder from path such as /usr/bin or /usr/local/bin #
# Finds at least one match of a given string in files or filenames (from a specified folder)
# requires pdftotext in $PATH from poppler: http://poppler.freedesktop.org
# *.pdf's files must have .pdf or .PDF extensions
#
BADARGS=65
E_ROOT=67
ROOT_UID=0
if [ "$UID" = "$ROOT_UID" ] ; then
echo "Unsafe to be root when running this script."
exit $E_ROOT
fi
if [ $# = 0 ] ; then
echo "Usage: `basename $0` search_string [path]"
echo "* if search_string is a phrase, use \"quotes\" * also avoid special chars'&,$,!,',\"....' inside strings *"
exit $BADARGS
elif [ $# -ne 2 ] ; then
directory=${PWD}
else
directory="$2"
echo 'Searching in:' $directory
fi
OLDIFS=$IFS; # Backup IFS if any
OLDTMP=$TMP # Backup TMP if any
IFS=:
if [ -x "$TMP" ] ; then
echo "Using ${TMP} temporary folder"
else
export TMP=/tmp
echo "Using ${TMP} temporary folder"
fi
fstring="$1"
if ! which pdftotext &> /dev/null; then
echo "poppler is not installed, or not in path * Can't search pdfs ... :("
for file in $( find $directory -mount -type f -name '*' -printf "%p$IFS" | sort -bif ) ; do
strings -af $file 2>/dev/null | zgrep -iaE --devices=skip -m 1 -e "$fstring" 2>/dev/null | awk -F":" '{ print $1 }'
continue # next one
done ;
else
for file in $( find $directory -mount -type f -name '*' -printf "%p$IFS" | sort -bif ) ; do
case $file in
*.pdf | *PDF)
pdftotext -q -nopgbrk $file $TMP/tpdf.$$
if [[ -e $(strings -af $TMP/tpdf.$$ 2>/dev/null | zgrep -iaE --devices=skip -m 1 -e "$fstring" 2>/dev/null | awk -F":" '{ print $1 }' ) ]] ; then
echo $file
rm -f $TMP/tpdf.$$
else
rm -f $TMP/tpdf.$$
fi
;;
esac
strings -af $file 2>/dev/null | zgrep -iaE --devices=skip -m 1 -e "$fstring" 2>/dev/null | awk -F":" '{ print $1 }'
continue # next one
done ; fi
IFS=${OLDIFS} # Restore IFS if any
TMP=${OLDTMP} # Restore TMP if any
exit $?
EDIT: kinda' figured out the case/ spaces issue from filenames. Now I have to figure out the grep part /EDIT
//(It ignores files having both upper/ lowercase chars in filenames.//
What's worse that it ignores *.odt *.ods files, not to mention *.pdf (it finds though *.doc and *.xls). Must be some encryption issue I have no clue about.
PS. Still searching for a GUI alternative
Last edited by no-daemon (2010-03-28 17:06:48)
less is a lot more!
Offline
Changed the above, sort-of what I was looking for, so I'm marking this fixed; added an ugly hack for pdf's, works for me, not really tested, so use it carefully. There's much room for improvement. Takes about 15 mins. to search in 20k files and ~3 mins. for one thousand. Regards,
Last edited by no-daemon (2010-03-28 17:07:59)
less is a lot more!
Offline