You are not logged in.

#1 2010-03-24 17:28:35

no-daemon
Member
Registered: 2010-02-13
Posts: 15

[fixed] Searching inside files without indexing first

Looking for a search command/ tool that does not use indexing and searches also inside binary files (such as Openoffice, pdf files, or archives). I'd like to use that in Openbox/LXDE, I know that in KDE, kfind works. Many search tools out there, the closest thing I've found is 'pcmanfm -f %F', not yet reliable. Thank you.

Last edited by no-daemon (2010-03-28 17:07:23)


less is a lot more!

Offline

#2 2010-03-24 17:40:40

Daenyth
Forum Fellow
From: Boston, MA
Registered: 2008-02-24
Posts: 1,244

Re: [fixed] Searching inside files without indexing first

grep.

Offline

#3 2010-03-24 17:54:36

no-daemon
Member
Registered: 2010-02-13
Posts: 15

Re: [fixed] Searching inside files without indexing first

Been there, tried that. I'll do it again. I have to read more about '--binary-files=' option from grep/zgrep.
Edit: grep "almost" works, except for pdf's /edit
Now, I won't mind using CLI for this, except that on some occasions for productivity sake, when I search though Gigs of docs, I have to click fast on them, to see what's in there. Thanks.

Last edited by no-daemon (2010-03-24 18:15:56)


less is a lot more!

Offline

#4 2010-03-24 18:22:55

drcouzelis
Member
From: Connecticut, USA
Registered: 2009-11-09
Posts: 4,092
Website

Re: [fixed] Searching inside files without indexing first

Let me see if I understand: You are looking for a GUI application that allows you to find a string in a file, including "binary" formats like PDF and ODF, and "kfind" does what you want but you don't want to use it because you don't use KDE.

Is that correct?

Offline

#5 2010-03-24 19:05:00

no-daemon
Member
Registered: 2010-02-13
Posts: 15

Re: [fixed] Searching inside files without indexing first

drcouzelis wrote:

Let me see if I understand: You are looking for a GUI application that allows you to find a string in a file, including "binary" formats like PDF and ODF, and "kfind" does what you want but you don't want to use it because you don't use KDE.

Is that correct?

This is exact, thank you and sorry if I didn't made myself clear from start. I'd prefer GUI, not a constrain, though.


less is a lot more!

Offline

#6 2010-03-26 19:17:59

no-daemon
Member
Registered: 2010-02-13
Posts: 15

Re: [fixed] Searching inside files without indexing first

OK, I can use some assistance here. Below is my unskilled script, that attempts some search. Pls. have mercy, it's the first one ever smile

#!/bin/bash
# bin-grep.sh # better if placed in a folder from path such as /usr/bin or /usr/local/bin #
# Finds at least one match of a given string in files or filenames (from a specified folder)
# requires pdftotext in $PATH from poppler: http://poppler.freedesktop.org
# *.pdf's files must have .pdf or .PDF extensions
#
BADARGS=65
E_ROOT=67
ROOT_UID=0

if [ "$UID" = "$ROOT_UID" ] ; then
echo "Unsafe to be root when running this script."
exit $E_ROOT
fi

if [ $# = 0 ] ; then
        echo "Usage: `basename $0` search_string [path]"
        echo "* if search_string is a phrase, use \"quotes\" * also avoid special chars'&,$,!,',\"....' inside strings *"
        exit $BADARGS
elif [ $# -ne 2 ] ; then
        directory=${PWD}
        else
        directory="$2"
        echo 'Searching in:' $directory
fi
OLDIFS=$IFS; # Backup IFS if any
OLDTMP=$TMP # Backup TMP if any
IFS=:
if [ -x "$TMP" ] ; then
echo "Using ${TMP} temporary folder"
else
export TMP=/tmp
echo "Using ${TMP} temporary folder"
fi
fstring="$1"

if ! which pdftotext &> /dev/null; then
   echo "poppler is not installed, or not in path * Can't search pdfs ... :("
    for file in $( find $directory -mount -type f -name '*' -printf "%p$IFS" | sort -bif ) ; do
        strings -af $file 2>/dev/null | zgrep -iaE --devices=skip -m 1 -e "$fstring"  2>/dev/null | awk -F":" '{ print $1 }'
    continue                #  next one
    done ; 
else
    for file in $( find $directory -mount -type f -name '*' -printf "%p$IFS" | sort -bif ) ; do
        case $file in 
    *.pdf | *PDF) 
        pdftotext -q -nopgbrk $file $TMP/tpdf.$$
    if [[ -e $(strings -af $TMP/tpdf.$$ 2>/dev/null | zgrep -iaE --devices=skip -m 1 -e "$fstring" 2>/dev/null | awk -F":" '{ print $1 }' ) ]] ;    then
        echo $file
        rm -f $TMP/tpdf.$$
    else
        rm -f $TMP/tpdf.$$
    fi
        ;;
        esac
    strings -af $file 2>/dev/null | zgrep -iaE --devices=skip -m 1 -e "$fstring"  2>/dev/null | awk -F":" '{ print $1 }'
continue                #  next one
done ; fi
IFS=${OLDIFS} # Restore IFS if any
TMP=${OLDTMP} # Restore TMP if any
exit $?

EDIT: kinda' figured out the case/ spaces issue from filenames. Now I have to figure out the grep part  /EDIT
//(It ignores files having both upper/ lowercase chars in filenames.//
What's worse that it ignores *.odt *.ods files, not to mention *.pdf (it finds though *.doc and *.xls). Must be some encryption issue I have no clue about.
PS. Still searching for a GUI alternative tongue

Last edited by no-daemon (2010-03-28 17:06:48)


less is a lot more!

Offline

#7 2010-03-28 17:06:06

no-daemon
Member
Registered: 2010-02-13
Posts: 15

Re: [fixed] Searching inside files without indexing first

Changed the above, sort-of what I was looking for, so I'm marking this fixed; added an ugly hack for pdf's, works for me, not really tested, so use it carefully. There's much room for improvement. Takes about 15 mins. to search in 20k files and ~3 mins. for one thousand.  Regards,

Last edited by no-daemon (2010-03-28 17:07:59)


less is a lot more!

Offline

Board footer

Powered by FluxBB