You are not logged in.

#1 2008-10-19 23:52:44

rine
Member
From: Germany
Registered: 2008-03-04
Posts: 217

grep -i with regex

$ echo Batman | grep -E '[b]at'
Batman
$ echo Batman | grep -i -E '[b]at'

I don't get it. Why doesn't it match with -i?

edit: And what's wrong with the forum here. I wrote capital B's in the brackets sad

Last edited by rine (2008-10-19 23:55:22)

Offline

#2 2008-10-20 00:58:08

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: grep -i with regex

I don't know why, but after reading this, I found out what caused it:
http://savannah.gnu.org/bugs/?18633It seems to be LC_CTYPE

$ LC_CTYPE=C grep -i -E '[b]at' heroes.txt
batman
Batman
$ LC_CTYPE=en_US.UTF8 grep -i -E '[b]at' heroes.txt
batman

Last edited by Procyon (2008-10-20 00:58:20)

Offline

#3 2008-10-20 16:13:14

gnud
Member
Registered: 2005-11-27
Posts: 182

Re: grep -i with regex

That seems like a pretty bad bug? If the text file is in utf-8, that is.

Offline

#4 2008-10-21 17:09:34

rine
Member
From: Germany
Registered: 2008-03-04
Posts: 217

Re: grep -i with regex

It really is the locale.

LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE=C
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=

This is mine. I tried the same on a server from work and it works as expected, the locale there is:

LANG=de_DE@euro
LC_CTYPE=de_DE@euro
LC_NUMERIC="de_DE@euro"
LC_TIME="de_DE@euro"
LC_COLLATE="de_DE@euro"
LC_MONETARY="de_DE@euro"
LC_MESSAGES="de_DE@euro"
LC_PAPER="de_DE@euro"
LC_NAME="de_DE@euro"
LC_ADDRESS="de_DE@euro"
LC_TELEPHONE="de_DE@euro"
LC_MEASUREMENT="de_DE@euro"
LC_IDENTIFICATION="de_DE@euro"
LC_ALL=

Guess I won't use -i with grep anymore D:

Offline

#5 2008-10-21 17:17:18

.:B:.
Forum Fellow
Registered: 2006-11-26
Posts: 5,819
Website

Re: grep -i with regex

I think egrep is meant for working with regexes, I could be wrong though (egrep is grep with some switch).


Got Leenucks? :: Arch: Power in simplicity :: Get Counted! Registered Linux User #392717 :: Blog thingy

Offline

#6 2008-10-21 19:37:01

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: grep -i with regex

@rine: alias grep='LANG=C grep' in .bashrc will do the trick.

@B: Yep it's -E, and just -E.

Offline

#7 2008-10-21 21:09:06

rine
Member
From: Germany
Registered: 2008-03-04
Posts: 217

Re: grep -i with regex

Procyon wrote:

@rine: alias grep='LANG=C grep' in .bashrc will do the trick.

@B: Yep it's -E, and just -E.

Ok, that works. Thanks.
@B: What Procyon said, egrep is deprecated for grep -E.

Last edited by rine (2008-10-23 23:37:05)

Offline

#8 2008-10-21 21:11:52

rine
Member
From: Germany
Registered: 2008-03-04
Posts: 217

Re: grep -i with regex

I have done it differently now, just export LC_ALL=C in my .zshrc.  I think that's cleaner. There is still one issue though (with all (non-)solutions). I have grep aliased as grep --color=auto. Now when I do

echo Batman | grep -i -E '[b]at'

the output is colorized. But when I use a capital B in the brackets, it's not colorized. Any ideas?

Last edited by rine (2008-10-23 23:40:28)

Offline

#9 2008-10-26 10:18:32

gnud
Member
Registered: 2005-11-27
Posts: 182

Re: grep -i with regex

I don't think that collating a different way than your locale is a fix. It's a glaring bug that you can't grep through multi-byte characters.

Offline

Board footer

Powered by FluxBB