You are not logged in.

#1 2007-11-29 00:12:12

sullivanva
Member
From: Herndon, VA USA
Registered: 2005-07-21
Posts: 126

problem with grep

Broken:

boogie:~> pacman -Q | grep grep
grep 2.5.3-2
boogie:~> echo j | grep "[A-Z]"
j
boogie:~>

Not broken:

froggie:~> pacman -Q | grep grep
grep 2.5.1a-2
froggie:~> echo j | grep "[A-Z]"
froggie:~>

--HAPS

Offline

#2 2007-11-29 00:22:39

Allan
Pacman
From: Brisbane, AU
Registered: 2007-06-09
Posts: 11,399
Website

Re: problem with grep

I can not replicate this here...

Offline

#3 2007-11-29 00:24:50

Cerebral
Forum Fellow
From: Waterloo, ON, CA
Registered: 2005-04-08
Posts: 3,108
Website

Re: problem with grep

$ pacman -Q grep
grep 2.5.3-2
$ echo j | grep "[A-Z]"
$

Also can't reproduce.

-edit-
Totally offtopic, but sed works fine too.  tongue

$ echo j | sed -n "/[A-Z]/ p"
$ echo A | sed -n "/[A-Z]/ p"
A

Last edited by Cerebral (2007-11-29 00:27:25)

Offline

#4 2007-11-29 00:47:38

nj
Member
Registered: 2007-04-06
Posts: 93

Re: problem with grep

Any chance that you have grep aliased to 'grep -i' to ignore case?

Offline

#5 2007-11-29 01:52:10

delphiki
Member
Registered: 2007-11-17
Posts: 66
Website

Re: problem with grep

@sullivanva

$ pacman -Q grep
2.5.3-2
$ echo j | grep "[A-Z]"
$

I cannot reproduce your error either.

Cheers,

Offline

#6 2007-11-29 06:29:36

drakosha
Member
Registered: 2006-01-03
Posts: 253
Website

Re: problem with grep

You might try "egrep" which is part of grep package and understands more complex regular expressions. "which grep" should disclose which grep do u use - it might be some other executable, not the /usr/bin/grep one.

Offline

#7 2007-11-29 08:31:19

kishd
Member
Registered: 2006-06-14
Posts: 401

Re: problem with grep

[kishd@dozer ~]$ pacman -Q | grep grep
grep 2.5.3-2

[kishd@dozer ~]$ echo j | grep "[A-Z]"
j

same here


---for there is nothing either good or bad, but only thinking makes it so....
Hamlet, W Shakespeare

Offline

#8 2007-11-29 08:33:27

mucknert
Member
From: Berlin // Germany
Registered: 2006-06-27
Posts: 510

Re: problem with grep

Hilarious! I have to try that when I get home.


Todays mistakes are tomorrows catastrophes.

Offline

#9 2007-11-29 09:33:03

Nihathrael
Member
From: Freising, Germany
Registered: 2007-10-21
Posts: 82
Website

Re: problem with grep

[nihathrael@reaper ~]$ pacman -Q | grep grep
grep 2.5.3-2
ngrep 1.45-3
[nihathrael@reaper ~]$ echo j | grep "[A-Z]"
j

Unknown Horizons - Open source real-time strategy game with the comfy Anno 1602 feeling!

Offline

#10 2007-11-29 10:19:55

mucknert
Member
From: Berlin // Germany
Registered: 2006-06-27
Posts: 510

Re: problem with grep

Works as intended here. No problem at all.

[~] % pacman -Q | grep grep
grep 2.5.3-2
[~] % echo j | grep "[A-Z]" 
[~] %

Last edited by mucknert (2007-11-29 10:21:26)


Todays mistakes are tomorrows catastrophes.

Offline

#11 2007-11-29 12:30:24

Sekre
Member
From: The Rainy North
Registered: 2006-11-24
Posts: 116

Re: problem with grep

Tried as well, no problems here either

~  $  pacman -Q | grep grep
grep 2.5.3-2
~  $  echo j | grep "[A-Z]"
~  $

Offline

#12 2007-11-29 12:43:25

shining
Pacman Developer
Registered: 2006-05-10
Posts: 2,043

Re: problem with grep

sullivanva and kishd, could you paste the output of locale ?
And try with LANG=C grep .


pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))

Offline

#13 2007-11-29 12:48:37

dolby
Member
From: 1992
Registered: 2006-08-08
Posts: 1,581

Re: problem with grep

i get a j too

LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=en_US.utf8

There shouldn't be any reason to learn more editor types than emacs or vi -- mg (1)
[You learn that sarcasm does not often work well in international forums.  That is why we avoid it. -- ewaller (arch linux forum moderator)

Offline

#14 2007-11-29 12:54:28

shining
Pacman Developer
Registered: 2006-05-10
Posts: 2,043

Re: problem with grep

It's because of LC_ALL, it wasn't set here, and it worked fine.
I set it and I got the bug.

> echo $LC_ALL
en_US.utf8
> echo j | grep "[A-Z]"
j
> unset LC_ALL
> echo j | grep "[A-Z]"
<nothing>

Last edited by shining (2007-11-29 12:55:45)


pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))

Offline

#15 2007-11-29 14:00:21

MrWeatherbee
Member
Registered: 2007-08-01
Posts: 277

Re: problem with grep

shining wrote:

It's because of LC_ALL, it wasn't set here, and it worked fine.
I set it and I got the bug.

I don't think it's a bug based on this documentation:

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, `[a-d]' is equivalent to `[abcd]'. Many locales sort characters in dictionary order, and in these locales `[a-d]' is typically not equivalent to `[abcd]'; it might be equivalent to `[aBbCcDd]', for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value `C'.

LC_COLLATE, I believe, is the really critical value. So, for example, in Dolby's case, unsetting LC_ALL by itself won't do any good if he has LC_COLLATE set to 'en_US.utf8'. LC_COLLATE needs to be set to LC_COLLATE=C, for example, to get the behavior that everyone is expecting from the posted 'grep' statement. Of course, setting LC_ALL=C would do the trick as well, but it might not really be what you want.

Edit:

Forgot to post link to above quote:

http://www.gnu.org/software/grep/doc/grep_8.html#IDX178

Last edited by MrWeatherbee (2007-11-29 14:46:11)

Offline

#16 2007-11-29 14:22:34

kishd
Member
Registered: 2006-06-14
Posts: 401

Re: problem with grep

[kishd@dozer ~]$ locale
LANG=en_ZA.UTF8
LC_CTYPE="en_ZA.UTF8"
LC_NUMERIC="en_ZA.UTF8"
LC_TIME="en_ZA.UTF8"
LC_COLLATE="en_ZA.UTF8"
LC_MONETARY="en_ZA.UTF8"
LC_MESSAGES="en_ZA.UTF8"
LC_PAPER="en_ZA.UTF8"
LC_NAME="en_ZA.UTF8"
LC_ADDRESS="en_ZA.UTF8"
LC_TELEPHONE="en_ZA.UTF8"
LC_MEASUREMENT="en_ZA.UTF8"
LC_IDENTIFICATION="en_ZA.UTF8"
LC_ALL=
[kishd@dozer ~]$

Unsetting LC_COLLATE or LC_ALL does not seem to help

Last edited by kishd (2007-11-29 14:25:25)


---for there is nothing either good or bad, but only thinking makes it so....
Hamlet, W Shakespeare

Offline

#17 2007-11-29 14:29:15

MrWeatherbee
Member
Registered: 2007-08-01
Posts: 277

Re: problem with grep

kishd wrote:

Unsetting LC_COLLATE or LC_ALL does not seem to help

Don't just unset LC_COLLATE. As you have noticed, that doesn't work.

From my first post, you need to set LC_COLLATE=C (or set it to some other locale that provides the same collation scheme). Alternatively, you can affect all the variables by setting LC_ALL=C.

Last edited by MrWeatherbee (2007-11-29 14:47:22)

Offline

#18 2007-11-29 14:46:50

mico
Member
From: Slovenia
Registered: 2004-02-08
Posts: 247

Re: problem with grep

MrWeatherbee wrote:
shining wrote:

It's because of LC_ALL, it wasn't set here, and it worked fine.
I set it and I got the bug.

I don't think it's a bug based on this documentation:

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, `[a-d]' is equivalent to `[abcd]'. Many locales sort characters in dictionary order, and in these locales `[a-d]' is typically not equivalent to `[abcd]'; it might be equivalent to `[aBbCcDd]', for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value `C'.

LC_COLLATE, I believe, is the really critical value. So, for example, in Dolby's case, unsetting LC_ALL by itself won't do any good because he has LC_COLLATE set to 'en_US.utf8'. LC_COLLATE needs to be set to LC_COLLATE=C, for example, to get the behavior that everyone is expecting from the posted 'grep' statement. Of course, setting LC_ALL=C would do the trick as well, but it might not really be what you want.

Edit:

Forgot to post link to above quote:

http://www.gnu.org/software/grep/doc/grep_8.html#IDX178

Despite this, I think there is a bug in grep:

$ pacman -Q grep
grep 2.5.3-2
$ echo J | grep --color=always [A-Z]
J
$ echo j | grep --color=always [A-Z]
j

Matched string is supposed to be in color (red in particular) when grep prints matched lines. The printed uppercase J is red as it should be but printed lowercase j is normal terminal color.

Offline

#19 2007-11-29 15:11:25

Sekre
Member
From: The Rainy North
Registered: 2006-11-24
Posts: 116

Re: problem with grep

About the LC_COLLATE,
setting it to LC_COLLATE=C does indeed take this problem away, unsetting it causes it to react to lowercase as well.

mico : I confirmed your testing as well, it did indeed not color lowercase 'j' using [A-Z] range hmm

I'm on deep water in this, but hope someone knows whats going on "behind the scenes". Could probably learn something good from this, I hope so at least cool

$ echo J | grep --color=always [A-Z]
J (orange)
$ echo j | grep --color=always [A-Z]
j (white)
$ echo j | grep --color=always -i [A-Z]
j (orange)
$ echo j | grep --color=always [a-z]
j (orange)

Offline

#20 2007-11-29 15:19:22

kishd
Member
Registered: 2006-06-14
Posts: 401

Re: problem with grep

[kishd@dozer ~]$ locale
LANG=en_US.UTF8
LC_CTYPE="en_US.UTF8"
LC_NUMERIC="en_US.UTF8"
LC_TIME="en_US.UTF8"
LC_COLLATE="en_US.UTF8"
LC_MONETARY="en_US.UTF8"
LC_MESSAGES="en_US.UTF8"
LC_PAPER="en_US.UTF8"
LC_NAME="en_US.UTF8"
LC_ADDRESS="en_US.UTF8"
LC_TELEPHONE="en_US.UTF8"
LC_MEASUREMENT="en_US.UTF8"
LC_IDENTIFICATION="en_US.UTF8"
LC_ALL=
[kishd@dozer ~]$ echo j | grep "[A-Z]"
j
[kishd@dozer ~]$

LC_ALL= unset but still get the j

Last edited by kishd (2007-11-29 15:19:57)


---for there is nothing either good or bad, but only thinking makes it so....
Hamlet, W Shakespeare

Offline

#21 2007-11-29 15:21:51

shining
Pacman Developer
Registered: 2006-05-10
Posts: 2,043

Re: problem with grep

MrWeatherbee wrote:
shining wrote:

It's because of LC_ALL, it wasn't set here, and it worked fine.
I set it and I got the bug.

I don't think it's a bug based on this documentation:

You are right, my mistake. I actually already learned this in the past and then forgot.
It's indeed only LC_COLLATE which matters, and it's not a bug.


pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))

Offline

#22 2007-11-29 15:35:38

Cerebral
Forum Fellow
From: Waterloo, ON, CA
Registered: 2005-04-08
Posts: 3,108
Website

Re: problem with grep

kishd wrote:

LC_ALL= unset but still get the j

You're not supposed to UNSET it.  You're supposed to set it to C.

LC_COLLATE=C
or
LC_ALL=C

Last edited by Cerebral (2007-11-29 15:36:53)

Offline

#23 2007-11-29 16:39:52

Bison
Member
From: Jacksonville, FL
Registered: 2006-04-12
Posts: 158
Website

Re: problem with grep

I wasn't aware that [A-Z] was a valid grep string.  I believe egrep (or simply grep -e) handles  posix (extended) regular expressions.

Offline

#24 2007-11-29 16:47:54

F
Member
Registered: 2006-10-09
Posts: 322

Re: problem with grep

Looks fine on my end.

[f|~]% pacman -Q | grep grep
grep 2.5.3-1
ngrep 1.45-3.1
[f|~]% echo j
j
[f|~]% echo j | grep "[A-Z]"
[f|~]% echo j | grep "[a-z]"
j
[f|~]%

Offline

#25 2007-11-29 16:48:59

byte
Member
From: Düsseldorf (DE)
Registered: 2006-05-01
Posts: 2,046

Re: problem with grep

So far I only use egrep when I need to find alternating strings (OR), like egrep 'this|that' file, for everything else normal grep suffices.


1000

Offline

Board footer

Powered by FluxBB