You are not logged in.
Broken:
boogie:~> pacman -Q | grep grep
grep 2.5.3-2
boogie:~> echo j | grep "[A-Z]"
j
boogie:~>
Not broken:
froggie:~> pacman -Q | grep grep
grep 2.5.1a-2
froggie:~> echo j | grep "[A-Z]"
froggie:~>
--HAPS
Offline
I can not replicate this here...
Online
$ pacman -Q grep
grep 2.5.3-2
$ echo j | grep "[A-Z]"
$
Also can't reproduce.
-edit-
Totally offtopic, but sed works fine too.
$ echo j | sed -n "/[A-Z]/ p"
$ echo A | sed -n "/[A-Z]/ p"
A
Last edited by Cerebral (2007-11-29 00:27:25)
Offline
Any chance that you have grep aliased to 'grep -i' to ignore case?
Offline
@sullivanva
$ pacman -Q grep
2.5.3-2
$ echo j | grep "[A-Z]"
$
I cannot reproduce your error either.
Cheers,
Offline
You might try "egrep" which is part of grep package and understands more complex regular expressions. "which grep" should disclose which grep do u use - it might be some other executable, not the /usr/bin/grep one.
Offline
[kishd@dozer ~]$ pacman -Q | grep grep
grep 2.5.3-2
[kishd@dozer ~]$ echo j | grep "[A-Z]"
j
same here
---for there is nothing either good or bad, but only thinking makes it so....
Hamlet, W Shakespeare
Offline
Hilarious! I have to try that when I get home.
Todays mistakes are tomorrows catastrophes.
Offline
[nihathrael@reaper ~]$ pacman -Q | grep grep
grep 2.5.3-2
ngrep 1.45-3
[nihathrael@reaper ~]$ echo j | grep "[A-Z]"
j
Unknown Horizons - Open source real-time strategy game with the comfy Anno 1602 feeling!
Offline
Works as intended here. No problem at all.
[~] % pacman -Q | grep grep
grep 2.5.3-2
[~] % echo j | grep "[A-Z]"
[~] %
Last edited by mucknert (2007-11-29 10:21:26)
Todays mistakes are tomorrows catastrophes.
Offline
Tried as well, no problems here either
~ $ pacman -Q | grep grep
grep 2.5.3-2
~ $ echo j | grep "[A-Z]"
~ $
Offline
sullivanva and kishd, could you paste the output of locale ?
And try with LANG=C grep .
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
i get a j too
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=en_US.utf8
There shouldn't be any reason to learn more editor types than emacs or vi -- mg (1)
[You learn that sarcasm does not often work well in international forums. That is why we avoid it. -- ewaller (arch linux forum moderator)
Offline
It's because of LC_ALL, it wasn't set here, and it worked fine.
I set it and I got the bug.
> echo $LC_ALL
en_US.utf8
> echo j | grep "[A-Z]"
j
> unset LC_ALL
> echo j | grep "[A-Z]"
<nothing>
Last edited by shining (2007-11-29 12:55:45)
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
It's because of LC_ALL, it wasn't set here, and it worked fine.
I set it and I got the bug.
I don't think it's a bug based on this documentation:
Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, `[a-d]' is equivalent to `[abcd]'. Many locales sort characters in dictionary order, and in these locales `[a-d]' is typically not equivalent to `[abcd]'; it might be equivalent to `[aBbCcDd]', for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value `C'.
LC_COLLATE, I believe, is the really critical value. So, for example, in Dolby's case, unsetting LC_ALL by itself won't do any good if he has LC_COLLATE set to 'en_US.utf8'. LC_COLLATE needs to be set to LC_COLLATE=C, for example, to get the behavior that everyone is expecting from the posted 'grep' statement. Of course, setting LC_ALL=C would do the trick as well, but it might not really be what you want.
Edit:
Forgot to post link to above quote:
http://www.gnu.org/software/grep/doc/grep_8.html#IDX178
Last edited by MrWeatherbee (2007-11-29 14:46:11)
Offline
[kishd@dozer ~]$ locale
LANG=en_ZA.UTF8
LC_CTYPE="en_ZA.UTF8"
LC_NUMERIC="en_ZA.UTF8"
LC_TIME="en_ZA.UTF8"
LC_COLLATE="en_ZA.UTF8"
LC_MONETARY="en_ZA.UTF8"
LC_MESSAGES="en_ZA.UTF8"
LC_PAPER="en_ZA.UTF8"
LC_NAME="en_ZA.UTF8"
LC_ADDRESS="en_ZA.UTF8"
LC_TELEPHONE="en_ZA.UTF8"
LC_MEASUREMENT="en_ZA.UTF8"
LC_IDENTIFICATION="en_ZA.UTF8"
LC_ALL=
[kishd@dozer ~]$
Unsetting LC_COLLATE or LC_ALL does not seem to help
Last edited by kishd (2007-11-29 14:25:25)
---for there is nothing either good or bad, but only thinking makes it so....
Hamlet, W Shakespeare
Offline
Unsetting LC_COLLATE or LC_ALL does not seem to help
Don't just unset LC_COLLATE. As you have noticed, that doesn't work.
From my first post, you need to set LC_COLLATE=C (or set it to some other locale that provides the same collation scheme). Alternatively, you can affect all the variables by setting LC_ALL=C.
Last edited by MrWeatherbee (2007-11-29 14:47:22)
Offline
shining wrote:It's because of LC_ALL, it wasn't set here, and it worked fine.
I set it and I got the bug.I don't think it's a bug based on this documentation:
Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale's collating sequence and character set. For example, in the default C locale, `[a-d]' is equivalent to `[abcd]'. Many locales sort characters in dictionary order, and in these locales `[a-d]' is typically not equivalent to `[abcd]'; it might be equivalent to `[aBbCcDd]', for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value `C'.
LC_COLLATE, I believe, is the really critical value. So, for example, in Dolby's case, unsetting LC_ALL by itself won't do any good because he has LC_COLLATE set to 'en_US.utf8'. LC_COLLATE needs to be set to LC_COLLATE=C, for example, to get the behavior that everyone is expecting from the posted 'grep' statement. Of course, setting LC_ALL=C would do the trick as well, but it might not really be what you want.
Edit:
Forgot to post link to above quote:
Despite this, I think there is a bug in grep:
$ pacman -Q grep
grep 2.5.3-2
$ echo J | grep --color=always [A-Z]
J
$ echo j | grep --color=always [A-Z]
j
Matched string is supposed to be in color (red in particular) when grep prints matched lines. The printed uppercase J is red as it should be but printed lowercase j is normal terminal color.
Offline
About the LC_COLLATE,
setting it to LC_COLLATE=C does indeed take this problem away, unsetting it causes it to react to lowercase as well.
mico : I confirmed your testing as well, it did indeed not color lowercase 'j' using [A-Z] range
I'm on deep water in this, but hope someone knows whats going on "behind the scenes". Could probably learn something good from this, I hope so at least
$ echo J | grep --color=always [A-Z]
J (orange)
$ echo j | grep --color=always [A-Z]
j (white)
$ echo j | grep --color=always -i [A-Z]
j (orange)
$ echo j | grep --color=always [a-z]
j (orange)
Offline
[kishd@dozer ~]$ locale
LANG=en_US.UTF8
LC_CTYPE="en_US.UTF8"
LC_NUMERIC="en_US.UTF8"
LC_TIME="en_US.UTF8"
LC_COLLATE="en_US.UTF8"
LC_MONETARY="en_US.UTF8"
LC_MESSAGES="en_US.UTF8"
LC_PAPER="en_US.UTF8"
LC_NAME="en_US.UTF8"
LC_ADDRESS="en_US.UTF8"
LC_TELEPHONE="en_US.UTF8"
LC_MEASUREMENT="en_US.UTF8"
LC_IDENTIFICATION="en_US.UTF8"
LC_ALL=
[kishd@dozer ~]$ echo j | grep "[A-Z]"
j
[kishd@dozer ~]$
LC_ALL= unset but still get the j
Last edited by kishd (2007-11-29 15:19:57)
---for there is nothing either good or bad, but only thinking makes it so....
Hamlet, W Shakespeare
Offline
shining wrote:It's because of LC_ALL, it wasn't set here, and it worked fine.
I set it and I got the bug.I don't think it's a bug based on this documentation:
You are right, my mistake. I actually already learned this in the past and then forgot.
It's indeed only LC_COLLATE which matters, and it's not a bug.
pacman roulette : pacman -S $(pacman -Slq | LANG=C sort -R | head -n $((RANDOM % 10)))
Offline
LC_ALL= unset but still get the j
You're not supposed to UNSET it. You're supposed to set it to C.
LC_COLLATE=C
or
LC_ALL=C
Last edited by Cerebral (2007-11-29 15:36:53)
Offline
I wasn't aware that [A-Z] was a valid grep string. I believe egrep (or simply grep -e) handles posix (extended) regular expressions.
Offline
Looks fine on my end.
[f|~]% pacman -Q | grep grep
grep 2.5.3-1
ngrep 1.45-3.1
[f|~]% echo j
j
[f|~]% echo j | grep "[A-Z]"
[f|~]% echo j | grep "[a-z]"
j
[f|~]%
Offline
So far I only use egrep when I need to find alternating strings (OR), like egrep 'this|that' file, for everything else normal grep suffices.
1000
Offline