regex/grep word boundary (\b) bogusly matches utf8 character

Dieter@be · 2011-02-03 09:46:53

$ echo $LANG
en_US.UTF-8
$ grep "^stress\b" data/bom-nerfile | uniq
stress    O
stressÉ    O
stress    O

why does the second result show up? and how can i prevent it?

Procyon · 2011-02-03 11:20:39

http://savannah.gnu.org/bugs/?29537

Here is an extended regexp that is almost the same. It consumes the character after it though, and :punct: has _, that \b doesn't match.
grep -E '^stress([[:blank:][:punct:]]|$)'

Or sed:
sed -n '/^stress\b/p'

Arch Linux

#1 2011-02-03 09:46:53

regex/grep word boundary (\b) bogusly matches utf8 character

#2 2011-02-03 11:20:39

Re: regex/grep word boundary (\b) bogusly matches utf8 character

Board footer