You are not logged in.

#1 2011-02-03 09:46:53

Dieter@be
Forum Fellow
From: Belgium
Registered: 2006-11-05
Posts: 2,004
Website

regex/grep word boundary (\b) bogusly matches utf8 character

$ echo $LANG
en_US.UTF-8
$ grep "^stress\b" data/bom-nerfile | uniq
stress    O
stressÉ    O
stress    O

why does the second result show up? and how can i prevent it?


< Daenyth> and he works prolifically
4 8 15 16 23 42

Offline

#2 2011-02-03 11:20:39

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: regex/grep word boundary (\b) bogusly matches utf8 character

http://savannah.gnu.org/bugs/?29537

Here is an extended regexp that is almost the same. It consumes the character after it though, and :punct: has _, that \b doesn't match.
grep -E '^stress([[:blank:][:punct:]]|$)'

Or sed:
sed -n '/^stress\b/p'

Offline

Board footer

Powered by FluxBB