You are not logged in.
I'm stuck with the following problem and I don't seem to be able to solve without lots of ifs and else's.
I've got a program that you can pass patterns as parameters to. The program receives patterns as one single string.
The string could look like this:
a:i:foo r::bar t:ei:bark
or like this:
a:i:foo
What I'm hinting at is that the string comprises of several parts of the same structure. Each structure can be matched and saved with:
([art]:[ei]{0,2}:.*)
Now I want my regular expression able to match all the occurences without checking the string containing the pattern for something that could indicate the number of structures inside it. The following does not seem to work:
([art]:[ei]{0,2}:.*)+
So now I'm looking for something that would match one or more occurence of the structure and save it for future use.
I'd be really happy if someone could help me out here
Last edited by n0stradamus (2012-05-03 20:27:02)
Offline
--> echo "a:i:foo r::bar t:ei:bark" | sed 's/\([art]:[ei]\{0,2\}:[^ ]*\)/1/'
1 r::bar t:ei:bark
--> echo "a:i:foo r::bar t:ei:bark" | sed 's/\([art]:[ei]\{0,2\}:[^ ]*\)/1/g'
1 1 1
If [^ ]* is not usable (spaces are allowed arbitrarily), you need a non-greedy .* and non-consuming look-ahead of " [art]:"
In python's re module, this is .*?(?=( [art]:|$))
>>> import re
>>> m=re.findall("([art]:[ei]{0,2}:.*?(?=( [art]:|$)))","a:i:foo r::bar t:ei:bark")
>>> print(m)
[('a:i:foo', ' r:'), ('r::bar', ' t:'), ('t:ei:bark', '')]
Offline
The following does not seem to work:
([art]:[ei]{0,2}:.*)+
I suspect the .* is matching EVERYTHING that remains, so there is nothing left to match. Have you tried replacing the . with [^ ] ? like
([art]:[ei]{0,2}:[^ ]*)+
"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman
Offline
--> echo "a:i:foo r::bar t:ei:bark" | sed 's/\([art]:[ei]\{0,2\}:[^ ]*\)/1/' 1 r::bar t:ei:bark --> echo "a:i:foo r::bar t:ei:bark" | sed 's/\([art]:[ei]\{0,2\}:[^ ]*\)/1/g' 1 1 1
If [^ ]* is not usable (spaces are allowed arbitrarily), you need a non-greedy .* and non-consuming look-ahead of " [art]:"
In python's re module, this is .*?(?=( [art]:|$))>>> import re >>> m=re.findall("([art]:[ei]{0,2}:.*?(?=( [art]:|$)))","a:i:foo r::bar t:ei:bark") >>> print(m) [('a:i:foo', ' r:'), ('r::bar', ' t:'), ('t:ei:bark', '')]
Exactly what I was looking for! I didn't know that you could specify .* to stop at a certain sequence of characters.
Could you please point me to some materials where I can read up on the topic?
Back to the regex: It works finde in Python, but sadly that is not the language I'm using
The program I need this for is written in C and until now the regex functions from glibc worked fine for me.
Have I missed a function similar to re.findall in glibc?
Offline
Short description of [^ ] here:
http://www.grymoire.com/Unix/Regular.html#uh-6
I have no idea about glibc.
Offline