You are not logged in.

#1 2012-05-03 20:25:24

n0stradamus
Member
Registered: 2010-11-08
Posts: 94
Website

[Regular Expressions] Saving a variable number of matches

I'm stuck with the following problem and I don't seem to be able to solve without lots of ifs and else's.
I've got a program that you can pass patterns as parameters to. The program receives patterns as one single string.
The string could look like this:

a:i:foo r::bar t:ei:bark

or like this:

a:i:foo

What I'm hinting at is that the string comprises of several parts of the same structure. Each structure can be matched and saved with:

([art]:[ei]{0,2}:.*)

Now I want my regular expression able to match all the occurences without checking the string containing the pattern for something that could indicate the number of structures inside it. The following does not seem to work:

([art]:[ei]{0,2}:.*)+

So now I'm looking for something that would match one or more occurence of the structure and save it for future use.

I'd be really happy if someone could help me out here smile

Last edited by n0stradamus (2012-05-03 20:27:02)

Offline

#2 2012-05-03 21:12:58

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [Regular Expressions] Saving a variable number of matches

--> echo "a:i:foo r::bar t:ei:bark" | sed 's/\([art]:[ei]\{0,2\}:[^ ]*\)/1/'
1 r::bar t:ei:bark
--> echo "a:i:foo r::bar t:ei:bark" | sed 's/\([art]:[ei]\{0,2\}:[^ ]*\)/1/g'
1 1 1

If [^ ]* is not usable (spaces are allowed arbitrarily), you need a non-greedy .* and non-consuming look-ahead of " [art]:"
In python's re module, this is .*?(?=( [art]:|$))

>>> import re
>>> m=re.findall("([art]:[ei]{0,2}:.*?(?=( [art]:|$)))","a:i:foo r::bar t:ei:bark")
>>> print(m)
[('a:i:foo', ' r:'), ('r::bar', ' t:'), ('t:ei:bark', '')]

Offline

#3 2012-05-03 22:52:31

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 30,330
Website

Re: [Regular Expressions] Saving a variable number of matches

n0stradamus wrote:

The following does not seem to work:

([art]:[ei]{0,2}:.*)+

I suspect the .* is matching EVERYTHING that remains, so there is nothing left to match.  Have you tried replacing the . with [^ ] ? like

([art]:[ei]{0,2}:[^ ]*)+

"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman

Offline

#4 2012-05-04 08:19:10

n0stradamus
Member
Registered: 2010-11-08
Posts: 94
Website

Re: [Regular Expressions] Saving a variable number of matches

Procyon wrote:
--> echo "a:i:foo r::bar t:ei:bark" | sed 's/\([art]:[ei]\{0,2\}:[^ ]*\)/1/'
1 r::bar t:ei:bark
--> echo "a:i:foo r::bar t:ei:bark" | sed 's/\([art]:[ei]\{0,2\}:[^ ]*\)/1/g'
1 1 1

If [^ ]* is not usable (spaces are allowed arbitrarily), you need a non-greedy .* and non-consuming look-ahead of " [art]:"
In python's re module, this is .*?(?=( [art]:|$))

>>> import re
>>> m=re.findall("([art]:[ei]{0,2}:.*?(?=( [art]:|$)))","a:i:foo r::bar t:ei:bark")
>>> print(m)
[('a:i:foo', ' r:'), ('r::bar', ' t:'), ('t:ei:bark', '')]

Exactly what I was looking for! I didn't know that you could specify .* to stop at a certain sequence of characters.
Could you please point me to some materials where I can read up on the topic?

Back to the regex: It works finde in Python, but sadly that is not the language I'm using big_smile
The program I need this for is written in C and until now the regex functions from glibc worked fine for me.
Have I missed a function similar to re.findall in glibc?

Offline

#5 2012-05-04 10:13:14

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: [Regular Expressions] Saving a variable number of matches

Short description of [^ ] here:
http://www.grymoire.com/Unix/Regular.html#uh-6

I have no idea about glibc.

Offline

Board footer

Powered by FluxBB