You are not logged in.

#1 2010-06-03 01:43:51

brisbin33
Member
From: boston, ma
Registered: 2008-07-24
Posts: 1,799
Website

[SOLVED]perl regex matches twice in one line of text; i want first mat

echo 'anything<h1>something</h1>anything<h1>somethingelse</h1>anything' | perl -pe 's/.*\<h1\>(.*)\<\/h1\>.*/\1/g'
somethingelse

i want "something" (i.e. the first match left-to-right).  is there a regex that will accomplish this?

the end goal is use in php's preg_replace function, but it's easier to test/play with using the above.

Last edited by brisbin33 (2010-06-03 14:24:24)

Offline

#2 2010-06-03 02:50:18

Trent
Member
From: Baltimore, MD (US)
Registered: 2009-04-16
Posts: 990

Re: [SOLVED]perl regex matches twice in one line of text; i want first mat

Oops, was too quick to respond.  I don't think I understand the question.  Let me ponder for a while...

Last edited by Trent (2010-06-03 02:51:52)

Offline

#3 2010-06-03 03:01:56

Daenyth
Forum Fellow
From: Boston, MA
Registered: 2008-02-24
Posts: 1,244

Re: [SOLVED]perl regex matches twice in one line of text; i want first mat

Why are you parsing html with regex? Don't do that!

http://stackoverflow.com/questions/5907 … ml-why-not

Offline

#4 2010-06-03 12:31:26

Trent
Member
From: Baltimore, MD (US)
Registered: 2009-04-16
Posts: 990

Re: [SOLVED]perl regex matches twice in one line of text; i want first mat

You want a non-greedy qualifier (?) on the first .*  And the /g is unnecessary if you want only the first match.

s/.*?\<h1\>(.*)\<\/h1\>.*/\1/

Offline

#5 2010-06-03 12:46:05

rson451
Member
From: Annapolis, MD USA
Registered: 2007-04-15
Posts: 1,233
Website

Re: [SOLVED]perl regex matches twice in one line of text; i want first mat

This works, but Daenyth is right here..

echo 'anything<h1>something</h1>anything<h1>somethingelse</h1>anything' | perl -pe 's/.*?\<h1\>(.*?)\<\/h1\>.*/\1/'

archlinux - please read this and this — twice — then ask questions.
--
http://rsontech.net | http://github.com/rson

Offline

#6 2010-06-03 13:55:57

brisbin33
Member
From: boston, ma
Registered: 2008-07-24
Posts: 1,799
Website

Re: [SOLVED]perl regex matches twice in one line of text; i want first mat

@Trent, i could only get it to work by adding a second ? inside the ( )

@rson, nicely done, see above smile.

@Daenyth, i'll be looking into DOMs and whatnot, but in my simple situation, this regex will work well.

Last edited by brisbin33 (2010-06-03 14:09:18)

Offline

#7 2010-06-03 17:41:33

Trent
Member
From: Baltimore, MD (US)
Registered: 2009-04-16
Posts: 990

Re: [SOLVED]perl regex matches twice in one line of text; i want first mat

Yep, that's right.  I think I assumed it was already non-greedy.  Despite the obvious lack of qualifier.  I have no excuse. smile

Offline

#8 2010-06-03 18:04:51

brisbin33
Member
From: boston, ma
Registered: 2008-07-24
Posts: 1,799
Website

Re: [SOLVED]perl regex matches twice in one line of text; i want first mat

just as additional info for those that want it. i looked into PHP5's out of the box DOM stuff.  it's quite easy to use.  but you can't extract from commented tags (which i need).

http://simplehtmldom.sourceforge.net/manual.htm is a sweet drop-in script that works really well (i'll be using it in general going forward, Daenyth tongue).

i found it b/c their manual specifically mentions a special feature where $html->find('comment') will return all comments, within which i could then $comment->find('my custom tag').

sadly, it doesn't work.  no output.  i made a simple test script and emailed the writer of this software, we'll see if i'm dumb or it needs fixing.

cheers!

Offline

Board footer

Powered by FluxBB