[kinda ot] help with sed or awk, whatever suits you better

Phrodo_00 · 2007-06-01 04:14:39

Hi, I've been reading a lot and trying bus I still cannot do it, what I want is to delete everithing in a file that's formated like

<td class="j">something</td>

and replace every

<td class="e">another thing</td>

by

<p>another thing</p>

.
When I was closer to do the former was with

sed '/<td class="j">/,/<\/td>/d'

, but looks like it was too greddy with the matching as it deleted way more that what it should have had.
Thank you.
(of course, if you provide a python or ruby script or whatever that does this is as welcome as with sed or awk, I don't care about what tool to use, I just want to get this done)

samlt · 2007-06-01 06:37:07

well, since we don't know if 'something' or 'another thing' can also contain other (similar) tags, or be over multiple line, the easiest way to do it is to replace <td class="."> with <p> and </td> with </p> one by one, and not as a whole ( <td class> blahblah </td>):

sed -e 's#<td class=".">#<p>#g' -e 's#</td>#</p>#g'

tada!

EDIT: btw, if sed '/<td class="j">/,/<\/td>/d' doesn't work because it will delete a range of line starting with the first line containg <td class="."> and ending on the last line containing </td>

Hope that's clear enough?

Last edited by samlt (2007-06-01 06:39:05)

gradgrind · 2007-06-01 06:51:45

Here's a python version, but if you have these tags nested, you might need to use an xml parser!

#!/usr/bin/env python

import re, sys

r1 = re.compile(r'<td class="j">.*?</td>', re.DOTALL)
r2 = re.compile(r'<td class="e">(.*?)</td>', re.DOTALL)

text = sys.stdin.read()
sys.stdout.write(r2.sub(r"<p>\1</p>", r1.sub("", text)))

You can pipe your file through it, e.g. "cat myfile | filter.py > mynewfile"

Arch Linux

#1 2007-06-01 04:14:39

[kinda ot] help with sed or awk, whatever suits you better

#2 2007-06-01 06:37:07

Re: [kinda ot] help with sed or awk, whatever suits you better

#3 2007-06-01 06:51:45

Re: [kinda ot] help with sed or awk, whatever suits you better

Board footer