You are not logged in.
Hi, I've been reading a lot and trying bus I still cannot do it, what I want is to delete everithing in a file that's formated like
<td class="j">something</td>
and replace every
<td class="e">another thing</td>
by
<p>another thing</p>
.
When I was closer to do the former was with
sed '/<td class="j">/,/<\/td>/d'
, but looks like it was too greddy with the matching as it deleted way more that what it should have had.
Thank you.
(of course, if you provide a python or ruby script or whatever that does this is as welcome as with sed or awk, I don't care about what tool to use, I just want to get this done)
Offline
well, since we don't know if 'something' or 'another thing' can also contain other (similar) tags, or be over multiple line, the easiest way to do it is to replace <td class="."> with <p> and </td> with </p> one by one, and not as a whole ( <td class> blahblah </td>):
sed -e 's#<td class=".">#<p>#g' -e 's#</td>#</p>#g'
tada!
EDIT: btw, if sed '/<td class="j">/,/<\/td>/d' doesn't work because it will delete a range of line starting with the first line containg <td class="."> and ending on the last line containing </td>
Hope that's clear enough?
Last edited by samlt (2007-06-01 06:39:05)
Offline
Here's a python version, but if you have these tags nested, you might need to use an xml parser!
#!/usr/bin/env python
import re, sys
r1 = re.compile(r'<td class="j">.*?</td>', re.DOTALL)
r2 = re.compile(r'<td class="e">(.*?)</td>', re.DOTALL)
text = sys.stdin.read()
sys.stdout.write(r2.sub(r"<p>\1</p>", r1.sub("", text)))
You can pipe your file through it, e.g. "cat myfile | filter.py > mynewfile"
larch: http://larch.berlios.de
Offline