You are not logged in.

#1 2009-04-14 00:21:54

goll
Member
From: Croatia
Registered: 2007-10-29
Posts: 50

Bash script to extract certain parts of an xml

Here is what i'm dealing with, i have 2 files, one xml file and one file containing certain class names from the first one, the point is to filter the xml file so the newly created file only contains the classes defined in the second or list file. So far i've come up with an idea but got stuck on the last step, hope someone is willing to point me in the right direction smile

The xml file is called description.xml and the list is list.txt

#!/bin/bash

# First i extract all the required class names and put in the missing parts
cat list.txt | awk '{print "<class name=\"" $1 "\">"}' > filter.txt

# Next i filter out all the line numbers of the required classes
grep -n -f filter.txt description.xml | cut -d: -f1 > names.txt

# Now i need to use something like awk '/NR=$n/,/^$/' so it filters 
out the whole class definition using the line number as the starting 
point and a newline as the ending of a class, and of course save it all 
in a new xml but so far nothing i tried worked :)

The list file is one class name per line:

java/net/xyz
java/net/xzy
java/net/yxz

And a snippet from the description.xml:

.
.
.
<class name="java/net/xyz">
...
</class>

<class name="java/net/xzy">
...
</class>
.
.
.

Offline

#2 2009-04-14 00:48:20

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: Bash script to extract certain parts of an xml

Ahh I see what you're doing. That is a smart approach. I would have gone for a script that reads every line and extracts the java/net/xzy if it has one and grep -q it to the list.txt, and then maintain a variable that says whether it should print the class that gets set to false when </class> comes.

In this case it is a bit hard to use awk because you need to process the line numbers from a file and the data too, and line numbers aren't the easiest to work with.

But sed can do that easily
IFS="
"
for sedcmd in $(sed 's~$~,/^$/p~' numbers.txt); do sed -n "$sedcmd" xmlfile.xml; done

Last edited by Procyon (2009-04-14 00:49:41)

Offline

#3 2009-04-14 00:57:17

goll
Member
From: Croatia
Registered: 2007-10-29
Posts: 50

Re: Bash script to extract certain parts of an xml

Wow, man, you're a life saver, thanks for the lightning fast reply, it works like a charm smile

Offline

#4 2009-04-15 07:30:52

lefallen
Member
From: Melbourne, Australia
Registered: 2006-07-06
Posts: 36
Website

Re: Bash script to extract certain parts of an xml

Worth noting you can use xsltproc (part of libxslt) to format XML data quickly and easily too.  That's what I use in bash scripts.


JABBER: krayon -A-T- chat.qdnx.org
E-MAIL: archlinuxforums -A-T- quadronyx.org
WEB: http://www.qdnx.org/krayon/
~o~

Offline

#5 2009-04-15 12:42:27

Daenyth
Forum Fellow
From: Boston, MA
Registered: 2008-02-24
Posts: 1,244

Re: Bash script to extract certain parts of an xml

Boooo.
man xsltproc.

Offline

Board footer

Powered by FluxBB