You are not logged in.
I have a 30k line file in the following format
<tag1
data=1>
</tag1>
<tag2
data=2>
</tag2>
<accessControl inRealm='userA'
<inSrc address='1.2.3.4'
port='0'/>
<inDst address='0.0.0.0'
port='0'/>
</accessControl>
<tag3
data=3>
</tag3>
<accessControl inRealm='userB'
<inSrc address='5.6.7.8'
port='0'/>
<inDst address='0.0.0.0'
port='0'/>
</accessControl>
I have a second file in the following format
userA,192.168.100.1
userB,192.168.101.1
What I need to do is iterate through the first file and modify each block that begins with "<accessControl inRealm=" and ends with "</accessControl>". Those are my only markers to determine paragraphs. There are no empty lines and these blocks appear all over the file (no real pattern that I can see) but they are consistently in that format.
I need to replace the "inDst address='0.0.0.0' of each block with the IP in the second file. So the 'userB' accessControl section would look like this:
<accessControl inRealm='userB'
<inSrc address='5.6.7.8'
port='0'/>
<inDst address='192.168.101.1'
port='0'/>
</accessControl>
I have whatever tools pacman can install (or AUR) at my disposal and and am not tied to any particular language.
I've been trying to do this in perl, but I'm a real novice... I can break it into paragraphs but I can't really do much more
open(IN,"data.txt") or die "failed:$!";
@file=<IN>;
foreach(@file) {
if(/accessControl inRealm=/) {$str = $_ ; next;}
unless(/\<\/accessControl>/) {
$str.=$_ ;
}
else {
$str.= $_ ;
push @data,$str;
undef $str;
}
}
I've also been trying in bash but not really progressing because I'm not sure how to split the paragraphs correctly
paragraphs=()
while read line
do
paragraphs+=("$line")
done < <(cat data.txt)
for paragraph in "${paragraphs[@]}"
do
# do my stuff here
done
Any advice? Am I on the right track? Bash too inefficient for this? I'm assuming awk could probably do it but my knowledge is rather limited
Last edited by oliver (2013-10-02 01:21:16)
Offline
vim and emacs both do a fine job with XML files and can automatically restructure them.
Also (I've never tried it) but a cursory look through the AUR turned up xmlindent. It may well do exactly what you need.
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
The file looks like malformed XML (malformed, because it's missing quotes around the attributes, closing tags, etc.). If it's supposed to be proper XML, the right way would be to fix it and use an XML parser in your favorite scripting language to load it, read the other file, and alter the data systematically.
xmllint in the libxml2 package may be useful for checking XML syntax and reformatting.
If you know Python then you could probably use the xml.dom.minidom or one of the other standard XML libraries.
If it is in the format posted then the following script should do what you want, or at least provide a starting point. It's not the most efficient approach but it seems to do the trick.
#!/usr/bin/env perl
use strict;
use warnings;
open (my $fh1, '<', $ARGV[0]) or die "failed to open $ARGV[1]";
open (my $fh2, '<', $ARGV[1]) or die "failed to open $ARGV[2]";
my %user_ips;
foreach my $line (<$fh2>)
{
my ($user, $ip) = split /,/, $line, 2;
chomp $ip;
$user_ips{$user} = $ip;
}
my $old_xml;
{
local $/;
$old_xml = <$fh1>;
}
my $new_xml = $old_xml;
while ($old_xml =~ m/(<accessControl\s.*?<\/accessControl>)/sg)
{
my $old_ac = $1;
my ($user) = ($old_ac =~ m/inRealm='([^']+)/);
my $new_ip = $user_ips{$user};
my $old_ip = quotemeta "0.0.0.0";
my $new_ac = $old_ac;
$new_ac =~ s/$old_ip/$new_ip/;
$old_ac = quotemeta $old_ac;
$new_xml =~ s/$old_ac/$new_ac/;
}
print $new_xml;
close($fh1);
close($fh2);
Now, if we lay down in the grass and remain very quiet, a wild sed wizard may appear to amaze us with a glorious, arcane one-liner. Remember, if he appears, avoid sudden movements. If you startle him, his expression will break and he'll slink off muttering something about it having worked a minute ago.
p.s. I haven't touched Perl in ages. I almost miss it.
Almost.
Last edited by Xyne (2013-10-02 00:25:12)
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
p.s. I haven't touched Perl in ages. I almost miss it.
Perl - The world's finest write only language
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
If the files are file1 and file2:
for i in $(sed -n "s/^.*inRealm='\(.*\)'.*$/\1/p" file1); do j=$(sed -n "s/^${i},\(.*\)$/\1/p" file2); sed -i "/\<accessControl inRealm='${i}'/,/^\<\\accessControl/s/\(inDst address='\)0\.0\.0\.0/\1${j}/" file1; done
Should be:
for i in $(sed -n "s/^.*inRealm='\(.*\)'.*$/\1/p" file1); do j=$(sed -n "s/^${i},\(.*\)$/\1/p" file2); sed -i "/<accessControl inRealm='${i}'/,/^<\/accessControl/s/\(inDst address='\)0\.0\.0\.0/\1${j}/" file1; done
So what does 's/\</...' do?
Last edited by cfr (2013-10-01 21:26:49)
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
If the files are file1 and file2:
for i in $(sed -n "s/^.*inRealm='\(.*\)'.*$/\1/p" file1); do j=$(sed -n "s/^${i},\(.*\)$/\1/p" file2); sed -i "/\<accessControl inRealm='${i}'/,/^\<\\accessControl/s/\(inDst address='\)0\.0\.0\.0/\1${j}/" file1; done
SSHHHhhhh shhhh, everyone, get down! Don't move!
Eric, hand me that tranquillizer rifle and a 10 ml dart.
karol, are you documenting this?
Wait, wait, there's an error in the expression.
*puts rifle down*
Let's see how this plays out.
A word of caution to anyone trying the expression: it modifies the file in place.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
thank you all for the code and the comments :-)
It's not really XML.. it's the backup file from a proprietary box that acts as a glorified firewall with tons of virtual IPs that users can connect to. And while the <tag1 stuff (etc) is obviously an example, it really is in that format (open tags etc.) Those define the specific user data so I didn't want to include them here.
The bigger picture is that we have to modify each ACL entry to include both a src *and* dst IP hence the request. If I can modify the file, I can upload it back to the box and "restore" it (this basic procedure is known to work.) If I can't modify the file, some poor fellow has to log in and make 1000s of manual changes.
Offline
If it is in the format posted then the following script should do what you want, or at least provide a starting point. It's not the most efficient approach but it seems to do the trick.
This is close :-) but one slight problem... output is:
<accessControl inRealm='userB'
<inSrc address='5.6.7.8'
port='0'/>
<inDst address='192.168.101.1
'
port='0'/>
</accessControl>
Any idea why the closing quote mark of the dst IP is on the next line?
Offline
If the files are file1 and file2:
for i in $(sed -n "s/^.*inRealm='\(.*\)'.*$/\1/p" file1); do j=$(sed -n "s/^${i},\(.*\)$/\1/p" file2); sed -i "/\<accessControl inRealm='${i}'/,/^\<\\accessControl/s/\(inDst address='\)0\.0\.0\.0/\1${j}/" file1; done
This is close too.. but the dst IP is the same for both
<tag1
data=1>
</tag1>
<tag2
data=2>
</tag2>
<accessControl inRealm='userA'
<inSrc address='1.2.3.4'
port='0'/>
<inDst address='192.168.100.1'
port='0'/>
</accessControl>
<tag3
data=3>
</tag3>
<accessControl inRealm='userB'
<inSrc address='5.6.7.8'
port='0'/>
<inDst address='192.168.100.1'
port='0'/>
</accessControl>
Offline
SSHHHhhhh shhhh, everyone, get down! Don't move!
I know this would be Karol's job to say something like this, but a quick search of the board (with an external search tool and some regex) told me, that 101,3% of all Arch users recommend the use of awk, as soon as a single line with more than one mention of sed is posted. I know, this might be not the actual solution, but I made a habit of processing everything longer than a single line with awk. Mostly because I'm afraid of perl and, of course, because this thread wouldn't turn up under "Posted" if I only subscribed to it.
Offline
I was playing around with the sed one and I have it working now (I think - proof will be on the real file)
I simplified it a little bit for my own readability and it now looks like this:
for i in $(sed -n "s/^.*inRealm='\(.*\)'.*$/\1/p" file1)
do
j=$(sed -n "s/^${i},\(.*\)$/\1/p" file2)
sed -i "/<accessControl inRealm='${i}'/,/<\/accessControl>/s/\(inDst address='\)0\.0\.0\.0/\1${j}/" file1
done
Thanks again to everyone
Offline
The extra linebreak was due to my omission of "chomp" (good ol' chomp). The posted script should work now.
My Arch Linux Stuff • Forum Etiquette • Community Ethos - Arch is not for everyone
Offline
Yes sir - works perfectly. Thank you
Offline