You are not logged in.

#1 2013-09-30 21:44:13

oliver
Member
Registered: 2007-12-12
Posts: 448

[solved] how to manipulate paragraphs or blocks in a text file?

I have a 30k line file in the following format

<tag1
  data=1>
</tag1>
<tag2
  data=2>
</tag2>
<accessControl inRealm='userA'
                <inSrc address='1.2.3.4'
                        port='0'/>
                <inDst address='0.0.0.0'
                        port='0'/>
</accessControl>
<tag3
  data=3>
</tag3>
<accessControl inRealm='userB'
                <inSrc address='5.6.7.8'
                        port='0'/>
                <inDst address='0.0.0.0'
                        port='0'/>
</accessControl>

I have a second file in the following format

userA,192.168.100.1
userB,192.168.101.1

What I need to do is iterate through the first file and modify each block that begins with "<accessControl inRealm=" and ends with "</accessControl>".  Those are my only markers to determine paragraphs.  There are no empty lines and these blocks appear all over the file (no real pattern that I can see) but they are consistently in that format.

I need to replace the "inDst address='0.0.0.0' of each block with the IP in the second file.  So the 'userB' accessControl section would look like this:

<accessControl inRealm='userB'
                <inSrc address='5.6.7.8'
                        port='0'/>
                <inDst address='192.168.101.1'
                        port='0'/>
</accessControl>

I have whatever tools pacman can install (or AUR) at my disposal and and am not tied to any particular language.

I've been trying to do this in perl, but I'm a real novice... I can break it into paragraphs but I can't really do much more

open(IN,"data.txt") or die "failed:$!";
@file=<IN>;
foreach(@file) {
  if(/accessControl inRealm=/) {$str = $_ ; next;}
  unless(/\<\/accessControl>/) {
     $str.=$_  ;
  } 
  else {
  $str.= $_ ; 
  push @data,$str;
  undef $str;
  }  
  }

I've also been trying in bash but not really progressing because I'm not sure how to split the paragraphs correctly

paragraphs=()

while read line
do
  paragraphs+=("$line")
done < <(cat data.txt)

for paragraph in "${paragraphs[@]}"
do
  # do my stuff here
done

Any advice?  Am I on the right track?  Bash too inefficient for this?  I'm assuming awk could probably do it but my knowledge is rather limited

Last edited by oliver (2013-10-02 01:21:16)

Offline

#2 2013-09-30 22:50:26

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,740

Re: [solved] how to manipulate paragraphs or blocks in a text file?

vim and emacs both do a fine job with XML files and can automatically restructure them.

Also (I've never tried it) but a cursory look through the AUR turned up xmlindent.  It may well do exactly what you need.


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#3 2013-10-01 01:01:09

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: [solved] how to manipulate paragraphs or blocks in a text file?

The file looks like malformed XML (malformed, because it's missing quotes around the attributes, closing tags, etc.). If it's supposed to be proper XML, the right way would be to fix it and use an XML parser in your favorite scripting language to load it, read the other file, and alter the data systematically.

xmllint in the libxml2 package may be useful for checking XML syntax and reformatting.

If you know Python then you could probably use the xml.dom.minidom or one of the other standard XML libraries.

If it is in the format posted then the following script should do what you want, or at least provide a starting point. It's not the most efficient approach but it seems to do the trick.

#!/usr/bin/env perl
use strict;
use warnings;

open (my $fh1, '<', $ARGV[0]) or die "failed to open $ARGV[1]";
open (my $fh2, '<', $ARGV[1]) or die "failed to open $ARGV[2]";

my %user_ips;
foreach my $line (<$fh2>)
{
  my ($user, $ip) = split /,/, $line, 2;
  chomp $ip;
  $user_ips{$user} = $ip;
}

my $old_xml;
{
  local $/;
  $old_xml = <$fh1>;
}
my $new_xml = $old_xml;

while ($old_xml =~ m/(<accessControl\s.*?<\/accessControl>)/sg)
{
  my $old_ac = $1;
  my ($user) = ($old_ac =~ m/inRealm='([^']+)/);
  my $new_ip = $user_ips{$user};
  my $old_ip = quotemeta "0.0.0.0";
  my $new_ac = $old_ac;
  $new_ac =~ s/$old_ip/$new_ip/;
  $old_ac = quotemeta $old_ac;
  $new_xml =~ s/$old_ac/$new_ac/;
}

print $new_xml;

close($fh1);
close($fh2);

Now, if we lay down in the grass and remain very quiet, a wild sed wizard may appear to amaze us with a glorious, arcane one-liner. Remember, if he appears, avoid sudden movements. If you startle him, his expression will break and he'll slink off muttering something about it having worked a minute ago.




p.s. I haven't touched Perl in ages. I almost miss it.

Almost.

Last edited by Xyne (2013-10-02 00:25:12)


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#4 2013-10-01 01:25:03

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,740

Re: [solved] how to manipulate paragraphs or blocks in a text file?

Xyne wrote:

p.s. I haven't touched Perl in ages. I almost miss it.

Perl - The world's finest write only language big_smile


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#5 2013-10-01 01:30:18

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,131

Re: [solved] how to manipulate paragraphs or blocks in a text file?

If the files are file1 and file2:

for i in $(sed -n "s/^.*inRealm='\(.*\)'.*$/\1/p" file1); do j=$(sed -n "s/^${i},\(.*\)$/\1/p" file2); sed -i "/\<accessControl inRealm='${i}'/,/^\<\\accessControl/s/\(inDst address='\)0\.0\.0\.0/\1${j}/" file1; done

Should be:

for i in $(sed -n "s/^.*inRealm='\(.*\)'.*$/\1/p" file1); do j=$(sed -n "s/^${i},\(.*\)$/\1/p" file2); sed -i "/<accessControl inRealm='${i}'/,/^<\/accessControl/s/\(inDst address='\)0\.0\.0\.0/\1${j}/" file1; done

So what does 's/\</...' do?

Last edited by cfr (2013-10-01 21:26:49)


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#6 2013-10-01 02:33:55

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: [solved] how to manipulate paragraphs or blocks in a text file?

cfr wrote:

If the files are file1 and file2:

for i in $(sed -n "s/^.*inRealm='\(.*\)'.*$/\1/p" file1); do j=$(sed -n "s/^${i},\(.*\)$/\1/p" file2); sed -i "/\<accessControl inRealm='${i}'/,/^\<\\accessControl/s/\(inDst address='\)0\.0\.0\.0/\1${j}/" file1; done

SSHHHhhhh shhhh, everyone, get down! Don't move!

Eric, hand me that tranquillizer rifle and a 10 ml dart.
karol, are you documenting this?

Wait, wait, there's an error in the expression.
*puts rifle down*
Let's see how this plays out.

A word of caution to anyone trying the expression: it modifies the file in place.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#7 2013-10-01 13:09:35

oliver
Member
Registered: 2007-12-12
Posts: 448

Re: [solved] how to manipulate paragraphs or blocks in a text file?

thank you all for the code and the comments :-)

It's not really XML.. it's the backup file from a proprietary box that acts as a glorified firewall with tons of virtual IPs that users can connect to.  And while the <tag1 stuff (etc) is obviously an example, it really is in that format (open tags etc.)  Those define the specific user data so I didn't want to include them here.

The bigger picture is that we have to modify each ACL entry to include both a src *and* dst IP hence the request.  If I can modify the file, I can upload it back to the box and "restore" it (this basic procedure is known to work.)  If I can't modify the file, some poor fellow has to log in and make 1000s of manual changes.

Offline

#8 2013-10-01 14:31:07

oliver
Member
Registered: 2007-12-12
Posts: 448

Re: [solved] how to manipulate paragraphs or blocks in a text file?

Xyne wrote:

If it is in the format posted then the following script should do what you want, or at least provide a starting point. It's not the most efficient approach but it seems to do the trick.

This is close :-) but one slight problem... output is:

<accessControl inRealm='userB'
                <inSrc address='5.6.7.8'
                        port='0'/>
                <inDst address='192.168.101.1
'
                        port='0'/>
</accessControl>

Any idea why the closing quote mark of the dst IP is on the next line?

Offline

#9 2013-10-01 14:43:21

oliver
Member
Registered: 2007-12-12
Posts: 448

Re: [solved] how to manipulate paragraphs or blocks in a text file?

cfr wrote:

If the files are file1 and file2:

for i in $(sed -n "s/^.*inRealm='\(.*\)'.*$/\1/p" file1); do j=$(sed -n "s/^${i},\(.*\)$/\1/p" file2); sed -i "/\<accessControl inRealm='${i}'/,/^\<\\accessControl/s/\(inDst address='\)0\.0\.0\.0/\1${j}/" file1; done

This is close too.. but the dst IP is the same for both

<tag1
  data=1>
</tag1>
<tag2
  data=2>
</tag2>
<accessControl inRealm='userA'
                <inSrc address='1.2.3.4'
                        port='0'/>
                <inDst address='192.168.100.1'
                        port='0'/>
</accessControl>
<tag3
  data=3>
</tag3>
<accessControl inRealm='userB'
                <inSrc address='5.6.7.8'
                        port='0'/>
                <inDst address='192.168.100.1'
                        port='0'/>
</accessControl>

Offline

#10 2013-10-01 15:41:02

Awebb
Member
Registered: 2010-05-06
Posts: 6,275

Re: [solved] how to manipulate paragraphs or blocks in a text file?

Xyne wrote:

SSHHHhhhh shhhh, everyone, get down! Don't move!

I know this would be Karol's job to say something like this, but a quick search of the board (with an external search tool and some regex) told me, that 101,3% of all Arch users recommend the use of awk, as soon as a single line with more than one mention of sed is posted. I know, this might be not the actual solution, but I made a habit of processing everything longer than a single line with awk. Mostly because I'm afraid of perl and, of course, because this thread wouldn't turn up under "Posted" if I only subscribed to it.

Offline

#11 2013-10-01 16:19:15

oliver
Member
Registered: 2007-12-12
Posts: 448

Re: [solved] how to manipulate paragraphs or blocks in a text file?

I was playing around with the sed one and I have it working now (I think - proof will be on the real file)

I simplified it a little bit for my own readability and it now looks like this:

for i in $(sed -n "s/^.*inRealm='\(.*\)'.*$/\1/p" file1)
do 
  j=$(sed -n "s/^${i},\(.*\)$/\1/p" file2)
  sed -i "/<accessControl inRealm='${i}'/,/<\/accessControl>/s/\(inDst address='\)0\.0\.0\.0/\1${j}/" file1
done

Thanks again to everyone

Offline

#12 2013-10-02 00:29:44

Xyne
Administrator/PM
Registered: 2008-08-03
Posts: 6,963
Website

Re: [solved] how to manipulate paragraphs or blocks in a text file?

The extra linebreak was due to my omission of "chomp" (good ol' chomp). The posted script should work now.


My Arch Linux StuffForum EtiquetteCommunity Ethos - Arch is not for everyone

Offline

#13 2013-10-02 01:20:44

oliver
Member
Registered: 2007-12-12
Posts: 448

Re: [solved] how to manipulate paragraphs or blocks in a text file?

Yes sir - works perfectly.  Thank you

Offline

Board footer

Powered by FluxBB