You are not logged in.

#1 2008-06-27 21:49:29

delacruz
Member
From: /home/houston
Registered: 2007-12-09
Posts: 102

awk database manipulation

Hello I need some help in manipulating some data i have.   I think the best way to do this is using awk.  im not to familiar with python or other langauges. sad

example of some data i have

database_01.txt

01 01 01 01
02 02 02 02
03 03 03 03
04 04 04 04

database_02.txt

05 05 05
06 06 06
07 07 07

database_03.txt

08 08 08 08 09
09 09 09 09 09
10 10 10 10 10
11 11 11 11 11
12 12 12 12 12

i need that data to look like the following

newdatabase.txt

01 01 01 01 05 05 05 08 08 08 08 08
02 02 02 02 06 06 06 09 09 09 09 09
03 03 03 03 07 07 07 10 10 10 10 10
04 04 04 04               11 11 11 11 11
                                12 12 12 12 12

In other words i need to append "sideways" not downwards.  I found this post but im not 100% sure if that will work in my case.  i can not get it to work.  any ideas?  is what im looking in doing called "transposition"?

Offline

#2 2008-06-27 22:44:50

peets
Member
From: Montreal
Registered: 2007-01-11
Posts: 936
Website

Re: awk database manipulation

heh. I started writing up a little perl script for you, because I like to do those things. I need to know some more stuff though:
1. do all columns in the same file have the same width (number of characters)?
2. do all columns across all files have the same width?

I have no idea how this could be done in awk. If you're willing to spend the effort to learn awk, you might as well learn perl: http://perldoc.perl.org

Offline

#3 2008-06-27 22:46:42

ghostHack
Member
From: Bristol UK
Registered: 2008-02-29
Posts: 261

Re: awk database manipulation

You're going to have difficulty keeping the alignment with awk, try this perl program

#!/usr/bin/perl
#

$num_files = scalar @ARGV;
$filenum=0; $maxlines=0;

# read each file and count lines in each
foreach $file ( @ARGV ){
    $filenum++;
    open F,$file;
    $n=0; $maxlength[$filenum] =0;
    while (<F>){
    $n++;
    $maxlength[$filenum] = ($maxlength[$filenum] < length($_) ) ? length($_) : $maxlength[$filenum];
    chomp;
    $filedata[$filenum][$n]=$_;
    }
    $maxlines = ( $n>$maxlines ) ? $n : $maxlines;
    close F;
}

# write out lines sequentially, adding blank space when no more lines
for ( $line=1;$line<=$maxlines;$line++ ) {
    for ( $i=1;$i<=$num_files;$i++) {
    if ( defined($filedata[$i][$line]) ) { print "$filedata[$i][$line] " }
    else { print " "x$maxlength[$i] }
    }
    print "\n";
}

This takes the filenames you supply it and outputs to stdout. e.g:

 ./file_append.pl database_01.txt database_02.txt database_03.txt 
01 01 01 01 05 05 05 08 08 08 08 09 
02 02 02 02 06 06 06 09 09 09 09 09 
03 03 03 03 07 07 07 10 10 10 10 10 
04 04 04 04          11 11 11 11 11 
                     12 12 12 12 12

Offline

#4 2008-06-27 22:49:24

delacruz
Member
From: /home/houston
Registered: 2007-12-09
Posts: 102

Re: awk database manipulation

peets wrote:

1. do all columns in the same file have the same width (number of characters)?
2. do all columns across all files have the same width?

question 1  --->  yes
question 2 ----> no

thanks for the help.  i will take a look at learning some perl


EDIT:  ghostHack: Just tried your script works very well!!!! thanks

Last edited by delacruz (2008-06-27 22:54:46)

Offline

#5 2008-06-27 22:58:21

ghostHack
Member
From: Bristol UK
Registered: 2008-02-29
Posts: 261

Re: awk database manipulation

Glad you liked it smile  If you know what the output filename is you could extend it so that it prints the output directly to the file rather than stdout.  Perl is a really useful langauge for stuff like this.

Offline

#6 2008-06-27 23:19:29

lucke
Member
From: Poland
Registered: 2004-11-30
Posts: 4,018

Re: awk database manipulation

Heh, wanted to implement that in ruby.

Offline

#7 2008-06-27 23:54:23

peets
Member
From: Montreal
Registered: 2007-01-11
Posts: 936
Website

Re: awk database manipulation

well done ghostHack! Hooray for perl!

Offline

#8 2008-06-28 01:10:30

Procyon
Member
Registered: 2008-05-07
Posts: 1,819

Re: awk database manipulation

If you can get how many spaces you need from the file, e.g. with $(cat file1 | wc -L), and convert that into actual spaces (I can't figure out how to do that), you could do it like this. But you'll have to make it bigger for more files (will get complicated FAST) and there are probably unforeseen bugs. So just for fun.

--> paste -d'|' file1 file2 file3 | sed -e 's/^|/           |/;s/||/|        |/g;s/|/ /g'
01 01 01 01 05 05 05 08 08 08 08 09
02 02 02 02 06 06 06 09 09 09 09 09
03 03 03 03 07 07 07 10 10 10 10 10
04 04 04 04          11 11 11 11 11
                     12 12 12 12 12

Last edited by Procyon (2008-06-28 01:12:01)

Offline

#9 2008-06-30 19:23:08

delacruz
Member
From: /home/houston
Registered: 2007-12-09
Posts: 102

Re: awk database manipulation

im really new to perl and i am having a hard time understanding/reading ghostHack code.

$num_files = scalar @ARGV;

what does "scalar @ARGV" mean?


    open F,$file;

what is this F and $file ?


    
while (<F>){
    
    $maxlength[$filenum] = ($maxlength[$filenum] < length($_) ) ? length($_) : $maxlength[$filenum];
    chomp;
    $filedata[$filenum][$n]=$_;

what does "?" mean what does ":" mean and what does "$_" mean. 



I think understanding what is above I can understand the rest of the code.  I have been looking at tutorials on perl but they are all mostly basic basic perl stuff.

Offline

#10 2008-06-30 19:48:17

lucke
Member
From: Poland
Registered: 2004-11-30
Posts: 4,018

Re: awk database manipulation

My perl is rusty, but I'll give it a go.

scalar @ARGV returns a number of arguments passed on the command line.

F is a file descriptor, used to describe a file, $file holds a name of a file (taken from a command line).

"expression ? a : b" is a so called ternary operator (if expression is true then a, if not then b)

$_ is a "default variable", it's used when you pass some value without explicitly specyfing a named variable.

Offline

#11 2008-06-30 19:57:29

Daenyth
Forum Fellow
From: Boston, MA
Registered: 2008-02-24
Posts: 1,244

Re: awk database manipulation

$file is set to each argument through the foreach loop. F is the filehandle. The ?: construction is sortof a shortcut for if/else. test ? do_this_if_true : do_this_if_false. $_ is the "default" variable. In this case, it contains the line read from the file, inside the while loop.

Last edited by Daenyth (2008-06-30 20:04:12)

Offline

#12 2008-06-30 20:03:13

delacruz
Member
From: /home/houston
Registered: 2007-12-09
Posts: 102

Re: awk database manipulation

lucke ur perl isnt at all bad! i understood everything.   thanks

EDIT: Daenyth thanks for info all makes sense now

Last edited by delacruz (2008-06-30 20:11:07)

Offline

#13 2008-06-30 20:05:06

Daenyth
Forum Fellow
From: Boston, MA
Registered: 2008-02-24
Posts: 1,244

Re: awk database manipulation

sarnath'd sad

Offline

#14 2008-06-30 21:24:40

delacruz
Member
From: /home/houston
Registered: 2007-12-09
Posts: 102

Re: awk database manipulation

    else { print " "x$maxlength[$i] }

i understand everything expect this line.  what does the line mean?  I know it puts empty spaces but i do not understand why.

Offline

#15 2008-06-30 21:44:15

lucke
Member
From: Poland
Registered: 2004-11-30
Posts: 4,018

Re: awk database manipulation

"x" repeats the string n times.

Last edited by lucke (2008-06-30 21:47:39)

Offline

Board footer

Powered by FluxBB