You are not logged in.

#1 2008-05-01 12:47:37

ibendiben
Member
Registered: 2007-10-10
Posts: 519

Combining three colums with frequency data to one (summed) column

I have this list:

John  30    Ann   4     John   12
Ann   13    Bert  8     Ann    20
Bert  40    James 23    James  12
Jimmy 3     Clara 17    Jimmy  34
Clara 32    Mary  20    Claude 27

And I want to format it like this:

John  30                John   12
Ann   13    Ann   4     Ann    20
Bert  40    Bert  8
Jimmy 3                 Jimmy  34
            James 23    James  12
            Mary  20
Clara 32    Clara 17
                        Claude 27

So I can then easily form a total name list plus the sum column of the frequencies:

John      42
Ann       37
Bert      48
Jimmy     37
James     35
Mary      20
Clara     49
Claude    27

I hope this explains my problem well enough. I wonder if anyone has a good idea to make this happen automated to a big list (5000 entry's). I have excel and think I can use it for this job. Either an excel, or an alternate linux solution would be greatly appreciated.
Any ideas, steps, thinkering on this welcome too...
Thanks!

Last edited by ibendiben (2008-05-01 12:53:23)

Offline

#2 2008-05-01 13:39:29

ghostHack
Member
From: Bristol UK
Registered: 2008-02-29
Posts: 261

Re: Combining three colums with frequency data to one (summed) column

Well, if your ultimate goal is to get the final list you could use a perl script like this:

#!/usr/bin/perl
#

open FILE,"data";
while(<FILE>){
    chomp;
    @line = split /\s+/,$_;
    # test for even number of items
    if ( scalar(@line) % 2 != 0 ) { die "Problem with '$_'\n" }
    while (@line) {
    $name = shift @line; $freq = shift @line; # get name and frequency
    if ( defined($freqs{$name}) ) { 
        # if we have found name before then increment value
        $freqs{$name} += $freq 
    } 
    else { 
        $freqs{$name} = $freq;
    }
    }
}
close FILE;

# output to STDOUT
foreach ( sort keys %freqs ) {
    printf "%10s %3d\n",$_, $freqs{$_};
}

This assumes that your data is in a plain text, space separated, file called 'data' and there are no extra lines in it (e.g. column titles etc).  If you do have extra lines its fairly straightforward to allow for them. 

If you wanted the intermediate table then the above script would need a bit of modification.

Note, This is a fairly basic script and it does little error checking.

Offline

#3 2008-05-01 14:30:57

ibendiben
Member
Registered: 2007-10-10
Posts: 519

Re: Combining three colums with frequency data to one (summed) column

Thanks alot, I'm gonna try it soon, can you edit it so I can use columns separated by tabs? (in my columns are sometimes spaced names)

Offline

#4 2008-05-01 14:33:28

ibendiben
Member
Registered: 2007-10-10
Posts: 519

Re: Combining three colums with frequency data to one (summed) column

I suppose \s=space, and \t=tab

Offline

#5 2008-05-01 15:03:47

ibendiben
Member
Registered: 2007-10-10
Posts: 519

Re: Combining three colums with frequency data to one (summed) column

Brilliant! Works great!

Offline

#6 2008-05-01 17:20:18

ghostHack
Member
From: Bristol UK
Registered: 2008-02-29
Posts: 261

Re: Combining three colums with frequency data to one (summed) column

ibendiben wrote:

I suppose \s=space, and \t=tab

In the pattern match (used in split) \s is any whitespace character \s+ is one or more consecutive whitespace characters.  As you have found \t is specifically a tab whitespace.

ibendiben wrote:

Brilliant! Works great!

Glad to help smile

Offline

Board footer

Powered by FluxBB