Combining three colums with frequency data to one (summed) column

ibendiben · 2008-05-01 12:47:37

I have this list:

John  30    Ann   4     John   12
Ann   13    Bert  8     Ann    20
Bert  40    James 23    James  12
Jimmy 3     Clara 17    Jimmy  34
Clara 32    Mary  20    Claude 27

And I want to format it like this:

John  30                John   12
Ann   13    Ann   4     Ann    20
Bert  40    Bert  8
Jimmy 3                 Jimmy  34
            James 23    James  12
            Mary  20
Clara 32    Clara 17
                        Claude 27

So I can then easily form a total name list plus the sum column of the frequencies:

John      42
Ann       37
Bert      48
Jimmy     37
James     35
Mary      20
Clara     49
Claude    27

I hope this explains my problem well enough. I wonder if anyone has a good idea to make this happen automated to a big list (5000 entry's). I have excel and think I can use it for this job. Either an excel, or an alternate linux solution would be greatly appreciated.
Any ideas, steps, thinkering on this welcome too...
Thanks!

Last edited by ibendiben (2008-05-01 12:53:23)

ghostHack · 2008-05-01 13:39:29

Well, if your ultimate goal is to get the final list you could use a perl script like this:

#!/usr/bin/perl
#

open FILE,"data";
while(<FILE>){
    chomp;
    @line = split /\s+/,$_;
    # test for even number of items
    if ( scalar(@line) % 2 != 0 ) { die "Problem with '$_'\n" }
    while (@line) {
    $name = shift @line; $freq = shift @line; # get name and frequency
    if ( defined($freqs{$name}) ) { 
        # if we have found name before then increment value
        $freqs{$name} += $freq 
    } 
    else { 
        $freqs{$name} = $freq;
    }
    }
}
close FILE;

# output to STDOUT
foreach ( sort keys %freqs ) {
    printf "%10s %3d\n",$_, $freqs{$_};
}

This assumes that your data is in a plain text, space separated, file called 'data' and there are no extra lines in it (e.g. column titles etc). If you do have extra lines its fairly straightforward to allow for them.

If you wanted the intermediate table then the above script would need a bit of modification.

Note, This is a fairly basic script and it does little error checking.

ibendiben · 2008-05-01 14:30:57

Thanks alot, I'm gonna try it soon, can you edit it so I can use columns separated by tabs? (in my columns are sometimes spaced names)

ibendiben · 2008-05-01 14:33:28

I suppose \s=space, and \t=tab

ibendiben · 2008-05-01 15:03:47

Brilliant! Works great!

ghostHack · 2008-05-01 17:20:18

ibendiben wrote:

I suppose \s=space, and \t=tab

In the pattern match (used in split) \s is any whitespace character \s+ is one or more consecutive whitespace characters. As you have found \t is specifically a tab whitespace.

ibendiben wrote:

Brilliant! Works great!

Glad to help

Arch Linux

#1 2008-05-01 12:47:37

Combining three colums with frequency data to one (summed) column

#2 2008-05-01 13:39:29

Re: Combining three colums with frequency data to one (summed) column

#3 2008-05-01 14:30:57

Re: Combining three colums with frequency data to one (summed) column

#4 2008-05-01 14:33:28

Re: Combining three colums with frequency data to one (summed) column

#5 2008-05-01 15:03:47

Re: Combining three colums with frequency data to one (summed) column

#6 2008-05-01 17:20:18

Re: Combining three colums with frequency data to one (summed) column

Board footer