You are not logged in.
I have this list:
John 30 Ann 4 John 12
Ann 13 Bert 8 Ann 20
Bert 40 James 23 James 12
Jimmy 3 Clara 17 Jimmy 34
Clara 32 Mary 20 Claude 27
And I want to format it like this:
John 30 John 12
Ann 13 Ann 4 Ann 20
Bert 40 Bert 8
Jimmy 3 Jimmy 34
James 23 James 12
Mary 20
Clara 32 Clara 17
Claude 27
So I can then easily form a total name list plus the sum column of the frequencies:
John 42
Ann 37
Bert 48
Jimmy 37
James 35
Mary 20
Clara 49
Claude 27
I hope this explains my problem well enough. I wonder if anyone has a good idea to make this happen automated to a big list (5000 entry's). I have excel and think I can use it for this job. Either an excel, or an alternate linux solution would be greatly appreciated.
Any ideas, steps, thinkering on this welcome too...
Thanks!
Last edited by ibendiben (2008-05-01 12:53:23)
Offline
Well, if your ultimate goal is to get the final list you could use a perl script like this:
#!/usr/bin/perl
#
open FILE,"data";
while(<FILE>){
chomp;
@line = split /\s+/,$_;
# test for even number of items
if ( scalar(@line) % 2 != 0 ) { die "Problem with '$_'\n" }
while (@line) {
$name = shift @line; $freq = shift @line; # get name and frequency
if ( defined($freqs{$name}) ) {
# if we have found name before then increment value
$freqs{$name} += $freq
}
else {
$freqs{$name} = $freq;
}
}
}
close FILE;
# output to STDOUT
foreach ( sort keys %freqs ) {
printf "%10s %3d\n",$_, $freqs{$_};
}
This assumes that your data is in a plain text, space separated, file called 'data' and there are no extra lines in it (e.g. column titles etc). If you do have extra lines its fairly straightforward to allow for them.
If you wanted the intermediate table then the above script would need a bit of modification.
Note, This is a fairly basic script and it does little error checking.
Offline
Thanks alot, I'm gonna try it soon, can you edit it so I can use columns separated by tabs? (in my columns are sometimes spaced names)
Offline
I suppose \s=space, and \t=tab
Offline
Brilliant! Works great!
Offline
I suppose \s=space, and \t=tab
In the pattern match (used in split) \s is any whitespace character \s+ is one or more consecutive whitespace characters. As you have found \t is specifically a tab whitespace.
Brilliant! Works great!
Glad to help
Offline