You are not logged in.

#1 2022-08-15 20:32:57

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,595
Website

Pivot table with awk or something to sum replicate values [solved]

I want to take a file structured in two columns and for every unique value in the first column, sum the corresponding value in the 2nd column.

% cat example.txt
10 20
10 20
20 24
20 40
30 40
30 20

The desired output is:

10 40
20 44
30 60

Googling for this isn't turning up anything I can use.  Suggestions are welcomed, awk, perl, doesn't matter.  My goal is to put it into a shell script at the end of the day.

Last edited by graysky (2022-08-16 09:55:19)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#2 2022-08-15 20:51:02

jasonwryan
Anarchist
From: .nz
Registered: 2009-05-09
Posts: 30,424
Website

Re: Pivot table with awk or something to sum replicate values [solved]

I get different ouptput to you, because the 20 total is 64 according to your requirements (as I read them) tongue

 awk '{x[$1]+=$2;}END{for(i in x)print i" "x[i];}' example.txt                                                                                                            
10 40
20 64
30 60

Arch + dwm   •   Mercurial repos  •   Surfraw

Registered Linux User #482438

Offline

#3 2022-08-15 21:52:06

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,595
Website

Re: Pivot table with awk or something to sum replicate values [solved]

Nice, thanks!  Yes, typo there, 44 != 64.


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#4 2022-08-16 09:00:01

Alad
Wiki Admin/IRC Op
From: Bagelstan
Registered: 2014-05-04
Posts: 2,407
Website

Re: Pivot table with awk or something to sum replicate values [solved]

datamash makes this a bit easier:

datamash groupby 1 sum 2

Mods are just community members who have the occasionally necessary option to move threads around and edit posts. -- Trilby

Offline

#5 2022-08-16 09:54:45

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,595
Website

Re: Pivot table with awk or something to sum replicate values [solved]

Thanks Alad, useful util!

Last edited by graysky (2022-08-16 09:55:00)


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#6 2022-08-17 02:28:09

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: Pivot table with awk or something to sum replicate values [solved]

So besides the awk and *interesting* datamash options already shared, this post intrigued me because it's an obvious place for SQL.

Some quick google-fu shows that there are a number of applications that will allow you to run SQL queries directly on text files and a couple of these are already packaged in the AUR. (Assuming you can work with csv/tsv for this, though some do allow for json and other formats, quite informative write up about the options here)

I tried textql first which is based on SQLite and could read from a .txt extension file without issue. It also handled spaces in the csv file which caused other of the programs to treat the second column as a text.

~ $ cat example.txt
a,b
10, 20
10, 20
20, 24
20, 40
30, 40
30, 20

~ $ textql -header -sql "SELECT a, sum(b), count(*) from example GROUP BY a" example.txt
10,40,2
20,64,2
30,60,2

I also tried octosql also packaged in aur which allows for more file formats (via plugins, by default only csv and JSON) and also supports joins! But it didn't seamlessly handle the spaces in the second column like textql (I didn't look further into casting than a simple try that also failed).

~ $ cat example.csv
a,b
10,20
10,20
20,24
20,40
30,40
30,20

~ $ octosql "SELECT a, SUM(b), COUNT(*) FROM example.csv GROUP BY a"
+----+-------+-------+
| a  | sum_b | count |
+----+-------+-------+
| 10 |    40 |     2 |
| 20 |    64 |     2 |
| 30 |    60 |     2 |
+----+-------+-------+

~ $ cat example_spaces.csv
a,b
10, 20
10, 20
20, 24
20, 40
30, 40
30, 20

~ $ octosql "SELECT a, SUM(b), COUNT(*) FROM example_spaces.csv GROUP BY a"
...
Error: typecheck error: unknown aggregate: sum(String)

Octosql uses a custom engine, not SQLite which may explain some of the differences, oh and there's also this tidbit when first run,

OctoSQL sends anonymous usage statistics to help us guide the development of OctoSQL.
You can view the most recently sent usage events in the ~/.octosql/telemetry/recent directory.
You can turn telemetry off by setting the environment variable OCTOSQL_NO_TELEMETRY to 1.
Please don't though, as it helps us a great deal.

Last edited by CarbonChauvinist (2022-08-17 02:32:17)


"the wind-blown way, wanna win? don't play"

Offline

#7 2022-08-17 02:42:57

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,449
Website

Re: Pivot table with awk or something to sum replicate values [solved]

I'm not sure what textql really does here as sqlite will import flat files just fine:

# Assuming example.txt has added headers "A B" on the first line
printf '.import example.txt d\nselect A, sum(B) from d group by A;' | sqlite3

"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

#8 2022-08-17 02:51:25

CarbonChauvinist
Member
Registered: 2012-06-16
Posts: 412
Website

Re: Pivot table with awk or something to sum replicate values [solved]

@Trilby, interesting, is that importing the data though and not just querying the flat file directly (i.e what does the 'import' mean there)? Either way, I really need to play more with SQLite.


"the wind-blown way, wanna win? don't play"

Offline

#9 2022-08-17 03:00:58

Trilby
Inspector Parrot
Registered: 2011-11-29
Posts: 29,449
Website

Re: Pivot table with awk or something to sum replicate values [solved]

There is no difference betwee "importing the data" and "querying the flat file directly" ... or at least the latter doesn't exist.  All the tools you refer to import the flat file into a in-memory sqlite database; that's the only way any sql queries could run.  The tools you refer to just wrap sqlite3 and do the import for you.  You may as well use sqlite3 directly where you have more direct control of what is happening (and you don't need to build additional packages from the AUR).

Last edited by Trilby (2022-08-17 03:02:21)


"UNIX is simple and coherent..." - Dennis Ritchie, "GNU's Not UNIX" -  Richard Stallman

Offline

Board footer

Powered by FluxBB