# Assuming example.txt has added headers "A B" on the first line
printf '.import example.txt d\nselect A, sum(B) from d group by A;' | sqlite3
Some quick google-fu shows that there are a number of applications that will allow you to run SQL queries directly on text files and a couple of these are already packaged in the AUR. (Assuming you can work with csv/tsv for this, though some do allow for json and other formats, quite informative write up about the options here)
I tried textql first which is based on SQLite and could read from a .txt extension file without issue. It also handled spaces in the csv file which caused other of the programs to treat the second column as a text.
~ $ cat example.txt
a,b
10, 20
10, 20
20, 24
20, 40
30, 40
30, 20
~ $ textql -header -sql "SELECT a, sum(b), count(*) from example GROUP BY a" example.txt
10,40,2
20,64,2
30,60,2
I also tried octosql also packaged in aur which allows for more file formats (via plugins, by default only csv and JSON) and also supports joins! But it didn't seamlessly handle the spaces in the second column like textql (I didn't look further into casting than a simple try that also failed).
~ $ cat example.csv
a,b
10,20
10,20
20,24
20,40
30,40
30,20
~ $ octosql "SELECT a, SUM(b), COUNT(*) FROM example.csv GROUP BY a"
+----+-------+-------+
| a | sum_b | count |
+----+-------+-------+
| 10 | 40 | 2 |
| 20 | 64 | 2 |
| 30 | 60 | 2 |
+----+-------+-------+
~ $ cat example_spaces.csv
a,b
10, 20
10, 20
20, 24
20, 40
30, 40
30, 20
~ $ octosql "SELECT a, SUM(b), COUNT(*) FROM example_spaces.csv GROUP BY a"
...
Error: typecheck error: unknown aggregate: sum(String)
Octosql uses a custom engine, not SQLite which may explain some of the differences, oh and there's also this tidbit when first run,
OctoSQL sends anonymous usage statistics to help us guide the development of OctoSQL.
You can view the most recently sent usage events in the ~/.octosql/telemetry/recent directory.
You can turn telemetry off by setting the environment variable OCTOSQL_NO_TELEMETRY to 1.
Please don't though, as it helps us a great deal.
datamash groupby 1 sum 2
awk '{x[$1]+=$2;}END{for(i in x)print i" "x[i];}' example.txt
10 40
20 64
30 60
% cat example.txt
10 20
10 20
20 24
20 40
30 40
30 20
The desired output is:
10 40
20 44
30 60
Googling for this isn't turning up anything I can use. Suggestions are welcomed, awk, perl, doesn't matter. My goal is to put it into a shell script at the end of the day.
]]>