TIL: Datamash
 2024-07-18 
 - shell
- stats
I often need to gather rudimentary statistics on a sampling of numbers. For example I recently was working on performance optimizations for a Rust swc plugin I was writing at Stripe. I had a file with numbers like this that represented the number of miliseconds it took to transform a single file:
Given a file stats.txt:
40.478041
40.72755
40.788834
40.847896
40.90488
41.224775
41.336781
41.581588
41.73413
Turns out there’s a GNU tool called datamash that you can install with apt
or brew. You use it like this:
datamash --header-out count 1 min 1 max 1 mean 1 median 1 < ~/stats | column -t
The column part prints it nicely using the tab separators.
The output looks like this:
count(field-1)  min(field-1)  max(field-1)  mean(field-1)    median(field-1)
120             38.635759     69.312647     49.238020183333  46.318816