Benchmarks the concatenation of a large quantity of CSV files using Awk (The One True Awk), csvtk, GoAWK, Miller, qsv, xsv, and a naive custom shell script.
- All of the scripts require GNU Bash.
cat_csv_awkrequires awk.cat_csv_csvtkrequires csvtk.cat_csv_customrequires GNU Bash read, GNU cat, GNU tail, and GNU xargs.cat_csv_goawkrequires goawk.cat_csv_mlrrequires mlr.cat_csv_qsvrequires qsv.cat_csv_xsvrequires xsv.run_testsrequires Hyperfine.
Script generate_test_data creates the number of CSV files specified by argument FILE_COUNT, each containing a single row and column filled with the file number, in the directory specified by argument DATA_PATH. For example:
$ ./generate_test_data
Usage: ./generate_test_data [DATA_PATH] [FILE_COUNT]
$ generate_test_data data 100000
$ find data -name '*.csv' | sort | head -n 5; echo "..."; find data -name '*.csv' | sort | tail -n 5
data/0000001.csv
data/0000002.csv
data/0000003.csv
data/0000004.csv
data/0000005.csv
...
data/0099996.csv
data/0099997.csv
data/0099998.csv
data/0099999.csv
data/0100000.csv
$ cat data/0000001.csv
Column
1