cawk (columned-awk) is a fork of GNU gawk
that makes working with CSV files genuinely pleasant. It extends gawk's CSV
mode (--csv / -k) so you can address columns by their header name
($amount instead of $3) and use any single-character delimiter
(;, tab, |, …) while keeping RFC-style quoting.
Everything else is plain gawk: cawk is a strict superset of gawk built on GNU Awk 5.4, so every existing awk program, flag, and extension keeps working exactly as before. The new behaviour only activates in CSV mode.
# Sum the "amount" column for rows where "customer_id" is 5 — by name, not index
gawk --csv '$customer_id == 5 { sum += $amount } END { print sum }' sales.csvClassic awk forces you to count columns:
# stock gawk: which column was the price again?
gawk -F, 'NR>1 { total += $7 }' orders.csvThat breaks the moment a column is added, removed, or reordered. cawk lets the file's own header label the columns for you:
# cawk: read the header, refer to columns by name
gawk --csv 'FNR>1 { total += $price } END { print total }' orders.csvWhen CSV mode is on, the first record of each file is treated as a header.
For every column, cawk defines an awk variable named after the header whose
value is that column's number. Because $expr in awk uses the numeric value of
expr as a field index, $name just works:
name,age
Ada,36
Lin,29
$ gawk --csv '{ print $name }' people.csv
name # the header row is still seen by your rules
Ada
Lin
$ gawk --csv 'FNR>1 { print $name }' people.csv # skip the header
Ada
LinThe header row is not consumed — it is still passed to your rules, so you
choose whether to skip it (FNR>1) or use it.
Stock gawk's --csv is locked to a comma. cawk takes the delimiter from FS,
so semicolon-, tab-, or pipe-separated files work in CSV mode too — with
quoting still honoured:
gawk -kF';' '{ print $name, $age }' people.scsv # semicolon
gawk -F'\t' --csv '{ print $1, $2 }' data.tsv # tab- A default
FS(unset) keeps the RFC-compliant comma. FIELDWIDTHSandFPATstill have no effect in CSV mode (they are not CSV concepts) and warn if set.- A multi-character or regex
FScannot be a CSV delimiter; cawk warns once and falls back to a comma.
Header text that can't be an awk identifier is converted to one and a warning is printed to stderr so you know the name you must use:
| Header | Variable | Reason |
|---|---|---|
first name |
$first_name |
spaces → _ |
a-b |
$a_b |
- → _ |
2020 |
$_2020 |
leading digit gets a _ |
| (empty) | $_ |
empty header |
If a header name would overwrite a variable that already has a value (a -v
assignment, a special variable like NF, an array, or a function), cawk
prefixes the column variable with _ instead, and warns:
$ gawk -v amount=0 --csv '...' sales.csv
# header "amount" collides with -v amount -> use $_amount for the columnA name that is merely referenced in your program (and never assigned) is not
a collision, which is exactly why gawk -k '{ print $a }' works.
When the same header appears more than once, the first keeps the bare name and
the rest get a trailing _1, _2, … :
a,b,b
1,2,2
1,5,9
$ gawk -k '{ print $b }' # first "b" -> column 2
$ gawk -k '{ print $b_1 }' # second "b" -> column 3The two schemes compose: a _-prefixed name that also collides becomes
_b_1.
With several files, each file's first row is its own header, and the variables from the previous file are cleared when the next one starts:
gawk -k '{ print FILENAME, $id }' january.csv february.csvIf february.csv lacks a column that january.csv had, that name reads as
empty rather than silently pointing at a stale column.
cawk builds exactly like gawk:
./configure
make
make check # runs the full gawk test suite
sudo make install # optionalThe resulting gawk binary is cawk. (The project keeps the gawk program
name for drop-in compatibility.)
| Enable CSV mode | --csv or -k |
| Choose delimiter | -F';', -F'\t', … (single character) |
| Refer to a column | $header_name |
| Skip the header row | guard rules with FNR>1 (per file) or NR>1 |
| Duplicate column n | $name, $name_1, $name_2, … |
| Collision with a variable | column is $_name (warned) |
cawk tracks GNU Awk 5.4. The CSV column features are additive and confined to CSV mode; all other behaviour is upstream gawk. For the full language and feature reference, see the gawk manual.
cawk is free software under the GNU General Public License v3 or later,
the same license as gawk. See COPYING.
This is a derivative of GNU Awk, Copyright © the Free Software Foundation, Inc.