Skip to content

Inknyto/cawk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7,665 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cawk — columned awk

cawk (columned-awk) is a fork of GNU gawk that makes working with CSV files genuinely pleasant. It extends gawk's CSV mode (--csv / -k) so you can address columns by their header name ($amount instead of $3) and use any single-character delimiter (;, tab, |, …) while keeping RFC-style quoting.

Everything else is plain gawk: cawk is a strict superset of gawk built on GNU Awk 5.4, so every existing awk program, flag, and extension keeps working exactly as before. The new behaviour only activates in CSV mode.

# Sum the "amount" column for rows where "customer_id" is 5 — by name, not index
gawk --csv '$customer_id == 5 { sum += $amount } END { print sum }' sales.csv

Why

Classic awk forces you to count columns:

# stock gawk: which column was the price again?
gawk -F, 'NR>1 { total += $7 }' orders.csv

That breaks the moment a column is added, removed, or reordered. cawk lets the file's own header label the columns for you:

# cawk: read the header, refer to columns by name
gawk --csv 'FNR>1 { total += $price } END { print total }' orders.csv

Features

1. Access columns by name

When CSV mode is on, the first record of each file is treated as a header. For every column, cawk defines an awk variable named after the header whose value is that column's number. Because $expr in awk uses the numeric value of expr as a field index, $name just works:

name,age
Ada,36
Lin,29
$ gawk --csv '{ print $name }' people.csv
name      # the header row is still seen by your rules
Ada
Lin

$ gawk --csv 'FNR>1 { print $name }' people.csv   # skip the header
Ada
Lin

The header row is not consumed — it is still passed to your rules, so you choose whether to skip it (FNR>1) or use it.

2. Any single-character delimiter

Stock gawk's --csv is locked to a comma. cawk takes the delimiter from FS, so semicolon-, tab-, or pipe-separated files work in CSV mode too — with quoting still honoured:

gawk -kF';'  '{ print $name, $age }' people.scsv     # semicolon
gawk -F'\t' --csv '{ print $1, $2 }' data.tsv        # tab
  • A default FS (unset) keeps the RFC-compliant comma.
  • FIELDWIDTHS and FPAT still have no effect in CSV mode (they are not CSV concepts) and warn if set.
  • A multi-character or regex FS cannot be a CSV delimiter; cawk warns once and falls back to a comma.

3. Headers that aren't valid variable names are sanitized (with a warning)

Header text that can't be an awk identifier is converted to one and a warning is printed to stderr so you know the name you must use:

Header Variable Reason
first name $first_name spaces → _
a-b $a_b -_
2020 $_2020 leading digit gets a _
(empty) $_ empty header

4. Name collisions are resolved, never clobbered

If a header name would overwrite a variable that already has a value (a -v assignment, a special variable like NF, an array, or a function), cawk prefixes the column variable with _ instead, and warns:

$ gawk -v amount=0 --csv '...' sales.csv
# header "amount" collides with -v amount -> use $_amount for the column

A name that is merely referenced in your program (and never assigned) is not a collision, which is exactly why gawk -k '{ print $a }' works.

5. Duplicate headers get a numbered suffix

When the same header appears more than once, the first keeps the bare name and the rest get a trailing _1, _2, … :

a,b,b
1,2,2
1,5,9
$ gawk -k '{ print $b }'    # first "b"  -> column 2
$ gawk -k '{ print $b_1 }'  # second "b" -> column 3

The two schemes compose: a _-prefixed name that also collides becomes _b_1.

6. Per-file headers

With several files, each file's first row is its own header, and the variables from the previous file are cleared when the next one starts:

gawk -k '{ print FILENAME, $id }' january.csv february.csv

If february.csv lacks a column that january.csv had, that name reads as empty rather than silently pointing at a stale column.


Building

cawk builds exactly like gawk:

./configure
make
make check          # runs the full gawk test suite
sudo make install   # optional

The resulting gawk binary is cawk. (The project keeps the gawk program name for drop-in compatibility.)


Quick reference

Enable CSV mode --csv or -k
Choose delimiter -F';', -F'\t', … (single character)
Refer to a column $header_name
Skip the header row guard rules with FNR>1 (per file) or NR>1
Duplicate column n $name, $name_1, $name_2, …
Collision with a variable column is $_name (warned)

Relationship to gawk

cawk tracks GNU Awk 5.4. The CSV column features are additive and confined to CSV mode; all other behaviour is upstream gawk. For the full language and feature reference, see the gawk manual.

License

cawk is free software under the GNU General Public License v3 or later, the same license as gawk. See COPYING.

This is a derivative of GNU Awk, Copyright © the Free Software Foundation, Inc.

Releases

No releases published

Packages

 
 
 

Contributors