cawk — columned awk

cawk (columned-awk) is a fork of GNU gawk that makes working with CSV files genuinely pleasant. It extends gawk's CSV mode (--csv / -k) so you can address columns by their header name ($amount instead of $3) and use any single-character delimiter (;, tab, |, …) while keeping RFC-style quoting.

Everything else is plain gawk: cawk is a strict superset of gawk built on GNU Awk 5.4, so every existing awk program, flag, and extension keeps working exactly as before. The new behaviour only activates in CSV mode.

# Sum the "amount" column for rows where "customer_id" is 5 — by name, not index
gawk --csv '$customer_id == 5 { sum += $amount } END { print sum }' sales.csv

Why

Classic awk forces you to count columns:

# stock gawk: which column was the price again?
gawk -F, 'NR>1 { total += $7 }' orders.csv

That breaks the moment a column is added, removed, or reordered. cawk lets the file's own header label the columns for you:

# cawk: read the header, refer to columns by name
gawk --csv 'FNR>1 { total += $price } END { print total }' orders.csv

Features

1. Access columns by name

When CSV mode is on, the first record of each file is treated as a header. For every column, cawk defines an awk variable named after the header whose value is that column's number. Because $expr in awk uses the numeric value of expr as a field index, $name just works:

name,age
Ada,36
Lin,29

$ gawk --csv '{ print $name }' people.csv
name      # the header row is still seen by your rules
Ada
Lin

$ gawk --csv 'FNR>1 { print $name }' people.csv   # skip the header
Ada
Lin

The header row is not consumed — it is still passed to your rules, so you choose whether to skip it (FNR>1) or use it.

2. Any single-character delimiter

Stock gawk's --csv is locked to a comma. cawk takes the delimiter from FS, so semicolon-, tab-, or pipe-separated files work in CSV mode too — with quoting still honoured:

gawk -kF';'  '{ print $name, $age }' people.scsv     # semicolon
gawk -F'\t' --csv '{ print $1, $2 }' data.tsv        # tab

A default FS (unset) keeps the RFC-compliant comma.
FIELDWIDTHS and FPAT still have no effect in CSV mode (they are not CSV concepts) and warn if set.
A multi-character or regex FS cannot be a CSV delimiter; cawk warns once and falls back to a comma.

3. Headers that aren't valid variable names are sanitized (with a warning)

Header text that can't be an awk identifier is converted to one and a warning is printed to stderr so you know the name you must use:

Header	Variable	Reason
`first name`	`$first_name`	spaces → `_`
`a-b`	`$a_b`	`-` → `_`
`2020`	`$_2020`	leading digit gets a `_`
(empty)	`$_`	empty header

4. Name collisions are resolved, never clobbered

If a header name would overwrite a variable that already has a value (a -v assignment, a special variable like NF, an array, or a function), cawk prefixes the column variable with _ instead, and warns:

$ gawk -v amount=0 --csv '...' sales.csv
# header "amount" collides with -v amount -> use $_amount for the column

A name that is merely referenced in your program (and never assigned) is not a collision, which is exactly why gawk -k '{ print $a }' works.

5. Duplicate headers get a numbered suffix

When the same header appears more than once, the first keeps the bare name and the rest get a trailing _1, _2, … :

a,b,b
1,2,2
1,5,9

$ gawk -k '{ print $b }'    # first "b"  -> column 2
$ gawk -k '{ print $b_1 }'  # second "b" -> column 3

The two schemes compose: a _-prefixed name that also collides becomes _b_1.

6. Per-file headers

With several files, each file's first row is its own header, and the variables from the previous file are cleared when the next one starts:

gawk -k '{ print FILENAME, $id }' january.csv february.csv

If february.csv lacks a column that january.csv had, that name reads as empty rather than silently pointing at a stale column.

Building

cawk builds exactly like gawk:

./configure
make
make check          # runs the full gawk test suite
sudo make install   # optional

The resulting gawk binary is cawk. (The project keeps the gawk program name for drop-in compatibility.)

Quick reference


Enable CSV mode	`--csv` or `-k`
Choose delimiter	`-F';'`, `-F'\t'`, … (single character)
Refer to a column	`$header_name`
Skip the header row	guard rules with `FNR>1` (per file) or `NR>1`
Duplicate column n	`$name`, `$name_1`, `$name_2`, …
Collision with a variable	column is `$_name` (warned)

Relationship to gawk

cawk tracks GNU Awk 5.4. The CSV column features are additive and confined to CSV mode; all other behaviour is upstream gawk. For the full language and feature reference, see the gawk manual.

License

cawk is free software under the GNU General Public License v3 or later, the same license as gawk. See COPYING.

Name		Name	Last commit message	Last commit date
Latest commit History 7,665 Commits
README_d		README_d
awklib		awklib
build-aux		build-aux
doc		doc
extension		extension
extras		extras
helpers		helpers
m4		m4
missing_d		missing_d
old-extension		old-extension
patches		patches
pc		pc
po		po
posix		posix
support		support
test		test
vms		vms
.gitignore		.gitignore
ABOUT-NLS		ABOUT-NLS
AUTHORS		AUTHORS
COPYING		COPYING
ChangeLog		ChangeLog
ChangeLog.0		ChangeLog.0
ChangeLog.1		ChangeLog.1
Checklist		Checklist
INSTALL		INSTALL
MAINTAINING-cawk.md		MAINTAINING-cawk.md
Makefile.am		Makefile.am
Makefile.in		Makefile.in
NEWS		NEWS
NEWS.0		NEWS.0
NEWS.1		NEWS.1
POSIX.STD		POSIX.STD
README		README
README.git		README.git
README.md		README.md
TODO		TODO
aclocal.m4		aclocal.m4
array.c		array.c
awk.h		awk.h
awkgram.c		awkgram.c
awkgram.y		awkgram.y
bootstrap.sh		bootstrap.sh
builtin.c		builtin.c
cint_array.c		cint_array.c
cmd.h		cmd.h
command.c		command.c
command.y		command.y
configh.in		configh.in
configure		configure
configure.ac		configure.ac
custom.h		custom.h
debug.c		debug.c
eval.c		eval.c
ext.c		ext.c
field.c		field.c
floatcomp.c		floatcomp.c
floatmagic.h		floatmagic.h
gawkapi.c		gawkapi.c
gawkapi.h		gawkapi.h
gawkbug.in		gawkbug.in
gawkmisc.c		gawkmisc.c
gettext.h		gettext.h
int_array.c		int_array.c
interpret.h		interpret.h
io.c		io.c
main.c		main.c
mpfr.c		mpfr.c
msg.c		msg.c
node.c		node.c
nonposix.h		nonposix.h
printf.c		printf.c
profile.c		profile.c
re.c		re.c
replace.c		replace.c
str_array.c		str_array.c
symbol.c		symbol.c
version.c		version.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cawk — columned awk

Why

Features

1. Access columns by name

2. Any single-character delimiter

3. Headers that aren't valid variable names are sanitized (with a warning)

4. Name collisions are resolved, never clobbered

5. Duplicate headers get a numbered suffix

6. Per-file headers

Building

Quick reference

Relationship to gawk

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cawk — columned awk

Why

Features

1. Access columns by name

2. Any single-character delimiter

3. Headers that aren't valid variable names are sanitized (with a warning)

4. Name collisions are resolved, never clobbered

5. Duplicate headers get a numbered suffix

6. Per-file headers

Building

Quick reference

Relationship to gawk

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages