Skip to content

t: add greplint.pl and convert grep to test_grep#2135

Open
mmontalbo wants to merge 6 commits into
gitgitgadget:masterfrom
mmontalbo:mm/test-grep-docs
Open

t: add greplint.pl and convert grep to test_grep#2135
mmontalbo wants to merge 6 commits into
gitgitgadget:masterfrom
mmontalbo:mm/test-grep-docs

Conversation

@mmontalbo

@mmontalbo mmontalbo commented Jun 2, 2026

Copy link
Copy Markdown

test_grep is a wrapper around grep for test assertions that prints
the file contents on failure for easier debugging. Bare grep fails
silently, making it hard to diagnose what went wrong.

This series converts existing bare grep assertions to test_grep and
adds greplint.pl to prevent new ones from being introduced.

Patch 1 documents test_grep in t/README.

Patch 2 fixes three greps missing file arguments (t2402, t7507,
t7700). They were reading empty stdin and passing vacuously.

Patch 3 extracts chainlint's Lexer, ShellParser, and ScriptParser
into a shared module (lib-shell-parser.pl) so greplint.pl can
reuse the same tokenizer. No functional change to chainlint.

Patch 4 fixes a latent line-counting bug in scan_dqstring where
newlines from $() bodies inside double-quoted strings were counted
twice. This does not affect chainlint (which uses byte offsets)
but matters for greplint.pl's line-number reporting.

Patch 5 converts existing assertion greps to test_grep, including
sourced test helpers. Greps used as data filters or on files
that may not exist are left unconverted with lint-ok annotations.

Patch 6 adds greplint.pl with test fixtures (modeled on chainlint/)
and wires it into the Makefile as test-greplint and check-greplint.

Changes since v1:

  • Dropped lint-style.pl and --fix mode. Replaced with
    greplint.pl, which reuses chainlint's parser via a shared
    module to correctly identify command boundaries.

  • A regex approach to grep linting was prototyped in an attempt
    to reduce the number of patches in the series, but this
    approach produced false positives from grep inside heredoc
    bodies (e.g. write_script) and cross-line pipelines where
    the pipe or redirect is on a different line from the grep.
    The shared module's Lexer already collapses these into
    single tokens, giving zero false positives with less code
    than the regex heuristics would need, which is why it was
    retained in the current version.

  • Reverted two incorrect conversions where grep was used as a
    data filter inside redirected compound commands, not as a
    test assertion.

Known limitation / follow-up:

  • Assertions like grep pattern file >/dev/null are not
    converted because greplint.pl treats any stdout redirect as
    a filter. These are assertions (testing the exit code) but
    the correct conversion is ambiguous: the >/dev/null becomes
    dead code under test_grep since it already suppresses
    matching-line output. A follow-up series can address this
    once a convention is established.

cc: "D. Ben Knoble" ben.knoble@gmail.com, Eric Sunshine sunshine@sunshineco.com

@mmontalbo mmontalbo force-pushed the mm/test-grep-docs branch 30 times, most recently from 5d5366a to 9354adf Compare June 3, 2026 18:05
@mmontalbo mmontalbo force-pushed the mm/test-grep-docs branch 7 times, most recently from fe1f793 to bae968c Compare June 10, 2026 21:21
test_grep is a wrapper around grep for test assertions that prints
the file contents on failure for easier debugging.  It also accepts
'!' as its first argument for negation, which preserves the
diagnostic output that '! test_grep' would suppress.

Despite being widely used (and the preferred replacement for bare
grep in assertions), test_grep has no entry in t/README alongside
the other documented helpers like test_cmp and test_line_count.
Add one.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Three grep assertions were missing their file arguments, causing
them to read from empty stdin instead of the intended file:

- t2402: '! grep ...' should read from 'out', matching the
  grep on the preceding line.
- t7507: the closing quote is in the wrong place, making the
  entire 'diff --git actual' a single pattern with no file
  argument instead of pattern 'diff --git' and file 'actual'.
- t7700: '! grep ...' should read from 'packlist', matching
  the redirect on the preceding line.

Without file arguments these greps always succeed (empty stdin
matches nothing), so the assertions were not actually checking
anything.  All three tests pass with the corrected file arguments,
confirming the intended behavior is sound.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
@mmontalbo mmontalbo force-pushed the mm/test-grep-docs branch 3 times, most recently from f749f29 to 614b21f Compare June 11, 2026 07:20
@mmontalbo mmontalbo changed the title t: add lint-style.pl and convert grep to test_grep t: add check-grep.pl and convert grep to test_grep Jun 11, 2026
@mmontalbo mmontalbo force-pushed the mm/test-grep-docs branch 7 times, most recently from 1f7855c to 937f800 Compare June 11, 2026 17:25
@mmontalbo mmontalbo changed the title t: add check-grep.pl and convert grep to test_grep t: add greplint.pl and convert grep to test_grep Jun 11, 2026
Move chainlint.pl's Lexer, ShellParser, and ScriptParser into a
shared module (lib-shell-parser.pl) so other lint tools can reuse
the same shell parsing infrastructure.

ScriptParser's check_test() becomes a no-op in the shared module.
chainlint.pl defines ChainlintParser (extending ScriptParser)
with the &&-chain check_test() implementation.

No functional change: chainlint produces the same output and
check-chainlint self-tests pass.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
@mmontalbo mmontalbo force-pushed the mm/test-grep-docs branch 5 times, most recently from 45e09e3 to 14114d7 Compare June 11, 2026 22:32
scan_dqstring's post-loop newline counter re-counts newlines that
were already counted during recursive parsing of $() bodies.  This
happens because scan_dollar returns text containing newlines (from
multi-line command substitutions), and the catch-all counter at the
end of scan_dqstring counts all of them again.

Fix this by counting newlines inline as non-special characters are
consumed, and removing the post-loop catch-all.  Each newline is
now counted exactly once: literal newlines at the inline match,
line splices at the backslash handler, and $() newlines by
scan_token during the recursive parse.

This does not affect chainlint's output because chainlint annotates
the original body text using byte offsets, not token line numbers.
It matters for greplint.pl (introduced in a subsequent commit) which
uses token line numbers to report the source location of bare grep
assertions.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Replace bare grep with test_grep in test assertions across the
suite, including sourced test helpers (lib-*.sh, *-tests.sh).
test_grep prints the contents of the file being searched on
failure, making debugging easier than a bare grep which fails
silently.

Only assertion-style greps are converted: grep used as a filter
in pipelines, command substitutions, conditionals, or with
redirected I/O is left as-is with a "# lint-ok" annotation.
Existing '! test_grep' calls are rewritten to 'test_grep !' so
that the diagnostic output is preserved on failure.

To reproduce, apply greplint.pl from the next commit and run:

    # Step 1: mark bare greps that should not be converted
    sed -i '/! grep "\$m" \.git\/packed-refs/s/$/ # lint-ok: file may not exist (reftable)/' \
        t/t1400-update-ref.sh
    sed -i '/! grep dirty file3 &&/{/lint-ok/!s/$/ # lint-ok: file may not exist after --quit/}' \
        t/t3420-rebase-autostash.sh
    sed -i '/grep -vf before commits\.raw/s/$/ # lint-ok: data filter/' \
        t/t5326-multi-pack-bitmaps.sh
    sed -i '/! grep \$d shallow-client\/\.git\/shallow/s/$/ # lint-ok: file may not exist after repack/' \
        t/t5537-fetch-shallow.sh
    sed -i '/grep -E "^\[0-9a-f\].*|| :/s/$/ # lint-ok: data filter/' \
        t/t5702-protocol-v2.sh
    sed -i '/! grep gitdir squatting-clone/s/$/ # lint-ok: file may not exist after failed clone/' \
        t/t7450-bad-git-dotfiles.sh

    # Step 2: reorder pre-existing '! test_grep' to 'test_grep !'
    # (must come before steps 3-4 so greplint does not see them)
    sed -i 's/! test_grep/test_grep !/' t/t0031-lockfile-pid.sh
    sed -i 's/! test_grep/test_grep !/' t/t5300-pack-object.sh
    sed -i 's/! test_grep/test_grep !/' t/t5319-multi-pack-index.sh

    # Step 3: convert '! grep' -> 'test_grep !'
    perl t/greplint.pl t/*.sh 2>&1 | cut -d: -f1,2 |
    while IFS=: read f l; do
        sed -i "${l}s/! *grep/test_grep !/" "$f"
    done

    # Step 4: convert remaining 'grep' -> 'test_grep'
    perl t/greplint.pl t/*.sh 2>&1 | cut -d: -f1,2 |
    while IFS=: read f l; do
        sed -i "${l}s/grep/test_grep/" "$f"
    done

To verify, run: make -C t test-greplint

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Without a lint guard, bare grep assertions will creep back into
tests over time, defeating the previous commit's conversion.

Add greplint.pl to catch bare 'grep' used as a test assertion
(where 'test_grep' should be used) and '! test_grep' (where
'test_grep !' should be used).

greplint.pl reuses the shared shell parser from lib-shell-parser.pl
to tokenize test bodies.  The Lexer collapses heredocs, command
substitutions, and quoted strings into single tokens, so 'grep'
appearing inside these contexts is not flagged.  A flat walk over
the token stream tracks command position and pipeline state to
distinguish assertion greps from filter greps.

For double-quoted test bodies, a splice adjustment table compensates
for backslash-continuation lines that the Lexer consumes without
emitting into the body text, ensuring accurate line numbers.

Add test fixtures in greplint/ (modeled on chainlint/) covering
detection of bare grep assertions, correct skipping of filters,
pipelines, redirects, command substitutions, and lint-ok annotations.

Wire into the Makefile as:
  - test-greplint: runs greplint.pl on $(T) $(THELPERS) $(TPERF)
  - check-greplint: runs greplint.pl on fixtures, diffs against expected
  - clean-greplint: removes temp dir

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant