Skip to content

Propagate null through unnest and single() three-valued logic#2406

Merged
gregfelice merged 1 commit into
apache:masterfrom
MuhammadTahaNaveed:i2393
Jul 3, 2026
Merged

Propagate null through unnest and single() three-valued logic#2406
gregfelice merged 1 commit into
apache:masterfrom
MuhammadTahaNaveed:i2393

Conversation

@MuhammadTahaNaveed

Copy link
Copy Markdown
Member

Two related defects in how AGE handled cypher null inside list-iterating constructs.

age_unnest packaged every iterated element as a non-SQL-NULL agtype datum, even AGTV_NULL scalars. SQL IS NULL / IS NOT NULL then couldn't see those nulls, so [x IN [null, 1] WHERE x IS NULL] dropped the null it was meant to keep, and WHERE x IS NOT NULL kept the null it was meant to drop. The same mismatch surfaced in UNWIND. AGE already treats SQL NULL as the row-level representation of cypher null elsewhere (RETURN null AS v yields SQL NULL, strict operators short-circuit on it); age_unnest now does the same by emitting the row with nulls[0] = true when the element is AGTV_NULL.

single() previously transformed to SELECT count(*) FROM unnest(list) AS x WHERE pred IS TRUE, with the grammar wrapping the result as (subquery) = 1. With the unnest fix, [null, 5] WHERE x > 0 left one definite true after the WHERE filter -> count = 1 -> true. Neo4j returns null because the unknown predicate could itself be a second match. Rewritten to a CASE built on count(*) FILTER (WHERE pred IS TRUE) and bool_or(pred IS NULL):

CASE WHEN count() FILTER (WHERE pred IS TRUE) >= 2 THEN false
WHEN bool_or(pred IS NULL) THEN NULL
WHEN count(
) FILTER (WHERE pred IS TRUE) = 1 THEN true
ELSE false END

The >=2 arm runs first so two definite trues dominate any unknowns. Fits inside the existing make_predicate_case_expr helper alongside all/any/none, removes the special-case transform branch and the grammar = 1 wrap. A small make_count_star_filter_agg helper mirrors the existing make_bool_or_agg. Verified against Neo4j for the new edge cases (one-true-plus-null, two-trues-plus-null, all-nulls, mixed-true-false-null).

The predicate_functions regression also picks up the corrected behavior of any/all/none over null elements: null > 0 now yields SQL NULL instead of being silently treated as true, so the three-valued combinators in those functions produce the openCypher results the comments previously documented as buggy.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes Cypher null handling in list-iteration constructs and aligns predicate functions (notably single()) with openCypher/Neo4j three-valued logic by propagating SQL NULL correctly through UNNEST/UNWIND and updating transformed predicate-function subqueries.

Changes:

  • Update age_unnest to emit SQL NULL rows for AGTV_NULL elements so IS NULL / IS NOT NULL behave correctly.
  • Remove the grammar-level (subquery) = 1 wrapping for single() and instead return a boolean directly from a CASE-based aggregate expression.
  • Extend regression coverage for null-related edge cases in predicate functions, list comprehension filters, and UNWIND.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/backend/utils/adt/agtype.c Emit SQL NULL for AGTV_NULL elements during unnest/unwind iteration.
src/backend/parser/cypher_gram.y Simplify predicate-function node building now that single() returns boolean directly.
src/backend/parser/cypher_clause.c Add count(*) FILTER aggregate helper and implement three-valued CASE logic for single().
regress/sql/predicate_functions.sql Add/adjust regression queries to validate three-valued semantics across any/all/none/single.
regress/sql/list_comprehension.sql Add regression cases for IS NULL / IS NOT NULL behavior over null list elements and UNWIND.
regress/expected/predicate_functions.out Update expected outputs for new three-valued/null propagation behavior.
regress/expected/list_comprehension.out Update expected outputs for list-comprehension and UNWIND null filtering behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/parser/cypher_clause.c Outdated
Comment thread src/backend/parser/cypher_clause.c Outdated
Comment thread src/backend/parser/cypher_clause.c
@jrgemignani

Copy link
Copy Markdown
Contributor

@MuhammadTahaNaveed It looks like you need to rebase this PR

@gregfelice gregfelice left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the age_unnest change, the single() CASE logic, and the new aggregate helpers. The approach is sound and I'd endorse the direction; a few items below before it's approve-ready.

Approach. Emitting a SQL-NULL row (nulls[0]=true) for AGTV_NULL elements in age_unnest is the right fix — it makes the iterated element match how AGE represents cypher null everywhere else, so SQL IS NULL/IS NOT NULL and strict aggregates behave per openCypher. One change covers list-comprehension filters (#2393), UNWIND [null] count (#2383), and single() three-valued logic together.

single() 3VL — verified correct. I worked the cases by hand:

  • ≥2 definite trues → false (and this arm is evaluated first, so definite trues correctly dominate any unknown/null elements);
  • exactly 1 true with no nulls → true; exactly 1 true with a null → NULL (the unknown could be a second match);
  • 0 trues with a null → NULL; all-false → false; empty list → false.

That matches Neo4j, including the empty-list and the one-true-plus-null edge cases.

Overlap with #2411 (worth a maintainer decision). Both PRs fix the same root cause in age_unnest. This PR changes age_unnest in place, so it fixes UNWIND and list comprehension and single() in one go; #2411 adds a separate UNWIND-specific function and fixes only the UNWIND [null] count case. They're mutually redundant — picking this one would subsume #2411.

Minor (the three Copilot inline notes are valid):

  1. makeConst(INT8OID, ..., FLOAT8PASSBYVAL) should use INT8PASSBYVAL. It's functionally identical on all supported platforms (both follow USE_FLOAT8_BYVAL), but it's the wrong macro for an int8 constant — worth fixing for clarity/correctness.
  2. The empty-list comment is inaccurate: count(*) (incl. count(*) FILTER (...)) returns 0, not NULL, on zero rows — only bool_or(...) returns NULL. The logic is correct; just the comment.
  3. make_count_star_filter_agg hard-codes INT8OID as the return type — fine for count(*), slightly more robust to derive it, but optional.

With the rebase @jrgemignani asked for plus the byval/comment tidy-ups, this looks good to me.

@gregfelice

Copy link
Copy Markdown
Contributor

@jrgemignani — a couple of cross-cutting notes from going through the open parser/agtype PRs, in case they help with sequencing:

1. #2406 and #2411 are competing fixes for the same UNWIND [null] bug (#2383).
They take opposite approaches and will conflict — whichever lands first forces a rework of the other:

Per openCypher, null > 0 should be null rather than truthy, so this PR is the more correct end state; #2411 is the smaller, lower-risk step. Both are sound for their stated scope — it's really a scoping call on whether to fix the whole null-handling family at once or just the UNWIND symptom. Flagging so they don't both get merged independently.

2. A few of these are blocked more on process than on code.

Happy to take another look at any of these once they're rebased / CI is green.

@gregfelice

Copy link
Copy Markdown
Contributor

@MuhammadTahaNaveed The approach here is solid and I'd like to get it in — it's currently blocked only on a rebase plus two small tidy-ups. If you can:

  1. Rebase onto latest master (it's conflicting now, and @jrgemignani asked back on Apr 30) so CI runs green again.
  2. INT8PASSBYVAL fix — the two single() makeConst(INT8OID, …) calls use FLOAT8PASSBYVAL for the byval flag; they should use INT8PASSBYVAL. Functionally identical on all supported platforms, but it's the correct macro for an int8 const.
  3. Empty-list comment — the revised comment still says "all aggregates return NULL on zero rows"; count(*) (incl. count(*) FILTER (…)) returns 0, not NULL. The logic is right, just the comment — worth a one-line correction.

(The make_count_star_filter_agg hardcoded INT8OID return is fine to leave as-is.)

Once that's done and CI's green, I'll approve. Thanks for working the full three-valued-logic fix rather than just the UNWIND symptom.

Two related defects in how AGE handled cypher null inside list-iterating
constructs.

age_unnest packaged every iterated element as a non-SQL-NULL agtype
datum, even AGTV_NULL scalars. SQL `IS NULL` / `IS NOT NULL` then
couldn't see those nulls, so `[x IN [null, 1] WHERE x IS NULL]` dropped
the null it was meant to keep, and `WHERE x IS NOT NULL` kept the null
it was meant to drop. The same mismatch surfaced in UNWIND. AGE already
treats SQL NULL as the row-level representation of cypher null
elsewhere (`RETURN null AS v` yields SQL NULL, strict operators
short-circuit on it); age_unnest now does the same by emitting the row
with `nulls[0] = true` when the element is AGTV_NULL.

single() previously transformed to `SELECT count(*) FROM unnest(list)
AS x WHERE pred IS TRUE`, with the grammar wrapping the result as
`(subquery) = 1`. With the unnest fix, `[null, 5] WHERE x > 0` left
one definite true after the WHERE filter -> count = 1 -> true. Neo4j
returns null because the unknown predicate could itself be a second
match. Rewritten to a CASE built on `count(*) FILTER (WHERE pred IS
TRUE)` and `bool_or(pred IS NULL)`:

  CASE WHEN count(*) FILTER (WHERE pred IS TRUE) >= 2 THEN false
       WHEN bool_or(pred IS NULL)                     THEN NULL
       WHEN count(*) FILTER (WHERE pred IS TRUE) =  1 THEN true
       ELSE false END

The >=2 arm runs first so two definite trues dominate any unknowns.
Fits inside the existing make_predicate_case_expr helper alongside
all/any/none, removes the special-case transform branch and the
grammar `= 1` wrap. A small `make_count_star_filter_agg` helper
mirrors the existing `make_bool_or_agg`. Verified against Neo4j for
the new edge cases (one-true-plus-null, two-trues-plus-null,
all-nulls, mixed-true-false-null).

The predicate_functions regression also picks up the corrected
behavior of any/all/none over null elements: `null > 0` now yields
SQL NULL instead of being silently treated as true, so the
three-valued combinators in those functions produce the openCypher
results the comments previously documented as buggy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@MuhammadTahaNaveed

Copy link
Copy Markdown
Member Author

@gregfelice fixed.

@gregfelice gregfelice left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three items addressed — rebased onto master (mergeable, CI green), INT8PASSBYVAL applied to the single() consts (with a portability guard), and the empty-list comment corrected. Verified the three-valued logic for single()/any()/all()/none() over null elements against the openCypher/Neo4j semantics. LGTM, thanks @MuhammadTahaNaveed.

@gregfelice gregfelice merged commit 39273ca into apache:master Jul 3, 2026
6 checks passed
jrgemignani pushed a commit that referenced this pull request Jul 3, 2026
Fix age_unnest to emit SQL NULL for AGTV_NULL elements and rewrite single() to proper three-valued logic, fixing UNWIND [null] count, list-comprehension null filters, and single() semantics per openCypher.

Closes #2383
Closes #2393
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

List-comprehension WHERE filters may mishandle null elements.

4 participants