Propagate null through unnest and single() three-valued logic#2406
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Fixes Cypher null handling in list-iteration constructs and aligns predicate functions (notably single()) with openCypher/Neo4j three-valued logic by propagating SQL NULL correctly through UNNEST/UNWIND and updating transformed predicate-function subqueries.
Changes:
- Update
age_unnestto emit SQL NULL rows forAGTV_NULLelements soIS NULL/IS NOT NULLbehave correctly. - Remove the grammar-level
(subquery) = 1wrapping forsingle()and instead return a boolean directly from a CASE-based aggregate expression. - Extend regression coverage for null-related edge cases in predicate functions, list comprehension filters, and UNWIND.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/backend/utils/adt/agtype.c | Emit SQL NULL for AGTV_NULL elements during unnest/unwind iteration. |
| src/backend/parser/cypher_gram.y | Simplify predicate-function node building now that single() returns boolean directly. |
| src/backend/parser/cypher_clause.c | Add count(*) FILTER aggregate helper and implement three-valued CASE logic for single(). |
| regress/sql/predicate_functions.sql | Add/adjust regression queries to validate three-valued semantics across any/all/none/single. |
| regress/sql/list_comprehension.sql | Add regression cases for IS NULL / IS NOT NULL behavior over null list elements and UNWIND. |
| regress/expected/predicate_functions.out | Update expected outputs for new three-valued/null propagation behavior. |
| regress/expected/list_comprehension.out | Update expected outputs for list-comprehension and UNWIND null filtering behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@MuhammadTahaNaveed It looks like you need to rebase this PR |
gregfelice
left a comment
There was a problem hiding this comment.
Reviewed the age_unnest change, the single() CASE logic, and the new aggregate helpers. The approach is sound and I'd endorse the direction; a few items below before it's approve-ready.
Approach. Emitting a SQL-NULL row (nulls[0]=true) for AGTV_NULL elements in age_unnest is the right fix — it makes the iterated element match how AGE represents cypher null everywhere else, so SQL IS NULL/IS NOT NULL and strict aggregates behave per openCypher. One change covers list-comprehension filters (#2393), UNWIND [null] count (#2383), and single() three-valued logic together.
single() 3VL — verified correct. I worked the cases by hand:
- ≥2 definite trues → false (and this arm is evaluated first, so definite trues correctly dominate any unknown/null elements);
- exactly 1 true with no nulls → true; exactly 1 true with a null → NULL (the unknown could be a second match);
- 0 trues with a null → NULL; all-false → false; empty list → false.
That matches Neo4j, including the empty-list and the one-true-plus-null edge cases.
Overlap with #2411 (worth a maintainer decision). Both PRs fix the same root cause in age_unnest. This PR changes age_unnest in place, so it fixes UNWIND and list comprehension and single() in one go; #2411 adds a separate UNWIND-specific function and fixes only the UNWIND [null] count case. They're mutually redundant — picking this one would subsume #2411.
Minor (the three Copilot inline notes are valid):
makeConst(INT8OID, ..., FLOAT8PASSBYVAL)should useINT8PASSBYVAL. It's functionally identical on all supported platforms (both followUSE_FLOAT8_BYVAL), but it's the wrong macro for an int8 constant — worth fixing for clarity/correctness.- The empty-list comment is inaccurate:
count(*)(incl.count(*) FILTER (...)) returns 0, not NULL, on zero rows — onlybool_or(...)returns NULL. The logic is correct; just the comment. make_count_star_filter_agghard-codesINT8OIDas the return type — fine forcount(*), slightly more robust to derive it, but optional.
With the rebase @jrgemignani asked for plus the byval/comment tidy-ups, this looks good to me.
|
@jrgemignani — a couple of cross-cutting notes from going through the open parser/agtype PRs, in case they help with sequencing: 1. #2406 and #2411 are competing fixes for the same
Per openCypher, 2. A few of these are blocked more on process than on code.
Happy to take another look at any of these once they're rebased / CI is green. |
|
@MuhammadTahaNaveed The approach here is solid and I'd like to get it in — it's currently blocked only on a rebase plus two small tidy-ups. If you can:
(The Once that's done and CI's green, I'll approve. Thanks for working the full three-valued-logic fix rather than just the UNWIND symptom. |
Two related defects in how AGE handled cypher null inside list-iterating
constructs.
age_unnest packaged every iterated element as a non-SQL-NULL agtype
datum, even AGTV_NULL scalars. SQL `IS NULL` / `IS NOT NULL` then
couldn't see those nulls, so `[x IN [null, 1] WHERE x IS NULL]` dropped
the null it was meant to keep, and `WHERE x IS NOT NULL` kept the null
it was meant to drop. The same mismatch surfaced in UNWIND. AGE already
treats SQL NULL as the row-level representation of cypher null
elsewhere (`RETURN null AS v` yields SQL NULL, strict operators
short-circuit on it); age_unnest now does the same by emitting the row
with `nulls[0] = true` when the element is AGTV_NULL.
single() previously transformed to `SELECT count(*) FROM unnest(list)
AS x WHERE pred IS TRUE`, with the grammar wrapping the result as
`(subquery) = 1`. With the unnest fix, `[null, 5] WHERE x > 0` left
one definite true after the WHERE filter -> count = 1 -> true. Neo4j
returns null because the unknown predicate could itself be a second
match. Rewritten to a CASE built on `count(*) FILTER (WHERE pred IS
TRUE)` and `bool_or(pred IS NULL)`:
CASE WHEN count(*) FILTER (WHERE pred IS TRUE) >= 2 THEN false
WHEN bool_or(pred IS NULL) THEN NULL
WHEN count(*) FILTER (WHERE pred IS TRUE) = 1 THEN true
ELSE false END
The >=2 arm runs first so two definite trues dominate any unknowns.
Fits inside the existing make_predicate_case_expr helper alongside
all/any/none, removes the special-case transform branch and the
grammar `= 1` wrap. A small `make_count_star_filter_agg` helper
mirrors the existing `make_bool_or_agg`. Verified against Neo4j for
the new edge cases (one-true-plus-null, two-trues-plus-null,
all-nulls, mixed-true-false-null).
The predicate_functions regression also picks up the corrected
behavior of any/all/none over null elements: `null > 0` now yields
SQL NULL instead of being silently treated as true, so the
three-valued combinators in those functions produce the openCypher
results the comments previously documented as buggy.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@gregfelice fixed. |
gregfelice
left a comment
There was a problem hiding this comment.
All three items addressed — rebased onto master (mergeable, CI green), INT8PASSBYVAL applied to the single() consts (with a portability guard), and the empty-list comment corrected. Verified the three-valued logic for single()/any()/all()/none() over null elements against the openCypher/Neo4j semantics. LGTM, thanks @MuhammadTahaNaveed.
Two related defects in how AGE handled cypher null inside list-iterating constructs.
age_unnest packaged every iterated element as a non-SQL-NULL agtype datum, even AGTV_NULL scalars. SQL
IS NULL/IS NOT NULLthen couldn't see those nulls, so[x IN [null, 1] WHERE x IS NULL]dropped the null it was meant to keep, andWHERE x IS NOT NULLkept the null it was meant to drop. The same mismatch surfaced in UNWIND. AGE already treats SQL NULL as the row-level representation of cypher null elsewhere (RETURN null AS vyields SQL NULL, strict operators short-circuit on it); age_unnest now does the same by emitting the row withnulls[0] = truewhen the element is AGTV_NULL.single() previously transformed to
SELECT count(*) FROM unnest(list) AS x WHERE pred IS TRUE, with the grammar wrapping the result as(subquery) = 1. With the unnest fix,[null, 5] WHERE x > 0left one definite true after the WHERE filter -> count = 1 -> true. Neo4j returns null because the unknown predicate could itself be a second match. Rewritten to a CASE built oncount(*) FILTER (WHERE pred IS TRUE)andbool_or(pred IS NULL):CASE WHEN count() FILTER (WHERE pred IS TRUE) >= 2 THEN false
WHEN bool_or(pred IS NULL) THEN NULL
WHEN count() FILTER (WHERE pred IS TRUE) = 1 THEN true
ELSE false END
The >=2 arm runs first so two definite trues dominate any unknowns. Fits inside the existing make_predicate_case_expr helper alongside all/any/none, removes the special-case transform branch and the grammar
= 1wrap. A smallmake_count_star_filter_agghelper mirrors the existingmake_bool_or_agg. Verified against Neo4j for the new edge cases (one-true-plus-null, two-trues-plus-null, all-nulls, mixed-true-false-null).The predicate_functions regression also picks up the corrected behavior of any/all/none over null elements:
null > 0now yields SQL NULL instead of being silently treated as true, so the three-valued combinators in those functions produce the openCypher results the comments previously documented as buggy.