ORCA: don't push LOJ ON-pred onto its own outer in PushThruOuterChild#1836
Open
yjhjstz wants to merge 2 commits into
Open
ORCA: don't push LOJ ON-pred onto its own outer in PushThruOuterChild#1836yjhjstz wants to merge 2 commits into
yjhjstz wants to merge 2 commits into
Conversation
When two non-inner joins in an NAry join have structurally identical ON
predicates, ORCA could silently drop or misplace predicates, producing
wrong results:
select x.c1, y2.c1 from x left join y1 on x.c1
left join y2 on x.c1
where y2.c1 is null;
returned 0 rows instead of the two null-padded FALSE rows, because a
copy of the ON pred ended up as a scan filter on x.
Root cause: CJoinOrderDPv2's m_expression_to_edge_map is keyed on
structural equality (CExpression::HashValue / CUtils::Equals). With two
structurally identical ON preds, RecursivelyMarkEdgesAsUsed can only
ever mark one of the duplicate edges as used, so
AddSelectNodeForRemainingEdges treated the other edge as a leftover
WHERE predicate and emitted it into a Select on top of the join tree.
The normalizer then legitimately pushed that Select onto the LOJ's own
outer child, filtering out rows that outer-join semantics require to be
null-padded. (The map is only populated when a WHERE predicate
references an NIJ right child, which is why the WHERE clause is needed
to trigger the bug.)
Fix at the source: skip ON-pred edges (m_loj_num > 0) when collecting
remaining edges. An NIJ's ON predicate is always applied by the join
itself when its right child is placed (IsRightChildOfNIJ), so an
"unused" ON-pred edge can only be a bookkeeping artifact of the
structural-equality map and must never be duplicated above the join.
An earlier attempt fixed this downstream, by stripping conjuncts that
structurally match the LOJ's ON pred in CNormalizer::PushThruOuterChild.
That layer cannot distinguish the leaked ON-pred copy from legitimate,
structurally identical conjuncts arriving from above, and silently
deleted user predicates:
select * from x left join y on x.c1 where x.c1; -- 3 rows, not 1
select 1 from a t1
left join (a t2 left join a t3 on t2.id = 1)
on t2.id = 1; -- lost the
-- Index Cond on
-- t2 and the ON
-- pred entirely
With this fix, the original repro returns the correct 2 rows with no
scan filter on x, the queries above return planner-identical results,
and the nested-LOJ query regains Index Cond: (id = 1) on t2.
Add the repro as a regression test in bfv_joins.
…suite Mirror the bfv_joins regression case for the duplicate-ON-pred DPv2 bug (two LEFT JOINs sharing the same boolean ON column plus a WHERE on the inner side) into the pax_storage copy of the suite. Unlike the earlier version of this mirror, join_optimizer.out is left untouched: with the root-cause fix in CJoinOrderDPv2 the previously refreshed plan (Seq Scan on t2, outer Join Filter reduced to true) no longer exists; the original expected plan with Index Cond: (id = 1) on t2 is produced again.
94372e7 to
ad16de3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix a wrong-result bug in ORCA where a LEFT JOIN's own ON predicate ends up duplicated as a scan filter on the join's outer relation, discarding outer rows
that LOJ semantics require to be null-padded.
Repro
Seq Scan on xFilter: c1❌Trigger: two or more chained LEFT JOINs whose ON-clauses use the same boolean column from the outer relation, with a
WHEREon top.Root cause
CNormalizer::PushThruOuterChildis invoked fromPushThruSelectwith a predicate that happens to be (or contain) the LOJ's own ON predicate.SplitConjunct/FPushableaccept it as pushable to the outer relation, becauseFPushableonly checks that the predicate's columns are a subset of theouter's output columns — an LOJ ON-pred that references only outer-side columns trivially satisfies that.
Consequence: ORCA wraps the outer with
Select(outer, on_pred), producingLOJ(Select(x, c1), inner, x.c1). The LOJ's ON-pred is preserved asJoin Filter, but a redundantFilter: c1is also planted on the outer scan, which discards outer rows that don't satisfy the ON-pred — exactly the rows LOJmust null-pad and keep.
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheckmake -C src/test installcheck-cbdb-parallelImpact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions