Fix segfault on CREATE/MERGE/SET with generated or extra label columns (#2450)#2458
Conversation
|
@jrgemignani ready for review when you have a cycle. This fixes the #2450 segfault where a One design note called out in the PR: on SET, AGE rebuilds the row from |
There was a problem hiding this comment.
Pull request overview
This PR fixes a backend segfault in AGE’s Cypher CREATE/MERGE/SET paths when label tables contain user-added columns (including GENERATED ALWAYS ... STORED). It does this by fully NULL-initializing the entity tuple slot before populating AGE-managed columns, and by computing stored generated columns before materializing heap tuples, aligning behavior with PostgreSQL semantics.
Changes:
- Introduces
clear_entity_slot()and uses it at all entity slot build sites to ensure non-AGE columns default to NULL (avoiding uninitialized slot memory reads). - Computes stored generated columns during insert and update via
ExecComputeStoredGenerated()before heap tuple materialization. - Adds a
generated_columnsregression test and wires it into the regression suite.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/include/executor/cypher_utils.h | Declares clear_entity_slot() helper. |
| src/backend/executor/cypher_utils.c | Implements clear_entity_slot() and computes stored generated columns before materializing insert tuples. |
| src/backend/executor/cypher_set.c | Recomputes stored generated columns before materializing update tuples. |
| src/backend/executor/cypher_merge.c | Uses clear_entity_slot() when building MERGE insert slots. |
| src/backend/executor/cypher_create.c | Uses clear_entity_slot() when building CREATE insert slots. |
| regress/sql/generated_columns.sql | Adds regression coverage for stored generated columns and extra plain columns across CREATE/MERGE/SET. |
| regress/expected/generated_columns.out | Expected output for the new regression test. |
| Makefile | Adds generated_columns to the regression test list. |
| PROJECT.yaml | Adds project metadata (currently contains inaccurate license/repo fields). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - Standard SDLC (Makefile) | ||
| os: | ||
| - Linux/Debian | ||
| license: Proprietary |
| links: | ||
| docs: '' | ||
| repo: https://git.rizlabs.com/gregf/apache-age |
2ae8e16 to
3905048
Compare
|
GPT 5.5 xhigh review: Findings
Then CREATE and SET through Cypher both leave tbl as - / invalid. The fix should set elemTupleSlot->tts_tableOid before ExecComputeStoredGenerated(...) in both insert/create and update/set
|
|
GPT 5.5 xhigh review: Found one performance issue worth commenting on: Performance note:
Relevant spots:
This should probably cache/reuse Secondary, lower-priority nit: Minor:
On a matched |
apache#2450) AGE builds the insert/update tuple slot from the full label-table descriptor (table_slot_create), but only populates the columns it manages: id/properties for vertices and id/start_id/end_id/properties for edges. A user-added column on the label table -- most notably a GENERATED ALWAYS ... STORED column, but also any plain ADD COLUMN -- therefore left an uninitialized slot entry. ExecStoreVirtualTuple marks all attributes valid, so heap_form_tuple() -> heap_compute_data_size() read the uninitialized tts_isnull[]/tts_values[] for that column and segfaulted dereferencing garbage as a varlena. This crashed the backend on CREATE, MERGE and SET. Fix in two parts: - clear_entity_slot() clears the slot and marks every attribute NULL before AGE fills in its own columns, so columns AGE does not manage default to NULL instead of stale slot memory. It is used at every slot build site: create_vertex, create_edge, merge_vertex, merge_edge, and the populate_vertex_tts/populate_edge_tts helpers used by SET. - insert_entity_tuple_cid (CREATE/MERGE) and update_entity_tuple (SET) call ExecComputeStoredGenerated() before materializing the heap tuple when the relation has stored generated columns, so those columns are computed on insert and recomputed on update per PostgreSQL semantics. Adds a generated_columns regression test covering stored generated columns on vertex and edge labels across CREATE, MERGE and SET, plus a plain user column that must not crash.
3905048 to
4fc30ed
Compare
|
Thanks for the thorough review, @MuhammadTahaNaveed — genuinely useful. Addressed in the latest push: 1. [P1] tableoid generated columns — Fixed. Good catch. The slot now carries 2. [P1] PROJECT.yaml — Removed. That was a stray commit that rode in from a local branch base; the branch has been rebased onto current master so the PR now contains only the fix. 3. [P3] Rebase — Done, the branch is rebased onto current master and CI is green against the current tree. B. [P3] MERGE clears slot before the should_insert check — Fixed. A. [perf] SET builds a ResultRelInfo per row — Agreed this is worth doing, but I'd like to keep it out of this bugfix. The per-row 4. [P3] Trailing whitespace in the .out — Left as-is intentionally. pg_regress does an exact byte comparison against psql's output, which emits trailing spaces on table rows; every expected file in Thanks again — the tableoid case in particular was a real gap. |
|
Thanks for the careful review and the approval, @MuhammadTahaNaveed — the tableoid catch in particular was a real gap. All feedback is addressed, CI is green, and the follow-up perf item is tracked in #2459. Since this is my PR I won't self-merge — feel free to land it whenever you have a moment, or I can leave it for another committer. Thanks again! |
@gregfelice Its already merged :) |
#2458) AGE builds the insert/update tuple slot from the full label-table descriptor (table_slot_create), but only populates the columns it manages: id/properties for vertices and id/start_id/end_id/properties for edges. A user-added column on the label table -- most notably a GENERATED ALWAYS ... STORED column, but also any plain ADD COLUMN -- therefore left an uninitialized slot entry. ExecStoreVirtualTuple marks all attributes valid, so heap_form_tuple() -> heap_compute_data_size() read the uninitialized tts_isnull[]/tts_values[] for that column and segfaulted dereferencing garbage as a varlena. This crashed the backend on CREATE, MERGE and SET. Fix in two parts: - clear_entity_slot() clears the slot and marks every attribute NULL before AGE fills in its own columns, so columns AGE does not manage default to NULL instead of stale slot memory. It is used at every slot build site: create_vertex, create_edge, merge_vertex, merge_edge, and the populate_vertex_tts/populate_edge_tts helpers used by SET. - insert_entity_tuple_cid (CREATE/MERGE) and update_entity_tuple (SET) call ExecComputeStoredGenerated() before materializing the heap tuple when the relation has stored generated columns, so those columns are computed on insert and recomputed on update per PostgreSQL semantics. Adds a generated_columns regression test covering stored generated columns on vertex and edge labels across CREATE, MERGE and SET, plus a plain user column that must not crash.
Problem
Fixes #2450. A
GENERATED ALWAYS ... STOREDcolumn on a label table caused a backend segfault (signal 11) on CypherCREATE/MERGE/SET:Root cause
AGE builds the insert/update tuple slot from the full label-table descriptor (
table_slot_create), but only populates the columns it manages —id/propertiesfor vertices,id/start_id/end_id/propertiesfor edges. Any user-added column is left as an uninitialized slot entry.ExecStoreVirtualTuplethen marks all attributes valid, soheap_form_tuple()→heap_compute_data_size()reads the uninitializedtts_isnull[]/tts_values[]and segfaults dereferencing garbage as a varlena.The SET path crashes identically via
update_entity_tuple(cypher_set.c). It affects both stored generated columns and plainADD COLUMNs.Fix
clear_entity_slot()clears the slot and marks every attribute NULL before AGE fills in its own columns, so columns AGE doesn't manage default to NULL instead of stale slot memory. Used at all six slot-build sites:create_vertex,create_edge,merge_vertex,merge_edge, and thepopulate_vertex_tts/populate_edge_ttshelpers (SET).ExecComputeStoredGenerated()is called ininsert_entity_tuple_cid(CREATE/MERGE,CMD_INSERT) andupdate_entity_tuple(SET,CMD_UPDATE) before the heap tuple is materialized when the relation has stored generated columns, so those columns are computed on insert and recomputed on update per PostgreSQL semantics.Testing
New
generated_columnsregression test covering stored generated columns on vertex and edge labels across CREATE / MERGE / SET (values verified computed and recomputed), plus a plain user column that must not crash. Passes underpg_regress.Verified against the reporter's exact repro on PG18: no crash, and
categorycorrectly computes (diskon CREATE) and recomputes (ssdafter SET).Note
On
SET, AGE rebuilds the row fromid/properties, so a generated column recomputes correctly but a plain (non-generated) column is reset to NULL. Preserving plain-column values across updates would be a larger change and is out of scope here; the supported pattern is aGENERATED ... STOREDcolumn derived fromproperties.