Skip to content

Fix segfault on CREATE/MERGE/SET with generated or extra label columns (#2450)#2458

Merged
MuhammadTahaNaveed merged 1 commit into
apache:masterfrom
gregfelice:fix-2450-generated-column-segfault
Jul 3, 2026
Merged

Fix segfault on CREATE/MERGE/SET with generated or extra label columns (#2450)#2458
MuhammadTahaNaveed merged 1 commit into
apache:masterfrom
gregfelice:fix-2450-generated-column-segfault

Conversation

@gregfelice

Copy link
Copy Markdown
Contributor

Problem

Fixes #2450. A GENERATED ALWAYS ... STORED column on a label table caused a backend segfault (signal 11) on Cypher CREATE / MERGE / SET:

SELECT create_graph('age_bug');
SELECT create_vlabel('age_bug', 'Product');
ALTER TABLE age_bug."Product"
  ADD COLUMN category varchar(25)
  GENERATED ALWAYS AS (agtype_access_operator(properties, '"category"')) STORED;
-- crashes the backend:
SELECT * FROM cypher('age_bug', $$ CREATE (p:Product {category:'disk'}) RETURN p $$) AS (p agtype);

Root cause

AGE builds the insert/update tuple slot from the full label-table descriptor (table_slot_create), but only populates the columns it manages — id/properties for vertices, id/start_id/end_id/properties for edges. Any user-added column is left as an uninitialized slot entry. ExecStoreVirtualTuple then marks all attributes valid, so heap_form_tuple()heap_compute_data_size() reads the uninitialized tts_isnull[]/tts_values[] and segfaults dereferencing garbage as a varlena.

#0 heap_compute_data_size
#1 heap_form_tuple
#3 ExecFetchSlotHeapTuple
#4 insert_entity_tuple_cid   cypher_utils.c
#6 create_vertex            cypher_create.c

The SET path crashes identically via update_entity_tuple (cypher_set.c). It affects both stored generated columns and plain ADD COLUMNs.

Fix

  1. clear_entity_slot() clears the slot and marks every attribute NULL before AGE fills in its own columns, so columns AGE doesn't manage default to NULL instead of stale slot memory. Used at all six slot-build sites: create_vertex, create_edge, merge_vertex, merge_edge, and the populate_vertex_tts / populate_edge_tts helpers (SET).
  2. ExecComputeStoredGenerated() is called in insert_entity_tuple_cid (CREATE/MERGE, CMD_INSERT) and update_entity_tuple (SET, CMD_UPDATE) before the heap tuple is materialized when the relation has stored generated columns, so those columns are computed on insert and recomputed on update per PostgreSQL semantics.

Testing

New generated_columns regression test covering stored generated columns on vertex and edge labels across CREATE / MERGE / SET (values verified computed and recomputed), plus a plain user column that must not crash. Passes under pg_regress.

Verified against the reporter's exact repro on PG18: no crash, and category correctly computes (disk on CREATE) and recomputes (ssd after SET).

Note

On SET, AGE rebuilds the row from id/properties, so a generated column recomputes correctly but a plain (non-generated) column is reset to NULL. Preserving plain-column values across updates would be a larger change and is out of scope here; the supported pattern is a GENERATED ... STORED column derived from properties.

@gregfelice

Copy link
Copy Markdown
Contributor Author

@jrgemignani ready for review when you have a cycle. This fixes the #2450 segfault where a GENERATED ALWAYS ... STORED (or any extra) column on a label table crashed the backend on CREATE/MERGE/SET — root cause and the two-part fix (null-init the entity slot + ExecComputeStoredGenerated) are in the description. New generated_columns regression test covers vertex + edge across all three write paths, and CI is green.

One design note called out in the PR: on SET, AGE rebuilds the row from id/properties, so a generated column recomputes correctly but a plain column resets to NULL — I scoped plain-column preservation out. Happy to adjust if you'd rather handle that here.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a backend segfault in AGE’s Cypher CREATE/MERGE/SET paths when label tables contain user-added columns (including GENERATED ALWAYS ... STORED). It does this by fully NULL-initializing the entity tuple slot before populating AGE-managed columns, and by computing stored generated columns before materializing heap tuples, aligning behavior with PostgreSQL semantics.

Changes:

  • Introduces clear_entity_slot() and uses it at all entity slot build sites to ensure non-AGE columns default to NULL (avoiding uninitialized slot memory reads).
  • Computes stored generated columns during insert and update via ExecComputeStoredGenerated() before heap tuple materialization.
  • Adds a generated_columns regression test and wires it into the regression suite.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/include/executor/cypher_utils.h Declares clear_entity_slot() helper.
src/backend/executor/cypher_utils.c Implements clear_entity_slot() and computes stored generated columns before materializing insert tuples.
src/backend/executor/cypher_set.c Recomputes stored generated columns before materializing update tuples.
src/backend/executor/cypher_merge.c Uses clear_entity_slot() when building MERGE insert slots.
src/backend/executor/cypher_create.c Uses clear_entity_slot() when building CREATE insert slots.
regress/sql/generated_columns.sql Adds regression coverage for stored generated columns and extra plain columns across CREATE/MERGE/SET.
regress/expected/generated_columns.out Expected output for the new regression test.
Makefile Adds generated_columns to the regression test list.
PROJECT.yaml Adds project metadata (currently contains inaccurate license/repo fields).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread PROJECT.yaml Outdated
- Standard SDLC (Makefile)
os:
- Linux/Debian
license: Proprietary
Comment thread PROJECT.yaml Outdated
Comment on lines +23 to +25
links:
docs: ''
repo: https://git.rizlabs.com/gregf/apache-age
@gregfelice gregfelice force-pushed the fix-2450-generated-column-segfault branch from 2ae8e16 to 3905048 Compare July 3, 2026 11:27
@MuhammadTahaNaveed

Copy link
Copy Markdown
Member

@gregfelice

GPT 5.5 xhigh review:

Findings

  1. [P1] Stored generated columns using tableoid are computed incorrectly

    • src/backend/executor/cypher_utils.c:354
    • src/backend/executor/cypher_set.c:143

    The new ExecComputeStoredGenerated(...) calls compute stored generated columns, but the tuple slot does not appear to have tts_tableOid set first. PostgreSQL’s own executor sets
    slot->tts_tableOid = RelationGetRelid(...) before computing stored generated columns.

    Repro: adding this to a vertex label produces InvalidOid instead of the label table OID:

    ALTER TABLE tableoid_case."T"
      ADD COLUMN tbl oid GENERATED ALWAYS AS (tableoid) STORED;
    

Then CREATE and SET through Cypher both leave tbl as - / invalid. The fix should set elemTupleSlot->tts_tableOid before ExecComputeStoredGenerated(...) in both insert/create and update/set
paths, and add a regression case for tableoid.

  1. [P1] PROJECT.yaml adds incorrect Apache project metadata

    • PROJECT.yaml:16 says license: Proprietary
    • PROJECT.yaml:22 says owner: Rizlabs
    • PROJECT.yaml:25 points at a Rizlabs repo
      This file is completely unrelated to the fix and should be removed immediately.
  2. [P2] Branch needs a rebase onto current master
    The PR is based on merge base 12e2a31, while current apache/master is d84b457. It is missing several upstream commits, so the branch-local regression suite is stale. Please rebase so
    CI runs against the current tree.

  3. [P3] git diff --check fails on trailing whitespace
    The new regress/expected/generated_columns.out has multiple trailing-whitespace lines and a blank line at EOF. Please clean that up so whitespace checks pass.

@MuhammadTahaNaveed MuhammadTahaNaveed self-requested a review July 3, 2026 11:42
@MuhammadTahaNaveed

Copy link
Copy Markdown
Member

@gregfelice

GPT 5.5 xhigh review:

Found one performance issue worth commenting on:

Performance note: SET now calls ExecComputeStoredGenerated(...) for stored generated columns, but this path creates a fresh ResultRelInfo for each updated entity at src/backend/executor/cypher_set.c:607.

ExecComputeStoredGenerated(...) lazily initializes ri_GeneratedExprsU; PostgreSQL stores those expression states in the query context. Because AGE’s SET path uses a new ResultRelInfo per row/update item, multi-row SET on a label with stored generated columns will repeatedly rebuild generated-column expression state and retain that memory until query end.

Relevant spots:

  • src/backend/executor/cypher_set.c:607 creates a new ResultRelInfo
  • src/backend/executor/cypher_set.c:143 / :146 calls ExecComputeStoredGenerated(...)
  • src/backend/executor/cypher_set.c:840 / :841 closes indexes/relation, but the generated expression state has already been allocated in the executor query context

This should probably cache/reuse ResultRelInfo per label/relation for the duration of apply_update_list, similar to the existing relation/index/RLS caching pattern. That would let
ri_GeneratedExprsU initialize once per relation instead of once per updated row.

Secondary, lower-priority nit:

Minor: MERGE now calls clear_entity_slot(...) before checking whether the matched path will actually insert.

  • src/backend/executor/cypher_merge.c:1202
  • src/backend/executor/cypher_merge.c:1529

On a matched MERGE where should_insert == false, the code later skips insert_entity_tuple(...), so the new full-null-bitmap memset is avoidable work. Usually tiny for AGE’s base
label tables, but it scales with extra user-added columns. Consider moving the slot clear closer to the insert-only path if practical.

apache#2450)

AGE builds the insert/update tuple slot from the full label-table
descriptor (table_slot_create), but only populates the columns it
manages: id/properties for vertices and id/start_id/end_id/properties
for edges. A user-added column on the label table -- most notably a
GENERATED ALWAYS ... STORED column, but also any plain ADD COLUMN --
therefore left an uninitialized slot entry. ExecStoreVirtualTuple marks
all attributes valid, so heap_form_tuple() -> heap_compute_data_size()
read the uninitialized tts_isnull[]/tts_values[] for that column and
segfaulted dereferencing garbage as a varlena. This crashed the backend
on CREATE, MERGE and SET.

Fix in two parts:

- clear_entity_slot() clears the slot and marks every attribute NULL
  before AGE fills in its own columns, so columns AGE does not manage
  default to NULL instead of stale slot memory. It is used at every slot
  build site: create_vertex, create_edge, merge_vertex, merge_edge, and
  the populate_vertex_tts/populate_edge_tts helpers used by SET.

- insert_entity_tuple_cid (CREATE/MERGE) and update_entity_tuple (SET)
  call ExecComputeStoredGenerated() before materializing the heap tuple
  when the relation has stored generated columns, so those columns are
  computed on insert and recomputed on update per PostgreSQL semantics.

Adds a generated_columns regression test covering stored generated
columns on vertex and edge labels across CREATE, MERGE and SET, plus a
plain user column that must not crash.
@gregfelice

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review, @MuhammadTahaNaveed — genuinely useful. Addressed in the latest push:

1. [P1] tableoid generated columns — Fixed. Good catch. The slot now carries tts_tableOid = RelationGetRelid(...) before ExecComputeStoredGenerated() in both the insert (cypher_utils.c) and update (cypher_set.c) paths, matching PostgreSQL's own ExecInsert/ExecUpdate. Added a regression case with tbl oid GENERATED ALWAYS AS (tableoid) STORED that asserts the stored value equals the label table's real OID on both CREATE and SET.

2. [P1] PROJECT.yaml — Removed. That was a stray commit that rode in from a local branch base; the branch has been rebased onto current master so the PR now contains only the fix.

3. [P3] Rebase — Done, the branch is rebased onto current master and CI is green against the current tree.

B. [P3] MERGE clears slot before the should_insert check — Fixed. clear_entity_slot() is now gated on should_insert in both merge_vertex and merge_edge, so the matched path skips the null-init. Added a matched-MERGE regression case to cover that path.

A. [perf] SET builds a ResultRelInfo per row — Agreed this is worth doing, but I'd like to keep it out of this bugfix. The per-row ResultRelInfo is pre-existing structure, and the generated-column recompute is guarded by has_generated_stored, so only labels that actually have a stored generated column pay the cost — no regression for the common case. Caching ResultRelInfo/ri_GeneratedExprsU per label across apply_update_list is a cleaner standalone change (alongside the existing relation/index/RLS caching); filed as #2459 so it doesn't get lost.

4. [P3] Trailing whitespace in the .out — Left as-is intentionally. pg_regress does an exact byte comparison against psql's output, which emits trailing spaces on table rows; every expected file in regress/expected/ has them (e.g. cypher_set.out). Stripping them would make the expected file no longer match psql's actual output and break the test, and there's no diff --check/whitespace gate in CI. So this one I have to decline.

Thanks again — the tableoid case in particular was a real gap.

@MuhammadTahaNaveed MuhammadTahaNaveed merged commit 53ecacf into apache:master Jul 3, 2026
6 checks passed
@gregfelice

Copy link
Copy Markdown
Contributor Author

Thanks for the careful review and the approval, @MuhammadTahaNaveed — the tableoid catch in particular was a real gap. All feedback is addressed, CI is green, and the follow-up perf item is tracked in #2459.

Since this is my PR I won't self-merge — feel free to land it whenever you have a moment, or I can leave it for another committer. Thanks again!

@MuhammadTahaNaveed

Copy link
Copy Markdown
Member

Thanks for the careful review and the approval, @MuhammadTahaNaveed — the tableoid catch in particular was a real gap. All feedback is addressed, CI is green, and the follow-up perf item is tracked in #2459.

Since this is my PR I won't self-merge — feel free to land it whenever you have a moment, or I can leave it for another committer. Thanks again!

@gregfelice Its already merged :)

jrgemignani pushed a commit that referenced this pull request Jul 3, 2026
#2458)

AGE builds the insert/update tuple slot from the full label-table
descriptor (table_slot_create), but only populates the columns it
manages: id/properties for vertices and id/start_id/end_id/properties
for edges. A user-added column on the label table -- most notably a
GENERATED ALWAYS ... STORED column, but also any plain ADD COLUMN --
therefore left an uninitialized slot entry. ExecStoreVirtualTuple marks
all attributes valid, so heap_form_tuple() -> heap_compute_data_size()
read the uninitialized tts_isnull[]/tts_values[] for that column and
segfaulted dereferencing garbage as a varlena. This crashed the backend
on CREATE, MERGE and SET.

Fix in two parts:

- clear_entity_slot() clears the slot and marks every attribute NULL
  before AGE fills in its own columns, so columns AGE does not manage
  default to NULL instead of stale slot memory. It is used at every slot
  build site: create_vertex, create_edge, merge_vertex, merge_edge, and
  the populate_vertex_tts/populate_edge_tts helpers used by SET.

- insert_entity_tuple_cid (CREATE/MERGE) and update_entity_tuple (SET)
  call ExecComputeStoredGenerated() before materializing the heap tuple
  when the relation has stored generated columns, so those columns are
  computed on insert and recomputed on update per PostgreSQL semantics.

Adds a generated_columns regression test covering stored generated
columns on vertex and edge labels across CREATE, MERGE and SET, plus a
plain user column that must not crash.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Segmentation Fault] when creating graph node whose vertex label table has GENERATED ALWAYS columns

3 participants