Skip to content

[mypyc] Backport cached-group Extension.depends fix from upstream#21609

Closed
georgesittas wants to merge 22 commits into
python:release-2.1from
VaggelisD:fix/cached-group-header-deps
Closed

[mypyc] Backport cached-group Extension.depends fix from upstream#21609
georgesittas wants to merge 22 commits into
python:release-2.1from
VaggelisD:fix/cached-group-header-deps

Conversation

@georgesittas

@georgesittas georgesittas commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Apologies for the noise, gh resolved the base repo to the wrong upstream.

VaggelisD and others added 22 commits May 21, 2026 14:27
- test.yml: full mypyc test suite (py3.10-3.14) + macOS runtime tests + typecheck + lint
- build_wheels.yml: mypyc-compiled wheels via cibuildwheel for manylinux (x86_64 + aarch64), macOS (x86_64 + arm64)
- cibuildwheel.toml: enable mypyc compilation, run test_run.py against built wheels, skip Windows/PyPy/32-bit
- All runners are free standard GitHub-hosted runners for public repos
Adds a first-class `char` native type to mypyc, modeled on i64: stored
unboxed as int32 codepoint, with -1 as the empty-string sentinel, and
bidirectional str<->char promotion. Unblocks codepoint-level fast paths
in per-char loops.

Core type plumbing:
- MYPYC_NATIVE_CHAR_NAMES alongside MYPYC_NATIVE_INT_NAMES
- str <-> char bidirectional _promote in semanal_classprop
- str covers char in subtypes.covers_at_runtime + overlap in meet
- char_rprimitive (int32, is_native_int, error_overlap=False)
- mypy_extensions.char stub

Boxing / unboxing:
- CPyChar_FromObject (accepts 0/1-char str, -113 on type error)
- CPyChar_ToStr (uses interned empty-str singleton for -1)
- bool(char) checks != -1, not != 0, so "\0" stays truthy

Codegen fast paths:
- try_specialize_codepoint_compare in transform_comparison_expr handles
  char/char, char/s[i], char/0-or-1-char-literal, and s[i]/literal
  uniformly, compiling to int compare of the codepoint
- ord(s[i]) refactored to share the codepoint read path
- char.isspace/isdigit/isalnum/isalpha/isidentifier/upper method_ops
  route to codepoint-taking C helpers in str_extra_ops.h
- CPyChar_IsIdentifier delegates to PyUnicode_IsIdentifier for non-ASCII
  (correct XID_Start handling rather than Py_UNICODE_ISALPHA approximation)
- CPyChar_Upper falls back to str.upper() for non-ASCII, returning the
  original codepoint when upper() produces multiple chars (e.g. ß -> SS)
  since char holds one codepoint

New IR transform pass (runs after lower_ir, before dep collection):
- char_str_index_fold: folds Unbox(CPyStr_GetItem(s, i) -> char) to a
  direct CPyStr_GetCharAt int32 read, avoiding the 1-char PyObject alloc

Also adds str.isalpha() method_op via CPyStr_IsAlpha.

Tests:
- run-char.test covers boxing/unboxing, bool semantics (NUL is truthy,
  empty is falsy), equality, classification methods (including non-ASCII
  XID_Start for isidentifier), upper (including ß -> ß pinning for the
  multi-char fallback), str promotion, concatenation, s[i]=="x"
  specialization, ord, and astral-plane codepoints.
- char stub added to test-data/unit/lib-stub/mypy_extensions.pyi so the
  test harness can resolve the type.
Five small changes needed to get the matrix green on the fork's release-1.20
branch (all platform/version drift, no mypyc logic changes):

- Run black==25.9.0 over files that diverged from the pinned pre-commit
  version: emitmodule.py, emitwrapper.py, expression.py, char_str_index_fold.py,
  test_subclass_base.py.
- Gate run-char.test on `mypy_extensions.char` actually being importable at
  runtime. The fork ships a stub but the experimental runtime isn't on PyPI,
  so CI installs stock mypy_extensions and every char test fails with
  ImportError. Skip the file when char is missing; keep running it locally
  where the patched runtime is present.
- Skip test_decode_with_extra_data_after_padding on Python 3.13+.
  CPython 3.13.x and 3.14 tightened base64.b64decode to raise on trailing
  data after padding; our lenient native implementation doesn't, so the
  stdlib equivalence check diverges. Guarded by sys.version_info with the
  operator type:ignore that run-async.test already uses for the same check.
- Cap pathspec at <1.1 in pyproject.toml. pathspec 1.1.0 (released
  2026-04-23) made PathSpec a Generic, which trips a `Missing type arguments`
  error when mypyc self-compiles mypy/modulefinder.py during the build-env
  install. test-requirements.txt already pins 1.0.0 but the build env only
  sees the pyproject constraint.
- Drop the `type: ignore[attr-defined]` on the `from mypy_extensions import
  char` check. The typeshed stub declares `class char`, so mypy flags the
  ignore as unused; the runtime still handles the stock-mypy_extensions case
  via the surrounding `except ImportError`.
…emental build

mypyc_build builds Extension.depends from get_header_deps(), which regex-matches
every `#include "foo"` and `#include <foo>` in the generated C and prepends
target_dir. That works for `<sqlglot/__native_errors.h>` (resolves to
`build/sqlglot/__native_errors.h`, exists) but produces nonexistent paths for:

  - lib-rt headers like `<CPy.h>`, `<Python.h>` -> `build/CPy.h`
    (the C compiler resolves these via -I, not target_dir)
  - per-module relative includes like `"__native_athena.h"` -> `build/__native_athena.h`
    (the actual file is at `build/sqlglot/parsers/__native_athena.h`,
    relative to the includer's directory)

setuptools' newer_group with missing="newer" treats every missing dep as
"newer than target", so any extension whose group ran codegen this build
was always recompiled. With separate=True that's anywhere from 0 to ~half
the codebase per incremental build, regardless of what actually changed.

Resolve includes per-cfile against (cfile_dir, target_dir), keeping any
candidate that exists. lib-rt headers don't change between builds so
dropping them from depends is safe; per-module headers under target_dir
are preserved as the genuine cross-module struct-layout deps.

Run the resolution in a second pass over all groups so sibling-group
headers exist before each cfile's deps are checked.

Verified against sqlglot[c] (separate=True, ~100 modules):

  Edit                       Before fix    After fix
  no-op rebuild              44 recompiles  0
  parsers/snowflake.py       44             2 (snowflake parser+generator)
  parsers/mysql.py           44             5 (mysql + 4 subclasses)
  expressions/core.py        44             ~90 (real closure)

Pre-fix was wrong both directions: too many for leaf edits, too few for
center edits (the same 44 modules every time, regardless of impact).
Per Copilot review on #5: `resolve_cfile_deps` previously tried the
includer's directory first regardless of include kind, which differs
from the C preprocessor's actual behavior for `#include <foo>`
(angle-bracket form skips the includer's dir, only -I paths are
searched). For mypyc's emitted code the two paths happen to converge
in practice — cross-group angled includes always use a qualified
prefix (`<lib/__native_functions.h>`) that won't collide with anything
in the includer's dir — but if a future emit introduces an unqualified
angled include, the resolver would record the wrong file's path and
mtime, leading to subtle incremental-rebuild bugs.

Carry the include kind through resolution:

- `_INCLUDE_RE` is rewritten as an alternation whose two capture groups
  separate the quoted vs angle-bracket forms. A small `_extract_includes`
  helper turns matches into `(is_angled, name)` tuples.
- `get_header_deps` now returns `list[tuple[bool, str]]`. Only one
  internal caller (in `mypyc_build`), updated accordingly.
- `resolve_cfile_deps` consults `(includer_dir, target_dir)` for quoted
  includes and `(target_dir,)` only for angled ones, matching what the
  C preprocessor actually does.

Unit tests in `mypyc/test/test_misc.py::TestHeaderDeps` updated to the
new return shape, and a new `test_resolve_search_order_matches_preprocessor`
asserts that the same header name resolves to the includer-dir copy
under quoted form and the target_dir copy under angled form.
mypyc_build's get_header_deps regex-matched only `#include "..."` and
only scanned the .c file's contents, not the headers it transitively
includes. That misses the cross-group export-table header chain:

  __native_<mod>.c
    #include "__native_internal_<mod>.h"        <-- picked up
        #include <other_group/__native_other.h> <-- MISSED
            (angle brackets, and inside a header)

`__native_internal_<mod>.h` is where mypyc emits the cross-group
`struct export_table_<other_group>` declaration, by `#include`ing the
other group's `__native_<other>.h`. The consumer's .c file then accesses
exported classes/functions as `exports_<other_group>.CPyDef_<X>`, which
gcc/clang resolve to byte offsets into that struct at C compile time
and bakes into the consumer's .o.

When the cross-group header is missing from `Extension.depends`,
setuptools' `newer_group` doesn't see it as a reason to recompile the
consumer, so an incremental edit that shifts struct offsets in the
producer (e.g. inserting a new class earlier in the file, which adds
slots to its `export_table_<group>`) leaves the consumer's .o pointing
at stale offsets. The baked-in offset for `X` now resolves to whatever
class or function the producer's new layout placed at that slot, and
the consumer silently constructs the wrong thing — no compile error,
no load error, just `make_target()` returning `Inserted` instead of
`Target`.

Fix in two parts:

1. `_INCLUDE_RE` now matches both `"foo"` and `<foo>` includes.
   Lib-rt headers (`<Python.h>`, `<CPy.h>`, etc.) don't resolve under
   either the includer's dir or target_dir, so they're dropped during
   resolution and don't add spurious rebuilds.

2. Extract the dep resolution into `resolve_cfile_deps` and make the
   walk transitive: each resolved .h file is opened and re-scanned
   for its own includes, with the search dir set to that header's
   own directory. This is what `gcc -M` would do, and matches the
   actual C preprocessor's view of the dep graph. The walk is
   bounded by the `resolved` set (no revisits) and by the fact that
   only paths existing under `(includer_dir, target_dir)` are
   followed, so it terminates trivially.

Pre-existing in mypyc and only reachable once the prior
over-conservative 44-file always-rebuild was lifted (1.20.0.post5),
because that wasteful behavior kept cross-group consumers
self-consistently rebuilt by accident.

Verified against a 4-file MRE with package re-export (mimicking
`from .functions import *`): cold build returns correct classes;
inserting `NewClass` between `Beta` and `Gamma` and running an
incremental build (with the same `.mypy_cache/` and `build/`)
previously returned `NewClass` from `make_gamma()` and `Gamma` from
`make_delta()`; after the fix, `caller__mypyc.o` is correctly
recompiled and both functions return their expected classes.

Adds unit tests in `mypyc/test/test_misc.py::TestHeaderDeps` covering
the regex change, transitive header walking (the exact bug scenario),
the lib-rt drop behavior, and the includer-dir-first resolution
preference.
…ental builds

In separate=True mode, when generate_c returns empty cfiles for a group
(the fully-cached path — typical of pip's second setup.py invocation),
per_cfile_deps was never populated for that group.  Extension.depends
therefore stayed empty, so cross-group export-table header changes caused
by inserting a new class that shifts struct offsets never triggered a
recompile of the cached consumer's .o.  The stale .o then baked in the
old struct offsets, silently resolving them to wrong classes at runtime.

Fix: when the on-disk .c file exists for a cached group, read it before
calling get_header_deps so the dep resolver can walk the transitive header
chain and include cross-group headers in Extension.depends.  Also fixes
an inconsistent errors="replace" in resolve_cfile_deps (now plain
encoding="utf-8" throughout) and adds a test that directly demonstrates
the before/after behavior.
With `separate=True` and cross-module inheritance, when a subclass module
is recompiled incrementally without its parent (parent loaded from
mypy's cache, so `ClassDef.defs.body` is empty), `find_attr_initializers`
gathers no defaults from the parent. The subclass therefore has no
`__mypyc_defaults_setup` of its own, and `ClassIR.get_method` returns
the parent's. `emit_attr_defaults_func_call` then emitted a raw
`CPyDef_<parent>___...` call with no cross-group export-table prefix,
producing C that fails to compile:

    error: call to undeclared function
    'CPyDef_<parent_module>___<Parent>_____mypyc_defaults_setup'

The parent's header only declares the function as a pointer inside
`struct export_table_<group>`, so the symbol isn't reachable as a free
function from the subclass's compilation unit.

Apply `emitter.get_group_prefix(defaults_fn.decl)` at this call site,
matching the pattern already used by `emit_setup_or_dunder_new_call`,
`generate_constructor_for_class`, and the other cross-group call sites
in `emitclass.py`. `get_group_prefix` returns `""` for same-group calls
(intra-group behaviour unchanged) and `"exports_<group>."` when the
target lives in a different group; it also registers the target group
in `context.group_deps` so the right header gets `#include`d.

Reproducer (`base.py` with attribute defaults, `child.py` empty subclass,
`mypycify([...], separate=True)`): cold build succeeds, then touching
only `child.py` and rebuilding previously failed with the
implicit-declaration error. Generated C now correctly emits
`exports_base.CPyDef_base___Parent_____mypyc_defaults_setup(...)` and
`Child().x` returns the inherited default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reproduces the bug fixed in the parent commit: under TestRunSeparate,
the subclass module gets recompiled while the parent module is loaded
from mypy's cache (so `ClassDef.defs.body` is empty and the subclass
inherits no own `__mypyc_defaults_setup`). The emitted call to the
parent's setup function must use the cross-group `exports_<group>.`
prefix or the generated C fails to compile.

The test passes under TestRun and TestRunMultiFile (which don't
exercise cross-group calls) and fails under TestRunSeparate without
the fix. Verified by temporarily reverting `emit_attr_defaults_func_call`
to the pre-fix form and observing the implicit-declaration error in
`__native_other_a.c`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
detect_undefined_bitmap() was extending cl.bitmap_attrs in place. Under
separate=True each SCC's analyze_always_defined_attrs is invoked once per
group, and detect_undefined_bitmap recurses through cl.base_mro from the
subclass into its base classes. The seen set passed in dedupes within one
call but is fresh per call, so every subclass-group call re-extends the
shared base class's bitmap_attrs with another copy of the contributions.

The base class's emitted ObjectStruct then grows by one bitmap field per
~32 subclasses processed in the same build. The exact final length is a
function of how many SCCs went through compile_scc_to_ir this run:

  - clean build: every SCC fresh -> base bitmap_attrs accumulates fully
  - incremental build affecting N subclasses: base accumulates a fraction
  - second incremental: yet another count

Subclasses not rebuilt this round still see their base's old, larger
struct layout. Any attribute access on the base segfaults with a
mismatched bitmap-field offset.

Pre-existing in mypyc; only manifested once the prior over-conservative
44-file always-rebuild was lifted (1.20.0.post5), because that wasteful
behavior kept rebuild sets self-consistent.

Fix: compute a fresh local list and assign at the end. The function
becomes naturally idempotent across repeated calls — same input, same
output, regardless of how many groups have visited the class. No new
fields, no serialization changes.

Verified against sqlglot[c] (separate=True, ~100 modules):

  Edit: add a method to MySQLParser (a class with 7 dialect subclasses)
  Before: parser.h struct layout differs between clean and incremental
          builds; make unitc segfaults at first parser-using test.
  After:  parser.h identical between clean and incremental;
          make unitc passes (1163 tests, 0 segfaults).
Unboxes IntEnum operands to int_rprimitive for native comparison
instead of PyObject_RichCompare. Applied selectively:
- Ordering ops (<, <=, >, >=): always (2.5x faster in microbench)
- ==/!=: only for IntEnum vs int (IntEnum-vs-IntEnum already uses
  fast identity comparison via singleton pointer equality)
When both sides of == or != are type objects (TypeType from type(x)
or CallableType from a class reference), use pointer identity (is/
is not) instead of PyObject_RichCompare. Type objects are singletons
so identity is equivalent to equality. Microbenchmark shows 3.2-3.7x
speedup.
When both sides of ==, !=, is, or is not will resolve to pointer
identity comparison (no custom __eq__), pass can_borrow=True when
accepting operands. This eliminates unnecessary INCREF/DECREF pairs
around the comparison.

The check mirrors ll_builder.compare_instances: both operands must
be the same RInstance type with no __eq__, final __eq__/__ne__, no
Python inheritance, and not augmented (dataclass etc).

Parser-only benchmark shows 5-7% speedup on representative queries
due to eliminated refcount ops in hot paths like _match().
When calling a method on a value loaded from a native struct field
(e.g. expression.args.get("key")), borrow the field value instead
of generating INCREF/DECREF. The struct owner is kept alive via
KeepAlive, guaranteeing the field value remains valid.

Eliminates ~850 INCREF/DECREF pairs in SQLGlot's generated C code.
Four issues blocked the initial publish:

- mypyc/build.py write_file unconditional on cached groups: the
  cda8316 cherry-pick dropped the `if ctext` guard around write_file,
  so an empty ctext (mypy returns this for cached groups under
  separate=True) overwrote the previously-emitted .c with an empty stub.
  The next compile then linked against a stub `.so` that re-declared
  cross-group export_table_<group> structs without a definition, and
  9 separate=True tests failed with "incomplete type" errors. Restore
  the guard so cached groups keep their on-disk .c intact.
- mypyc/build.py angled-vs-quoted include kind: the cd0c079 cherry-pick
  (784ec63) updated type annotations to list[tuple[bool, str]] but the
  matching _INCLUDE_RE rewrite, _extract_includes helper, and
  resolve_cfile_deps worklist unpacking were dropped during conflict
  resolution. Restore them and update mypyc/test/test_misc.py to the
  new return shape so build.py typechecks cleanly under self-compile.
- cibuildwheel.toml PyPy in default matrix: cibuildwheel 2.22 enables
  PyPy in its default matrix. PyPy lacks prebuilt ast-serialize wheels
  and the build env can't bootstrap Rust for maturin, so every pp* job
  died. Skip pp*, *-win*, *-musllinux_aarch64, *-manylinux_i686, and
  free-threaded builds, matching the release-1.20 skip list.
- .github/workflows/test.yml triggering twice on tag push: dropped the
  `tags: ['*']` push trigger so the Tests workflow only runs on branch
  pushes/PRs. A combined branch+tag push previously kicked off two Tests
  runs on top of the one Build-and-publish run.
…ss groups (python#21524)

The fix for this was included in python#21369, but no dedicated test was
added.

This adds `testIncrementalBuiltinBaseClassConstruction` to
`run-multimodule.test`: three modules compiled with `separate=True`,
where step 2 changes a helper module's signature to force the caller to
be recompiled while the exception module is only loaded from cache.
…thon#21547)

Fixes python#21542

Under `separate=True`, when a subclass is recompiled while its parent is
loaded from mypy's incremental cache, parent default-attribute
assignments are silently dropped from the subclass's
`__mypyc_defaults_setup`. The first read of an inherited default-attr
then raises:

```
    AttributeError: attribute '<name>' of '<Parent>' undefined
```

`find_attr_initializers` walks `cdef.info.mro` and reads
`info.defn.defs.body` for `AssignmentStmt`s. `ClassDef.serialize`
(mypy/nodes.py) does not serialize `defs`, so a cache-loaded parent has
`defs = Block([])`; the MRO walk collects no parent assignments and the
subclass's emitted setup leaves inherited slots in the
undefined-sentinel state.

This PR implements the fix discussed in the linked issue.
…chain

fix(mypyc): preserve inherited class attribute defaults under separate=True
The cross-group header-deps work that shipped in 2.1.0.post1 was
upstreamed and merged as ab8e4bf, but
the improvements added during review never flowed back to release-2.1:

- mypyc_build: when a fully-cached group returns its cfile name with
  empty contents, re-read the on-disk .c before calling get_header_deps.
  The existing fallback only covered groups that return no cfile entries
  at all, so Extension.depends stayed empty for the shape that actually
  occurs and setuptools never recompiled stale consumer .o files when a
  dep's export-table struct layout shifted.
- get_header_deps: assert non-empty contents to keep this from
  regressing silently.
- fudge_dir_mtimes: stop shifting linker outputs back; combined with
  write_file's +1s bump this made every .c permanently newer than its
  .so, forcing unconditional rebuilds that masked depends bugs in tests.
- Add the testIncrementalCrossGroupExportTableOffsets regression test.

This is the bug behind the sqlglot CI segfault: a cached sqlglotc wheel
shipped a stale qualify.o whose quote_identifiers slot dispatched into
qualify_outputs after a new function was inserted mid-struct in
qualify_columns' export table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants