[mypyc] Backport cached-group Extension.depends fix from upstream#21609
Closed
georgesittas wants to merge 22 commits into
Closed
[mypyc] Backport cached-group Extension.depends fix from upstream#21609georgesittas wants to merge 22 commits into
georgesittas wants to merge 22 commits into
Conversation
- test.yml: full mypyc test suite (py3.10-3.14) + macOS runtime tests + typecheck + lint - build_wheels.yml: mypyc-compiled wheels via cibuildwheel for manylinux (x86_64 + aarch64), macOS (x86_64 + arm64) - cibuildwheel.toml: enable mypyc compilation, run test_run.py against built wheels, skip Windows/PyPy/32-bit - All runners are free standard GitHub-hosted runners for public repos
Adds a first-class `char` native type to mypyc, modeled on i64: stored unboxed as int32 codepoint, with -1 as the empty-string sentinel, and bidirectional str<->char promotion. Unblocks codepoint-level fast paths in per-char loops. Core type plumbing: - MYPYC_NATIVE_CHAR_NAMES alongside MYPYC_NATIVE_INT_NAMES - str <-> char bidirectional _promote in semanal_classprop - str covers char in subtypes.covers_at_runtime + overlap in meet - char_rprimitive (int32, is_native_int, error_overlap=False) - mypy_extensions.char stub Boxing / unboxing: - CPyChar_FromObject (accepts 0/1-char str, -113 on type error) - CPyChar_ToStr (uses interned empty-str singleton for -1) - bool(char) checks != -1, not != 0, so "\0" stays truthy Codegen fast paths: - try_specialize_codepoint_compare in transform_comparison_expr handles char/char, char/s[i], char/0-or-1-char-literal, and s[i]/literal uniformly, compiling to int compare of the codepoint - ord(s[i]) refactored to share the codepoint read path - char.isspace/isdigit/isalnum/isalpha/isidentifier/upper method_ops route to codepoint-taking C helpers in str_extra_ops.h - CPyChar_IsIdentifier delegates to PyUnicode_IsIdentifier for non-ASCII (correct XID_Start handling rather than Py_UNICODE_ISALPHA approximation) - CPyChar_Upper falls back to str.upper() for non-ASCII, returning the original codepoint when upper() produces multiple chars (e.g. ß -> SS) since char holds one codepoint New IR transform pass (runs after lower_ir, before dep collection): - char_str_index_fold: folds Unbox(CPyStr_GetItem(s, i) -> char) to a direct CPyStr_GetCharAt int32 read, avoiding the 1-char PyObject alloc Also adds str.isalpha() method_op via CPyStr_IsAlpha. Tests: - run-char.test covers boxing/unboxing, bool semantics (NUL is truthy, empty is falsy), equality, classification methods (including non-ASCII XID_Start for isidentifier), upper (including ß -> ß pinning for the multi-char fallback), str promotion, concatenation, s[i]=="x" specialization, ord, and astral-plane codepoints. - char stub added to test-data/unit/lib-stub/mypy_extensions.pyi so the test harness can resolve the type.
Five small changes needed to get the matrix green on the fork's release-1.20 branch (all platform/version drift, no mypyc logic changes): - Run black==25.9.0 over files that diverged from the pinned pre-commit version: emitmodule.py, emitwrapper.py, expression.py, char_str_index_fold.py, test_subclass_base.py. - Gate run-char.test on `mypy_extensions.char` actually being importable at runtime. The fork ships a stub but the experimental runtime isn't on PyPI, so CI installs stock mypy_extensions and every char test fails with ImportError. Skip the file when char is missing; keep running it locally where the patched runtime is present. - Skip test_decode_with_extra_data_after_padding on Python 3.13+. CPython 3.13.x and 3.14 tightened base64.b64decode to raise on trailing data after padding; our lenient native implementation doesn't, so the stdlib equivalence check diverges. Guarded by sys.version_info with the operator type:ignore that run-async.test already uses for the same check. - Cap pathspec at <1.1 in pyproject.toml. pathspec 1.1.0 (released 2026-04-23) made PathSpec a Generic, which trips a `Missing type arguments` error when mypyc self-compiles mypy/modulefinder.py during the build-env install. test-requirements.txt already pins 1.0.0 but the build env only sees the pyproject constraint. - Drop the `type: ignore[attr-defined]` on the `from mypy_extensions import char` check. The typeshed stub declares `class char`, so mypy flags the ignore as unused; the runtime still handles the stock-mypy_extensions case via the surrounding `except ImportError`.
…emental build
mypyc_build builds Extension.depends from get_header_deps(), which regex-matches
every `#include "foo"` and `#include <foo>` in the generated C and prepends
target_dir. That works for `<sqlglot/__native_errors.h>` (resolves to
`build/sqlglot/__native_errors.h`, exists) but produces nonexistent paths for:
- lib-rt headers like `<CPy.h>`, `<Python.h>` -> `build/CPy.h`
(the C compiler resolves these via -I, not target_dir)
- per-module relative includes like `"__native_athena.h"` -> `build/__native_athena.h`
(the actual file is at `build/sqlglot/parsers/__native_athena.h`,
relative to the includer's directory)
setuptools' newer_group with missing="newer" treats every missing dep as
"newer than target", so any extension whose group ran codegen this build
was always recompiled. With separate=True that's anywhere from 0 to ~half
the codebase per incremental build, regardless of what actually changed.
Resolve includes per-cfile against (cfile_dir, target_dir), keeping any
candidate that exists. lib-rt headers don't change between builds so
dropping them from depends is safe; per-module headers under target_dir
are preserved as the genuine cross-module struct-layout deps.
Run the resolution in a second pass over all groups so sibling-group
headers exist before each cfile's deps are checked.
Verified against sqlglot[c] (separate=True, ~100 modules):
Edit Before fix After fix
no-op rebuild 44 recompiles 0
parsers/snowflake.py 44 2 (snowflake parser+generator)
parsers/mysql.py 44 5 (mysql + 4 subclasses)
expressions/core.py 44 ~90 (real closure)
Pre-fix was wrong both directions: too many for leaf edits, too few for
center edits (the same 44 modules every time, regardless of impact).
Per Copilot review on #5: `resolve_cfile_deps` previously tried the includer's directory first regardless of include kind, which differs from the C preprocessor's actual behavior for `#include <foo>` (angle-bracket form skips the includer's dir, only -I paths are searched). For mypyc's emitted code the two paths happen to converge in practice — cross-group angled includes always use a qualified prefix (`<lib/__native_functions.h>`) that won't collide with anything in the includer's dir — but if a future emit introduces an unqualified angled include, the resolver would record the wrong file's path and mtime, leading to subtle incremental-rebuild bugs. Carry the include kind through resolution: - `_INCLUDE_RE` is rewritten as an alternation whose two capture groups separate the quoted vs angle-bracket forms. A small `_extract_includes` helper turns matches into `(is_angled, name)` tuples. - `get_header_deps` now returns `list[tuple[bool, str]]`. Only one internal caller (in `mypyc_build`), updated accordingly. - `resolve_cfile_deps` consults `(includer_dir, target_dir)` for quoted includes and `(target_dir,)` only for angled ones, matching what the C preprocessor actually does. Unit tests in `mypyc/test/test_misc.py::TestHeaderDeps` updated to the new return shape, and a new `test_resolve_search_order_matches_preprocessor` asserts that the same header name resolves to the includer-dir copy under quoted form and the target_dir copy under angled form.
mypyc_build's get_header_deps regex-matched only `#include "..."` and
only scanned the .c file's contents, not the headers it transitively
includes. That misses the cross-group export-table header chain:
__native_<mod>.c
#include "__native_internal_<mod>.h" <-- picked up
#include <other_group/__native_other.h> <-- MISSED
(angle brackets, and inside a header)
`__native_internal_<mod>.h` is where mypyc emits the cross-group
`struct export_table_<other_group>` declaration, by `#include`ing the
other group's `__native_<other>.h`. The consumer's .c file then accesses
exported classes/functions as `exports_<other_group>.CPyDef_<X>`, which
gcc/clang resolve to byte offsets into that struct at C compile time
and bakes into the consumer's .o.
When the cross-group header is missing from `Extension.depends`,
setuptools' `newer_group` doesn't see it as a reason to recompile the
consumer, so an incremental edit that shifts struct offsets in the
producer (e.g. inserting a new class earlier in the file, which adds
slots to its `export_table_<group>`) leaves the consumer's .o pointing
at stale offsets. The baked-in offset for `X` now resolves to whatever
class or function the producer's new layout placed at that slot, and
the consumer silently constructs the wrong thing — no compile error,
no load error, just `make_target()` returning `Inserted` instead of
`Target`.
Fix in two parts:
1. `_INCLUDE_RE` now matches both `"foo"` and `<foo>` includes.
Lib-rt headers (`<Python.h>`, `<CPy.h>`, etc.) don't resolve under
either the includer's dir or target_dir, so they're dropped during
resolution and don't add spurious rebuilds.
2. Extract the dep resolution into `resolve_cfile_deps` and make the
walk transitive: each resolved .h file is opened and re-scanned
for its own includes, with the search dir set to that header's
own directory. This is what `gcc -M` would do, and matches the
actual C preprocessor's view of the dep graph. The walk is
bounded by the `resolved` set (no revisits) and by the fact that
only paths existing under `(includer_dir, target_dir)` are
followed, so it terminates trivially.
Pre-existing in mypyc and only reachable once the prior
over-conservative 44-file always-rebuild was lifted (1.20.0.post5),
because that wasteful behavior kept cross-group consumers
self-consistently rebuilt by accident.
Verified against a 4-file MRE with package re-export (mimicking
`from .functions import *`): cold build returns correct classes;
inserting `NewClass` between `Beta` and `Gamma` and running an
incremental build (with the same `.mypy_cache/` and `build/`)
previously returned `NewClass` from `make_gamma()` and `Gamma` from
`make_delta()`; after the fix, `caller__mypyc.o` is correctly
recompiled and both functions return their expected classes.
Adds unit tests in `mypyc/test/test_misc.py::TestHeaderDeps` covering
the regex change, transitive header walking (the exact bug scenario),
the lib-rt drop behavior, and the includer-dir-first resolution
preference.
…ental builds In separate=True mode, when generate_c returns empty cfiles for a group (the fully-cached path — typical of pip's second setup.py invocation), per_cfile_deps was never populated for that group. Extension.depends therefore stayed empty, so cross-group export-table header changes caused by inserting a new class that shifts struct offsets never triggered a recompile of the cached consumer's .o. The stale .o then baked in the old struct offsets, silently resolving them to wrong classes at runtime. Fix: when the on-disk .c file exists for a cached group, read it before calling get_header_deps so the dep resolver can walk the transitive header chain and include cross-group headers in Extension.depends. Also fixes an inconsistent errors="replace" in resolve_cfile_deps (now plain encoding="utf-8" throughout) and adds a test that directly demonstrates the before/after behavior.
With `separate=True` and cross-module inheritance, when a subclass module
is recompiled incrementally without its parent (parent loaded from
mypy's cache, so `ClassDef.defs.body` is empty), `find_attr_initializers`
gathers no defaults from the parent. The subclass therefore has no
`__mypyc_defaults_setup` of its own, and `ClassIR.get_method` returns
the parent's. `emit_attr_defaults_func_call` then emitted a raw
`CPyDef_<parent>___...` call with no cross-group export-table prefix,
producing C that fails to compile:
error: call to undeclared function
'CPyDef_<parent_module>___<Parent>_____mypyc_defaults_setup'
The parent's header only declares the function as a pointer inside
`struct export_table_<group>`, so the symbol isn't reachable as a free
function from the subclass's compilation unit.
Apply `emitter.get_group_prefix(defaults_fn.decl)` at this call site,
matching the pattern already used by `emit_setup_or_dunder_new_call`,
`generate_constructor_for_class`, and the other cross-group call sites
in `emitclass.py`. `get_group_prefix` returns `""` for same-group calls
(intra-group behaviour unchanged) and `"exports_<group>."` when the
target lives in a different group; it also registers the target group
in `context.group_deps` so the right header gets `#include`d.
Reproducer (`base.py` with attribute defaults, `child.py` empty subclass,
`mypycify([...], separate=True)`): cold build succeeds, then touching
only `child.py` and rebuilding previously failed with the
implicit-declaration error. Generated C now correctly emits
`exports_base.CPyDef_base___Parent_____mypyc_defaults_setup(...)` and
`Child().x` returns the inherited default.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reproduces the bug fixed in the parent commit: under TestRunSeparate, the subclass module gets recompiled while the parent module is loaded from mypy's cache (so `ClassDef.defs.body` is empty and the subclass inherits no own `__mypyc_defaults_setup`). The emitted call to the parent's setup function must use the cross-group `exports_<group>.` prefix or the generated C fails to compile. The test passes under TestRun and TestRunMultiFile (which don't exercise cross-group calls) and fails under TestRunSeparate without the fix. Verified by temporarily reverting `emit_attr_defaults_func_call` to the pre-fix form and observing the implicit-declaration error in `__native_other_a.c`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
detect_undefined_bitmap() was extending cl.bitmap_attrs in place. Under
separate=True each SCC's analyze_always_defined_attrs is invoked once per
group, and detect_undefined_bitmap recurses through cl.base_mro from the
subclass into its base classes. The seen set passed in dedupes within one
call but is fresh per call, so every subclass-group call re-extends the
shared base class's bitmap_attrs with another copy of the contributions.
The base class's emitted ObjectStruct then grows by one bitmap field per
~32 subclasses processed in the same build. The exact final length is a
function of how many SCCs went through compile_scc_to_ir this run:
- clean build: every SCC fresh -> base bitmap_attrs accumulates fully
- incremental build affecting N subclasses: base accumulates a fraction
- second incremental: yet another count
Subclasses not rebuilt this round still see their base's old, larger
struct layout. Any attribute access on the base segfaults with a
mismatched bitmap-field offset.
Pre-existing in mypyc; only manifested once the prior over-conservative
44-file always-rebuild was lifted (1.20.0.post5), because that wasteful
behavior kept rebuild sets self-consistent.
Fix: compute a fresh local list and assign at the end. The function
becomes naturally idempotent across repeated calls — same input, same
output, regardless of how many groups have visited the class. No new
fields, no serialization changes.
Verified against sqlglot[c] (separate=True, ~100 modules):
Edit: add a method to MySQLParser (a class with 7 dialect subclasses)
Before: parser.h struct layout differs between clean and incremental
builds; make unitc segfaults at first parser-using test.
After: parser.h identical between clean and incremental;
make unitc passes (1163 tests, 0 segfaults).
Unboxes IntEnum operands to int_rprimitive for native comparison instead of PyObject_RichCompare. Applied selectively: - Ordering ops (<, <=, >, >=): always (2.5x faster in microbench) - ==/!=: only for IntEnum vs int (IntEnum-vs-IntEnum already uses fast identity comparison via singleton pointer equality)
When both sides of == or != are type objects (TypeType from type(x) or CallableType from a class reference), use pointer identity (is/ is not) instead of PyObject_RichCompare. Type objects are singletons so identity is equivalent to equality. Microbenchmark shows 3.2-3.7x speedup.
When both sides of ==, !=, is, or is not will resolve to pointer identity comparison (no custom __eq__), pass can_borrow=True when accepting operands. This eliminates unnecessary INCREF/DECREF pairs around the comparison. The check mirrors ll_builder.compare_instances: both operands must be the same RInstance type with no __eq__, final __eq__/__ne__, no Python inheritance, and not augmented (dataclass etc). Parser-only benchmark shows 5-7% speedup on representative queries due to eliminated refcount ops in hot paths like _match().
When calling a method on a value loaded from a native struct field
(e.g. expression.args.get("key")), borrow the field value instead
of generating INCREF/DECREF. The struct owner is kept alive via
KeepAlive, guaranteeing the field value remains valid.
Eliminates ~850 INCREF/DECREF pairs in SQLGlot's generated C code.
Four issues blocked the initial publish: - mypyc/build.py write_file unconditional on cached groups: the cda8316 cherry-pick dropped the `if ctext` guard around write_file, so an empty ctext (mypy returns this for cached groups under separate=True) overwrote the previously-emitted .c with an empty stub. The next compile then linked against a stub `.so` that re-declared cross-group export_table_<group> structs without a definition, and 9 separate=True tests failed with "incomplete type" errors. Restore the guard so cached groups keep their on-disk .c intact. - mypyc/build.py angled-vs-quoted include kind: the cd0c079 cherry-pick (784ec63) updated type annotations to list[tuple[bool, str]] but the matching _INCLUDE_RE rewrite, _extract_includes helper, and resolve_cfile_deps worklist unpacking were dropped during conflict resolution. Restore them and update mypyc/test/test_misc.py to the new return shape so build.py typechecks cleanly under self-compile. - cibuildwheel.toml PyPy in default matrix: cibuildwheel 2.22 enables PyPy in its default matrix. PyPy lacks prebuilt ast-serialize wheels and the build env can't bootstrap Rust for maturin, so every pp* job died. Skip pp*, *-win*, *-musllinux_aarch64, *-manylinux_i686, and free-threaded builds, matching the release-1.20 skip list. - .github/workflows/test.yml triggering twice on tag push: dropped the `tags: ['*']` push trigger so the Tests workflow only runs on branch pushes/PRs. A combined branch+tag push previously kicked off two Tests runs on top of the one Build-and-publish run.
…ss groups (python#21524) The fix for this was included in python#21369, but no dedicated test was added. This adds `testIncrementalBuiltinBaseClassConstruction` to `run-multimodule.test`: three modules compiled with `separate=True`, where step 2 changes a helper module's signature to force the caller to be recompiled while the exception module is only loaded from cache.
…thon#21547) Fixes python#21542 Under `separate=True`, when a subclass is recompiled while its parent is loaded from mypy's incremental cache, parent default-attribute assignments are silently dropped from the subclass's `__mypyc_defaults_setup`. The first read of an inherited default-attr then raises: ``` AttributeError: attribute '<name>' of '<Parent>' undefined ``` `find_attr_initializers` walks `cdef.info.mro` and reads `info.defn.defs.body` for `AssignmentStmt`s. `ClassDef.serialize` (mypy/nodes.py) does not serialize `defs`, so a cache-loaded parent has `defs = Block([])`; the MRO walk collects no parent assignments and the subclass's emitted setup leaves inherited slots in the undefined-sentinel state. This PR implements the fix discussed in the linked issue.
…chain fix(mypyc): preserve inherited class attribute defaults under separate=True
The cross-group header-deps work that shipped in 2.1.0.post1 was upstreamed and merged as ab8e4bf, but the improvements added during review never flowed back to release-2.1: - mypyc_build: when a fully-cached group returns its cfile name with empty contents, re-read the on-disk .c before calling get_header_deps. The existing fallback only covered groups that return no cfile entries at all, so Extension.depends stayed empty for the shape that actually occurs and setuptools never recompiled stale consumer .o files when a dep's export-table struct layout shifted. - get_header_deps: assert non-empty contents to keep this from regressing silently. - fudge_dir_mtimes: stop shifting linker outputs back; combined with write_file's +1s bump this made every .c permanently newer than its .so, forcing unconditional rebuilds that masked depends bugs in tests. - Add the testIncrementalCrossGroupExportTableOffsets regression test. This is the bug behind the sqlglot CI segfault: a cached sqlglotc wheel shipped a stale qualify.o whose quote_identifiers slot dispatched into qualify_outputs after a new function was inserted mid-struct in qualify_columns' export table.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Apologies for the noise,
ghresolved the base repo to the wrong upstream.