Skip to content

Revert "Add wheel support for Newton-Schulz method via cuSolverMp"#3151

Merged
ksivaman merged 1 commit into
mainfrom
revert-3004-expand_wheel_builds
Jun 29, 2026
Merged

Revert "Add wheel support for Newton-Schulz method via cuSolverMp"#3151
ksivaman merged 1 commit into
mainfrom
revert-3004-expand_wheel_builds

Conversation

@ksivaman

Copy link
Copy Markdown
Member

Reverts #3004

Needs more proper thought on nvidia cuda-python lib dependencies in the Jax ecosystem.

@ksivaman ksivaman added the 2.17 label Jun 28, 2026
@greptile-apps

greptile-apps Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR reverts the wheel-level cuSolverMp integration added in #3004, removing it from the Docker build images, the wheel build script, the PyPI dependency list, and the runtime library loader. The underlying C++ Newton-Schulz implementation and its NVTE_WITH_CUSOLVERMP CMake opt-in are intentionally left intact for manual/source builds.

  • Dockerfile.aarch / Dockerfile.x86: cuSolverMp dnf installation, symlinks, ldconfig entry, and CUSOLVERMP_HOME env-var are all removed; LD_LIBRARY_PATH is trimmed accordingly.
  • build_tools.sh / setup.py: NVTE_WITH_CUSOLVERMP=1 export and the nvidia-cusolvermp-cu* PyPI install requirement are dropped so released wheels no longer pull in the cuSolverMp Python package.
  • transformer_engine/common/__init__.py: _is_cusolvermp_installed_in_system() and the conditional ctypes load are removed; _CUSOLVERMP_LIB_CTYPES has no remaining references.

Confidence Score: 5/5

Safe to merge — this is a targeted revert of wheel-level cuSolverMp wiring with no functional regressions; the C++ implementation remains intact behind its opt-in build flag.

All removed code is self-contained: Docker image setup, a wheel build export, a PyPI dependency call, and a ctypes loader that has no remaining references. The C++ cuSolverMp path is still available via the NVTE_WITH_CUSOLVERMP CMake flag for manual builds, so nothing is permanently deleted.

No files require special attention.

Important Files Changed

Filename Overview
build_tools/utils.py Removes cusolvermp_pypi_package_name() and trims the importlib.metadata import; PackageNotFoundError is now referenced via importlib.metadata.PackageNotFoundError rather than the named import (flagged in prior thread).
build_tools/wheel_utils/Dockerfile.aarch Removes cuSolverMp dnf package installation, symlink setup, ldconfig entry, and the CUSOLVERMP_HOME env-var; otherwise unchanged.
build_tools/wheel_utils/Dockerfile.x86 Mirror of Dockerfile.aarch change: removes cuSolverMp installation and CUSOLVERMP_HOME env-var from the x86 wheel build image.
build_tools/wheel_utils/build_wheels.sh Drops the export NVTE_WITH_CUSOLVERMP=1 line so cuSolverMp is no longer activated during wheel builds.
setup.py Removes the cusolvermp_pypi_package_name() call from install_reqs; the opt-in NVTE_WITH_CUSOLVERMP CMake path is intentionally left in place for manual/source builds.
transformer_engine/common/init.py Removes _is_cusolvermp_installed_in_system() helper and the runtime ctypes load of cusolverMp; _CUSOLVERMP_LIB_CTYPES has no surviving references, so the removal is complete and safe.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Wheel Build Triggered] --> B[Dockerfile.x86 / Dockerfile.aarch]
    B -->|before revert| C[Install cuSolverMp via dnf\nSet CUSOLVERMP_HOME]
    B -->|after revert| D[No cuSolverMp installation]
    C --> E[build_wheels.sh]
    D --> E
    E -->|before revert| F[export NVTE_WITH_CUSOLVERMP=1]
    E -->|after revert| G[No cuSolverMp flag]
    F --> H[setup.py cmake flags -DNVTE_WITH_CUSOLVERMP=ON]
    G --> I[setup.py: NVTE_WITH_CUSOLVERMP not set]
    H --> J[__init__.py loads cusolverMp ctypes]
    I --> K[__init__.py: no cusolverMp loading]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Wheel Build Triggered] --> B[Dockerfile.x86 / Dockerfile.aarch]
    B -->|before revert| C[Install cuSolverMp via dnf\nSet CUSOLVERMP_HOME]
    B -->|after revert| D[No cuSolverMp installation]
    C --> E[build_wheels.sh]
    D --> E
    E -->|before revert| F[export NVTE_WITH_CUSOLVERMP=1]
    E -->|after revert| G[No cuSolverMp flag]
    F --> H[setup.py cmake flags -DNVTE_WITH_CUSOLVERMP=ON]
    G --> I[setup.py: NVTE_WITH_CUSOLVERMP not set]
    H --> J[__init__.py loads cusolverMp ctypes]
    I --> K[__init__.py: no cusolverMp loading]
Loading

Reviews (2): Last reviewed commit: "Revert "Add wheel support for Newton-Sch..." | Re-trigger Greptile

Comment thread build_tools/utils.py
import platform
from pathlib import Path
from importlib.metadata import PackageNotFoundError, distribution, version as get_version
from importlib.metadata import version as get_version

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The revert dropped PackageNotFoundError from the named import but kept the reference as importlib.metadata.PackageNotFoundError. This works because import importlib is present and the from importlib.metadata import statement loads the submodule, but it is an inconsistent style. Re-adding the name to the existing from import keeps the catch site readable and consistent with how the rest of the file uses importlib.metadata symbols.

Suggested change
from importlib.metadata import version as get_version
from importlib.metadata import PackageNotFoundError, version as get_version

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment thread build_tools/utils.py
version_tuple = tuple(int(part) for part in version_str.split(".") if part.isdigit())
return version_tuple
except PackageNotFoundError:
except importlib.metadata.PackageNotFoundError:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Pairing change: once PackageNotFoundError is re-exported from the from importlib.metadata import line above, the catch clause can revert to the shorter, original form.

Suggested change
except importlib.metadata.PackageNotFoundError:
except PackageNotFoundError:

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

)"

This reverts commit 20e185c.

Signed-off-by: ksivamani <ksivamani@nvidia.com>
@ksivaman ksivaman force-pushed the revert-3004-expand_wheel_builds branch from fec7c65 to d638d41 Compare June 28, 2026 23:01
@ksivaman

Copy link
Copy Markdown
Member Author

/te-ci

@ksivaman ksivaman merged commit ee78711 into main Jun 29, 2026
9 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant