Skip to content

refactor(dpmodel/dpa4)#5599

Open
OutisLi wants to merge 5 commits into
deepmodeling:masterfrom
OutisLi:pr/dpmodel
Open

refactor(dpmodel/dpa4)#5599
OutisLi wants to merge 5 commits into
deepmodeling:masterfrom
OutisLi:pr/dpmodel

Conversation

@OutisLi

@OutisLi OutisLi commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

This is first PR of porting the whole dpa4 descriptor

Summary by CodeRabbit

  • New Features

    • Added Cartesian tensor-product mixing options for SeZM/DPA4 descriptors (edge_cartesian, node_cartesian) with mixing_layers control.
    • Expanded LoRA support with adapter injection, merging, and LoRA-free checkpoint export.
    • Added opt-in runtime acceleration (Triton paths when available) plus eval-time activation checkpointing for the interaction block.
  • Bug Fixes

    • Improved destination-wise edge aggregation/softmax behavior, edge masking with sparse inputs, and more consistent dtype/precision handling.
    • Strengthened descriptor serialization/deserialization and variable restoration round-trips.
  • API Changes

    • SeZM SO(2) parameter renamed: so2_layersmixing_layers (with so2_layers kept as an alias).

Comment thread deepmd/dpmodel/descriptor/dpa4_nn/block.py Dismissed
Comment thread deepmd/dpmodel/descriptor/dpa4_nn/ffn.py Fixed
Comment thread deepmd/dpmodel/descriptor/dpa4_nn/lora.py Fixed
Comment thread deepmd/pt_expt/descriptor/dpa4.py Fixed
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds Cartesian tensor-product routing for SeZM/DPA4, refactors dpmodel building blocks and serialization, introduces LoRA adapters, adds pt_expt Triton/runtime overrides, and updates compile and test coverage for the new descriptor paths.

Changes

SeZM/DPA4 descriptor extension and dpmodel port refactor

Layer / File(s) Summary
Cartesian routing and descriptor wiring
deepmd/pt/model/descriptor/sezm_nn/cartesian.py, deepmd/dpmodel/descriptor/dpa4_nn/cartesian.py, deepmd/pt/model/descriptor/sezm_nn/edge_cache.py, deepmd/dpmodel/descriptor/dpa4_nn/edge_cache.py, deepmd/pt/model/descriptor/sezm.py, deepmd/pt/model/descriptor/sezm_nn/so2.py, deepmd/pt/model/descriptor/sezm_nn/block.py, deepmd/dpmodel/descriptor/dpa4_nn/__init__.py, deepmd/pt/model/descriptor/sezm_nn/__init__.py, deepmd/utils/argcheck.py, examples/water/dpa4/input.json
Adds Cartesian tensor-product modules and helpers, threads edge_cartesian/node_cartesian/mixing_layers through SeZM wiring, and adds the build_wigner switch for edge-cache builders.
Depth attention and interaction block
deepmd/dpmodel/descriptor/dpa4_nn/attn_res.py, deepmd/dpmodel/descriptor/dpa4_nn/block.py, deepmd/pt_expt/descriptor/dpa4_nn/block.py
Adds DepthAttnRes, refactors SeZMInteractionBlock forward modes and LayerScale handling, and adds pt_expt activation-checkpointed execution.
Core dpmodel building blocks
deepmd/dpmodel/descriptor/dpa4_nn/activation.py, deepmd/dpmodel/descriptor/dpa4_nn/attention.py, deepmd/dpmodel/descriptor/dpa4_nn/norm.py, deepmd/dpmodel/descriptor/dpa4_nn/so3.py, deepmd/dpmodel/descriptor/dpa4_nn/radial.py, deepmd/dpmodel/descriptor/dpa4_nn/embedding.py, deepmd/dpmodel/descriptor/dpa4_nn/ffn.py, deepmd/dpmodel/descriptor/dpa4_nn/indexing.py, deepmd/dpmodel/descriptor/dpa4_nn/utils.py, deepmd/dpmodel/descriptor/dpa4_nn/edge_cache.py, deepmd/dpmodel/descriptor/dpa4_nn/attention.py
Refactors activation, attention, norm, SO3, radial, embedding, FFN, indexing, utils, and edge-cache helpers for the dpmodel backend.
Wigner-D, grid projection, and grid net
deepmd/dpmodel/descriptor/dpa4_nn/wignerd.py, deepmd/dpmodel/descriptor/dpa4_nn/grid_net.py, deepmd/dpmodel/descriptor/dpa4_nn/projection.py
Rewrites Wigner-D evaluation and grid-projection/grid-net plumbing, including e3nn support and unified variable loading.
LoRA adapters and merge utilities
deepmd/dpmodel/descriptor/dpa4_nn/lora.py
Introduces LoRA wrappers and merge/state-dict helpers for SeZM.
pt_expt Triton kernels and runtime overrides
deepmd/pt_expt/descriptor/dpa4_nn/triton/so2_rotation.py, deepmd/pt_expt/descriptor/dpa4_nn/triton/radial_mix.py, deepmd/pt_expt/descriptor/dpa4_nn/triton/__init__.py, deepmd/pt_expt/descriptor/dpa4_nn/radial.py, deepmd/pt_expt/descriptor/dpa4_nn/so2.py, deepmd/pt_expt/descriptor/dpa4.py, deepmd/pt_expt/descriptor/dpa4_nn/__init__.py
Adds fused Triton SO(2) rotation and radial-mix kernels, runtime overrides, wrapper modules, and package availability wiring.
Compile fixes and test updates
deepmd/pt/model/model/sezm_model.py, deepmd/pt/utils/compile_compat.py, source/tests/...
Fixes trace_pad_dim, adjusts eval tracing behavior, and updates tests for Cartesian paths, sparse edges, Wigner parity, LoRA, Triton fallback, and serialization contracts.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • deepmodeling/deepmd-kit#5448: Implements matching SeZM/DPA4 descriptor components in the dpmodel backend, including the segment_envelope_gated_softmax and interaction wiring used here.
  • deepmodeling/deepmd-kit#5483: Touches the same SeZMModel.trace_and_compile path in deepmd/pt/model/model/sezm_model.py.
  • deepmodeling/deepmd-kit#5503: Implements the corresponding mixing_layers and edge_cartesian/node_cartesian SeZM API changes on the PT backend.

Suggested labels

enhancement, Python, Examples

Suggested reviewers

  • iProzd
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is related to the PR, but it is very broad and doesn’t convey the main change or scope. Use a more specific title that names the primary port or feature, such as the DPA4/SeZM descriptor refactor or backend port.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
deepmd/dpmodel/descriptor/dpa4_nn/norm.py (1)

354-361: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Restore mmax bounds validation before deriving balance weights.

With mmax < 0, n_coeff_l can become non-positive, producing invalid balance weights and NaNs in call(). Keep the constructor aligned with the documented 0 <= mmax <= lmax contract.

Proposed fix
         self.lmax = int(lmax)
         self.mmax = int(mmax)
+        if self.lmax < 0:
+            raise ValueError("`lmax` must be non-negative")
+        if self.mmax < 0:
+            raise ValueError("`mmax` must be non-negative")
+        if self.mmax > self.lmax:
+            raise ValueError("`mmax` must be <= `lmax`")
         self.channels = int(channels)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/dpmodel/descriptor/dpa4_nn/norm.py` around lines 354 - 361, Restore
the missing `mmax` validation in the `Norm.__init__` constructor before any
balance-weight derivation happens. Ensure the constructor still enforces the
documented `0 <= mmax <= lmax` contract, alongside the existing `lmax` check, so
`n_coeff_l` cannot become non-positive and propagate invalid values into
`call()`. Use the `Norm` initializer and the `mmax`/`lmax` guard logic as the
place to fix this.
deepmd/dpmodel/descriptor/dpa4_nn/attention.py (1)

77-78: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Restore the padded-layout shape guard before reshaping.

nnei = n_edge // n_nodes now turns n_nodes == 0 into ZeroDivisionError, and non-multiple E values fail later during reshape. Keep the explicit contract check here so callers get the intended error.

Proposed fix
     n_channel = n_focus * n_head
     eps_f = float(eps)
+    if n_nodes <= 0 or n_edge % n_nodes != 0:
+        raise ValueError("`logits.shape[0]` must be a positive multiple of `n_nodes`")
     nnei = n_edge // n_nodes
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/dpmodel/descriptor/dpa4_nn/attention.py` around lines 77 - 78, Restore
the explicit padded-layout shape validation in attention.py before computing
nnei or reshaping: the logic around eps_f, n_edge, and n_nodes should reject
invalid inputs up front instead of relying on n_edge // n_nodes. Add back the
contract check in the attention reshaping path so n_nodes == 0 and non-multiple
edge counts raise the intended error, and keep this guard near the code that
derives nnei and reshapes the attention tensors.
🧹 Nitpick comments (2)
deepmd/dpmodel/descriptor/dpa4_nn/so3.py (1)

505-505: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Validate deterministic expand_index on load.

expand_index is fully determined by lmax; accepting the serialized value directly can silently corrupt degree-to-weight mapping if a checkpoint is stale or malformed but shape-compatible.

Proposed fix
-        obj.expand_index = np.asarray(variables["expand_index"], dtype=np.int64)
+        loaded_expand_index = np.asarray(variables["expand_index"], dtype=np.int64)
+        expected_expand_index = map_degree_idx(obj.lmax)
+        if not np.array_equal(loaded_expand_index, expected_expand_index):
+            raise ValueError("Invalid SO3Linear expand_index for configured lmax")
+        obj.expand_index = expected_expand_index
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/dpmodel/descriptor/dpa4_nn/so3.py` at line 505, The load path for
`SO3` is trusting the serialized `expand_index` value directly, even though it
is fully derived from `lmax`. Update the deserialization logic in `so3.py` so
`expand_index` is recomputed from the model’s `lmax` during load (or at least
validated against the expected deterministic value) instead of accepting the
checkpoint contents blindly. Keep the change localized to the object restore
flow around `obj.expand_index` to prevent stale or malformed checkpoints from
overriding the correct degree-to-weight mapping.
deepmd/pt_expt/descriptor/dpa4_nn/radial.py (1)

73-78: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Fail loudly on unexpected RadialMLP.net entries.

This fallback currently turns every non-Module/non-NativeOP entry into _ScalarActivation. If the dpmodel list shape changes later, this wrapper will silently build the wrong module tree.

Possible tightening
     def _convert_layer(self, layer: Any) -> torch.nn.Module:
         if isinstance(layer, torch.nn.Module):
             return layer
         if isinstance(layer, NativeOP):
             return try_convert_module(layer)
-        return _ScalarActivation(self.activation_function)
+        if callable(layer):
+            return _ScalarActivation(self.activation_function)
+        raise TypeError(f"Unsupported RadialMLP.net entry: {type(layer)!r}")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt_expt/descriptor/dpa4_nn/radial.py` around lines 73 - 78, The
_convert_layer method in RadialMLP currently falls back to _ScalarActivation for
any unexpected entry, which can silently mask bad dpmodel shapes. Tighten the
logic so only the known cases (torch.nn.Module and NativeOP) are accepted and
raise an explicit error for anything else, using _convert_layer and
RadialMLP.net as the key locations to update.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/dpmodel/descriptor/dpa4_nn/attn_res.py`:
- Around line 163-176: The main issue is that the `forward` logic in
`attn_res.py` dereferences `sources[0]` and calls `scalar_extractor(current_x)`
without validating required inputs first. Add explicit guards near the start of
the method to reject an empty `sources` sequence and to require `current_x`
whenever `self.input_dependent` is enabled, using the existing
`scalar_extractor` and `query_proj` path as the location to enforce the
contract. Raise a clear, descriptive error before any indexing or projection
happens so invalid calls fail fast instead of producing opaque runtime errors.

In `@deepmd/dpmodel/descriptor/dpa4_nn/block.py`:
- Around line 297-302: The dpmodel deserialization path is not
backward-compatible because older configs still use so2_layers while the
constructor now expects mixing_layers. Update the block’s config-loading logic
in the relevant deserialization/compatibility entry point for DPA4 block classes
(including the paths around the referenced constructor and any shared loaders)
to translate legacy so2_layers into mixing_layers before calling cls(**config).
Keep the old key readable, preserve existing behavior when mixing_layers is
already present, and avoid passing both keys into the constructor.

In `@deepmd/dpmodel/descriptor/dpa4_nn/embedding.py`:
- Around line 257-258: The reshape-based aggregation in embedding.py is only
valid for the padded build_edge_cache layout, so the sparse
build_edge_cache_from_edges path must not use `nnei = n_edge // n_nodes` or the
related reshape logic. Update the relevant aggregation methods in the dpa4_nn
embedding flow to either keep the scatter/index-add implementation for sparse
edge caches or explicitly guard these reshape paths so they only run on padded
inputs. Use the existing edge-cache handling in the descriptor/embedding methods
to distinguish padded versus sparse cases and apply the same rule to the other
reshape-based aggregations in this file.

In `@deepmd/dpmodel/descriptor/dpa4_nn/lora.py`:
- Around line 652-660: The tuple-handling in _swap_submodule is incorrect
because it treats tuple children like mutable lists and tries in-place
replacement. Update the replacement logic in _swap_submodule to either only
allow indexed assignment for list parents or, when the parent is a tuple,
rebuild the tuple with the new NativeOP and reassign it back to the container
instead of mutating it directly.
- Around line 706-739: The LoRA adapter parameters are ending up frozen because
apply_lora_to_sezm freezes every module first and the replacement
LoRASO3/LoRASO2 instances inherit that state from their serialized config.
Update the replacement logic in apply_lora_to_sezm so the injected LoRASO3 and
LoRASO2 adapters are explicitly marked trainable after construction, while
keeping the copied base weights/biases frozen under the dpmodel trainability
model. Make sure the same fix is applied in the later replacement block that
handles the other SO3Linear/SO2Linear occurrences.

In `@deepmd/dpmodel/descriptor/dpa4_nn/projection.py`:
- Around line 278-305: The e3nn quadrature setup in the projection path assumes
an even theta resolution, but `_normalize_s2_grid_resolution()` can still pass
through a custom odd `res_beta`, causing `quad_weight` to have the wrong length
and `harmonics` broadcasting to fail. Update the resolution handling in this
projection code to either reject odd e3nn `theta_resolution` values with a clear
error or normalize them to the next even value before computing `quad_weight`,
`to_grid_mat`, and `from_grid_mat`.

In `@deepmd/pt_expt/descriptor/dpa4_nn/block.py`:
- Around line 32-34: The pt_expt descriptor block is importing the dpmodel-only
exchange_ghost_features helper, which is not implemented for torch and will fail
whenever comm_dict is used. Update the import in block.py to use the
PyTorch-capable ghost exchange implementation, or add a pt_expt wrapper that
delegates to the torch version, and make sure the callers in the SO(2) message
passing path use that symbol instead of the dpmodel one.

In `@deepmd/pt_expt/descriptor/dpa4_nn/triton/radial_mix.py`:
- Around line 687-698: The backward launch in _radial_mix_bwd_kernel is mixing a
contiguous grad buffer with strides taken from the original grad_out, which can
produce incorrect reads when grad_out is non-contiguous. Update the Triton call
so the stride arguments come from the same contiguous tensor being passed into
the kernel, using the grad_out.contiguous() value consistently for the stride
calculations.

In `@deepmd/pt/model/descriptor/sezm_nn/block.py`:
- Around line 303-308: The block deserialization path is not backward compatible
because legacy serialized configs may still use so2_layers while the constructor
for the block class now expects mixing_layers. Update the config migration in
the block loading/deserialization flow (around the block constructor/init path
and any versioned load logic) to translate so2_layers into mixing_layers before
calling cls(**config), or explicitly branch on the serialized version and map
old payloads in the same block class. Ensure this compatibility handling is
applied in both affected load locations.

---

Outside diff comments:
In `@deepmd/dpmodel/descriptor/dpa4_nn/attention.py`:
- Around line 77-78: Restore the explicit padded-layout shape validation in
attention.py before computing nnei or reshaping: the logic around eps_f, n_edge,
and n_nodes should reject invalid inputs up front instead of relying on n_edge
// n_nodes. Add back the contract check in the attention reshaping path so
n_nodes == 0 and non-multiple edge counts raise the intended error, and keep
this guard near the code that derives nnei and reshapes the attention tensors.

In `@deepmd/dpmodel/descriptor/dpa4_nn/norm.py`:
- Around line 354-361: Restore the missing `mmax` validation in the
`Norm.__init__` constructor before any balance-weight derivation happens. Ensure
the constructor still enforces the documented `0 <= mmax <= lmax` contract,
alongside the existing `lmax` check, so `n_coeff_l` cannot become non-positive
and propagate invalid values into `call()`. Use the `Norm` initializer and the
`mmax`/`lmax` guard logic as the place to fix this.

---

Nitpick comments:
In `@deepmd/dpmodel/descriptor/dpa4_nn/so3.py`:
- Line 505: The load path for `SO3` is trusting the serialized `expand_index`
value directly, even though it is fully derived from `lmax`. Update the
deserialization logic in `so3.py` so `expand_index` is recomputed from the
model’s `lmax` during load (or at least validated against the expected
deterministic value) instead of accepting the checkpoint contents blindly. Keep
the change localized to the object restore flow around `obj.expand_index` to
prevent stale or malformed checkpoints from overriding the correct
degree-to-weight mapping.

In `@deepmd/pt_expt/descriptor/dpa4_nn/radial.py`:
- Around line 73-78: The _convert_layer method in RadialMLP currently falls back
to _ScalarActivation for any unexpected entry, which can silently mask bad
dpmodel shapes. Tighten the logic so only the known cases (torch.nn.Module and
NativeOP) are accepted and raise an explicit error for anything else, using
_convert_layer and RadialMLP.net as the key locations to update.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5f98d8e6-9f05-462e-abee-46098a57be4e

📥 Commits

Reviewing files that changed from the base of the PR and between a9bcbc5 and 02d6fa6.

📒 Files selected for processing (52)
  • deepmd/dpmodel/descriptor/dpa4.py
  • deepmd/dpmodel/descriptor/dpa4_nn/__init__.py
  • deepmd/dpmodel/descriptor/dpa4_nn/activation.py
  • deepmd/dpmodel/descriptor/dpa4_nn/attention.py
  • deepmd/dpmodel/descriptor/dpa4_nn/attn_res.py
  • deepmd/dpmodel/descriptor/dpa4_nn/block.py
  • deepmd/dpmodel/descriptor/dpa4_nn/cartesian.py
  • deepmd/dpmodel/descriptor/dpa4_nn/edge_cache.py
  • deepmd/dpmodel/descriptor/dpa4_nn/embedding.py
  • deepmd/dpmodel/descriptor/dpa4_nn/ffn.py
  • deepmd/dpmodel/descriptor/dpa4_nn/grid_net.py
  • deepmd/dpmodel/descriptor/dpa4_nn/indexing.py
  • deepmd/dpmodel/descriptor/dpa4_nn/lora.py
  • deepmd/dpmodel/descriptor/dpa4_nn/norm.py
  • deepmd/dpmodel/descriptor/dpa4_nn/projection.py
  • deepmd/dpmodel/descriptor/dpa4_nn/radial.py
  • deepmd/dpmodel/descriptor/dpa4_nn/so2.py
  • deepmd/dpmodel/descriptor/dpa4_nn/so3.py
  • deepmd/dpmodel/descriptor/dpa4_nn/utils.py
  • deepmd/dpmodel/descriptor/dpa4_nn/wignerd.py
  • deepmd/pt/model/descriptor/sezm.py
  • deepmd/pt/model/descriptor/sezm_nn/__init__.py
  • deepmd/pt/model/descriptor/sezm_nn/block.py
  • deepmd/pt/model/descriptor/sezm_nn/cartesian.py
  • deepmd/pt/model/descriptor/sezm_nn/edge_cache.py
  • deepmd/pt/model/descriptor/sezm_nn/so2.py
  • deepmd/pt/model/model/sezm_model.py
  • deepmd/pt/utils/compile_compat.py
  • deepmd/pt_expt/descriptor/dpa4.py
  • deepmd/pt_expt/descriptor/dpa4_nn/__init__.py
  • deepmd/pt_expt/descriptor/dpa4_nn/block.py
  • deepmd/pt_expt/descriptor/dpa4_nn/radial.py
  • deepmd/pt_expt/descriptor/dpa4_nn/so2.py
  • deepmd/pt_expt/descriptor/dpa4_nn/triton/__init__.py
  • deepmd/pt_expt/descriptor/dpa4_nn/triton/radial_mix.py
  • deepmd/pt_expt/descriptor/dpa4_nn/triton/so2_rotation.py
  • deepmd/utils/argcheck.py
  • examples/water/dpa4/input.json
  • source/tests/common/dpmodel/test_descrpt_dpa4.py
  • source/tests/common/dpmodel/test_dpa4_basegridnet_cross.py
  • source/tests/common/dpmodel/test_dpa4_frame_mixers.py
  • source/tests/common/dpmodel/test_dpa4_grid_descriptor.py
  • source/tests/common/dpmodel/test_dpa4_gridmlp_frames.py
  • source/tests/common/dpmodel/test_dpa4_so2_grid.py
  • source/tests/common/dpmodel/test_dpa4_so3_grid_utils.py
  • source/tests/common/dpmodel/test_dpa4_so3_gridnet.py
  • source/tests/common/dpmodel/test_lebedev_sh.py
  • source/tests/pt/model/test_descriptor_sezm.py
  • source/tests/pt/model/test_dpa4_dpmodel_parity.py
  • source/tests/pt/model/test_sezm_model.py
  • source/tests/pt_expt/descriptor/test_dpa4.py
  • source/tests/pt_expt/descriptor/test_dpa4_ckpt_triton.py
💤 Files with no reviewable changes (4)
  • source/tests/common/dpmodel/test_dpa4_frame_mixers.py
  • source/tests/pt_expt/descriptor/test_dpa4.py
  • source/tests/common/dpmodel/test_dpa4_basegridnet_cross.py
  • source/tests/common/dpmodel/test_lebedev_sh.py

Comment thread deepmd/dpmodel/descriptor/dpa4_nn/attn_res.py
Comment thread deepmd/dpmodel/descriptor/dpa4_nn/block.py
Comment thread deepmd/dpmodel/descriptor/dpa4_nn/embedding.py Outdated
Comment thread deepmd/dpmodel/descriptor/dpa4_nn/lora.py
Comment thread deepmd/dpmodel/descriptor/dpa4_nn/lora.py
Comment thread deepmd/dpmodel/descriptor/dpa4_nn/projection.py
Comment thread deepmd/pt_expt/descriptor/dpa4_nn/block.py
Comment thread deepmd/pt_expt/descriptor/dpa4_nn/triton/radial_mix.py
Comment thread deepmd/pt/model/descriptor/sezm_nn/block.py
@codecov

codecov Bot commented Jun 28, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 65.88589% with 1136 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.62%. Comparing base (a9bcbc5) to head (145d989).

Files with missing lines Patch % Lines
.../pt_expt/descriptor/dpa4_nn/triton/so2_rotation.py 19.54% 461 Missing ⚠️
deepmd/dpmodel/descriptor/dpa4_nn/lora.py 45.10% 202 Missing ⚠️
deepmd/dpmodel/descriptor/dpa4_nn/cartesian.py 20.23% 134 Missing ⚠️
...md/pt_expt/descriptor/dpa4_nn/triton/radial_mix.py 29.11% 112 Missing ⚠️
deepmd/dpmodel/descriptor/dpa4_nn/wignerd.py 84.72% 106 Missing ⚠️
deepmd/dpmodel/descriptor/dpa4_nn/block.py 83.61% 29 Missing ⚠️
deepmd/dpmodel/descriptor/dpa4_nn/edge_cache.py 77.87% 25 Missing ⚠️
deepmd/dpmodel/descriptor/dpa4_nn/radial.py 71.92% 16 Missing ⚠️
deepmd/pt/model/descriptor/sezm_nn/so2.py 94.41% 11 Missing ⚠️
deepmd/dpmodel/descriptor/dpa4_nn/projection.py 87.67% 9 Missing ⚠️
... and 11 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5599      +/-   ##
==========================================
- Coverage   82.35%   81.62%   -0.73%     
==========================================
  Files         896      908      +12     
  Lines      100952   103436    +2484     
  Branches     4059     4058       -1     
==========================================
+ Hits        83138    84434    +1296     
- Misses      16349    17540    +1191     
+ Partials     1465     1462       -3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@OutisLi OutisLi enabled auto-merge June 28, 2026 11:35
Comment thread source/tests/common/dpmodel/test_dpa4_sparse_edges.py Dismissed

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepmd/dpmodel/descriptor/dpa4_nn/embedding.py (1)

147-156: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Normalize precision before serializing.

These serializers index PRECISION_DICT with self.precision, but the constructors accept mixed-case values via self.precision.lower(). SeZMTypeEmbedding(precision="Float64").serialize() now raises KeyError, and the same regression repeats in GeometricInitialEmbedding and EnvironmentInitialEmbedding.

Proposed fix
-                "precision": np.dtype(PRECISION_DICT[self.precision]).name,
+                "precision": np.dtype(PRECISION_DICT[self.precision.lower()]).name,
-            "precision": np.dtype(PRECISION_DICT[self.precision]).name,
+            "precision": np.dtype(PRECISION_DICT[self.precision.lower()]).name,
-                "precision": np.dtype(PRECISION_DICT[self.precision]).name,
+                "precision": np.dtype(PRECISION_DICT[self.precision.lower()]).name,

Also applies to: 336-343, 667-683

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/dpmodel/descriptor/dpa4_nn/embedding.py` around lines 147 - 156, The
serializer methods for SeZMTypeEmbedding, GeometricInitialEmbedding, and
EnvironmentInitialEmbedding are indexing PRECISION_DICT with the raw precision
value, which can fail when callers pass mixed-case strings that were accepted by
the constructors. Update each serialize() implementation to normalize
self.precision before the PRECISION_DICT lookup, matching the lowercasing
already used in the corresponding constructors, so serialization works
consistently for values like "Float64".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@deepmd/dpmodel/descriptor/dpa4_nn/embedding.py`:
- Around line 147-156: The serializer methods for SeZMTypeEmbedding,
GeometricInitialEmbedding, and EnvironmentInitialEmbedding are indexing
PRECISION_DICT with the raw precision value, which can fail when callers pass
mixed-case strings that were accepted by the constructors. Update each
serialize() implementation to normalize self.precision before the PRECISION_DICT
lookup, matching the lowercasing already used in the corresponding constructors,
so serialization works consistently for values like "Float64".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9cdaa53a-d8f8-4924-84b5-e04c7fc2562d

📥 Commits

Reviewing files that changed from the base of the PR and between 02d6fa6 and 2ff1f10.

📒 Files selected for processing (11)
  • deepmd/dpmodel/array_api.py
  • deepmd/dpmodel/descriptor/dpa4_nn/attention.py
  • deepmd/dpmodel/descriptor/dpa4_nn/embedding.py
  • deepmd/dpmodel/descriptor/dpa4_nn/ffn.py
  • deepmd/dpmodel/descriptor/dpa4_nn/lora.py
  • deepmd/dpmodel/descriptor/dpa4_nn/so2.py
  • deepmd/pt_expt/descriptor/__init__.py
  • deepmd/pt_expt/descriptor/dpa4.py
  • source/tests/common/dpmodel/test_dpa4_lora.py
  • source/tests/common/dpmodel/test_dpa4_sparse_edges.py
  • source/tests/pt/model/test_dpa4_dpmodel_parity.py
💤 Files with no reviewable changes (2)
  • deepmd/pt_expt/descriptor/dpa4.py
  • deepmd/dpmodel/descriptor/dpa4_nn/ffn.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • deepmd/dpmodel/descriptor/dpa4_nn/lora.py
  • source/tests/pt/model/test_dpa4_dpmodel_parity.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
deepmd/pt_expt/utils/serialization.py (1)

1009-1045: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Consider extracting the inductor config save/disable/restore into a context manager.

The threshold + assert_indirect_indexing snapshot/override/restore block is now duplicated verbatim in both the regular and with-comm compile paths. A small context manager centralizes the documented thread-safety caveat in one place and prevents the two copies from silently diverging later.

♻️ Proposed context-manager extraction
from contextlib import contextmanager


`@contextmanager`
def _inductor_compile_overrides(*, disable_fusion: bool):
    # See module notes: torch._inductor.config is a process-wide singleton;
    # this save/restore is NOT thread-safe. Serialise .pt2 exports per process.
    import torch._inductor.config as _inductor_config

    saved_threshold = _inductor_config.realize_opcount_threshold
    saved_assert_indexing = _inductor_config.assert_indirect_indexing
    if disable_fusion:
        _inductor_config.realize_opcount_threshold = 0
    _inductor_config.assert_indirect_indexing = False
    try:
        yield
    finally:
        _inductor_config.realize_opcount_threshold = saved_threshold
        _inductor_config.assert_indirect_indexing = saved_assert_indexing

Usage at both call sites:

-    saved_threshold = _inductor_config.realize_opcount_threshold
-    saved_assert_indexing = _inductor_config.assert_indirect_indexing
-    if is_cuda:
-        _inductor_config.realize_opcount_threshold = 0
-    _inductor_config.assert_indirect_indexing = False
-    try:
-        aoti_compile_and_package(exported, package_path=model_file)
-    finally:
-        _inductor_config.realize_opcount_threshold = saved_threshold
-        _inductor_config.assert_indirect_indexing = saved_assert_indexing
+    with _inductor_compile_overrides(disable_fusion=is_cuda):
+        aoti_compile_and_package(exported, package_path=model_file)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt_expt/utils/serialization.py` around lines 1009 - 1045, The inductor
config save/override/restore logic is duplicated in both
`aoti_compile_and_package` compile paths, so extract it into a shared context
manager (for example, a helper near the serialization/export code) to keep the
process-wide `torch._inductor.config` handling in one place. Move the
`realize_opcount_threshold` and `assert_indirect_indexing` snapshot/restore
behavior into that helper, then wrap both the regular export and the `with_comm`
export call sites with it so the behavior stays identical and cannot diverge.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 1009-1045: The inductor config save/override/restore logic is
duplicated in both `aoti_compile_and_package` compile paths, so extract it into
a shared context manager (for example, a helper near the serialization/export
code) to keep the process-wide `torch._inductor.config` handling in one place.
Move the `realize_opcount_threshold` and `assert_indirect_indexing`
snapshot/restore behavior into that helper, then wrap both the regular export
and the `with_comm` export call sites with it so the behavior stays identical
and cannot diverge.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 92be69fa-51ce-4796-821c-790f683b1700

📥 Commits

Reviewing files that changed from the base of the PR and between 2ff1f10 and 145d989.

📒 Files selected for processing (1)
  • deepmd/pt_expt/utils/serialization.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants