Skip to content

Enabling MetaTensor Persistent Caching#8940

Open
ericspod wants to merge 3 commits into
Project-MONAI:devfrom
ericspod:metatensor_persistence
Open

Enabling MetaTensor Persistent Caching#8940
ericspod wants to merge 3 commits into
Project-MONAI:devfrom
ericspod:metatensor_persistence

Conversation

@ericspod

Copy link
Copy Markdown
Member

Addresses GHSA-636w-j999-g7x5.

Description

This modifies the PersistentDataset class to permit storing MetaTensor objects. This is done by relying on the torch.load functionality to load only safe object types and those white-listed with torch.serialization.add_safe_globals. This also uses sha256 hashing in place of md5 in case of security concerns.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
@ericspod ericspod requested review from KumoLiu and Nic-Ma as code owners June 21, 2026 20:42
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 496aab82-83f4-44db-b5b9-f1943356205d

📥 Commits

Reviewing files that changed from the base of the PR and between 0ee969c and 19453a1.

📒 Files selected for processing (2)
  • monai/data/dataset.py
  • tests/data/test_persistentdataset.py
💤 Files with no reviewable changes (1)
  • monai/data/dataset.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/data/test_persistentdataset.py

📝 Walkthrough

Walkthrough

PersistentDataset now permits track_meta=True with weights_only=True after removing the validation guard. Cache-key generation in monai/data/utils.py moves from version-conditional MD5 to unconditional SHA-256 for both json_hashing and pickle_hashing; the sys import is removed. Docstrings detail weights_only behavior and how MetaTensor metadata loading may fail due to whitelisting constraints, triggering cache recomputation. Tests verify that MetaTensor objects round-trip with metadata and affine preserved, cache files load via torch.load(..., weights_only=True), and unsafe cached objects trigger pickle.UnpicklingError on direct load and UserWarning on dataset retrieval.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title directly matches main change: enabling MetaTensor persistent caching, clearly summarized in 4 words.
Description check ✅ Passed Description includes security advisory reference, clear summary of changes, and checked applicable items. Required sections are present.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
monai/data/dataset.py (1)

280-283: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale docstring: Raises section describes removed behavior.

Lines 280-282 document a ValueError that is no longer raised since the guard was removed. Delete this section.

🧹 Proposed fix
-        Raises:
-            ValueError: When both `track_meta=True` and `weights_only=True`, since this combination
-                prevents cached MetaTensors from being reloaded and causes perpetual cache regeneration.
         """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@monai/data/dataset.py` around lines 280 - 283, The Raises section in the
docstring describes a ValueError condition (when both track_meta=True and
weights_only=True are set) that is no longer raised by the code since the guard
was removed. Delete the entire Raises section (lines 280-283) that documents
this ValueError behavior to keep the docstring accurate and in sync with the
current implementation.
🧹 Nitpick comments (1)
tests/data/test_persistentdataset.py (1)

282-285: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Use f-string conversion flag instead of str().

Per Ruff RUF010: replace {str(Path(tempdir)/"out.txt")} with {Path(tempdir)/"out.txt"!s}.

♻️ Proposed fix
             class _BadType:
                 def __reduce__(self):
                     # something more insecure than this could be done with os.system
-                    return (os.system, (f'echo "Code injected!" > {str(Path(tempdir)/"out.txt")}',))
+                    return (os.system, (f'echo "Code injected!" > {Path(tempdir)/"out.txt"!s}',))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/data/test_persistentdataset.py` around lines 282 - 285, In the _BadType
class's __reduce__ method, replace the explicit str() call wrapping the Path
object inside the f-string with f-string conversion syntax. Change
{str(Path(tempdir)/"out.txt")} to {Path(tempdir)/"out.txt"!s} in the f-string to
use the conversion flag instead of calling str() explicitly.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@monai/data/dataset.py`:
- Around line 280-283: The Raises section in the docstring describes a
ValueError condition (when both track_meta=True and weights_only=True are set)
that is no longer raised by the code since the guard was removed. Delete the
entire Raises section (lines 280-283) that documents this ValueError behavior to
keep the docstring accurate and in sync with the current implementation.

---

Nitpick comments:
In `@tests/data/test_persistentdataset.py`:
- Around line 282-285: In the _BadType class's __reduce__ method, replace the
explicit str() call wrapping the Path object inside the f-string with f-string
conversion syntax. Change {str(Path(tempdir)/"out.txt")} to
{Path(tempdir)/"out.txt"!s} in the f-string to use the conversion flag instead
of calling str() explicitly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dc50f62c-53a3-4d79-bce8-33e2e5e57fb7

📥 Commits

Reviewing files that changed from the base of the PR and between 15f5073 and 0ee969c.

📒 Files selected for processing (3)
  • monai/data/dataset.py
  • monai/data/utils.py
  • tests/data/test_persistentdataset.py

Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant