Skip to content

feat(vi/cvi): auto-recover VirtualImage and ClusterVirtualImage from ImageLost#2564

Open
danilrwx wants to merge 1 commit into
mainfrom
feat/imagelost-auto-recovery
Open

feat(vi/cvi): auto-recover VirtualImage and ClusterVirtualImage from ImageLost#2564
danilrwx wants to merge 1 commit into
mainfrom
feat/imagelost-auto-recovery

Conversation

@danilrwx

@danilrwx danilrwx commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Description

Automatically recover VirtualImage and ClusterVirtualImage that entered the ImageLost phase (image is missing in DVCR).

Previously the LifeCycleHandler did an unconditional early return for ImageLost, so the only way to recover was to delete and recreate the resource. Now, when the image is lost:

  • for recoverable data sources (HTTP, ContainerImage, ObjectRef) the controller resets the status to Pending, runs sources.CleanUp and requeues, reusing the same import-restart path already used on spec changes;
  • for Upload the resource stays in ImageLost, since the data was uploaded once and cannot be re-fetched.

A VirtualImageLostRecovering / ClusterVirtualImageLostRecovering event is emitted on recovery start (a recorder was added to the CVI LifeCycleHandler, which previously had none). The ImagePVCLost phase is not affected.

Handler order (LifeCycleHandlerImagePresenceHandler) rules out a race: after the phase is reset to Pending, the presence handler (which only acts on Ready) skips the resource within the same reconcile.

Why do we need it, and what problem does it solve?

The image presence monitoring correctly moves VI/CVI to ImageLost when the image disappears from DVCR (manual deletion, storage failure, garbage collection), but the resource then gets stuck there. For most sources the data is fully reproducible: the URL, external registry or referenced cluster resource is still available, so a re-import would restore the image without user action.

This is especially valuable for a mass DVCR storage failure, where re-creating dozens of resources by hand is impractical.

What is the expected result?

  1. Create a VI/CVI with an HTTP / ContainerImage / ObjectRef source, wait for Ready.
  2. Remove the image from DVCR, wait for ImageLost.
  3. The resource automatically goes through Pending/Provisioning and returns to Ready; a VirtualImageLostRecovering event is recorded.
  4. For an Upload source, the resource stays in ImageLost and no re-import happens.

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: core
type: feature
summary: Automatically recover VirtualImage and ClusterVirtualImage from the ImageLost phase for recoverable data sources.

@danilrwx danilrwx marked this pull request as ready for review July 1, 2026 13:41
@danilrwx danilrwx added this to the v1.10.0 milestone Jul 1, 2026
@danilrwx danilrwx changed the title feat: auto-recover VirtualImage and ClusterVirtualImage from ImageLost feat(vi/cvi): auto-recover VirtualImage and ClusterVirtualImage from ImageLost Jul 1, 2026
Restart the import process when a Ready image is lost in DVCR, for
recoverable data sources (HTTP, ContainerImage, ObjectRef). Upload
images stay in ImageLost since their data cannot be re-fetched.

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
@danilrwx danilrwx force-pushed the feat/imagelost-auto-recovery branch from 338e9b5 to 9454c18 Compare July 1, 2026 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant