Skip to content

docs(metrics): update the metrics documentation#9799

Open
ttsuuubasa wants to merge 1 commit into
kubernetes:masterfrom
ttsuuubasa:ca-docs-metrics
Open

docs(metrics): update the metrics documentation#9799
ttsuuubasa wants to merge 1 commit into
kubernetes:masterfrom
ttsuuubasa:ca-docs-metrics

Conversation

@ttsuuubasa

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind documentation
/area cluster-autoscaler

What this PR does / why we need it:

This PR expands the metrics documentation by adding the metrics that are not yet documented, and re-syncs proposals/metrics.md with metrics.go so the doc reflects the actual implementation.

cluster-autoscaler/proposals/metrics.md is one of the entry points users rely on to discover what metrics Cluster Autoscaler exposes, but a number of metrics defined in cluster-autoscaler/metrics/metrics.go have never been listed there. This makes it hard for operators to learn which metrics are available and what their labels are without reading the source code.

Newly documented metrics:

  • Per-node-group metrics gated behind --emit-per-nodegroup-metrics
  • failed_node_creations_total
  • failed_gpu_scale_ups_total
  • unneeded_nodes_count
  • unremovable_nodes_count
  • scale_down_in_cooldown
  • overflowing_controllers_count
  • created_node_groups_total
  • deleted_node_groups_total
  • node_taints_count
  • inconsistent_instances_migs_count
  • binpacking_heterogeneity
  • max_node_skip_eval_duration_seconds
  • node_removal_latency_seconds
  • dra_node_template_resources_mismatch

Other changes:

  • Reorder the "Cluster state" and "Cluster Autoscaler operations" tables to match the declaration order in metrics.go, so future drift is easier to spot.
  • Fix labels on existing rows so they match what is actually emitted (e.g. add gpu_resource_name / dra_drivers to the scale-up/down rows, type to unschedulable_pods_count, eviction_result to evicted_pods_total).
  • Mark scaled_up_gpu_nodes_total, scaled_down_gpu_nodes_total, and failed_gpu_scale_ups_total as deprecated since 1.36.0, in line with their DeprecatedVersion in code.

The doc is intentionally only re-synced; no metrics are added, removed or renamed in this PR.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

Sync proposals/metrics.md with metrics.go by adding the missing
entries and reordering the existing ones to match the source order.

Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/documentation Categorizes issue or PR as related to documentation. area/cluster-autoscaler Issues or PRs related to the Cluster Autoscaler component needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 11, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If SIG Autoscaling contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ttsuuubasa
Once this PR has been reviewed and has the lgtm label, please assign aleksandra-malinowska for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 11, 2026
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Hi @ttsuuubasa. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 11, 2026

| Metric name | Metric type | Labels | Description |
| ----------- | ----------- | ------ | ----------- |
| nap_enabled | Gauge | | Whether or not Node Autoprovisioning is enabled. 1 if it is, 0 otherwise. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here can we clean up nap_enabled, this metric no longer exists.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think we can delete this entire section (starting from L159) as the node groups metrics were added above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cluster-autoscaler Issues or PRs related to the Cluster Autoscaler component cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants