docs(metrics): update the metrics documentation#9799
Conversation
Sync proposals/metrics.md with metrics.go by adding the missing entries and reordering the existing ones to match the source order. Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
|
This issue is currently awaiting triage. If SIG Autoscaling contributors determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ttsuuubasa The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @ttsuuubasa. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
|
||
| | Metric name | Metric type | Labels | Description | | ||
| | ----------- | ----------- | ------ | ----------- | | ||
| | nap_enabled | Gauge | | Whether or not Node Autoprovisioning is enabled. 1 if it is, 0 otherwise. | |
There was a problem hiding this comment.
While we're here can we clean up nap_enabled, this metric no longer exists.
There was a problem hiding this comment.
Actually I think we can delete this entire section (starting from L159) as the node groups metrics were added above.
What type of PR is this?
/kind documentation
/area cluster-autoscaler
What this PR does / why we need it:
This PR expands the metrics documentation by adding the metrics that are not yet documented, and re-syncs
proposals/metrics.mdwithmetrics.goso the doc reflects the actual implementation.cluster-autoscaler/proposals/metrics.mdis one of the entry points users rely on to discover what metrics Cluster Autoscaler exposes, but a number of metrics defined incluster-autoscaler/metrics/metrics.gohave never been listed there. This makes it hard for operators to learn which metrics are available and what their labels are without reading the source code.Newly documented metrics:
--emit-per-nodegroup-metricsfailed_node_creations_totalfailed_gpu_scale_ups_totalunneeded_nodes_countunremovable_nodes_countscale_down_in_cooldownoverflowing_controllers_countcreated_node_groups_totaldeleted_node_groups_totalnode_taints_countinconsistent_instances_migs_countbinpacking_heterogeneitymax_node_skip_eval_duration_secondsnode_removal_latency_secondsdra_node_template_resources_mismatchOther changes:
metrics.go, so future drift is easier to spot.gpu_resource_name/dra_driversto the scale-up/down rows,typetounschedulable_pods_count,eviction_resulttoevicted_pods_total).scaled_up_gpu_nodes_total,scaled_down_gpu_nodes_total, andfailed_gpu_scale_ups_totalas deprecated since 1.36.0, in line with theirDeprecatedVersionin code.The doc is intentionally only re-synced; no metrics are added, removed or renamed in this PR.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: