Skip to content

[deckhouse-cli] Adding new commands and fix bugs in the debug archive#354

Open
VaLosev wants to merge 3 commits into
mainfrom
update-debug-archive
Open

[deckhouse-cli] Adding new commands and fix bugs in the debug archive#354
VaLosev wants to merge 3 commits into
mainfrom
update-debug-archive

Conversation

@VaLosev
Copy link
Copy Markdown
Contributor

@VaLosev VaLosev commented May 18, 2026

The collect-debug-info command execution logic has been improved:

  • Fixed an issue where stderr.Reset() was not reset after each iteration.
  • Removed the unused const - labelSelector = "leader=true" since GetDeckhousePod already contains a check for this label.
  • The method for obtaining a list of possible files to exclude has been changed to GetExcludableFiles for automatic loading.
  • RequiredModule and ExpandPerModule have been added to the Command structure to more accurately verify that a module is enabled and data should be collected from it. To support this new functionality, the following functions have been modified: fetchActiveModules (gets a list of modules in the tatus.phase == "Ready" state) and filterAndExpandCommands (accepts a list of enabled modules from fetchActiveModules and filters the data to be collected into the archive by RequiredModule and ExpandPerModule). The final logic is as follows:
- RequiredModule == "" → Include data in the assembly
- RequiredModule != "", ExpandPerModule == false → Include if at least one included module has a name == RequiredModule or begins with RequiredModule
- RequiredModule != "", ExpandPerModule == true → Create one copy of the command for each applicable active module, replacing {module-name} (only for cloud-provider) with the actual module name in File and Args

To prevent excessive API load in busy clusters, two flags have been added for more granular control over query execution:

  • The --command-timeout flag is the maximum execution time for a single collection command. If the command hangs or the API is unresponsive, it will be interrupted after 2m by default.
  • The --request-interval flag allows you to configure pauses between command starts for the kubernetes API. By default, there is no delay.

An example of command execution has been added to Examples:

> d8 system collect-debug-info --help
Collect debug info from Deckhouse Kubernetes Platform.

 © Flant JSC 2025

Usage:
  d8 system collect-debug-info [flags] > deckhouse-debug-$(date +"%Y_%m_%d").tar.gz

Examples:
  # The --exclude flag can be used to exclude specific elements from the archive build:
  d8 system collect-debug-info --exclude ccm-logs,csi-controller-logs > deckhouse-debug-$(date +"%Y_%m_%d").tar.gz

  # The --list-exclude flag can be used to list all elements that can be excluded from the archive build
  d8 system collect-debug-info --list-exclude

Flags:
      --command-timeout duration    Timeout for each individual debug command execution (default 2m0s)
      --exclude strings             Exclude specific files from the debug archive. Use comma-separated values
  -h, --help                        help for collect-debug-info
  -l, --list-exclude                List all files that can be excluded from the debug archive
      --request-interval duration   Minimum interval between debug command executions to avoid overloading the cluster (e.g. 200ms, 500ms, 1s). Zero disables rate limiting (default 0s)

Global Flags:
      --context string      The name of the kubeconfig context to use
  -k, --kubeconfig string   KubeConfig of the cluster. (default is $KUBECONFIG when it is set, $HOME/.kube/config otherwise) (default "/Users/valery.losev/.kube/config")

Added data assemblies:

  • cert-manager-logs.txt
  • certificate-cert-manager.json
  • kube-system-control-plane-manager-logs.txt
  • kube-system-etcd-logs.txt
  • kube-system-kube-apiserver-logs.txt
  • kube-system-kube-controller-manager-logs.txt
  • kube-system-kube-scheduler-logs.txt
  • kube-system-kube-dns-logs.txt
  • prometheusremotewrites.json
  • mutatingwebhookconfigurations.json
  • validatingwebhookconfigurations.json
  • storage-deckhouse-io-terminating.txt (looks at all --api-group=storage.deckhouse.io objects and displays those that have metadata.deletionTimestamp set (meaning they have already been sent a delete message), but they are stuck in “terminating”)
  • namespace.json

P.S. Istio data collection now only occurs when the module is enabled.

Signed-off-by: Losev Valery <valery.losev@flant.com>
@VaLosev VaLosev self-assigned this May 18, 2026
@VaLosev VaLosev requested a review from ldmonster as a code owner May 18, 2026 09:31
VaLosev added 2 commits May 18, 2026 13:52
Signed-off-by: Losev Valery <valery.losev@flant.com>
Signed-off-by: Losev Valery <valery.losev@flant.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant