Skip to content

test: enhance e2e diagnostics for tenant readiness failures#127

Open
GatewayJ wants to merge 2 commits into
rustfs:mainfrom
GatewayJ:feat/cert-manager-tls
Open

test: enhance e2e diagnostics for tenant readiness failures#127
GatewayJ wants to merge 2 commits into
rustfs:mainfrom
GatewayJ:feat/cert-manager-tls

Conversation

@GatewayJ
Copy link
Copy Markdown
Member

Type of Change

  • Test/CI
  • New Feature
  • Bug Fix
  • Documentation
  • Performance Improvement
  • Refactor
  • Other:

Related Issues

N/A

Summary of Changes

  • Improve e2e live failure forensics by enriching Kubernetes snapshot collection.
  • Add dedicated snapshot commands for tenant describe, PVC/PV/PV path visibility, RustFS pod logs (current + previous), and tenant/Pod descriptions.
  • Add automatic snapshot diagnosis output (diagnosis.txt) and structured return values for easier failure handling.
  • Detect and explain known startup signatures, including erasure read quorum and DNS lookup failures, with recovery guidance.
  • In KindCluster cleanup/create flow, add explicit host storage directory checks:
    • pre-create reset requires directories to be empty,
    • post-delete requires directories to be absent.
  • Keep diagnostic logic isolated in e2e harness/test layer only; no operator runtime behavior changes.

Checklist

  • I have read and followed the CONTRIBUTING.md guidelines
  • Passed make pre-commit
  • Added/updated necessary tests
  • Documentation updated (if needed)
  • CHANGELOG.md updated under [Unreleased] (if user-visible change)
  • CI/CD passed (if applicable)

Impact

  • Breaking change (CRD/API compatibility)
  • Requires doc/config/deployment update
  • Other impact:
    • N/A

Verification

make pre-commit
cargo test --manifest-path e2e/Cargo.toml -- --quiet
cargo fmt --manifest-path e2e/Cargo.toml --check
cargo clippy --manifest-path e2e/Cargo.toml --all-targets -- -D warnings

Additional Notes

  • This change intentionally keeps observability/debug instrumentation in e2e harness only.

GatewayJ added 2 commits May 14, 2026 14:54
- enrich live e2e snapshot with tenant/PV/pod logs for startup issues\n- add explicit snapshot diagnosis for erasure read quorum and DNS lookup failures\n- print diagnosis + artifact path directly in smoke and cert TLS failure paths\n- harden kind storage cleanup checks for non-empty/leftover host dirs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5526485995

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/reconcile/phases.rs
Comment on lines +478 to +480
if let Some(tls_status) = tls_plan.status {
builder.set_tls_status(tls_status);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear TLS status when TLS is disabled

When spec.tls is removed or switched to Disabled, tls_plan.status is None and this branch skips any update to TLS fields, but StatusBuilder::from_tenant starts from the previous status snapshot. That leaves stale status.certificates.tls (and its old hash/error context) visible after TLS is turned off, which can mislead console/API consumers and automation that read tenant status. Please explicitly clear TLS status/condition when the plan is disabled.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant