Skip to content

feat: FP8 quantization robustness, quality harness, kvo_cache patch#14

Open
forkni wants to merge 9 commits into
SDTD_031_devfrom
feat/quantization-robustness
Open

feat: FP8 quantization robustness, quality harness, kvo_cache patch#14
forkni wants to merge 9 commits into
SDTD_031_devfrom
feat/quantization-robustness

Conversation

@forkni
Copy link
Copy Markdown
Collaborator

@forkni forkni commented May 18, 2026

Summary

  • FP8 QDQ hardening: finite-scale gate prevents NaN/inf from propagating through quantized layers; fused-MHA layer count logged for diagnostics
  • FP8 calibration fix: per-input-aware tile selection for static-dim0 ONNX inputs; fixes CFG batch mismatch during calibration
  • kvo_cache patch: ports varshith15 KV-cache optimization onto diffusers 0.38.0 via runtime monkey-patch; backward-compat sentinel added so ControlNet ONNX export (which inspects return arity) does not break
  • Quality regression harness: FP16-TRT goldens + SSIM/LPIPS thresholds and manifest; automated pass/fail for future PRs
  • cp1252 fix: replaces emoji chars in SDXL ONNX size warning to avoid encode error on Windows
  • Formatter pass: pre-existing quote/whitespace diffs staged and cleared

Test plan

  • Cold-start SDXL-Turbo TRT pipeline with FP8 enabled — confirm no NaN in outputs
  • Run quality harness (scripts/quality_harness.py or equivalent) — all goldens pass SSIM/LPIPS thresholds
  • Export ControlNet ONNX with kvo_cache patch active — confirm no return-arity error
  • Windows launch — confirm no cp1252 UnicodeEncodeError in SD log

🤖 Generated with Claude Code

forkni and others added 9 commits May 16, 2026 08:36
kvo_cache_in_* tensors have ONNX dim 0 = 2 (hard-static K/V pair), not a
symbolic batch dim. The previous naïve _max_rows tile pumped sample to
2×_n_itr rows, causing modelopt's CalibrationDataProvider to compute
n_itr=2×_n_itr (sample's symbolic dim 0 resolves to 1) and split kvo into
chunks of shape (1,...) instead of (2,...) — ORT then rejected them with
"Got 1 Expected 2".

Fix: compute per-input target_rows = n_itr × resolved_dim0(name), mirroring
modelopt's symbolic→1/static-kept substitution, so every input splits into
exactly n_itr uniform chunks. Adds regression test in tests/quality/.

Fixes SDXL-Turbo + use_cached_attn=True + cfg_type=self + use_controlnet TD config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant