feat: FP8 quantization robustness, quality harness, kvo_cache patch by forkni · Pull Request #14 · dotsimulate/StreamDiffusion

forkni · 2026-05-18T05:41:02Z

Summary

FP8 QDQ hardening: finite-scale gate prevents NaN/inf from propagating through quantized layers; fused-MHA layer count logged for diagnostics
FP8 calibration fix: per-input-aware tile selection for static-dim0 ONNX inputs; fixes CFG batch mismatch during calibration
kvo_cache patch: ports varshith15 KV-cache optimization onto diffusers 0.38.0 via runtime monkey-patch; backward-compat sentinel added so ControlNet ONNX export (which inspects return arity) does not break
Quality regression harness: FP16-TRT goldens + SSIM/LPIPS thresholds and manifest; automated pass/fail for future PRs
cp1252 fix: replaces emoji chars in SDXL ONNX size warning to avoid encode error on Windows
Formatter pass: pre-existing quote/whitespace diffs staged and cleared

Test plan

Cold-start SDXL-Turbo TRT pipeline with FP8 enabled — confirm no NaN in outputs
Run quality harness (scripts/quality_harness.py or equivalent) — all goldens pass SSIM/LPIPS thresholds
Export ControlNet ONNX with kvo_cache patch active — confirm no return-arity error
Windows launch — confirm no cp1252 UnicodeEncodeError in SD log

🤖 Generated with Claude Code

…PIPS metrics

…me monkey-patch

…code error on Windows

… batch mismatch in calibration

…tion)

kvo_cache_in_* tensors have ONNX dim 0 = 2 (hard-static K/V pair), not a symbolic batch dim. The previous naïve _max_rows tile pumped sample to 2×_n_itr rows, causing modelopt's CalibrationDataProvider to compute n_itr=2×_n_itr (sample's symbolic dim 0 resolves to 1) and split kvo into chunks of shape (1,...) instead of (2,...) — ORT then rejected them with "Got 1 Expected 2". Fix: compute per-input target_rows = n_itr × resolved_dim0(name), mirroring modelopt's symbolic→1/static-kept substitution, so every input splits into exactly n_itr uniform chunks. Adds regression test in tests/quality/. Fixes SDXL-Turbo + use_cached_attn=True + cfg_type=self + use_controlnet TD config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…kward-compat return arity

forkni and others added 9 commits May 16, 2026 08:36

chore: gitignore local cgw tooling; add PR audit doc

5c5d5a6

feat: add FP8 QDQ finite-scale gate and fused-MHA layer count

275f293

feat: add quality regression harness with FP16-TRT goldens and SSIM/L…

90091d5

…PIPS metrics

feat: port varshith15 kvo_cache patch onto diffusers 0.38.0 via runti…

d1763bb

…me monkey-patch

fix: replace emoji chars in SDXL ONNX size warning to avoid cp1252 en…

62548fb

…code error on Windows

feat: seed quality-harness goldens, manifest, thresholds; fix FP8 CFG…

67c74c2

… batch mismatch in calibration

chore: stage pre-existing formatter diffs (quote/whitespace normalisa…

4b4aaf7

…tion)

fix: kvo_cache patch breaks ControlNet ONNX export — sentinel for bac…

1a8065f

…kward-compat return arity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: FP8 quantization robustness, quality harness, kvo_cache patch#14

feat: FP8 quantization robustness, quality harness, kvo_cache patch#14
forkni wants to merge 9 commits into
SDTD_031_devfrom
feat/quantization-robustness

forkni commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

forkni commented May 18, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant