Skip to content

fix: rate-limit Conv2d kernel-size underflow errors from community plugins#920

Open
livepeer-tessa wants to merge 2 commits into
mainfrom
fix/917-conv2d-kernel-underflow-rate-limit
Open

fix: rate-limit Conv2d kernel-size underflow errors from community plugins#920
livepeer-tessa wants to merge 2 commits into
mainfrom
fix/917-conv2d-kernel-underflow-rate-limit

Conversation

@livepeer-tessa
Copy link
Copy Markdown
Contributor

Summary

Fixes #917 — the flux-klein community plugin causes a Conv2d Kernel size can't be greater than actual input size RuntimeError on every processed chunk when the input frame is too narrow (2 px wide after padding < 3×3 kernel). This floods Grafana / CloudWatch at 2+ errors/second, identical to the FileNotFoundError flood pattern fixed in #524.

Root Cause

Community plugins don't necessarily enforce minimum input dimensions (unlike built-in pipelines like the WAN VAE encoder fixed in #713). The pipeline_processor.py exception handler passes all non-OOM Exception instances to the generic logger — no rate limiting, so the error repeats on every chunk indefinitely.

Changes

src/scope/server/pipeline_processor.py

  • Add _is_conv2d_kernel_underflow(err_msg) static helper that matches PyTorch's canonical error string ("Kernel size can't be greater than actual input size")
  • Add _conv2d_last_logged / _CONV2D_LOG_INTERVAL instance fields (30-second suppression window, same as _fnf_last_logged)
  • Add explicit except RuntimeError handler in process_chunk(): if the message matches the Conv2d underflow pattern, rate-limit to one ERROR per 30 s per unique message, then continue. Non-matching RuntimeErrors fall through to the existing generic except Exception handler unchanged.

Behaviour

Before After
2+ ERROR logs/second flooding Grafana on every chunk One ERROR per 30 s, then suppressed
Error message: PyTorch's opaque RuntimeError Augmented message: identifies spatial-dimension root cause + points to #917 / #713
Pipeline terminates on next exception escalation path Pipeline stays alive; recovers if user corrects resolution via param update

Related Issues

Tessa (livepeer-tessa) added 2 commits April 12, 2026 06:26
…FileNotFoundError

Issue #916: ltx2 i2v_image path from client's /tmp crashes every chunk

Root cause (two parts):
1. _sanitize_initial_params was only called at pipeline LOAD time, not
   during runtime parameter updates received over WebSocket. When the
   frontend sends i2v_image mid-session (e.g. user picks a Reference Image
   while already streaming), the path from the client machine
   (/tmp/.daydream-scope/assets/foo.png) is forwarded raw to the fal.ai
   worker where that /tmp path doesn't exist.

2. Even with the right path, a race condition between CDN download and the
   first chunk causes FileNotFoundError on every chunk — 2500+ per session
   in the observed window — flooding Grafana logs.

Fixes:
- Add _sanitize_asset_path and _sanitize_initial_params static methods to
  PipelineManager (mirrors the fix from PR #827 for Windows paths, extended
  to also catch Linux /tmp paths from foreign machines).
- Call _sanitize_initial_params in PipelineManager._load_pipeline_implementation
  for plugin pipelines (load-time fix).
- Call _sanitize_initial_params in FrameProcessor.update_parameters for all
  runtime parameter updates (WebSocket mid-session updates).
- In PipelineProcessor.process_chunk, catch FileNotFoundError separately and
  rate-limit repeated log entries for the same missing path to one per 30s,
  preventing log floods while still making the error visible.

Tests: 9 new tests in TestSanitizeAssetPath covering Windows paths, Linux
/tmp foreign paths, relative paths, None values, and list params.

Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>
…ugins

The flux-klein community plugin (and other third-party pipelines) can produce
a Conv2d padded-input-underflow RuntimeError on every processed chunk when the
input video is too narrow (e.g. 2px after padding < 3x3 kernel). This fires
2+ errors/second and floods Grafana / CloudWatch identically to the
FileNotFoundError pattern fixed in #524.

Changes in pipeline_processor.py:
- Add _is_conv2d_kernel_underflow() static helper that matches PyTorch's
  canonical 'Kernel size can\'t be greater than actual input size' message
- Add _conv2d_last_logged / _CONV2D_LOG_INTERVAL instance fields (same
  30-second suppression window as FileNotFoundError rate-limiting)
- Add explicit except RuntimeError handler in process_chunk(): if the
  error matches the Conv2d underflow pattern, rate-limit to one ERROR
  per 30s per unique message, then continue (the user may fix resolution
  via a parameter update). Non-matching RuntimeErrors fall through to the
  existing generic except Exception handler unchanged.

This is a general fix for all community plugins lacking minimum-resolution
guards. The pipeline stays alive across the underflow and recovers if the
user adjusts the input resolution.

Related: #557 (VaceEncodingBlock), #673 (temporal kernel underflow),
         #713 (WAN VAE spatial dims), #917 (flux-klein, this issue)

Fixes #917

Signed-off-by: livepeer-tessa <robot@livepeer.org>
Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>
@livepeer-tessa livepeer-tessa added the bug Something isn't working label Apr 12, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 12, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 07ab0cea-c94b-4771-afed-d2cff49e51d6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/917-conv2d-kernel-underflow-rate-limit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

🚀 fal.ai Preview Deployment

App ID daydream/scope-pr-920--preview
WebSocket wss://fal.run/daydream/scope-pr-920--preview/ws
Commit f718914

Livepeer Runner

App ID daydream/scope-livepeer-pr-920--preview
WebSocket wss://fal.run/daydream/scope-livepeer-pr-920--preview/ws
Auth private

Testing Livepeer Mode

SCOPE_CLOUD_MODE=livepeer SCOPE_CLOUD_APP_ID="daydream/scope-livepeer-pr-920--preview/ws" uv run daydream-scope

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[fal.ai] flux-klein plugin: Conv2d padded input underflow — kernel (3×3) > input width (2px) causes chunk errors

1 participant