fix: rate-limit Conv2d kernel-size underflow errors from community plugins#920
Open
livepeer-tessa wants to merge 2 commits into
Open
fix: rate-limit Conv2d kernel-size underflow errors from community plugins#920livepeer-tessa wants to merge 2 commits into
livepeer-tessa wants to merge 2 commits into
Conversation
added 2 commits
April 12, 2026 06:26
…FileNotFoundError Issue #916: ltx2 i2v_image path from client's /tmp crashes every chunk Root cause (two parts): 1. _sanitize_initial_params was only called at pipeline LOAD time, not during runtime parameter updates received over WebSocket. When the frontend sends i2v_image mid-session (e.g. user picks a Reference Image while already streaming), the path from the client machine (/tmp/.daydream-scope/assets/foo.png) is forwarded raw to the fal.ai worker where that /tmp path doesn't exist. 2. Even with the right path, a race condition between CDN download and the first chunk causes FileNotFoundError on every chunk — 2500+ per session in the observed window — flooding Grafana logs. Fixes: - Add _sanitize_asset_path and _sanitize_initial_params static methods to PipelineManager (mirrors the fix from PR #827 for Windows paths, extended to also catch Linux /tmp paths from foreign machines). - Call _sanitize_initial_params in PipelineManager._load_pipeline_implementation for plugin pipelines (load-time fix). - Call _sanitize_initial_params in FrameProcessor.update_parameters for all runtime parameter updates (WebSocket mid-session updates). - In PipelineProcessor.process_chunk, catch FileNotFoundError separately and rate-limit repeated log entries for the same missing path to one per 30s, preventing log floods while still making the error visible. Tests: 9 new tests in TestSanitizeAssetPath covering Windows paths, Linux /tmp foreign paths, relative paths, None values, and list params. Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>
…ugins The flux-klein community plugin (and other third-party pipelines) can produce a Conv2d padded-input-underflow RuntimeError on every processed chunk when the input video is too narrow (e.g. 2px after padding < 3x3 kernel). This fires 2+ errors/second and floods Grafana / CloudWatch identically to the FileNotFoundError pattern fixed in #524. Changes in pipeline_processor.py: - Add _is_conv2d_kernel_underflow() static helper that matches PyTorch's canonical 'Kernel size can\'t be greater than actual input size' message - Add _conv2d_last_logged / _CONV2D_LOG_INTERVAL instance fields (same 30-second suppression window as FileNotFoundError rate-limiting) - Add explicit except RuntimeError handler in process_chunk(): if the error matches the Conv2d underflow pattern, rate-limit to one ERROR per 30s per unique message, then continue (the user may fix resolution via a parameter update). Non-matching RuntimeErrors fall through to the existing generic except Exception handler unchanged. This is a general fix for all community plugins lacking minimum-resolution guards. The pipeline stays alive across the underflow and recovers if the user adjusts the input resolution. Related: #557 (VaceEncodingBlock), #673 (temporal kernel underflow), #713 (WAN VAE spatial dims), #917 (flux-klein, this issue) Fixes #917 Signed-off-by: livepeer-tessa <robot@livepeer.org> Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Contributor
🚀 fal.ai Preview Deployment
Livepeer Runner
Testing Livepeer Mode |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #917 — the
flux-kleincommunity plugin causes a Conv2dKernel size can't be greater than actual input sizeRuntimeError on every processed chunk when the input frame is too narrow (2 px wide after padding < 3×3 kernel). This floods Grafana / CloudWatch at 2+ errors/second, identical to the FileNotFoundError flood pattern fixed in #524.Root Cause
Community plugins don't necessarily enforce minimum input dimensions (unlike built-in pipelines like the WAN VAE encoder fixed in #713). The
pipeline_processor.pyexception handler passes all non-OOMExceptioninstances to the generic logger — no rate limiting, so the error repeats on every chunk indefinitely.Changes
src/scope/server/pipeline_processor.py_is_conv2d_kernel_underflow(err_msg)static helper that matches PyTorch's canonical error string ("Kernel size can't be greater than actual input size")_conv2d_last_logged/_CONV2D_LOG_INTERVALinstance fields (30-second suppression window, same as_fnf_last_logged)except RuntimeErrorhandler inprocess_chunk(): if the message matches the Conv2d underflow pattern, rate-limit to one ERROR per 30 s per unique message, then continue. Non-matching RuntimeErrors fall through to the existing genericexcept Exceptionhandler unchanged.Behaviour
Related Issues