Skip to content

vLLM/SGLANG: add semantic degradation support#890

Open
podkidyshev wants to merge 5 commits into
mainfrom
ipod/llm-semantic-degradation
Open

vLLM/SGLANG: add semantic degradation support#890
podkidyshev wants to merge 5 commits into
mainfrom
ipod/llm-semantic-degradation

Conversation

@podkidyshev
Copy link
Copy Markdown
Contributor

Summary

  • Add semantic degradation support to vLLM and SGLANG

Test Plan

  • Automated CI
  • Manual runs

Additional Notes

@podkidyshev podkidyshev self-assigned this May 13, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 2bedbef7-785e-41b9-90aa-4552784f02af

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ipod/llm-semantic-degradation

Comment @coderabbitai help to get the list of available commands and usage tips.

@podkidyshev podkidyshev marked this pull request as ready for review May 20, 2026 21:47
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@conf/experimental/vllm/test/vllm.toml`:
- Around line 21-24: The git_repos entry in vllm.toml currently pins commit =
"main", which is mutable; change the commit value to an immutable revision (a
specific full commit SHA or a tagged release) so the entry under [[git_repos]]
(url, commit, mount_as) uses a fixed commit SHA instead of "main"; update commit
= "<full-commit-sha-or-tag>" to ensure reproducible builds and CI.

In `@src/cloudai/workloads/common/llm_serving.py`:
- Around line 633-643: _gen_semantic_eval_block currently emits the semantic
eval command raw, skipping the project's custom_bash wrapper; change it to build
the semantic command via get_semantic_eval_command() and then wrap the full srun
invocation using the existing helper _with_custom_bash(...) so the semantic eval
runs in the same custom_bash environment as other steps. Concretely, keep using
self.semantic_eval_log_file and self.test_run.output_path, but replace the raw
f-string return with a call that passes the srun_prefix and the joined
semantic_cmd through self._with_custom_bash(...) (mirroring how serve/bench use
custom_bash) so the final returned string is the wrapped command.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 9ca78aef-fa48-4a85-ae69-192d360d60a3

📥 Commits

Reviewing files that changed from the base of the PR and between 1099207 and b32782a.

📒 Files selected for processing (22)
  • README.md
  • conf/experimental/sglang/test/sglang.toml
  • conf/experimental/sglang/test_scenario/sglang.toml
  • conf/experimental/vllm/test/vllm.toml
  • conf/experimental/vllm/test_scenario/vllm.toml
  • doc/workloads/sglang.rst
  • doc/workloads/vllm.rst
  • src/cloudai/workloads/common/llm_serving.py
  • src/cloudai/workloads/sglang/__init__.py
  • src/cloudai/workloads/sglang/report_generation_strategy.py
  • src/cloudai/workloads/sglang/sglang.py
  • src/cloudai/workloads/sglang/slurm_command_gen_strategy.py
  • src/cloudai/workloads/vllm/__init__.py
  • src/cloudai/workloads/vllm/report_generation_strategy.py
  • src/cloudai/workloads/vllm/slurm_command_gen_strategy.py
  • src/cloudai/workloads/vllm/vllm.py
  • tests/workloads/sglang/test_command_gen_strategy_slurm.py
  • tests/workloads/sglang/test_job_status_retrieval_strategy.py
  • tests/workloads/sglang/test_report_gen_strategy.py
  • tests/workloads/vllm/test_command_gen_strategy_slurm.py
  • tests/workloads/vllm/test_job_status_retrieval_strategy.py
  • tests/workloads/vllm/test_report_gen_strategy.py

Comment on lines +21 to +24
[[git_repos]]
url = "https://github.com/vllm-project/vllm.git"
commit = "main"
mount_as = "/vllm_repo"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin git_repos.commit to an immutable revision.

Line 23 uses main, which makes semantic-eval behavior drift over time and weakens reproducibility/debuggability for benchmark results and CI.

Suggested change
 [[git_repos]]
 url = "https://github.com/vllm-project/vllm.git"
-commit = "main"
+commit = "<pinned-tag-or-sha>"
 mount_as = "/vllm_repo"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/experimental/vllm/test/vllm.toml` around lines 21 - 24, The git_repos
entry in vllm.toml currently pins commit = "main", which is mutable; change the
commit value to an immutable revision (a specific full commit SHA or a tagged
release) so the entry under [[git_repos]] (url, commit, mount_as) uses a fixed
commit SHA instead of "main"; update commit = "<full-commit-sha-or-tag>" to
ensure reproducible builds and CI.

Comment on lines +633 to +643
def _gen_semantic_eval_block(self, srun_prefix: str) -> str:
semantic_cmd = self.get_semantic_eval_command()
if not semantic_cmd:
return ""

return f"""\

echo "Running semantic validation..."
{srun_prefix} \\
--output={(self.test_run.output_path / self.semantic_eval_log_file).absolute()} \\
{" ".join(semantic_cmd)}"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Apply custom_bash wrapping to semantic eval commands.

Semantic evaluation is the only launched step that bypasses _with_custom_bash(...). If custom_bash is used to set up runtime env, serve/bench can pass while semantic-eval fails or runs in a different environment.

💡 Proposed fix
 def _gen_semantic_eval_block(self, srun_prefix: str) -> str:
     semantic_cmd = self.get_semantic_eval_command()
     if not semantic_cmd:
         return ""
+    semantic_tail = self._with_custom_bash(" ".join(semantic_cmd))

     return f"""\
 
 echo "Running semantic validation..."
 {srun_prefix} \\
     --output={(self.test_run.output_path / self.semantic_eval_log_file).absolute()} \\
-    {" ".join(semantic_cmd)}"""
+    {semantic_tail}"""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _gen_semantic_eval_block(self, srun_prefix: str) -> str:
semantic_cmd = self.get_semantic_eval_command()
if not semantic_cmd:
return ""
return f"""\
echo "Running semantic validation..."
{srun_prefix} \\
--output={(self.test_run.output_path / self.semantic_eval_log_file).absolute()} \\
{" ".join(semantic_cmd)}"""
def _gen_semantic_eval_block(self, srun_prefix: str) -> str:
semantic_cmd = self.get_semantic_eval_command()
if not semantic_cmd:
return ""
semantic_tail = self._with_custom_bash(" ".join(semantic_cmd))
return f"""\
echo "Running semantic validation..."
{srun_prefix} \\
--output={(self.test_run.output_path / self.semantic_eval_log_file).absolute()} \\
{semantic_tail}"""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/workloads/common/llm_serving.py` around lines 633 - 643,
_gen_semantic_eval_block currently emits the semantic eval command raw, skipping
the project's custom_bash wrapper; change it to build the semantic command via
get_semantic_eval_command() and then wrap the full srun invocation using the
existing helper _with_custom_bash(...) so the semantic eval runs in the same
custom_bash environment as other steps. Concretely, keep using
self.semantic_eval_log_file and self.test_run.output_path, but replace the raw
f-string return with a call that passes the srun_prefix and the joined
semantic_cmd through self._with_custom_bash(...) (mirroring how serve/bench use
custom_bash) so the final returned string is the wrapped command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant