Add UCC/NCCL alltoallv vs DeepEP by ybenvidia · Pull Request #891 · NVIDIA/cloudai

ybenvidia · 2026-05-14T13:58:09Z

Add UCC/NCCL alltoallv vs DeepEP

add ucc alltoallv test
add nccl alltoallv test
add report generation for DeepEP vs NCCL vs UCC

coderabbitai · 2026-05-14T13:58:22Z

📝 Walkthrough

Walkthrough

This PR integrates DeepEP benchmark outputs with NCCL and UCC test frameworks by adding discovery utilities, a new MoE throughput reporter, matrix artifact mounting, and updated command generation strategies to enable dependent test execution and cross-workload metric visualization.

Changes

DeepEP Integration Feature

Layer / File(s)	Summary
DeepEP discovery and helper utilities `src/cloudai/workloads/deepep/deepep_combined_report.py`	New module exports `DEEPEP_PREV_MOUNT`, `start_post_comp_chain()` to traverse test dependencies, `deepep_benchmark_root()` to identify DeepEP outputs by marker files, and `deepep_results_json_files()` to collect result JSON files for downstream processors.
DeepEP MoE throughput reporter and registration `src/cloudai/workloads/deepep/deepep_moe_throughput_reporter.py`, `src/cloudai/workloads/deepep/__init__.py`, `src/cloudai/registration.py`	New `DeepEPMoEThroughputReporter` class extracts dispatch/combine bandwidth from DeepEP, UCC, and NCCL result files and renders an SVG bar chart; reporter is exported and registered as `"deepep_moe_throughput"` scenario report.
DeepEP report generation strategy updates `src/cloudai/workloads/deepep/report_generation_strategy.py`	Simplifies JSON discovery via `deepep_results_json_files()`, adds type guards for list/dict payloads, extends CSV columns with `bus_bw_avg`, `bus_bw_min`, `bus_bw_max`.
DeepEP command args and Slurm strategy updates `src/cloudai/workloads/deepep/deepep.py`, `src/cloudai/workloads/deepep/slurm_command_gen_strategy.py`	Adds `enable_tuning` field to `DeepEPCmdArgs`; updates head node detection docs, exports `MASTER_ADDR`/`MASTER_PORT`, mounts config file to target path, replaces torchrun with direct python invocation, broadens success pattern matching for bandwidth metrics.
NCCL alltoallv subtest and matrix field definitions `src/cloudai/workloads/nccl_test/nccl.py`	Extends `NCCLCmdArgs` with `alltoallv_perf_mpi` and `alltoallv_perf` subtest literals (single and list forms); adds `use_deepep_matrix` and `alltoallv_matrix_container_path` fields.
NCCL Slurm strategy integration with DeepEP matrices `src/cloudai/workloads/nccl_test/slurm_command_gen_strategy.py`	Adds matrix discovery helpers, conditionally mounts DeepEP nccl_matrix.txt and previous output, sets `ALLTOALLV_MATRIX_FILE` environment variable, implements specialized `alltoallv_perf_mpi` command generation with scalarized numeric arguments.
UCC integration with DeepEP matrices `src/cloudai/workloads/ucc_test/ucc.py`, `src/cloudai/workloads/ucc_test/slurm_command_gen_strategy.py`	Adds `use_deepep_matrix` field to `UCCCmdArgs`; implements `_deepep_ucc_matrix_host_path()` to locate ucc_matrix.txt, conditionally mounts matrix and previous directory, appends `--gen file:name=<container_path>` when matrix is available.
Configuration examples demonstrating integration `conf/experimental/test/deepep_standard.toml`, `conf/experimental/test/nccl_test_alltoallv.toml`, `conf/experimental/test_scenario/deepep_with_nccl_alltoallv.toml`, `conf/experimental/test_scenario/deepep_with_ucc_alltoallv.toml`	Updates deepep_standard tokens to 4096; defines nccl_test_alltoallv with alltoallv_perf_mpi subtest and DeepEP matrix flag; introduces test scenarios that chain DeepEP benchmark followed by NCCL or UCC tests via start_post_comp dependencies.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A DeepEP matrix flows through NCCL's veins,
UCC learns to dance to its bandwidth refrains,
From results to charts, in SVG delight,
Throughput blooms fair under MoE's new light!

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly reflects the main addition of UCC and NCCL alltoallv comparisons against DeepEP, which is the primary focus of the PR.
Description check	✅ Passed	The description is related to the changeset and outlines the key additions: UCC alltoallv test, NCCL alltoallv test, and report generation for cross-tool comparison.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cloudai/workloads/deepep/slurm_command_gen_strategy.py (1)
28-134: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix Ruff formatting for this file before merge.

CI is failing on ruff-format for this file; please run formatting and commit the normalized output to unblock the pipeline.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cloudai/workloads/deepep/slurm_command_gen_strategy.py` around lines 28 -
134, The file fails ruff-format checks; run the ruff formatter and commit the
normalized file so CI passes. Specifically, run your project's ruff/formatter
(e.g., ruff format) on the module containing methods like
_append_head_node_detection, _append_sbatch_directives, _container_mounts,
_generate_config_yaml and gen_srun_success_check, review the changes for any
unintended edits, and commit the formatted file to the branch.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/workloads/deepep/deepep_moe_throughput_reporter.py`:
- Line 15: The import of Reporter in deepep_moe_throughput_reporter.py currently
uses the private path cloudai._core.base_reporter; replace that with the allowed
public reporter interface used by other workloads (import the same Reporter
symbol from the public reporter module instead) so the file imports Reporter
from the public/allowed package rather than cloudai._core.base_reporter.
- Around line 23-25: The function _deepep_dispatch_combine_bars is too complex
and triggers C901 and has an unused loop variable causing B007; split it into
small helpers (e.g., deepep_results_json_files -> leave, add helpers like
_load_latest_results(path: Path) -> dict,
_extract_dispatch_combine_rows(results: dict) -> Iterable[dict], and
_row_to_bar(row: dict) -> tuple[str, float, str]) and have
_deepep_dispatch_combine_bars orchestrate these helpers to reduce cyclomatic
complexity; also rename any unused loop variable "lab" to "_lab" (or prefix with
underscore) to satisfy B007, and finally run ruff format to fix formatting
issues.

In `@src/cloudai/workloads/nccl_test/slurm_command_gen_strategy.py`:
- Around line 51-58: If use_deepep_matrix is true, don’t silently return None
when deepep_benchmark_root or the nccl matrix file is missing; instead fail fast
by raising a clear exception (e.g., RuntimeError/ValueError) from
_deepep_nccl_matrix_host_path that includes context (test/run id and that
use_deepep_matrix was requested) so the caller will stop the run; apply the same
change to the analogous methods referenced in the comment (the other
deepep-related path/resolution methods around lines 62-67 and 81-83, such as the
container-path resolver), and use deepep_benchmark_root and
_nccl_matrix_path_under_deepep_output in the error message to make the root
cause obvious.
- Around line 95-112: The alltoallv_perf_mpi branch only appends the fixed flags
and misses forwarding other configured NCCL options; update the branch in
slurm_command_gen_strategy.py (the block using _NCCL_TESTS_ALLTOALLV_PERF,
tdef.cmd_args and _nccl_cmd_scalar) to conditionally append the additional flags
present on tdef.cmd_args (e.g., stepfactor, nthreads, check, blocking and any
other optional fields) in the same way you handle minbytes/maxbytes/ngpus (use
_nccl_cmd_scalar where appropriate and add boolean flags when true), and keep
the existing inclusion of self.test_run.test.extra_args_str when
test.extra_cmd_args is set so TOML-configured options are not dropped.

In `@src/cloudai/workloads/ucc_test/slurm_command_gen_strategy.py`:
- Around line 55-64: If tdef.cmd_args.use_deepep_matrix is true, do not silently
return []; instead detect missing matrix by checking
deepep_benchmark_root(self.test_run) and self._deepep_ucc_matrix_host_path() and
raise a clear exception (or call fail-fast helper) when either is None and there
is no manual generation option provided (e.g. no tdef.cmd_args.gen or equivalent
flag). Update the logic in the block guarded by use_deepep_matrix (and the
analogous block later around the other check) to validate deepep_root and
matrix_host and raise an informative error mentioning the missing DeepEP matrix
and required --gen/manual generation flag rather than falling back to running
without a matrix.

---

Outside diff comments:
In `@src/cloudai/workloads/deepep/slurm_command_gen_strategy.py`:
- Around line 28-134: The file fails ruff-format checks; run the ruff formatter
and commit the normalized file so CI passes. Specifically, run your project's
ruff/formatter (e.g., ruff format) on the module containing methods like
_append_head_node_detection, _append_sbatch_directives, _container_mounts,
_generate_config_yaml and gen_srun_success_check, review the changes for any
unintended edits, and commit the formatted file to the branch.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 0416a4db-8f59-4498-afdb-cfa1d0fa7cce

📥 Commits

Reviewing files that changed from the base of the PR and between 2d672e1 and 12f6742.

📒 Files selected for processing (15)

conf/experimental/test/deepep_standard.toml
conf/experimental/test/nccl_test_alltoallv.toml
conf/experimental/test_scenario/deepep_with_nccl_alltoallv.toml
conf/experimental/test_scenario/deepep_with_ucc_alltoallv.toml
src/cloudai/registration.py
src/cloudai/workloads/deepep/__init__.py
src/cloudai/workloads/deepep/deepep.py
src/cloudai/workloads/deepep/deepep_combined_report.py
src/cloudai/workloads/deepep/deepep_moe_throughput_reporter.py
src/cloudai/workloads/deepep/report_generation_strategy.py
src/cloudai/workloads/deepep/slurm_command_gen_strategy.py
src/cloudai/workloads/nccl_test/nccl.py
src/cloudai/workloads/nccl_test/slurm_command_gen_strategy.py
src/cloudai/workloads/ucc_test/slurm_command_gen_strategy.py
src/cloudai/workloads/ucc_test/ucc.py

coderabbitai · 2026-05-14T14:02:36Z

+import re
+from pathlib import Path
+
+from cloudai._core.base_reporter import Reporter


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use an allowed public import path for Reporter to satisfy layering rules.

This import currently breaks the import-linter contract (cloudai.workloads → cloudai._core). Please switch to the public/allowed reporter interface module used elsewhere in workloads.

🧰 Tools

🪛 GitHub Actions: CI / Linting

[error] Ruff format check failed for this file (hook 'ruff-format' reformatted files, causing the hook to fail).

[error] import-linter contract broken: cloudai.workloads is not allowed to import cloudai._core (offending dependency: cloudai.workloads.deepep.deepep_moe_throughput_reporter -> cloudai._core.base_reporter).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/deepep/deepep_moe_throughput_reporter.py` at line 15, The import of Reporter in deepep_moe_throughput_reporter.py currently uses the private path cloudai._core.base_reporter; replace that with the allowed public reporter interface used by other workloads (import the same Reporter symbol from the public reporter module instead) so the file imports Reporter from the public/allowed package rather than cloudai._core.base_reporter.

coderabbitai · 2026-05-14T14:02:36Z

+def _deepep_dispatch_combine_bars(test_output: Path) -> list[tuple[str, float, str]]:
+    """From latest ``results.json``: one bar per ``dispatch`` / ``combine`` row (``bus_bw_avg``)."""
+    paths = deepep_results_json_files(test_output)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve Ruff blockers in this new reporter file.

CI is red on C901 (_deepep_dispatch_combine_bars complexity), B007 (unused lab), and formatting. Please split _deepep_dispatch_combine_bars into smaller helpers, rename unused loop vars (e.g., _lab), and run ruff format.

Also applies to: 164-166

🧰 Tools

🪛 GitHub Actions: CI / 2_Linting.txt

[error] 23-25: Ruff error (C901): _deepep_dispatch_combine_bars is too complex (11 > 10).

🪛 GitHub Actions: CI / Linting

[error] 23-25: Ruff check (C901): _deepep_dispatch_combine_bars is too complex (11 > 10).

[error] Ruff format check failed for this file (hook 'ruff-format' reformatted files, causing the hook to fail).

[error] import-linter contract broken: cloudai.workloads is not allowed to import cloudai._core (offending dependency: cloudai.workloads.deepep.deepep_moe_throughput_reporter -> cloudai._core.base_reporter).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/deepep/deepep_moe_throughput_reporter.py` around lines 23 - 25, The function _deepep_dispatch_combine_bars is too complex and triggers C901 and has an unused loop variable causing B007; split it into small helpers (e.g., deepep_results_json_files -> leave, add helpers like _load_latest_results(path: Path) -> dict, _extract_dispatch_combine_rows(results: dict) -> Iterable[dict], and _row_to_bar(row: dict) -> tuple[str, float, str]) and have _deepep_dispatch_combine_bars orchestrate these helpers to reduce cyclomatic complexity; also rename any unused loop variable "lab" to "_lab" (or prefix with underscore) to satisfy B007, and finally run ruff format to fix formatting issues.

coderabbitai · 2026-05-14T14:02:36Z

+    def _deepep_nccl_matrix_host_path(self) -> Path | None:
+        tdef: NCCLTestDefinition = cast(NCCLTestDefinition, self.test_run.test)
+        if not tdef.cmd_args.use_deepep_matrix:
+            return None
+        root = deepep_benchmark_root(self.test_run)
+        if root is None:
+            return None
+        return _nccl_matrix_path_under_deepep_output(root)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when use_deepep_matrix is enabled but matrix discovery fails.

Current flow silently skips mounts/env var when the DeepEP dependency or nccl_matrix.txt is missing, so the run proceeds without the requested matrix input and can produce misleading comparison data.

💡 Suggested fix

def _deepep_nccl_matrix_host_path(self) -> Path | None: tdef: NCCLTestDefinition = cast(NCCLTestDefinition, self.test_run.test) if not tdef.cmd_args.use_deepep_matrix: return None root = deepep_benchmark_root(self.test_run) - if root is None: - return None - return _nccl_matrix_path_under_deepep_output(root) + if root is None: + raise FileNotFoundError( + "use_deepep_matrix=true but no start_post_comp DeepEP output was found" + ) + matrix = _nccl_matrix_path_under_deepep_output(root) + if matrix is None: + raise FileNotFoundError( + f"use_deepep_matrix=true but nccl_matrix.txt was not found under {root}" + ) + return matrix

Also applies to: 62-67, 81-83

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/nccl_test/slurm_command_gen_strategy.py` around lines 51 - 58, If use_deepep_matrix is true, don’t silently return None when deepep_benchmark_root or the nccl matrix file is missing; instead fail fast by raising a clear exception (e.g., RuntimeError/ValueError) from _deepep_nccl_matrix_host_path that includes context (test/run id and that use_deepep_matrix was requested) so the caller will stop the run; apply the same change to the analogous methods referenced in the comment (the other deepep-related path/resolution methods around lines 62-67 and 81-83, such as the container-path resolver), and use deepep_benchmark_root and _nccl_matrix_path_under_deepep_output in the error message to make the root cause obvious.

coderabbitai · 2026-05-14T14:02:36Z

+        if tdef.cmd_args.subtest_name == "alltoallv_perf_mpi":
+            a = tdef.cmd_args
+            parts: List[str] = [
+                _NCCL_TESTS_ALLTOALLV_PERF,
+                "-b",
+                str(_nccl_cmd_scalar(a.minbytes)),
+                "-e",
+                str(_nccl_cmd_scalar(a.maxbytes)),
+                "-g",
+                str(_nccl_cmd_scalar(a.ngpus)),
+                "-w",
+                str(_nccl_cmd_scalar(a.warmup_iters)),
+                "-n",
+                str(_nccl_cmd_scalar(a.iters)),
+            ]
+            if self.test_run.test.extra_cmd_args:
+                parts.append(self.test_run.test.extra_args_str)
+            return parts


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Dedicated alltoallv_perf_mpi path drops configured NCCL options.

This branch only forwards -b/-e/-g/-w/-n and ignores other configured fields (e.g. stepfactor, nthreads, check, blocking), so TOML values are silently ineffective for this subtest.

💡 Suggested fix

if tdef.cmd_args.subtest_name == "alltoallv_perf_mpi": a = tdef.cmd_args parts: List[str] = [ _NCCL_TESTS_ALLTOALLV_PERF, "-b", str(_nccl_cmd_scalar(a.minbytes)), "-e", str(_nccl_cmd_scalar(a.maxbytes)), "-g", str(_nccl_cmd_scalar(a.ngpus)), "-w", str(_nccl_cmd_scalar(a.warmup_iters)), "-n", str(_nccl_cmd_scalar(a.iters)), ] + optional_flags = [ + ("-f", a.stepfactor), + ("-t", a.nthreads), + ("-c", a.check), + ("-z", a.blocking), + ] + for flag, raw_value in optional_flags: + value = _nccl_cmd_scalar(raw_value) + if value is not None: + parts.extend([flag, str(value)]) if self.test_run.test.extra_cmd_args: parts.append(self.test_run.test.extra_args_str) return parts

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if tdef.cmd_args.subtest_name == "alltoallv_perf_mpi":

a = tdef.cmd_args

parts: List[str] = [

_NCCL_TESTS_ALLTOALLV_PERF,

"-b",

str(_nccl_cmd_scalar(a.minbytes)),

"-e",

str(_nccl_cmd_scalar(a.maxbytes)),

"-g",

str(_nccl_cmd_scalar(a.ngpus)),

"-w",

str(_nccl_cmd_scalar(a.warmup_iters)),

"-n",

str(_nccl_cmd_scalar(a.iters)),

]

if self.test_run.test.extra_cmd_args:

parts.append(self.test_run.test.extra_args_str)

return parts

if tdef.cmd_args.subtest_name == "alltoallv_perf_mpi":

a = tdef.cmd_args

parts: List[str] = [

_NCCL_TESTS_ALLTOALLV_PERF,

"-b",

str(_nccl_cmd_scalar(a.minbytes)),

"-e",

str(_nccl_cmd_scalar(a.maxbytes)),

"-g",

str(_nccl_cmd_scalar(a.ngpus)),

"-w",

str(_nccl_cmd_scalar(a.warmup_iters)),

"-n",

str(_nccl_cmd_scalar(a.iters)),

]

optional_flags = [

("-f", a.stepfactor),

("-t", a.nthreads),

("-c", a.check),

("-z", a.blocking),

]

for flag, raw_value in optional_flags:

value = _nccl_cmd_scalar(raw_value)

if value is not None:

parts.extend([flag, str(value)])

if self.test_run.test.extra_cmd_args:

parts.append(self.test_run.test.extra_args_str)

return parts

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/nccl_test/slurm_command_gen_strategy.py` around lines 95 - 112, The alltoallv_perf_mpi branch only appends the fixed flags and misses forwarding other configured NCCL options; update the branch in slurm_command_gen_strategy.py (the block using _NCCL_TESTS_ALLTOALLV_PERF, tdef.cmd_args and _nccl_cmd_scalar) to conditionally append the additional flags present on tdef.cmd_args (e.g., stepfactor, nthreads, check, blocking and any other optional fields) in the same way you handle minbytes/maxbytes/ngpus (use _nccl_cmd_scalar where appropriate and add boolean flags when true), and keep the existing inclusion of self.test_run.test.extra_args_str when test.extra_cmd_args is set so TOML-configured options are not dropped.

coderabbitai · 2026-05-14T14:02:36Z

+        if not tdef.cmd_args.use_deepep_matrix:
+            return []
+
+        deepep_root = deepep_benchmark_root(self.test_run)
+        if deepep_root is None:
+            return []
+
+        matrix_host = self._deepep_ucc_matrix_host_path()
+        if matrix_host is None:
+            return []


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when use_deepep_matrix is enabled but the matrix is missing.

Current flow silently falls back (no mount + no --gen) and can run a non-matrix UCC test while the scenario claims DeepEP-matrix-driven comparison. Please raise an explicit error when matrix resolution fails and no manual gen is provided.

Suggested fix

def _container_mounts(self) -> List[str]: tdef: UCCTestDefinition = cast(UCCTestDefinition, self.test_run.test) if not tdef.cmd_args.use_deepep_matrix: return [] @@ matrix_host = self._deepep_ucc_matrix_host_path() - if matrix_host is None: - return [] + if matrix_host is None: + raise FileNotFoundError( + "use_deepep_matrix=true but no ucc_matrix.txt was found in DeepEP outputs" + ) @@ def generate_test_command(self) -> List[str]: @@ - elif self._deepep_ucc_matrix_host_path() is not None: - srun_command_parts.append(f"--gen file:name={_UCC_GEN_MATRIX_CONTAINER}") + elif tdef_cmd_args.use_deepep_matrix: + matrix_host = self._deepep_ucc_matrix_host_path() + if matrix_host is None: + raise FileNotFoundError( + "use_deepep_matrix=true but no ucc_matrix.txt was found in DeepEP outputs" + ) + srun_command_parts.append(f"--gen file:name={_UCC_GEN_MATRIX_CONTAINER}")

Also applies to: 85-86

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/ucc_test/slurm_command_gen_strategy.py` around lines 55 - 64, If tdef.cmd_args.use_deepep_matrix is true, do not silently return []; instead detect missing matrix by checking deepep_benchmark_root(self.test_run) and self._deepep_ucc_matrix_host_path() and raise a clear exception (or call fail-fast helper) when either is None and there is no manual generation option provided (e.g. no tdef.cmd_args.gen or equivalent flag). Update the logic in the block guarded by use_deepep_matrix (and the analogous block later around the other check) to validate deepep_root and matrix_host and raise an informative error mentioning the missing DeepEP matrix and required --gen/manual generation flag rather than falling back to running without a matrix.

podkidyshev · 2026-05-15T13:11:07Z

please resolve coderabbit comments and make CI pass before review

ping me again directly once it's done please

ybenvidia and others added 6 commits March 5, 2026 13:09

add tuning option

e912a9c

Merge branch 'NVIDIA:main' into dp-benchmark

aee7ebb

Merge branch 'NVIDIA:main' into dp-benchmark

ed01ea8

Merge branch 'NVIDIA:main' into dp-benchmark

8b029c9

add ucc/nccl all2allv

1b85a6e

Merge branch 'NVIDIA:main' into dp-benchmark

12f6742

ybenvidia requested review from jeffnvidia, podkidyshev and srivatsankrishnan as code owners May 14, 2026 13:58

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add UCC/NCCL alltoallv vs DeepEP#891

Add UCC/NCCL alltoallv vs DeepEP#891
ybenvidia wants to merge 6 commits into
NVIDIA:mainfrom
ybenvidia:main

ybenvidia commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

podkidyshev commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ybenvidia commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

podkidyshev commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 14, 2026 •

edited

Loading

podkidyshev commented May 15, 2026 •

edited

Loading