[Configurator] Add GymnasiumAdapter for CloudAI envs#894
Conversation
📝 WalkthroughWalkthroughAdds GymnasiumAdapter wrapping BaseGym to expose Gymnasium-style action/observation spaces and APIs; makes CloudAIGym observation size metric-driven; exports the adapter; adds optional gymnasium dev/rl extras and a comprehensive pytest suite for the adapter. ChangesGymnasium adapter integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/configurator/gymnasium_adapter.py`:
- Around line 92-112: decode_action and step_raw currently allow partial or
unexpected parameter sets which can silently produce incorrect runs; update
decode_action, step, and step_raw to validate that the final parameter keys
exactly match the expected set (the union of self._tunable_params.keys() and
self._fixed_params.keys()) before calling _step_with_params, and raise a clear
exception (e.g., ValueError) listing missing or unexpected keys if validation
fails so the caller fails fast; enforce the same check when step builds params
from decode_action and when step_raw is passed params directly, and reference
the symbols decode_action, step, step_raw, self._tunable_params,
self._fixed_params, and _step_with_params to locate the changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 9474faf6-0f97-42f8-82a0-1e23932471bc
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
pyproject.tomlsrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/gymnasium_adapter.pytests/test_gymnasium_adapter.py
- GymnasiumAdapter wraps a CloudAI BaseGym as a gymnasium-compatible env: Dict of Discrete actions over the tunable params, float32 Box observation derived from define_observation_space(), gymnasium 5-tuple from step/step_raw. - Fixed (single-value) params are injected on every step so agents only ever see the tunable subset; step_raw bypasses index decoding and fixed injection for callers that already have a fully-formed parameter dict. - The adapter mirrors handle_dse_job's 1-based test_run.step so artifact paths match whether the env is driven by the DSE loop or an external training loop. - CloudAIGymEnv.define_observation_space() now returns one slot per agent metric (at least one), giving adapters the correct Box shape. - gymnasium is an optional dependency: lazy-imported inside the adapter and exposed via a new rl extra (also added to dev) pinned to ~=1.2.
decode_action used to accept partial action dicts and step_raw accepted arbitrary key sets, both silently propagating incomplete params to the underlying env. Validate up front: - decode_action requires keys exactly == self._tunable_params, and each discrete index must be in range. - step_raw requires keys exactly == self._tunable_params | self._fixed_params. A new _assert_keys helper raises ValueError with sorted missing / extra lists so callers see exactly what was wrong. Adds tests for missing key, unknown key, out-of-range index, missing fixed param, and unknown raw key paths.
9937ef3 to
78b1bb3
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 79-82: The constraint-failure branch returns a hardcoded
single-element observation that violates the declared observation shape; update
the branch to return the same-width list as the normal case by using the
metric-derived width (use max(len(self.test_run.test.agent_metrics), 1)) instead
of a single 0.0 so that both the regular return (currently using [0.0] *
max(len(self.test_run.test.agent_metrics), 1)) and the constraint-failure branch
produce identically shaped observations.
In `@tests/test_gymnasium_adapter.py`:
- Around line 209-215: The tests assert on ValueError messages using exact list
formatting (e.g., match=r"missing=\['param_b'\]"), which is brittle; update the
assertions in test_decode_action_missing_key and
test_decode_action_rejects_unknown_keys to use more focused regexes that only
check for the presence of the key names (e.g., look for "param_b" and "bogus")
or words like "missing" / "extra" rather than exact list punctuation; adjust the
two places mentioned (the current block around adapter.decode_action({"param_a":
0}) and the block for adapter.decode_action({"param_a": 0, "param_b": 1,
"bogus": 0}) and the similar assertions at lines 226-234) to use these looser
key-focused regexes so tests validate behavior without depending on formatting.
- Around line 131-136: Remove the test's assertion on the private attribute
adapter._fixed_params and instead verify public behavior: keep the assertion
that adapter.action_space.spaces equals {"param_a","param_b"}, then call
GymnasiumAdapter.step (using the _FixedParamGym instance which records received
actions) with an action containing only "param_a"/"param_b" and assert the
environment received/processed an action that includes the fixed parameter (e.g.
the recorded last_action or returned observation/info shows "fixed_param": 42);
this exercises action-space exclusion plus the merged action behavior without
relying on the private _fixed_params field.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: bf1b3854-aa01-4f65-8aeb-7699901ac890
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
pyproject.tomlsrc/cloudai/configurator/__init__.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/configurator/gymnasium_adapter.pytests/test_gymnasium_adapter.py
| list: One float slot per agent metric (at least one), giving the correct shape | ||
| for adapters that derive ``gymnasium.spaces.Box`` from this output. | ||
| """ | ||
| return [0.0] | ||
| return [0.0] * max(len(self.test_run.test.agent_metrics), 1) |
There was a problem hiding this comment.
Return constraint-failure observations with the declared shape.
Line 82 makes observation width metric-driven, but the constraint-failure branch still returns a hardcoded single-element observation (Line 143). That breaks the fixed observation-shape contract expected by Gymnasium consumers.
Suggested fix
- return [-1.0], self.rewards.constraint_failure, True, {}
+ fallback_obs = [-1.0] * len(self.define_observation_space())
+ return fallback_obs, self.rewards.constraint_failure, True, {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cloudai/configurator/cloudai_gym.py` around lines 79 - 82, The
constraint-failure branch returns a hardcoded single-element observation that
violates the declared observation shape; update the branch to return the
same-width list as the normal case by using the metric-derived width (use
max(len(self.test_run.test.agent_metrics), 1)) instead of a single 0.0 so that
both the regular return (currently using [0.0] *
max(len(self.test_run.test.agent_metrics), 1)) and the constraint-failure branch
produce identically shaped observations.
| def test_single_value_params_are_excluded_from_action_space() -> None: | ||
| adapter = GymnasiumAdapter(_FixedParamGym()) | ||
|
|
||
| assert set(adapter.action_space.spaces) == {"param_a", "param_b"} | ||
| assert adapter._fixed_params == {"fixed_param": 42} | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Avoid asserting private adapter internals in tests.
Line 135 couples the test to adapter._fixed_params, which is implementation detail. Assert only public behavior (action-space exclusion + merged action on step) to keep tests stable across refactors.
Suggested diff
def test_single_value_params_are_excluded_from_action_space() -> None:
adapter = GymnasiumAdapter(_FixedParamGym())
assert set(adapter.action_space.spaces) == {"param_a", "param_b"}
- assert adapter._fixed_params == {"fixed_param": 42}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_single_value_params_are_excluded_from_action_space() -> None: | |
| adapter = GymnasiumAdapter(_FixedParamGym()) | |
| assert set(adapter.action_space.spaces) == {"param_a", "param_b"} | |
| assert adapter._fixed_params == {"fixed_param": 42} | |
| def test_single_value_params_are_excluded_from_action_space() -> None: | |
| adapter = GymnasiumAdapter(_FixedParamGym()) | |
| assert set(adapter.action_space.spaces) == {"param_a", "param_b"} | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_gymnasium_adapter.py` around lines 131 - 136, Remove the test's
assertion on the private attribute adapter._fixed_params and instead verify
public behavior: keep the assertion that adapter.action_space.spaces equals
{"param_a","param_b"}, then call GymnasiumAdapter.step (using the _FixedParamGym
instance which records received actions) with an action containing only
"param_a"/"param_b" and assert the environment received/processed an action that
includes the fixed parameter (e.g. the recorded last_action or returned
observation/info shows "fixed_param": 42); this exercises action-space exclusion
plus the merged action behavior without relying on the private _fixed_params
field.
| with pytest.raises(ValueError, match=r"missing=\['param_b'\]"): | ||
| adapter.decode_action({"param_a": 0}) | ||
|
|
||
|
|
||
| def test_decode_action_rejects_unknown_keys(adapter: GymnasiumAdapter) -> None: | ||
| with pytest.raises(ValueError, match=r"extra=\['bogus'\]"): | ||
| adapter.decode_action({"param_a": 0, "param_b": 1, "bogus": 0}) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Relax exception-message matching to avoid brittle formatting dependencies.
These assertions currently depend on exact list representation/order in error strings. Prefer key-focused regex so tests validate behavior, not message formatting details.
Suggested diff
def test_decode_action_rejects_missing_keys(adapter: GymnasiumAdapter) -> None:
- with pytest.raises(ValueError, match=r"missing=\['param_b'\]"):
+ with pytest.raises(ValueError, match=r"missing=.*param_b"):
adapter.decode_action({"param_a": 0})
def test_decode_action_rejects_unknown_keys(adapter: GymnasiumAdapter) -> None:
- with pytest.raises(ValueError, match=r"extra=\['bogus'\]"):
+ with pytest.raises(ValueError, match=r"extra=.*bogus"):
adapter.decode_action({"param_a": 0, "param_b": 1, "bogus": 0})
@@
def test_step_raw_rejects_missing_fixed_param() -> None:
@@
- with pytest.raises(ValueError, match=r"missing=\['fixed_param'\]"):
+ with pytest.raises(ValueError, match=r"missing=.*fixed_param"):
adapter.step_raw({"param_a": 1, "param_b": 10})
@@
def test_step_raw_rejects_unknown_keys() -> None:
@@
- with pytest.raises(ValueError, match=r"extra=\['bogus'\]"):
+ with pytest.raises(ValueError, match=r"extra=.*bogus"):
adapter.step_raw({"param_a": 1, "param_b": 10, "fixed_param": 42, "bogus": 0})Also applies to: 226-234
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_gymnasium_adapter.py` around lines 209 - 215, The tests assert on
ValueError messages using exact list formatting (e.g.,
match=r"missing=\['param_b'\]"), which is brittle; update the assertions in
test_decode_action_missing_key and test_decode_action_rejects_unknown_keys to
use more focused regexes that only check for the presence of the key names
(e.g., look for "param_b" and "bogus") or words like "missing" / "extra" rather
than exact list punctuation; adjust the two places mentioned (the current block
around adapter.decode_action({"param_a": 0}) and the block for
adapter.decode_action({"param_a": 0, "param_b": 1, "bogus": 0}) and the similar
assertions at lines 226-234) to use these looser key-focused regexes so tests
validate behavior without depending on formatting.
|
|
||
| from .base_gym import BaseGym | ||
|
|
||
| _GYMNASIUM_INSTALL_HINT = "gymnasium is required for GymnasiumAdapter. Install it with: pip install gymnasium" |
There was a problem hiding this comment.
| _GYMNASIUM_INSTALL_HINT = "gymnasium is required for GymnasiumAdapter. Install it with: pip install gymnasium" | |
| _GYMNASIUM_INSTALL_HINT = "gymnasium is required for GymnasiumAdapter. Install it with: uv sync --extra rl" |
| _GYMNASIUM_INSTALL_HINT = "gymnasium is required for GymnasiumAdapter. Install it with: pip install gymnasium" | ||
|
|
||
|
|
||
| def _import_gymnasium(): |
There was a problem hiding this comment.
please introduce/move it into src/cloudai/util/lazy_imports.py. numpy is already there, just add gymnasium and maybe optionally spaces as a separate property
| def __init__(self, env: BaseGym) -> None: | ||
| _, spaces, np = _import_gymnasium() | ||
|
|
||
| self._np = np |
There was a problem hiding this comment.
please do not keep module as a variable, use it when needed from lazy.np.<smth> with from cloudai.util.lazy_imports import lazy
| """ | ||
| test_run = getattr(self._env, "test_run", None) | ||
| if test_run is not None: | ||
| test_run.step = self._step_count + 1 |
There was a problem hiding this comment.
who's responsible for steps budget now? for standard argents it's handle_dse_job loop with agent.max_steps
There was a problem hiding this comment.
apparently this is the only places that anyhow touches cloudai internals. if we solve it somehow - I suggest the adapter is to be put not in cloudai but in the private repo.
if the only consumer is our private RL agents, I’d prefer this live with those agents, while CloudAI exposes only the minimal env/execution hooks they need. Moving it to core makes Gymnasium compatibility look like a supported CloudAI API, but we do not yet have the integration tests or API ownership story for that
GymnasiumAdapterwraps a CloudAIBaseGymas agymnasium-compatible env:DictofDiscreteactions over the tunable params,float32Boxobservation derived fromdefine_observation_space(), gymnasium 5-tuple fromstep/step_raw.step_rawbypasses index decoding and fixed injection for callers that already have a fully-formed parameter dict.handle_dse_job's 1-basedtest_run.stepso artifact paths match whether the env is driven by the DSE loop or by an external training loop wrapping the adapter.CloudAIGymEnv.define_observation_space()now returns one slot per agent metric (at least one), giving adapters the correctBoxshape.gymnasiumis an optional dependency: lazy-imported inside the adapter and exposed via a newrlextra (also added todev) pinned to~=1.2.