Skip to content

feat: v2.0.0-rc3 — fix audit findings (finalizer, state_overrides, idempotency + 6 important)#41

Merged
aksOps merged 1 commit into
mainfrom
feat/v2-rc4-audit-fixes
May 16, 2026
Merged

feat: v2.0.0-rc3 — fix audit findings (finalizer, state_overrides, idempotency + 6 important)#41
aksOps merged 1 commit into
mainfrom
feat/v2-rc4-audit-fixes

Conversation

@aksOps
Copy link
Copy Markdown
Contributor

@aksOps aksOps commented May 16, 2026

Findings Fixed

  1. CRITICAL refactor: prompt-vs-code remediation — code as authority, not the prompt #1 - Background and non-streaming runs now finalize terminal status.

    • src/runtime/orchestrator.py:1295
    • src/runtime/service.py:563
    • tests/test_finalizer_paths.py:107
  2. CRITICAL refactor: framework de-coupling — generic runtime, ASR as use case (v1.1) #2 - state_overrides persist beyond environment.

    • src/runtime/storage/session_store.py:208
    • src/runtime/orchestrator.py:1263
    • src/runtime/service.py:425
    • examples/code_review/state.py:22
    • tests/test_state_overrides_schema.py:334
  3. CRITICAL v1.2: Framework Owns Flow Control (FOC-01..FOC-06) #3 - Webhook idempotency uses atomic reservation before starting sessions.

    • src/runtime/triggers/idempotency.py:188
    • src/runtime/triggers/registry.py:258
    • tests/test_triggers/test_registry.py:134
  4. IMPORTANT refactor: prompt-vs-code remediation — code as authority, not the prompt #1 - Recent-events SSE emits React-compatible session summary payloads.

    • src/runtime/api_recent_events.py:27
    • tests/test_api_recent_events.py:15
  5. IMPORTANT refactor: framework de-coupling — generic runtime, ASR as use case (v1.1) #2 - Backend emits session.agent_running start and terminal clear events.

    • src/runtime/graph.py:748
    • src/runtime/orchestrator.py:307
    • tests/test_agent_node.py:80
    • tests/test_status_change_telemetry.py:103
  6. IMPORTANT v1.2: Framework Owns Flow Control (FOC-01..FOC-06) #3 - Per-session reducer handles real backend delta shapes.

    • web/src/state/sessionReducer.ts:84
    • web/tests/unit/sessionReducer.test.ts:46
  7. IMPORTANT v1.3: Hardening + Real-LLM Compatibility (HARD-01..09 + LLM-COMPAT-01 + BUNDLER-01 + SKILL-LINTER-01) #4 - Vector write-through failures no longer bubble after SQL commit.

    • src/runtime/storage/session_store.py:449
    • tests/test_storage_vector.py:98
  8. IMPORTANT v1.4: per-step telemetry + auto-learning intake + React-ready API #5 - Generic session start accepts state_overrides and rejects environment conflicts.

    • src/runtime/api.py:123
    • src/runtime/api.py:533
    • web/src/api/types.ts:67
    • web/src/modals/NewSessionModal.tsx:4
    • tests/test_api.py:222
  9. IMPORTANT fix(hitl): HITL approve/reject end-to-end + gpt-oss markdown reliability #6 - TS/Python literal drift is corrected and guarded by parity tests.

    • src/runtime/state.py:25
    • web/src/api/types.ts:8
    • tests/test_type_literal_parity.py:45
  10. STYLE refactor: prompt-vs-code remediation — code as authority, not the prompt #1 - CORS config is wired through cfg.api.

    • src/runtime/config.py:783
    • src/runtime/api.py:366
  11. STYLE refactor: framework de-coupling — generic runtime, ASR as use case (v1.1) #2 - Unknown-route envelope test now probes an API path.

    • tests/test_api_react_surface.py:487
  12. STYLE v1.2: Framework Owns Flow Control (FOC-01..FOC-06) #3 - Web test warning noise was not addressed in this PR.

    • Follow-up: Radix Dialog Description, React act(...), Select read-only, and NO_COLOR warning cleanup.

Verification

  • ASR_WEB_DIST=/tmp/empty uv run pytest -x --no-cov - 1345 passed, 8 skipped
  • uv run ruff check src/ tests/ - passed
  • uv run pyright src/runtime - 0 errors
  • cd web && npm run typecheck && npm run lint && npm run test:unit && npm run build && npm run check:size - passed
  • uv run python scripts/build_single_file.py - regenerated dist/ bundles

Notes

  • Left pre-existing local config/config.yaml and .superpowers/ unstaged.
  • Web lint still reports two existing Fast Refresh warnings in web/src/state/selectedRef.tsx; tests still emit the known dialog/act/select warning noise.

@sonarqubecloud
Copy link
Copy Markdown

@aksOps aksOps merged commit 8be7ea2 into main May 16, 2026
10 checks passed
@aksOps aksOps deleted the feat/v2-rc4-audit-fixes branch May 16, 2026 16:17
aksOps added a commit that referenced this pull request May 16, 2026
#45)

The v2.0.0-rc3 finalizer-asymmetry fix (PR #41) correctly handled
graph-completed-without-pause: every non-streaming run now calls
_finalize_session_status_async() to flip the row to a terminal status.

But the *paused* branch was left as a status-write no-op: when the
graph paused at a HITL gate, the session row stayed at its pre-pause
status ('new' / 'in_progress') instead of transitioning to
'awaiting_input'. UIs that filter by status='awaiting_input' (the
approvals queue, the sessions rail Active group, /sessions in-flight
listing) missed paused sessions entirely.

This was the sibling case of CRITICAL #1 and was filed as #42 after
the rc3 ship.

The fix adds a new lock-guarded helper that mirrors
_finalize_session_status_async on the paused side:

    Orchestrator._mark_session_paused_async(session_id) -> str | None

It loads the row, captures the from-status, flips status to
'awaiting_input', saves, and emits both status_changed (per-session
event log) and session.status_changed (cross-session SSE) events via
the existing _emit_status_changed_event helper. It is a no-op when
the row is already at 'awaiting_input', already at a terminal status
(guard against a late paused-write unwinding a finalize that landed
in between), or the row is missing.

Four call sites updated to use it: the three start_session +
stream_session + retry_session paths in orchestrator.py, and the
OrchestratorService._run() background task in service.py.

Regression coverage in tests/test_finalizer_paths.py:
  - _PausedAtGateGraph fake graph that returns from ainvoke and
    reports a non-empty `next` tuple to aget_state (mirrors langgraph
    1.x's interrupt-boundary state).
  - test_orchestrator_start_session_marks_paused_run_awaiting_input:
    direct (non-streaming) path produces status='awaiting_input' on
    the row AND emits status_changed + session.status_changed events.
  - test_service_background_start_session_marks_paused_run_awaiting_input:
    same guarantee through the OrchestratorService._run() task.
  - test_mark_session_paused_is_no_op_when_already_awaiting_input:
    no spurious event emission.
  - test_mark_session_paused_is_no_op_on_terminal_status: guard
    against a late paused-write unwinding a completed terminal flip.

Two pre-existing shim tests updated to stub the new method:
  - tests/test_retry_session_locked_post_policy.py _StubOrch
  - tests/test_triggers/test_orchestrator_trigger_kwarg.py
    (orchestrator built via __new__, bypasses __init__)

Verified:
  - ASR_WEB_DIST=/tmp/empty uv run pytest --no-cov -> 1349 passed, 8 skipped
  - uv run ruff check src/ tests/ -> clean
  - uv run pyright src/runtime -> clean
  - cd web && npm run typecheck/lint/test:unit/build/check:size -> green
  - dist/* regenerated (HARD-08)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant