Skip to content

bug: paused-at-gate sessions stuck at status='new' instead of transitioning to 'awaiting_input' #42

@aksOps

Description

@aksOps

Summary

When a session pauses at a HITL gate (gate_fired event), the session row's status column stays at 'new' instead of transitioning to 'awaiting_input'. The graph IS paused correctly, but the persisted status doesn't reflect that — UIs that filter by status='awaiting_input' (like the approvals queue) miss it.

This is the sibling case of CRITICAL #1 (finalizer asymmetry) closed in v2.0.0-rc3 / PR #41. The rc3 fix correctly handles graph-completed-without-pause via _finalize_session_status_async(). The paused branch was left untouched and the status-write that should accompany gate_fired is missing.

Reproduction (live, against rc3 backend on main @ 8be7ea2)

# Backend on uvicorn :37777 serving SPA + API, Ollama gpt-oss provider.
SID=$(curl -sf -X POST https://clm.randomcodespace.dev/api/v1/sessions \
  -H "Content-Type: application/json" \
  -d '{"query":"rc3 agent_running smoke","environment":"dev","submitter":{"id":"t"}}' \
  | jq -r .session_id)
# Wait ~15s for the LLM to drive the graph through to the gate.
curl -sf "https://clm.randomcodespace.dev/api/v1/sessions/$SID/full" | jq '.session.status, [.events[] | select(.kind == "gate_fired")] | length'

Observed for INC-20260516-061:

  • session.status'new'
  • events[].kind == 'gate_fired' → 1 occurrence (local_remediation:apply_fix, reason high_risk)
  • agents_run → 2 entries (triage, deep_investigator), resolution started but didn't finish
  • Event log clearly shows the graph paused at the gate

INC-20260516-058 (from earlier today, pre-PR #41 backend) has the same symptom — this is pre-existing, not introduced by rc3.

Expected behavior

When the gateway emits gate_fired, the session row's status should be updated to 'awaiting_input' (and session.status_changed event with to='awaiting_input' should fire).

Suggested fix

Likely lives next to where gate_fired is emitted — either in src/runtime/tools/gateway.py (where the gate decision lands) or where the graph reads the gate signal in src/runtime/graph.py. Add a paired event_log.record(sid, "session.status_changed", payload={"from": ..., "to": "awaiting_input"}) AND a SessionStore.update_status(sid, "awaiting_input") write, both in the same transaction as the audit row.

Reuse the same async-finalizer machinery from rc3 (_finalize_session_status_async) but with the paused branch taken explicitly: when is_graph_paused returns True after ainvoke, set status to awaiting_input.

Regression test

In tests/test_finalizer_paths.py add a case where the fake graph pauses at a gate (interrupt). Assert:

  1. store.load(sid).status == 'awaiting_input'
  2. event_log.iter_for(sid) contains a session.status_changed with to='awaiting_input'

Touches v2.1 scope

This is the obvious next item after the v2.0.0-rc3 audit fixes. Tagging for the v2.0.0-rc4 / v2.0.0 GA cleanup pass.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions