Skip to content

fix(0.6.1): pipeline firstMessage barge-in aborts TTS stream + unblocks LLM#100

Open
nicolotognoni wants to merge 1 commit into
feat/observability-otel-attrs-0.6.1from
fix/0.6.1-pipeline-firstmessage-bargein-tts-reset
Open

fix(0.6.1): pipeline firstMessage barge-in aborts TTS stream + unblocks LLM#100
nicolotognoni wants to merge 1 commit into
feat/observability-otel-attrs-0.6.1from
fix/0.6.1-pipeline-firstmessage-bargein-tts-reset

Conversation

@nicolotognoni
Copy link
Copy Markdown
Collaborator

Summary

  • Fixes a critical pipeline-mode bug where interrupting the agent during the firstMessage left the call silent: cancelSpeaking flipped isSpeaking=false but the for await over tts.synthesizeStream stayed suspended on the provider WS until FRAME_TIMEOUT_MS (30 s on ElevenLabs WS), so no subsequent LLM turn ever dispatched even though Deepgram kept transcribing.
  • Adds an abort signal raced against the iterator's next() / __anext__ plus an optional TTSAdapter.cancel() hook that closes the in-flight provider WS, so the firstMessage loop exits within one event-loop tick on barge-in.
  • Full parity Python ↔ TypeScript: firstMessageAbort / _first_message_abort, activeSocket / _active_socket, cancel() on both ElevenLabsWebSocketTTS implementations.

Implementation

  • libraries/typescript/src/stream-handler.ts
    • New private field firstMessageAbort: AbortController | null.
    • cancelSpeaking() now also calls firstMessageAbort.abort().
    • Replaced the firstMessage for await with a manual async-iterator loop racing iter.next() against the abort signal; on abort calls this.tts?.cancel?.() then awaits iter.return() to drive the generator's finally.
  • libraries/typescript/src/provider-factory.ts — added optional cancel?() method on TTSAdapter.
  • libraries/typescript/src/providers/elevenlabs-ws-tts.ts — track activeSocket across synthesizeStream, add cancel() that closes it, fold cancel() into close().
  • libraries/python/getpatter/stream_handler.py
    • New field self._first_message_abort: asyncio.Event | None.
    • _do_cancel_for_barge_in sets the abort event and calls tts.cancel() when the adapter exposes it.
    • Replaced the firstMessage async for with a manual asyncio.wait race of __anext__ against fm_abort.wait(); agen.aclose() always runs in finally.
  • libraries/python/getpatter/providers/elevenlabs_ws_tts.py — track self._active_socket across synthesize, add cancel() that schedules ws.close() on the loop, catch ConnectionClosed in the recv loop as end-of-stream.
  • Regression tests added in both SDKs: _handle_barge_in sets abort event + invokes tts.cancel when available, plus a standalone iterator-race test that proves a stalled generator unblocks within one tick of (abort + adapter.cancel).
  • CHANGELOG.md ## 0.6.1 ### Fixed entry.

Breaking change?

No. The new TTSAdapter.cancel() is optional — existing adapters without it are unaffected (the stream handler null-checks before calling). All existing public API shapes are preserved.

Test plan

  • Python: pytest tests/ — 1845 passed, 7 skipped, 0 failed.
  • TypeScript: npx vitest run — 1518 passed, 0 failed.
  • TypeScript: npm run lint (tsc --noEmit) — clean.
  • TypeScript: npx tsup build — success.
  • Manual: place an outbound pipeline-mode call with Deepgram STT + Cerebras LLM + ElevenLabsWebSocketTTS, interrupt the agent ~500 ms into the firstMessage, confirm the next user turn dispatches an LLM response (no 30 s silence, no FRAME_TIMEOUT_MS error).

Docs updates

N/A — internal fix, no public API surface change beyond the optional cancel() hook (which is documented inline in the TTSAdapter interface).

…ks LLM

When the user interrupted the agent during the firstMessage in pipeline mode
(Deepgram STT + LLM + ElevenLabs WS TTS), the existing barge-in cancel flipped
``isSpeaking`` / ``_is_speaking`` to ``False`` but the ``for await`` / ``async for``
consuming ``tts.synthesizeStream`` / ``tts.synthesize`` stayed suspended on the
next-frame wait (``ws.recv()``). The check at the top of the loop body never
re-ran, the provider WS sat idle until ``FRAME_TIMEOUT_MS`` (30 s on ElevenLabs
WS TTS), and the "speaking lock" was never released — subsequent Deepgram
finals were captured but the LLM dispatch path never fired, leaving the call
silent for the user.

Fix (parity Py/TS):

* Add ``firstMessageAbort`` (TS, ``AbortController``) / ``_first_message_abort``
  (Py, ``asyncio.Event``) raced against the iterator's ``next()`` / ``__anext__``.
* Add an optional ``cancel()`` hook on the TTS adapter interface. Implementation
  in ``ElevenLabsWebSocketTTS`` (both SDKs) closes the in-flight WS
  (``activeSocket`` / ``_active_socket``) so the next-frame wait unblocks via
  ``ConnectionClosed`` within one event-loop tick.
* Use the manual iterator protocol in the firstMessage loop so we can race
  ``next()`` with the abort signal and call ``iter.return()`` / ``agen.aclose()``
  on abort, ensuring the generator's ``finally`` runs and closes the WS.
* Regression tests in both SDKs: standalone iterator race + ``_handle_barge_in``
  must invoke ``tts.cancel`` when available and set the abort event.

CHANGELOG entry added under ``## 0.6.1 (2026-05-12)``.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant