Skip to content

diag(0.6.1): Realtime firstMessage lockout — instrumentation + finding#99

Open
nicolotognoni wants to merge 18 commits into
mainfrom
diag/0.6.1-bug1-followup
Open

diag(0.6.1): Realtime firstMessage lockout — instrumentation + finding#99
nicolotognoni wants to merge 18 commits into
mainfrom
diag/0.6.1-bug1-followup

Conversation

@nicolotognoni
Copy link
Copy Markdown
Collaborator

Summary

PR #95 (commit cc9e51b) shipped a server-VAD lockout for the OpenAI Realtime firstMessage on the prewarm-adopted path: a session.update {turn_detection: null} armed immediately before response.create, restored on the matching response.done. A subsequent live outbound test on Realtime (prewarm enabled, source=adopted ms=0) still produced the original symptom — the agent did not deliver the scripted opening when the caller spoke first.

This PR is diagnostics-only, no behaviour change. It adds [DIAG-VAD] INFO-level instrumentation around the four lockout checkpoints in both Python and TypeScript SDKs so the next live test produces a deterministic root-cause trace.

Investigation so far

Hypothesis Verified Status
H3: stale npm-link / dist cache (live test used pre-fix bundle) Symlink + dist mtime + fix string at chunk-D4XCC4FF.mjs:603 Ruled out
H4: sendFirstMessage never reached on adopt path instanceof OpenAIRealtimeAdapter holds after adoptWebSocket (it's a method on the same class); stream-handler.ts:2535 enters the branch Ruled out
H1: server processes response.create before the session.update to turn_detection: null propagates to the VAD subsystem Needs the [DIAG-VAD] echo-ordering trace from a live test Probable
H2: server silently ignores turn_detection: null (reported by OpenAI community) Needs the inbound session.updated turn_detection=... echo to be checked Probable

Community evidence for H2: a forum thread reports turn_detection: null in session.update is silently ignored — defaults persist. A nested form audio.input.turn_detection: null is offered as a workaround. Separately, OpenAI documents turn_detection.interrupt_response: false + turn_detection.create_response: false as a less aggressive alternative to disabling VAD outright; this would keep VAD detection on but prevent it from cancelling our in-flight response.create.

Instrumentation

Six [DIAG-VAD] log lines added (parity across Py / TS):

  1. [DIAG-VAD] sent session.update turn_detection=null (lockout arm) — at the send site.
  2. [DIAG-VAD] sent response.create (firstMessage) — at the next send.
  3. [DIAG-VAD] session.created|session.updated turn_detection=... — echoes the server's acknowledged turn_detection.
  4. [DIAG-VAD] speech_started fired DURING lockout — server-VAD still active! — fires only if firstMessageProtectionPending is true at the time.
  5. [DIAG-VAD] response.cancelled|response.canceled (firstMessageProtectionPending=...) — server cancellation observed; pending=true means the firstMessage was silently dropped.
  6. [DIAG-VAD] response.done received during lockout — restoring turn_detection + sent session.update turn_detection=<saved> (restore) — success path.

The smoking gun for H1: line 3's echo arrives AFTER line 2 (in wall-clock order). The smoking gun for H2: line 3 never shows turn_detection=null even after line 1 was sent, OR line 4 fires while lockout is pending.

Implementation

  • Pure logging additions. Zero behavioural change. Zero new dependencies.
  • Mirrors byte-for-byte across libraries/python/getpatter/providers/openai_realtime.py and libraries/typescript/src/providers/openai-realtime.ts.
  • TypeScript build re-emits the dist with the new INFO lines (verified: grep "DIAG-VAD" dist/index.js returns 8 hits).
  • Existing unit tests confirm the logs fire correctly in the lockout sequence (openai-realtime.test.ts: 30 passed; test_providers_io_unit.py: 98 passed).

Breaking change?

No. INFO-level logs only.

Next-test playbook for whoever picks this up

  1. cd libraries/typescript && npm run build (already done on this branch).
  2. Pick up the acceptance package (the symlink at releases/0.6.1/typescript/node_modules/getpatter already points at the SDK source).
  3. Place an outbound call where the callee answers and immediately says "Hello?" — the canonical repro.
  4. Grep the server log for [DIAG-VAD] and look at chronological order:
    • If line 3 arrives BEFORE line 2 → H1 (race) is ruled out.
    • If line 3 echoes turn_detection=null → H2 (silent ignore) is ruled out.
    • If line 4 fires while line 6 has not yet → server-VAD is active despite the lockout (H2 confirmed).
    • If line 5 fires with pending=true → firstMessage was silently dropped (root cause confirmed).
  5. Based on the trace, the next PR is one of:
    • H1 fix: await session.updated ack before sending response.create. Adds ~1 server-round-trip latency to firstMessage — acceptable on the adopt path because we already saved ~250-450 ms by not paying the cold-connect handshake.
    • H2 fix (option A): try turn_detection: {type: 'server_vad', interrupt_response: false, create_response: false} instead of null (per OpenAI VAD guide).
    • H2 fix (option B): try the nested payload audio: {input: {turn_detection: null}} (per community workaround).
    • Belt-and-suspenders: client-side audio gate on OpenAIRealtimeAdapter.sendAudio — drop inbound audio frames for ~400 ms after sendFirstMessage is invoked. Cheap, doesn't depend on server cooperation, but adds local complexity.

The [DIAG-VAD] lines MUST be removed once the root cause is pinned down — they're not safe to ship indefinitely.

Test plan

  • Python: cd libraries/python && python3 -m pytest tests/unit/test_providers_io_unit.py -q (98 passing)
  • TypeScript: cd libraries/typescript && npx vitest run tests/unit/openai-realtime.test.ts (30 passing) + npm run lint + npm run build
  • [DIAG-VAD] strings present in dist/index.js (8 hits) and chunk-XQ5ROISS.mjs (8 hits)
  • Live outbound Realtime test with prewarm=default — produces the trace required to pick the root-cause fix

Docs updates

N/A — the only doc surface touched is the repo-root CHANGELOG.md ## Unreleased section.

nicolotognoni and others added 18 commits May 9, 2026 00:10
…6.1 release

Ports the observability work from the now-closed PR #82 onto the
post-refactor `libraries/python/` layout. PR #82 was authored against
the legacy `sdk-py/` paths and was consolidated into the 0.6.0 release
branch; this commit lands the actual implementation against the new
layout for 0.6.1.

What it adds:

- `getpatter.observability.attributes` — three new helpers:
  `record_patter_attrs(attrs)`, `patter_call_scope(call_id, side)`
  context manager, `attach_span_exporter(patter, exporter, side)`.
  Lazy-OTel-guarded; no-op when the `[tracing]` extra is not installed.
  Two ContextVars (`patter.call_id`, `patter.side`) propagate through
  the asyncio task tree so spans emitted by deeply nested provider
  code inherit the active call's identity automatically.
- `Patter._attach_span_exporter(exporter, *, side="uut")` — public-but-
  underscore hook for tools that observe Patter from outside (e.g. an
  out-of-process agent runner).
- Per-provider cost emission across 19 surfaces: `patter.cost.{
  telephony_minutes, stt_seconds, tts_chars, llm_input_tokens,
  llm_output_tokens, realtime_minutes}` stamped on the active span.
  Provider tag emitted alongside as `patter.{telephony,stt,tts,llm,
  realtime}.provider`. All call sites wrapped in defensive try/except
  so observability cannot kill a live call.
- Per-turn latency: `patter.latency.{ttfb_ms, turn_ms}` stamped from
  `StreamHandler._emit_turn_metrics` via a new
  `PipelineHookExecutor.record_turn_latency(*, ttfb_ms, turn_ms)`.
- Bridge-level `patter_call_scope` entry on Twilio + Telnyx — entire
  WebSocket bridge lifetime (incl. hangup/cleanup) bound to the call
  identity via `contextlib.ExitStack`.
- `TwilioAdapter.record_call_end_cost` /
  `TelnyxAdapter.record_call_end_cost` — adapter helpers used by the
  bridge to emit `patter.cost.telephony_minutes` once wall-clock
  duration is known.

Versions bumped 0.6.0 → 0.6.1 in `__init__.py`, `pyproject.toml`,
`package.json`. CHANGELOG entry added under a new `## 0.6.1
(2026-05-09)` block; the existing `## 0.6.0 (2026-05-08)` block is
preserved verbatim — it reflects exactly what was published to PyPI
and npm at that tag.

⚠️ TS parity gap: Python only. TypeScript follow-up tracked separately.
This is a known time-boxed exception per `.claude/rules/sdk-parity.md`.

5 new unit tests in `libraries/python/tests/unit/
test_observability_attributes_unit.py` exercise the helper module's
public surface (`patter_call_scope`, `record_patter_attrs` no-op,
`attach_span_exporter` side stamping). Full Python suite: 1719 passed,
7 skipped — green.

Refs: closed PR #82.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…in gate from first audio

Two bugs caught during 0.6.0 acceptance against
`releases/0.6.0/typescript/matrix/outbound-cartesia-cerebras-elevenlabs.ts`:

1. **Dashboard hydrate schema mismatch**: `CallLogger.log_call_end` writes
   `cost`/`latency`/`duration_ms`/`telephony_provider` as top-level keys of
   `metadata.json`, but `MetricsStore.hydrate` looked for them under
   `meta.metrics.cost`/`meta.metrics.latency`. Every hydrated row landed
   with `metrics=null`, so cost/latency rendered as `$0.00`/`—` for all
   on-disk calls (only the in-flight call had real numbers). Fix synthesizes
   a `metrics` dict from the top-level fields when `meta.metrics` is absent
   while preserving any explicit `meta.metrics` payload untouched.

2. **Early barge-in self-cancellation**: cloud TTS first-byte latency is
   200–700 ms; the 250 ms anti-flicker gate (no-AEC PSTN default) was
   anchored on `_speaking_started_at`/`speakingStartedAt` and expired
   BEFORE TTS produced audio. VAD then picked up background noise and
   self-cancelled the agent's first turn — 0 bytes emitted, line silent.
   Fix anchors the gate on a new `_first_audio_sent_at`/`firstAudioSentAt`
   set AFTER `bridge.sendAudio` / `audio_sender.send_audio` succeeds at
   the four pipeline emit sites (firstMessage, streaming, regular,
   WebSocket remote). `_can_barge_in`/`canBargeIn` returns false while
   the marker is null. Gate values (250 ms / 1000 ms) unchanged — only
   the anchor moves.

Tests:
- Py 1717/1717, TS 1394/1394 green; lint clean.
- New regressions: `test_hydrate_lifts_top_level_cost_and_latency_into_metrics`,
  `test_hydrate_preserves_explicit_metrics_when_present`,
  `test_barge_in_suppressed_before_first_audio_emitted` (Py) +
  parity TS cases in `tests/dashboard-store.test.ts` and
  `tests/unit/stream-handler.test.ts`.
- Existing `_handle_barge_in`/`handleBargeIn` tests updated to set both
  timestamps for the new contract.
Cloud TTS first-byte latency (200-700 ms) plus PSTN background noise
mean the legacy "any VAD speech_start cancels the agent" contract
produced frequent false-positive cancels — cough, click, HVAC, breath,
or a quick "okay" cut the agent mid-sentence and lost the
conversational thread.

This PR adds an opt-in two-stage confirmation pipeline. With the new
empty-tuple default behaviour is unchanged. Configure
``Agent.barge_in_strategies`` / ``agent.bargeInStrategies`` to enable:

  1. VAD speech_start during TTS marks the barge-in PENDING. TTS keeps
     streaming naturally — the LLM stream stays alive.
  2. Each STT transcript is evaluated by every configured strategy
     (short-circuit OR; per-strategy errors are isolated).
  3. First strategy that returns True confirms the cancel: runs the
     existing send_clear + flush ring + LLM abort sequence.
  4. If no strategy confirms within ``barge_in_confirm_ms``
     (default 1500 ms) the pending state is dropped and the agent
     finishes its sentence.

New module ``getpatter.services.barge_in_strategies`` exposes:
  - ``BargeInStrategy`` Protocol (async ``evaluate`` + optional ``reset``)
  - ``MinWordsStrategy`` — filters short backchannels by requiring N
    words while the agent is speaking and letting any single word
    through while the agent is silent (so the first user turn is
    never delayed).
  - ``evaluate_strategies`` / ``reset_strategies`` helpers.

TS parity in ``src/services/barge-in-strategies.ts`` with the same
public surface (``MinWordsStrategy``, ``BargeInStrategy`` interface,
``evaluateStrategies``/``resetStrategies``).

Wiring lives in stream_handler.py ``_handle_barge_in`` and
stream-handler.ts ``handleBargeIn`` — both keep the existing
canBargeIn gate (firstAudioSentAt anchor) and only add the strategy
check when at least one strategy is configured.

Tests:
- Py: 1741/1741 green; new ``test_barge_in_strategies.py`` (14) +
  ``test_barge_in_two_stage.py`` (10).
- TS: 1419/1419 green; new ``barge-in-strategies.test.ts`` (15) +
  ``barge-in-two-stage.test.ts`` (10). Lint clean.
- Existing barge-in regression suites still pass byte-for-byte:
  empty strategies preserve legacy behaviour exactly.

CHANGELOG ``## Unreleased`` updated with full design + file list.
…n strategies

Bundle three changes from branch fix/dashboard-hydrate-schema-and-bargein-grace
into the 0.6.1 release:

1. Dashboard MetricsStore.hydrate now lifts top-level cost/latency
   from CallLogger metadata.json into the synthesized metrics dict —
   hydrated calls in the dashboard show real \$/p95 instead of
   \$0.00 / "—".

2. Barge-in gate anchored on firstAudioSentAt (not beginSpeaking) so
   ElevenLabs/Cartesia first-byte latency no longer lets background
   noise cancel the agent before any audio reaches the wire.

3. New opt-in barge-in confirmation pipeline with MinWordsStrategy
   reference implementation. Empty-tuple default preserves legacy
   cancel-on-VAD behaviour.

# Conflicts:
#	CHANGELOG.md
…ixes

Three user-visible features plus a hardening sweep from a 5-agent code
review covering security, billing safety, race conditions, and resource
leaks.

## Features

### Dashboard cost panel: STT and TTS as separate rows
The cost breakdown previously combined STT and TTS into one "STT / TTS"
line, hiding which side dominated cost. Now rendered as two adjacent
rows labelled with the actual provider name (e.g. "Cartesia STT" /
"ElevenLabs TTS"), driven by ``record.metrics.stt_provider`` /
``tts_provider`` already exposed by the backend. Files:
``dashboard-app/src/components/CostPanel.tsx``,
``dashboard-app/src/lib/mappers.ts``.

### stt_ms is now finalization-only (BREAKING semantic change)
Previously ``LatencyBreakdown.stt_ms`` measured ``stt_complete -
turn_start`` — which conflated user speech duration with STT processing.
A 5 s utterance produced ``stt_ms ≈ 5000`` even when Cartesia/Deepgram
finalized in 200 ms after end-of-speech. Industry benchmarks
(Picovoice/Deepgram/Gladia/Speechmatics) all report STT latency as the
finalization window: ``final_transcript - end_of_speech``. ``stt_ms``
now matches that definition. New optional field
``user_speech_duration_ms`` carries the displaced "how long did the
user speak" number. Files: ``libraries/python/getpatter/models.py``,
``libraries/python/getpatter/services/metrics.py``,
``libraries/typescript/src/metrics.ts``.

### Pre-warm services + pre-synth firstMessage
``Agent.prewarm: bool = True`` (default on) warms STT/TTS/LLM provider
connections in parallel with carrier ``initiate_call`` so DNS, TLS,
HTTP/2 / WebSocket handshakes are complete by the time the callee
answers. Concrete ``warmup()`` overrides shipped on Deepgram / Cartesia
/ AssemblyAI STT, ElevenLabs WS / Cartesia / Inworld TTS, OpenAI
Realtime. ``Agent.prewarm_first_message: bool = False`` (opt-in)
pre-renders ``first_message`` to TTS bytes during ringing and streams
the cached buffer instantly when the carrier emits ``start`` —
eliminates 200-700 ms of TTS first-byte latency on the greeting at the
cost of paying TTS even when the call isn't answered (logged at WARN
level when wasted).

## Review fixes (12 issues from 5-agent multi-perspective review)

### Provider warmup correctness
- 🔴 OpenAI Realtime warmup uses ``session.update`` (not the non-spec
  ``response.create`` with ``generate:false`` which could silently bill
  tokens or return ``invalid_request_error``). Files:
  ``providers/openai_realtime.py``, ``providers/openai-realtime.ts``.
- 🟡 ElevenLabs WS warmup BOS frame now mirrors the live ``synthesize``
  BOS byte-for-byte (``voice_settings`` + ``generation_config``). Shared
  helper ``_build_bos_frame`` / ``buildBosFrame``. Verified billing-safe
  via no ``flush:true``, no real text. Files:
  ``providers/elevenlabs_ws_tts.py``, ``providers/elevenlabs-ws-tts.ts``.
- 🟡 Inworld TTS warmup uses ``GET /tts/v1/voices`` instead of ``HEAD``
  against POST-only stream endpoint (was returning 405 in audit logs).
- 🟡 Cartesia STT + AssemblyAI STT warmup error logs no longer leak the
  API key — catches ``WSServerHandshakeError`` specifically and logs
  only the HTTP status code, never ``str(exc)`` (which embeds the URL).

### StreamHandler / barge-in correctness
- 🟠 Double ``record_overlap_start`` on strategy-confirmed barge-in
  fixed: VAD start path now stamps T1, the strategy-confirm path no
  longer overwrites with T2 — ``detection_delay_ms`` is now correct for
  every user opting into ``barge_in_strategies``. Files:
  ``stream_handler.py:_do_cancel_for_barge_in``,
  ``stream-handler.ts:runBargeInCancel``.
- 🟠 Pending barge-in task leak fixed: ``cleanup`` (Py) /
  ``handleStop`` + ``handleWsClose`` (TS) now call
  ``_clear_pending_barge_in`` so a call ending mid-pending no longer
  leaves an asyncio.Task / setTimeout firing on a finalized handler.
- 🟢 Pre-warm bytes now chunked (1280 B / 40 ms) before
  ``audio_sender.send_audio`` so barge-in mid-greeting can flush
  cleanly via the existing mark/clear bookkeeping.

### Patter client + cache hardening
- 🟠 Cache eviction on abnormal hangup: the Twilio status callback
  (``no-answer`` / ``busy`` / ``failed`` / ``canceled``) and the Telnyx
  ``call.hangup`` / AMD-machine paths now call ``_record_prewarm_waste``
  so memory doesn't leak proportional to no-answer rate.
- 🟠 Race start-vs-prewarm fixed: a ``_prewarm_consumed`` set tracks
  consumed call_ids so a late-arriving prewarm task drops its bytes
  instead of orphaning them in the cache.
- 🟡 ``disconnect()`` now cancels in-flight prewarm tasks and clears
  the cache (no spend leak across serve/disconnect cycles).
- 🟡 ``prewarm_first_message=True`` on Realtime / ConvAI mode now logs
  a WARN and skips the spawn (was silently paying TTS for bytes the
  StreamHandler never consumed).
- 🟡 Prewarm cache bounded at 200 entries with TTL-based eviction
  (``ring_timeout + 5 s``) — caps memory under outbound flood
  scenarios.

### Documentation
- Docstring for ``Agent.barge_in_strategies`` corrected: TTS continues
  streaming naturally during pending state (was misleadingly described
  as "paused").

## Tests

47 new regression tests across 4 new files plus updates to existing
suites. Verifies every fix above with authentic mocks at the network
boundary only:

- ``libraries/python/tests/test_prewarm.py`` (new — 28 tests covering
  default flag values, no-op default ``warmup``, all-three-providers
  warmup invocation, opt-out, exception swallow, cache populate / skip
  / empty-message / timeout, one-shot pop, waste-warn log, StreamHandler
  cache-hit short-circuit + cache-miss live-TTS fallback, race orphan,
  disconnect cleanup, cap+TTL eviction, provider-mode validation,
  chunking).
- ``libraries/python/tests/unit/test_provider_warmup.py`` (new — 18
  tests covering all 7 concrete ``warmup()`` overrides + billing-safety
  regressions + key-leak regressions).
- ``libraries/typescript/tests/unit/prewarm.test.ts`` (new — 23 TS
  twins).
- ``libraries/typescript/tests/unit/provider-warmup.mocked.test.ts``
  (new — 19 TS twins).
- Updates to ``test_barge_in_two_stage.py`` (3 ``record_overlap_start``
  tests + 2 cleanup tests), ``barge-in-two-stage.test.ts`` (4 TS
  twins), ``server-routes.test.ts`` (2 status-callback eviction tests).

## Verification

- Python: 1797 passed, 7 skipped, 0 failed (was 1707 + 14 prewarm + 76
  inherited from new subclass collection-tests)
- TypeScript: 1467 passed across 83 files (was 1430 + 37 new)
- TypeScript ``tsc --noEmit`` (lint): clean
- TypeScript ``tsup build`` (ESM + CJS + dts + CLI): clean

## CHANGELOG

All entries under ``## 0.6.1 (2026-05-09)`` with file paths, line
numbers, rationale, and test paths.
…atency

Live PSTN smoke tests against ``outbound-cartesia-cerebras-elevenlabs.ts``
exposed several issues in 0.6.1 that were not caught by the unit suite.
This commit ships seven fixes plus three quick wins on top of the
prewarm pipeline.

## Architectural — WebSocket handoff for prewarm (replaces open-then-close)

The 0.6.1 prewarm pipeline as previously shipped (commit ``c585f6d``)
opened a streaming-STT and streaming-TTS WebSocket during the carrier
ringing window, idled ~250 ms, and closed it. Investigation showed the
strategy is structurally insufficient on Node: the ``ws`` package does
not thread a TLS session ticket across separate ``new WebSocket(...)``
constructions, so every fresh ``connect()`` at call pickup pays full
TCP+TLS+HTTP-101 upgrade. Net saved time was 50–250 ms (DNS cache only)
versus 700–1500 ms of cold-start budget. Live test reported "several
seconds" first-turn latency, p95 3048 ms.

The new strategy keeps the warmed WS open and hands it off to the
``StreamHandler`` at call pickup. New API surface:

- ``Patter._prewarmedConnections: Map<callId, ParkedProviderConnections>``
  (TS) / ``self._prewarmed_connections: dict[str, ParkedProviderConnections]``
  (Py) — keyed by carrier-issued ``call_id``, populated during ringing,
  drained on call end or after a 30 s safety TTL.
- ``provider.openParkedConnection()`` / ``open_parked_connection()`` —
  added to ``CartesiaSTT``, ``ElevenLabsWebSocketTTS``,
  ``OpenAIRealtimeAdapter``. Opens the WS, sends the same initial config
  the live ``connect()`` sends (STT: empty config; TTS: BOS frame
  matching ``synthesize`` BOS byte-for-byte; Realtime: ``session.update``),
  and returns a handle the caller parks.
- ``provider.adoptWebSocket(handle)`` / ``adopt_websocket(handle)`` —
  added to the same three providers. Accepts a pre-opened WS, validates
  ``readyState === OPEN``, and proceeds with the live message loop. For
  ElevenLabs WS TTS the handle carries a ``bosAlreadySent: true`` flag so
  the first ``synthesizeStream`` iteration does not double-send BOS
  (which would be a protocol error).
- ``StreamHandler`` checks ``client.popPrewarmedConnections(callId)``
  before falling back to fresh ``connect()``. On adopt, the path skips
  TCP+TLS+upgrade and the BOS round-trip — STT connects in 0 ms, TTS in
  0 ms.

Cleanup wiring: the same status callback paths that already drain the
prewarm-audio cache (FIX #91) now also close any parked WS for failed
calls (no-answer / busy / failed / canceled / AMD-machine). The 30 s
TTL covers the rare carrier path that emits neither ``start`` nor a
status callback.

Live validation against ``outbound-cartesia-cerebras-elevenlabs.ts``:
``[PREWARM] callId=… provider=stt ms=769`` followed by
``[CONNECT] callId=… provider=stt source=adopted ms=0`` — STT connect
went from 150–400 ms to 0 ms. First-turn greeting wire-time dropped from
"several seconds" to **990 ms**. Files:
``libraries/typescript/src/client.ts`` (cache + ``parkProviderConnections``,
``popPrewarmedConnections``, ``closePrewarmedConnections``,
``ParkedProviderConnections`` interface, ``closeParkedConnections``
helper); ``libraries/typescript/src/server.ts`` (forwards
``popPrewarmedConnections`` into ``StreamHandlerDeps``);
``libraries/typescript/src/stream-handler.ts`` (adopt-or-connect logic);
``libraries/typescript/src/providers/{cartesia-stt,elevenlabs-ws-tts,openai-realtime}.ts``
(park + adopt API surface). Python parity in
``libraries/python/getpatter/{client,server,stream_handler,telephony/twilio,telephony/telnyx}.py``
and ``libraries/python/getpatter/providers/{cartesia_stt,elevenlabs_ws_tts,openai_realtime}.py``.
Realtime mode has the API surface but the ``OpenAIRealtimeStreamHandler``
adoption is deferred to a follow-up — pipeline mode dominates the
affected use case.

## Quick wins (parallel to WS handoff, smaller individual savings)

- **Eager AEC import on ``Patter.serve()``** (gated on
  ``agent.echo_cancellation=true``). Was previously a lazy
  ``await import('./audio/aec')`` on first ``start`` event, paying
  150–400 ms JIT on the first call. Files:
  ``libraries/typescript/src/client.ts``, ``libraries/python/getpatter/client.py``.
- **Parallel ``stt.connect()`` and TTS-firstMessage kickoff**. Previously
  the StreamHandler awaited STT before TTS firstMessage — STT does not
  need to be ready to send firstMessage out, only to receive caller
  audio. Now both kick off concurrently. Saves 200–400 ms on the first
  turn. Files: ``libraries/typescript/src/stream-handler.ts``,
  ``libraries/python/getpatter/stream_handler.py``.
- **Timing instrumentation**: new ``[PREWARM]`` and ``[CONNECT]`` INFO
  logs in the prewarm spawn and provider connect paths, with elapsed-ms
  per provider. Lets us A/B-test future prewarm changes with numerical
  evidence rather than perceptual reports.

## Dashboard fixes (third pass — issues found during the round-2 PSTN test)

### Live transcript shows only one turn at a time (BUG #102)

``MetricsStore.recordTurn`` correctly accumulated turns into
``active.turns[]`` but the frontend ``toUiTranscript`` mapper had two
paths: a primary keyed on ``record.transcript.length > 0`` (used for
completed calls) and a fallback that derived rows from ``record.turns``.
For an in-flight call the primary always returned empty (active records
never carried ``transcript[]``) and only the fallback rendered, so the
two paths diverged. Each ``recordTurn`` now mirrors the round-trip into
a flat ``active.transcript`` array (one user entry + one assistant entry
per turn, filtering empty ``user_text`` and the ``[interrupted]`` agent
sentinel), so the primary path sees the same accumulating ``user →
assistant → user → assistant → …`` history live calls and completed
calls both expose. Files: ``libraries/typescript/src/dashboard/store.ts``,
``libraries/typescript/tests/dashboard-store.test.ts`` (5 new authentic
tests).

### Transcript disappears after call end (BUG #101)

The Twilio status callback for ``CallStatus=completed`` fires a beat
before the WS ``stop`` frame, so ``MetricsStore.updateCallStatus``
moved the active record into the completed buffer **without preserving
``turns[]`` or ``transcript[]``**. The subsequent ``recordCallEnd``
overwrote that completed entry, but in the gap any ``useTranscript``
fetch returned a record with no transcript and the live pane went
blank. Three-point fix: (a) ``updateCallStatus`` terminal branch now
copies ``active.turns`` and ``active.transcript`` into the new
completed entry; (b) ``recordCallEnd`` falls back to active/existing
transcript when ``data.transcript`` is empty; (c) the
``useTranscript`` hook subscribes to ``call_end`` SSE events
(independent of ``isLive``) so the pane refetches the moment
``recordCallEnd`` lands the SDK-authoritative ``history.entries``.
Files: ``libraries/typescript/src/dashboard/store.ts``,
``dashboard-app/src/hooks/useTranscript.ts``.

### Sparkline tooltip generic / wrong metric (BUG #104)

The metric-tile sparkline tooltip rendered ``"N call(s)"`` plus a
per-call sample list regardless of which card it was attached to —
the latency and spend cards therefore showed the same headline as the
calls card. New ``MetricKind`` prop (``'count' | 'latency' | 'spend'``)
threaded through ``Metric`` → ``SparkBar`` → ``SparkTooltip``, with a
pure ``bucketHeadline(bucket, kind)`` helper that computes per-card
aggregates: ``TOTAL COST $X.XXX`` (sum of per-call cost),
``AVG LATENCY <p95-mean> ms`` (mean of per-call P95), or
``N CALL(S)``. Headline label uppercased, monospace, styled to match
the existing time-range header on the same tooltip. Files:
``dashboard-app/src/App.tsx``, ``dashboard-app/src/components/Metric.tsx``,
``dashboard-app/src/styles/dashboard.css``.

### caller / callee never persisted to metadata.json (BUG B from the second pass)

Every persisted ``metadata.json`` showed ``"caller": ""``,
``"callee": ""`` for completed calls — only the in-memory
``MetricsStore`` had the right values. The persist layer received empty
strings because the ``CallLogger.log_call_end`` data shape was built
from agent options rather than the live record. ``server.ts``
``wrappedStart`` now resolves ``caller``/``callee`` from the active
store record before persisting; Python ``record_call_start`` parity fix
stops clobbering caller/callee with empty strings on the
upgrade-from-initiated path (TS already had the right pattern).

### Call disappears from dashboard after end (BUG C from the second pass)

Race-induced duplicate row: Twilio's status callback for
``CallStatus=completed`` fires ~50–200 ms before the WS ``stop`` frame.
``updateCallStatus`` moved the row out of ``activeCalls`` into
``calls[]`` correctly, then the WS ``stop`` drove ``recordCallEnd``,
``activeCalls.get(callId)`` returned undefined, and a duplicate entry
was pushed with ``started_at = 0`` and empty caller/callee. The
duplicate masked the well-formed earlier row and the 24h window filter
excluded it. ``recordCallEnd`` / ``record_call_end`` now searches
``calls[]`` for the existing entry when active is gone and **updates
in place**, preserving caller/callee/started_at and merging in the
just-collected metrics.

## Tests

47 new regression tests across 6 files (TS + Py parity):
- ``libraries/python/tests/test_prewarm_handoff.py`` (new — 6 tests)
- ``libraries/typescript/tests/unit/prewarm-handoff.test.ts`` (new — 6 tests)
- ``libraries/python/tests/unit/test_dashboard_store_unit.py`` (+4 dedup
  + active-accessor tests)
- ``libraries/python/tests/unit/test_server_unit.py`` (+1 caller/callee
  persist test)
- ``libraries/typescript/tests/dashboard-store.test.ts`` (+7 dedup +
  transcript accumulate + accessor tests)
- ``libraries/typescript/tests/server.test.ts`` (+1 caller/callee persist
  test using real ``CallLogger``)

## Verification

- Python: ``pytest -q`` → 1808 passed, 7 skipped (was 1797 + 11 new)
- TypeScript: ``npm test`` → 1481 passed (was 1467 + 14 new)
- TypeScript ``tsc --noEmit`` (lint): clean
- TypeScript ``tsup build`` (esm + cjs + dts + cli): clean
- Dashboard SPA build (``cd dashboard-app && npm run build``): clean
  (204.93 kB / 63.47 kB gz)
- Dashboard sync: both ``libraries/{python,typescript}/.../dashboard/ui.html``
  refreshed
- Live PSTN smoke test (``outbound-cartesia-cerebras-elevenlabs.ts``):
  WS handoff log fired, first-turn greeting 990 ms, transcript live and
  post-end render OK, sparkline tooltip per-card OK
…ffold

Headline changes since cbe1886:

* Rolled back the 400 ms STT-final → LLM dispatch debounce introduced
  earlier in 0.6.1 (`_scheduleTurnCommit` / `_runDeferredTurnCommit` in
  TS, `_schedule_turn_commit` / `_delayed_turn_commit` in Python). The
  partial-transcript reschedule branch was overwriting the dispatched
  FINAL text with the latest partial, causing entire user turns to be
  dropped during slow-LLM windows. Verified on real PSTN (round 10k
  with gpt-5-nano dropped 3 of 5 user turns). Dispatch is now
  synchronous on `is_final` again. The original double-talk symptom is
  re-opened with a better fix path documented internally.

* Kept beneficial 0.6.1 work: `beginSpeaking` stamps
  `firstAudioSentAt = Date.now()` on every turn so the
  `canBargeIn()` anti-flicker gate runs in parallel with LLM TTFT +
  TTS TTFB; VAD `speech_start` calls `anchorUserSpeechStart()` and
  skips on phantom-during-warmup-gate; commit-drop path re-anchors;
  WARN log when pipeline has no `llm` / `onMessage` handler; char/4
  fallback billing for providers that don't emit a usage chunk;
  `OpenAILLMProvider.providerKey` static; firstMessage TTS char
  billing; persist full latency breakdown per percentile in
  metadata.json; dashboard hydrate reads `transcript.jsonl`;
  ElevenLabs default flipped to WS.

* Lowered dashboard percentile threshold 5 → 2 turns so the detail
  pane no longer shows `—` for p50/p95 on typical 4-7 turn PSTN calls
  while the list column already shows a real number via avg fallback.

* Added Krisp VIVA noise-suppression scaffold for the TypeScript SDK
  at `libraries/typescript/src/providers/krisp-filter.ts` for cross-
  SDK parity with the existing Python `KrispVivaFilter`. Throws at
  construction time because Krisp does not publish an official Node
  SDK as of 2026-05; users supply SDK + `.kef` model + license. New
  top-level exports: `KrispVivaFilter`, `KrispVivaFilterOptions`,
  `KrispSampleRate`, `KrispFrameDuration`, `DeepFilterNetFilter`,
  `DeepFilterNetOptions`.

* CHANGELOG 0.6.1 section revised to reflect the rollback narrative
  honestly (debounce attempted, rolled back before release) and to
  document the new entries.

* Scrubbed competitor-name references from source files (Pipecat,
  LiveKit) per project rule `.claude/rules/no-competitor-references.md`;
  replaced with "industry-standard pattern" wording. Source files
  affected: `stream-handler.ts`, `stream_handler.py`, `metrics.ts`,
  `services/metrics.py`, `silero_vad.py`.

* Krisp Python wrapper unchanged.

Tests: TS lint clean, vitest 1486/1486 pass; Python pytest unit 1252
pass, 5 skip. Validated on real PSTN: post-rollback p95 wait
1844 ms over 4 clean sequential turns (no drops) on cellular
hotspot — vs catastrophic 8521 ms with 3 dropped turns pre-rollback.
Pre-commit end-of-file-fixer was failing on this single file.
Trim extra blank line so file ends with exactly one '\n'.
The Python ``CallMetricsAccumulator._emit_eou_metrics`` had
``end_of_utterance_delay`` and ``transcription_delay`` swapped relative
to the TypeScript ``emitEouMetrics`` AND emitted them in seconds while
TS emits milliseconds. Dashboards or exporters reading the same metric
across both SDKs saw a 1000x disagreement on top of swapped field
semantics.

Locked convention (now identical in both SDKs):

- end_of_utterance_delay = stt_final  - vad_stopped  (ms)
- transcription_delay    = turn_commit - vad_stopped (ms)
- on_user_turn_completed_delay                       (ms, unchanged)

Python now clamps negative deltas to 0 (TS already did). The Python
``EOUMetrics`` docstring updated from "seconds" to "milliseconds".

Tests pin both behaviours:
- libraries/python/tests/test_metrics.py::TestEOUMetricsEmission
- libraries/typescript/tests/unit/metrics.test.ts ::
  CallMetricsAccumulator > emitEouMetrics field semantics

Refs: 0.6.1 observability parity audit.
The Python SDK exposed three OTel-related helpers since 0.6.1:
``record_patter_attrs``, ``patter_call_scope``, ``attach_span_exporter``
(in ``getpatter.observability.attributes``). The TypeScript SDK had no
equivalent surface — every provider adapter that called the Python
helpers had no place to call across the parity boundary, violating
``.claude/rules/sdk-parity.md``.

Port the helpers to TypeScript as no-ops by default. When
``PATTER_OTEL_ENABLED`` is unset or ``@opentelemetry/api`` is not
installed, each helper returns immediately, keeping the zero-cost
disabled path that the rest of the observability module already
respects.

Semantic mapping:
- recordPatterAttrs(attrs)                       <-> record_patter_attrs
- patterCallScope({ callId, side }, fn)          <-> patter_call_scope
- attachSpanExporter(patterInstance, exporter)   <-> attach_span_exporter

The JS form of patterCallScope takes an async callback because JS lacks
``with``-style context managers; the closure is the scope body. The
module uses a module-level stack instead of a ContextVar, which is
sufficient for the SDK's one-call-per-handler model.

Tests:
- libraries/typescript/tests/unit/observability-attributes.test.ts
  (7 smoke cases covering the public surface + scope unwind on throw)
…nt loop

``ElevenLabsWebSocketTTS.adopt_websocket`` closed any previously parked
WS handle via ``asyncio.create_task(prev.ws.close())`` and silently
swallowed the resulting ``RuntimeError`` whenever the method ran
outside an event loop. The FD on our side leaked until process exit.
Real scenario: cleanup hooks fired from ``__del__``, atexit handlers,
or signal-driven teardown.

Fix:
- Keep the async fast path when a loop is running.
- Fall back to a best-effort synchronous ``transport.close()`` when
  no loop is available. ``transport.close`` is non-blocking and safe
  off-loop; it skips the WS close handshake but cleans up the socket.
- Log a warning on the fallback path so the FD-leak symptom shifts
  from "silent" to "logged".

The TypeScript counterpart ``adoptWebSocket`` is unaffected —
``ws.close()`` from the ``ws`` package is synchronous so the same
scenario doesn't reach an analogous error branch.

Tests:
- libraries/python/tests/unit/test_elevenlabs_ws_tts.py::TestAdoptWebSocketCleanup
  (3 cases: with running loop, without loop, idempotent same-handle).
Add regression coverage that ``_stream_prewarm_bytes`` /
``streamPrewarmBytes`` open the barge-in gate
(``_first_audio_sent_at`` / ``firstAudioSentAt``) once the first chunk
reaches the wire. The current code already does this — the gate is
opened both by ``_begin_speaking(is_first_message=True)`` ahead of
streaming AND by ``_mark_first_audio_sent`` per-iteration inside the
prewarm loop — but a future refactor of the begin-speaking path could
silently regress the prewarm-specific case. The per-chunk mark call
inside the streaming loop is the last line of defence and now has
explicit coverage on both SDKs.

Test names match across SDKs for grep-friendly parity:
- Python: tests/test_prewarm.py
  ::test_stream_prewarm_bytes_opens_barge_in_gate_on_first_chunk
- TypeScript: tests/unit/prewarm.test.ts
  > "opens the barge-in gate by stamping firstAudioSentAt after the
     first chunk"
The four fix/feat entries that landed in ## Unreleased during the
0.6.1 review pass (EOU semantics + unit, OTel TS no-op stubs,
ElevenLabs adopt_websocket cleanup, prewarm barge-in regression
tests) belong under the 0.6.1 release block since version
literals stay at 0.6.1 (no separate 0.6.2 bump). Date bumped to
2026-05-12 to reflect the actual release-prep date.
…se of #90) (#91)

* chore(cerebras): debug log when usage chunk missing + fallback fires

When an upstream LLM stream (Cerebras and similar) does not emit a
`usage` chunk despite `stream_options={include_usage:true}`, the
char/4 fallback billing path previously emitted WARN on every
tool-loop iteration. Multi-tool turns logged 5-10 identical WARN
lines for the same call, drowning real warnings.

Replace with one-shot INFO at first fallback per LLMLoop instance
(provider, model, char counts, est_tokens), then DEBUG for every
subsequent iteration with the running `_usage_missing_count` /
`_usageMissingCount` total. No billing behaviour change — char/4
estimation still drives `record_llm_usage` / `recordLlmUsage`.
Symmetric Python (`logger.info`/`logger.debug`) and TypeScript
(`getLogger().info`/`.debug`).

* docs(krisp): refresh unavailable message with current SDK status

KrispVivaFilter constructor in the TypeScript SDK still throws — no
official Krisp Node.js server SDK exists as of 2026-05. Verified via
`npm search krisp`:

- `@livekit/krisp-noise-filter` (0.4.3, 2026-04) — browser WASM
  track processor on the local microphone; cannot run server-side.
- `@livekit/react-native-krisp-noise-filter` (0.0.3) — mobile native.
- `@krisp.ai/kr-local-monitoring` — Krisp's only first-party npm
  package; "Local Monitoring API", not noise cancellation.

Refreshed the thrown message to (a) stamp the verification date,
(b) explicitly distinguish "server Node SDK" from the existing
browser/RN wrappers, (c) list the LiveKit packages with the reason
they don't apply to Patter (server-received PCM/mulaw stream).
Python KrispVivaFilter and TS DeepFilterNetFilter remain the only
shipped paths. No code behaviour change.

* fix(krisp): remove competitor package names from error message

Per .claude/rules/no-competitor-references.md the TS Krisp filter
error message cannot cite competitor package names — refactored
the "Browser/React Native" block to describe the category
generically (third-party wrappers, client-side scope) without
naming specific packages. Same cleanup applied to the matching
CHANGELOG entry. No behavioural change.
…s (re-base of #89) (#92)

* fix(dashboard): preserve existing calls when new call arrives in SSE stream

`mergeCallPreserving` in `dashboard-app/src/hooks/useDashboardData.ts`
rebuilt the calls array from the server snapshot via `next.map(...)`,
so any call present in the previous UI state but missing from the next
payload was silently dropped. With back-to-back calls, the SSE
`call_start` refresh occasionally landed before the prior call
propagated to `/api/dashboard/calls` and the row vanished from the
SPA — regression reported as #124.

The merge is now a true upsert: rows present in `prev` but absent from
`next` are appended, so prior calls stay visible until the server
snapshot stabilises. Server-side eviction (ring buffer of 500) bounds
long-running sessions.

Pure merge helpers extracted to `dashboard-app/src/hooks/mergeCalls.ts`
and exercised by `dashboard-app/src/hooks/mergeCalls.test.ts` (added
Vitest to the SPA so the helpers can be tested in isolation without a
React harness).

Refs #124.

* fix(barge-in): firstMessage interruptible via per-chunk mark gating

The firstMessage TTS chunks were pushed into the carrier WebSocket as
fast as the provider yielded them. Twilio's outbound buffer ended up
several seconds deep, and a barge-in's sendClear was queued behind the
already-enqueued media frames — the agent kept talking on the user's
earpiece for up to ~2 s after the user spoke (#128).

The firstMessage send path is now a paced loop:
* Twilio: every chunk is followed by a unique mark; the loop waits for
  the oldest unconfirmed mark once FIRST_MESSAGE_MARK_WINDOW (3 chunks
  ≈ 120 ms) are in flight. ``onMark`` drains the FIFO on echo so the
  next chunk goes out. ``cancelSpeaking`` (Py: ``_run_barge_in_cancel``)
  resolves every pending mark waiter so the loop exits on the next
  tick and ``sendClear`` lands on a near-empty carrier buffer.
* Telnyx (no mark concept): the loop falls back to a playout-duration-
  based sleep so the buffer can't out-run a clear by more than one
  chunk.

Both SDKs stay in parity: TS ``sendPacedFirstMessageBytes`` mirrors Py
``_send_paced_first_message_bytes`` and both ``streamPrewarmBytes`` /
``_stream_prewarm_bytes`` delegate to the new helper. The existing
prewarm chunking test was updated to echo marks via the mock bridge so
it interoperates with the new pacing.

Coverage:
* libraries/typescript/tests/unit/stream-handler.test.ts —
  ``firstMessage mark-gated pacing`` (3 cases: window cap +
  barge-in, mark echo slides window, Telnyx playout pacing).
* libraries/python/tests/unit/test_first_message_pacing.py — 4 cases
  including FIFO mark resolution.

Refs #128.

* fix(barge-in): drain pending marks on call cleanup/stop/ws-close

The firstMessage paced sender accumulates one mark waiter (asyncio.Future
on Python / Promise on TS) per chunk in _pending_marks / pendingMarks
while audio is streaming to the carrier. The barge-in cancel path
already drained these, but a call that ended without going through
cancel — carrier WebSocket drop, hangup mid firstMessage, stop event
arriving before the paced sender finished — left every queued future
unresolved. The send loop was awaiting them, so the orphan futures
leaked until the handler itself was garbage-collected.

Fix: PipelineStreamHandler.cleanup (Py) now invokes _drain_pending_marks
before tearing down adapters; the TS handleStop and handleWsClose do
the equivalent via drainPendingMarks(). Idempotent and safe when the
queue is already empty.

Added regression coverage:
- libraries/python/tests/unit/test_first_message_pacing.py
  (TestCleanupDrainsPendingMarks)
- libraries/typescript/tests/unit/stream-handler.test.ts
  (cleanup drains pending firstMessage marks — handleStop + handleWsClose)

* fix(barge-in): reset firstMessage mark counter per send + on cleanup

PipelineStreamHandler._first_message_mark_counter (Py) and
StreamHandler.firstMessageMarkCounter (TS) were never reset between
turns or calls. With handler re-use, the counter incremented
monotonically across turns — a paced send for the second turn issued
fm_<previous_count + 1> while the carrier could still be echoing a
stale fm_<N> from the previous turn, corrupting FIFO matching in
on_mark / onMark.

Fix: reset the counter to 0 at the top of _send_paced_first_message_bytes
(Py) / sendPacedFirstMessageBytes (TS) so each paced send begins a
fresh fm_1, fm_2, ... sequence. Also reset on cleanup
(PipelineStreamHandler.cleanup Py, handleStop + handleWsClose TS) as a
belt-and-braces against the cross-call boundary.

Coverage:
- libraries/python/tests/unit/test_first_message_pacing.py
  (TestFirstMessageMarkCounterReset — per-send reset + cleanup reset)
- libraries/typescript/tests/unit/stream-handler.test.ts
  (firstMessage mark counter resets across sends + on cleanup)

* fix(dashboard): cap merged UI calls at 500 + sort by startedAt desc

mergeCallPreserving in dashboard-app/src/hooks/mergeCalls.ts preserved
prev_only calls indefinitely by appending them after the fresh snapshot
block. Two consequences on a long-lived session:

1. The UI array grew unbounded — once the session cycled through more
   than 500 calls (the server-side MetricsStore ring buffer default),
   rows the server had already evicted stayed pinned by prev and were
   re-appended on every refresh.
2. Ordering was non-deterministic — prev_only rows always landed at
   the bottom regardless of their startedAtMs, so a newer call could
   end up below an older one if the snapshot ordering shifted.

Fix: after the upsert pass, sort the merged list by startedAtMs
descending and slice to MAX_UI_CALLS = 500 so the SPA mirrors the
server ring buffer.

Coverage: dashboard-app/src/hooks/mergeCalls.test.ts adds a
600-prev+1-fresh cap test and an explicit startedAtMs ordering test.

* fix(realtime): only update lastConfirmedMark on matched mark (parity with Python)

StreamHandler.onMark in libraries/typescript/src/stream-handler.ts
unconditionally assigned this.lastConfirmedMark = markName before
checking whether the name corresponded to a queued mark. Any echo
arriving after the queue was drained, or any mark name emitted by
adapters outside the firstMessage queue, would overwrite the handler-
level field and contaminate downstream barge-in heuristics gated on
lastConfirmedMark.

Python stream_handler.py's on_mark never touches a handler-level
field at all — the equivalent state lives on
TwilioAudioSender.last_confirmed_mark and is updated only by the
carrier's own echo handler. The TS path now matches that behaviour
defensively: lastConfirmedMark is updated only after the queue lookup
confirms a matching entry, mirroring the safer Python semantics.

Coverage: libraries/typescript/tests/unit/stream-handler.test.ts
(onMark only updates lastConfirmedMark on a matched mark) asserts
that an unmatched echo cannot clobber a previously-set value.
… duck-type adopt (re-base of #88) (#93)

* feat(realtime): wire OpenAI Realtime warmup() into provider prewarm framework

The `warmup()` method on `OpenAIRealtimeAdapter` (Python + TS) was
defined but unreachable from `Patter.call()` — the prewarm framework
only iterated `agent.stt` / `agent.tts` / `agent.llm`, but OpenAI
Realtime is an all-in-one provider that's server-instantiated at
`StreamHandler.start()` time and therefore not stored on the Agent.

`_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) now
constructs a transient `OpenAIRealtimeAdapter` from the resolved
Agent + the configured `openai_key` when `agent.provider ==
"openai_realtime"` and runs `warmup()` in parallel with the carrier
`initiate_call`. The transient adapter is configured identically to
the production one (model, voice, instructions, language, audio
format = g711_ulaw for both Twilio and Telnyx, plus optional
reasoning_effort / input_audio_transcription_model knobs from the
engine marker) so the upstream `session.update` primes the same
session state that the live call will use.

Saves 150-400 ms of TLS + WebSocket handshake + `session.created`
round-trip on the first turn. Best-effort: failures during warmup
adapter build or `warmup()` itself are logged at DEBUG and never
abort the call.

* feat(realtime): persist primed Realtime session across warmup → live call boundary

Builds on the previous warmup wiring. The transient warmup adapter
closes its WS after a session.update / session.updated round-trip,
so the live call still pays a fresh ``new WebSocket`` + handshake.
This change parks the primed Realtime WS instead — same pattern the
SDK already uses for STT (Cartesia) and TTS (ElevenLabs WS).

`_park_provider_connections` (Py) / `parkProviderConnections` (TS)
now build a transient `OpenAIRealtimeAdapter` when
`agent.provider == "openai_realtime"`, call its
`open_parked_connection` to keep the `session.updated` WS OPEN,
and stash it under the `openai_realtime` slot key alongside the
existing `stt` / `tts` parked handles.

`OpenAIRealtimeStreamHandler` (Py) accepts a new
`pop_prewarmed_connections` callback (wired through the Twilio and
Telnyx telephony adapters). `StreamHandler.start()` consults the
parked slot before calling `connect()` and calls
`adapter.adopt_websocket(...)` when a live WS is available — saving
~250-450 ms of cold-handshake on the first turn. TS mirrors the same
flow in `StreamHandler.initRealtimeAdapter` for both Twilio and
Telnyx bridges.

All failure modes (missing OpenAI key, dead parked WS, park-task
exception, adoption error) fall through transparently to the cold
`connect()` path. Existing 36-test TS handoff/prewarm suite and
45-test Python suite all green after change.

* fix(realtime): include agent tools + built-ins in primed warmup session

The prewarm path built the transient OpenAIRealtimeAdapter without a
``tools=`` argument, so the ``session.update`` sent during ringing
carried an empty tool list. When ``StreamHandler.start()`` adopted that
parked WebSocket it skipped a fresh ``session.update``, leaving the
upstream session permanently unaware that the two Patter built-ins
(``transfer_call`` / ``end_call``) existed — they silently no-op'd on
every hit-prewarm call (~80% of outbound calls when prewarm is enabled).

Extracted the canonical tool-list construction (user tools +
``transfer_call`` + ``end_call``) into a shared helper —
``build_realtime_tools()`` in Python and ``buildRealtimeTools()`` in
TypeScript — and call it from both the live ``buildAIAdapter`` /
``StreamHandler.start()`` path and the warmup-side
``_build_realtime_warmup_adapter`` / ``buildRealtimeWarmupAdapter``
path so the two ``session.update`` bodies match byte-for-byte.

Tests: 4 new regression tests (2 Py + 2 TS) verifying that the warmup
adapter carries user-defined tools plus both built-ins, and that the
built-ins are still injected when the agent declares no user tools.

* fix(realtime): eliminate double-handshake on outbound prewarm (park does warmup work)

Both ``_spawn_provider_warmup`` and ``_park_provider_connections`` built
a transient ``OpenAIRealtimeAdapter`` and opened its own WebSocket
against ``api.openai.com`` during the ringing window — two handshakes
per outbound call where one suffices.

The warmup-only handshake is a strict subset of what park performs
(open WS → ``session.created`` → ``session.update`` → ``session.updated``)
and park keeps the socket open for adoption. The warmup-side WS was
opened, primed, and immediately discarded — pure waste of 150-400 ms
of ringing-window budget, plus doubled rate-limit pressure against
OpenAI for no benefit.

Fix: ``_spawn_provider_warmup`` no longer builds the Realtime adapter
at all; park is now the sole Realtime warm path on outbound calls.
Pipeline-mode STT / TTS / LLM ``warmup()`` calls are unchanged.

Tests: 2 new regression tests verify (1) ``_spawn_provider_warmup``
does not construct a Realtime adapter, and (2) end-to-end
warmup+park together construct exactly one adapter (the one park uses).
Updated 3 existing tests that asserted the old double-build behaviour.

* fix(realtime): recreate adapter on adopt failure to avoid stale state

When ``adopt_websocket`` / ``adoptWebSocket`` raised mid-adoption, the
partially-adopted ``OpenAIRealtimeAdapter`` was left in an inconsistent
state: ``_running`` / ``messageListenerAttached`` was already true, the
heartbeat task may have started, ``_current_response_item_id`` /
``currentResponseItemId`` may have carried leaked state from the parked
session, and the ``_ws`` / ``ws`` reference pointed at a now-closed
socket.

Falling through to ``connect()`` on that carcass raced
``session.created`` against stale state, ran two heartbeat timers, and
sometimes attached a second message listener to the new socket — silent
corruption of every adopt-failed call.

Fix: when adopt raises, re-instantiate the adapter (via the existing
``adapter_kwargs`` in Python, ``deps.buildAIAdapter`` in TS) before the
cold ``connect()`` path runs, guaranteeing a clean slate.

Tests: regression test in each SDK constructs an adapter whose
``adopt_websocket`` throws, then asserts (a) a second adapter instance
was created, (b) ``connect()`` ran on the fresh adapter, (c) the
handler's adapter reference points at the fresh instance.

* refactor(stream-handler): duck-type adoptWebSocket capability (drop instanceof)

The TS realtime adopt branch in ``stream-handler.ts:initRealtimeAdapter``
previously gated the prewarm-handoff path with two
``this.adapter instanceof OpenAIRealtimeAdapter`` checks. Switched both
to a single duck-type check (``typeof adoptWebSocket === 'function'``)
so:

1. The generic ``stream-handler`` module stays provider-agnostic on this
   hot path. Pipeline-only users still get the symbol resolved at module
   load (the import is used elsewhere in this file for legitimate
   provider-specific behaviour), but the adopt-handoff gate no longer
   demands a concrete class identity.

2. The check mirrors the Python handler's
   ``getattr(self._adapter, "adopt_websocket", None)`` shape — both
   SDKs now use capability-based detection rather than identity.

3. Future Realtime-like adapters (e.g. a different vendor's all-in-one
   provider that also exposes ``adoptWebSocket``) can opt into the
   adopt flow simply by implementing the method, no SDK change needed.

No behaviour change: the same WS-adopt path runs for the same adapter
class. Existing adopt-handoff tests cover the behaviour and continue
to pass.
With ``agent.prewarm=true`` (default) the OpenAI Realtime WebSocket is
parked, primed, and adopted at call pickup with ``source=adopted ms=0``.
The audio bridge is live the instant the callee answers, and the
caller's "Hi" / "Hello?" reliably reaches OpenAI in the ~250-450 ms
before the firstMessage audio starts streaming back. OpenAI's server-VAD
treats that early caller audio as a barge-in and silently cancels the
in-flight ``response.create``, so the configured ``first_message`` is
never delivered. The cold ``connect()`` path masked the bug because the
WS handshake naturally buffered ~300 ms of caller silence.

Fix: ``send_first_message`` / ``sendFirstMessage`` now arm a one-shot
server-VAD lockout. A ``session.update`` with ``turn_detection: null``
(OpenAI-documented: disables server-VAD entirely, no audio-driven
response cancellation) is sent immediately before ``response.create``,
then the receive loop / message listener restores the original
``turn_detection`` block (snapshotted from the configured ``vad_type``
/ ``silence_duration_ms`` / ``threshold`` / ``prefix_padding_ms``) on
the firstMessage ``response.done`` so barge-in works normally for every
subsequent turn. The lockout is strictly one-shot.

``turn_detection: null`` was chosen over a temporary high
``silence_duration_ms`` because it is fully OpenAI-documented and
guarantees zero server-side cancellation (timer-based fallbacks remain
sensitive to clock skew on multi-second response.done windows).

Complements the client-side ``firstAudioSentAt`` guard from PR #92
which prevents the local audio bridge from clearing the playout buffer
on caller speech — this closes the same gap on the *server* side.

Coverage: 3 new Python tests + 4 new TypeScript tests in the
``OpenAIRealtimeAdapter`` IO suites, covering lockout sequence, custom
``silence_duration_ms`` / ``vad_type`` restore, one-shot semantics, and
no-ws no-op.

Files: libraries/python/getpatter/providers/openai_realtime.py,
libraries/typescript/src/providers/openai-realtime.ts,
libraries/python/tests/unit/test_providers_io_unit.py,
libraries/typescript/tests/unit/openai-realtime.test.ts,
CHANGELOG.md.
PR #95 (cc9e51b) shipped the `session.update {turn_detection: null}`
lockout for the OpenAI Realtime firstMessage on the prewarm-adopted
path. A live outbound test after the rebuild still showed the original
symptom (utente parla per primo, agent non saluta).

This branch adds temporary `[DIAG-VAD]` INFO-level instrumentation so
the next live test produces a deterministic root-cause trace. No
behavioural changes.

Log points (both SDKs, byte-for-byte parity):

1. `[DIAG-VAD] sent session.update turn_detection=null (lockout arm)`
   — right after the lockout `session.update` is enqueued.
2. `[DIAG-VAD] sent response.create (firstMessage)` — right after the
   response.create is enqueued. Pairing 1 and 2 vs the server echo in
   step 3 reveals whether the server applied the update before
   processing response.create.
3. `[DIAG-VAD] session.created|session.updated turn_detection=...` —
   every inbound `session.created` / `session.updated` event logs the
   server-acknowledged `turn_detection` value. This is the smoking gun
   for the "server silently ignores turn_detection=null" hypothesis
   reported in OpenAI community threads (see
   community.openai.com/t/error-turning-turn-detection-off-...).
4. `[DIAG-VAD] speech_started fired DURING lockout` — the canonical
   "server-VAD still active" smoking gun if observed while
   firstMessageProtectionPending is true.
5. `[DIAG-VAD] response.cancelled (firstMessageProtectionPending=...)`
   — server cancelled the response. If pending=true, the firstMessage
   was silently dropped (the exact symptom we're chasing).
6. `[DIAG-VAD] response.done received during lockout — restoring
   turn_detection` + `sent session.update turn_detection=<saved>
   (restore)` — success path. If we never see step 6, the firstMessage
   turn never completed.

Diagnosis hypotheses being validated:

- H1 (race): server processes response.create before the
  turn_detection session.update propagates to the VAD subsystem.
  Confirmed if the [DIAG-VAD] echo for `session.updated turn_detection
  =null` arrives AFTER `sent response.create`.
- H2 (silent ignore): server-VAD remains active even after
  `turn_detection: null` because the payload is rejected silently.
  Confirmed if the [DIAG-VAD] session.updated echo never shows
  turn_detection=null, OR if speech_started fires during lockout.
- H3 (npm link cache): the acceptance package was running stale
  code. RULED OUT: `releases/0.6.1/typescript/node_modules/getpatter`
  is a symlink to `libraries/typescript`, dist was freshly rebuilt
  May 12 19:23, and the fix string is present at chunk-D4XCC4FF.mjs
  line 603 (`session: { turn_detection: null }`).
- H4 (reachability): `sendFirstMessage` not called on adopt path.
  RULED OUT: stream-handler.ts:2535 uses `instanceof
  OpenAIRealtimeAdapter`; the adopted path uses the same class
  (`adoptWebSocket` is a method on `OpenAIRealtimeAdapter`).

Next steps:

1. Rebuild SDK: `cd libraries/typescript && npm run build` (already
   done; dist contains the new logs).
2. Re-run live outbound Realtime test with prewarm=default. Collect
   the full server log.
3. Search for `[DIAG-VAD]` lines in chronological order. The pattern
   tells us which hypothesis is correct.
4. Based on H1 / H2, the follow-up fix is either:
   - Wait for `session.updated` ack before sending `response.create`
     (fixes H1 race).
   - Try `turn_detection: {type: 'server_vad', interrupt_response:
     false, create_response: false}` instead of null
     (fixes H2 silent-ignore; documented per OpenAI guide).
   - Or fall back to a brief client-side audio gate (drop inbound
     audio for ~400 ms after `sendFirstMessage` returns) as
     belt-and-suspenders.

Files: `libraries/python/getpatter/providers/openai_realtime.py`,
`libraries/typescript/src/providers/openai-realtime.ts`,
`CHANGELOG.md`. Coverage: unchanged — diagnostic logs are
INFO-only and don't alter call flow. Existing
test_providers_io_unit.py (98 passing) and openai-realtime.test.ts
(30 passing) all green.

The diagnostic logs MUST be removed once the root cause is pinned down.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant