diag(0.6.1): Realtime firstMessage lockout — instrumentation + finding by nicolotognoni · Pull Request #99 · PatterAI/Patter

nicolotognoni · 2026-05-12T17:52:52Z

Summary

PR #95 (commit cc9e51b) shipped a server-VAD lockout for the OpenAI Realtime firstMessage on the prewarm-adopted path: a session.update {turn_detection: null} armed immediately before response.create, restored on the matching response.done. A subsequent live outbound test on Realtime (prewarm enabled, source=adopted ms=0) still produced the original symptom — the agent did not deliver the scripted opening when the caller spoke first.

This PR is diagnostics-only, no behaviour change. It adds [DIAG-VAD] INFO-level instrumentation around the four lockout checkpoints in both Python and TypeScript SDKs so the next live test produces a deterministic root-cause trace.

Investigation so far

Hypothesis	Verified	Status
H3: stale npm-link / dist cache (live test used pre-fix bundle)	Symlink + dist mtime + fix string at `chunk-D4XCC4FF.mjs:603`	Ruled out
H4: `sendFirstMessage` never reached on adopt path	`instanceof OpenAIRealtimeAdapter` holds after `adoptWebSocket` (it's a method on the same class); stream-handler.ts:2535 enters the branch	Ruled out
H1: server processes `response.create` before the `session.update` to `turn_detection: null` propagates to the VAD subsystem	Needs the [DIAG-VAD] echo-ordering trace from a live test	Probable
H2: server silently ignores `turn_detection: null` (reported by OpenAI community)	Needs the inbound `session.updated turn_detection=...` echo to be checked	Probable

Community evidence for H2: a forum thread reports turn_detection: null in session.update is silently ignored — defaults persist. A nested form audio.input.turn_detection: null is offered as a workaround. Separately, OpenAI documents turn_detection.interrupt_response: false + turn_detection.create_response: false as a less aggressive alternative to disabling VAD outright; this would keep VAD detection on but prevent it from cancelling our in-flight response.create.

Instrumentation

Six [DIAG-VAD] log lines added (parity across Py / TS):

[DIAG-VAD] sent session.update turn_detection=null (lockout arm) — at the send site.
[DIAG-VAD] sent response.create (firstMessage) — at the next send.
[DIAG-VAD] session.created|session.updated turn_detection=... — echoes the server's acknowledged turn_detection.
[DIAG-VAD] speech_started fired DURING lockout — server-VAD still active! — fires only if firstMessageProtectionPending is true at the time.
[DIAG-VAD] response.cancelled|response.canceled (firstMessageProtectionPending=...) — server cancellation observed; pending=true means the firstMessage was silently dropped.
[DIAG-VAD] response.done received during lockout — restoring turn_detection + sent session.update turn_detection=<saved> (restore) — success path.

The smoking gun for H1: line 3's echo arrives AFTER line 2 (in wall-clock order). The smoking gun for H2: line 3 never shows turn_detection=null even after line 1 was sent, OR line 4 fires while lockout is pending.

Implementation

Pure logging additions. Zero behavioural change. Zero new dependencies.
Mirrors byte-for-byte across libraries/python/getpatter/providers/openai_realtime.py and libraries/typescript/src/providers/openai-realtime.ts.
TypeScript build re-emits the dist with the new INFO lines (verified: grep "DIAG-VAD" dist/index.js returns 8 hits).
Existing unit tests confirm the logs fire correctly in the lockout sequence (openai-realtime.test.ts: 30 passed; test_providers_io_unit.py: 98 passed).

Breaking change?

No. INFO-level logs only.

Next-test playbook for whoever picks this up

cd libraries/typescript && npm run build (already done on this branch).
Pick up the acceptance package (the symlink at releases/0.6.1/typescript/node_modules/getpatter already points at the SDK source).
Place an outbound call where the callee answers and immediately says "Hello?" — the canonical repro.
Grep the server log for [DIAG-VAD] and look at chronological order:
- If line 3 arrives BEFORE line 2 → H1 (race) is ruled out.
- If line 3 echoes turn_detection=null → H2 (silent ignore) is ruled out.
- If line 4 fires while line 6 has not yet → server-VAD is active despite the lockout (H2 confirmed).
- If line 5 fires with pending=true → firstMessage was silently dropped (root cause confirmed).
Based on the trace, the next PR is one of:
- H1 fix: await session.updated ack before sending response.create. Adds ~1 server-round-trip latency to firstMessage — acceptable on the adopt path because we already saved ~250-450 ms by not paying the cold-connect handshake.
- H2 fix (option A): try turn_detection: {type: 'server_vad', interrupt_response: false, create_response: false} instead of null (per OpenAI VAD guide).
- H2 fix (option B): try the nested payload audio: {input: {turn_detection: null}} (per community workaround).
- Belt-and-suspenders: client-side audio gate on OpenAIRealtimeAdapter.sendAudio — drop inbound audio frames for ~400 ms after sendFirstMessage is invoked. Cheap, doesn't depend on server cooperation, but adds local complexity.

The [DIAG-VAD] lines MUST be removed once the root cause is pinned down — they're not safe to ship indefinitely.

Test plan

Python: cd libraries/python && python3 -m pytest tests/unit/test_providers_io_unit.py -q (98 passing)
TypeScript: cd libraries/typescript && npx vitest run tests/unit/openai-realtime.test.ts (30 passing) + npm run lint + npm run build
[DIAG-VAD] strings present in dist/index.js (8 hits) and chunk-XQ5ROISS.mjs (8 hits)
Live outbound Realtime test with prewarm=default — produces the trace required to pick the root-cause fix

Docs updates

N/A — the only doc surface touched is the repo-root CHANGELOG.md ## Unreleased section.

…6.1 release Ports the observability work from the now-closed PR #82 onto the post-refactor `libraries/python/` layout. PR #82 was authored against the legacy `sdk-py/` paths and was consolidated into the 0.6.0 release branch; this commit lands the actual implementation against the new layout for 0.6.1. What it adds: - `getpatter.observability.attributes` — three new helpers: `record_patter_attrs(attrs)`, `patter_call_scope(call_id, side)` context manager, `attach_span_exporter(patter, exporter, side)`. Lazy-OTel-guarded; no-op when the `[tracing]` extra is not installed. Two ContextVars (`patter.call_id`, `patter.side`) propagate through the asyncio task tree so spans emitted by deeply nested provider code inherit the active call's identity automatically. - `Patter._attach_span_exporter(exporter, *, side="uut")` — public-but- underscore hook for tools that observe Patter from outside (e.g. an out-of-process agent runner). - Per-provider cost emission across 19 surfaces: `patter.cost.{ telephony_minutes, stt_seconds, tts_chars, llm_input_tokens, llm_output_tokens, realtime_minutes}` stamped on the active span. Provider tag emitted alongside as `patter.{telephony,stt,tts,llm, realtime}.provider`. All call sites wrapped in defensive try/except so observability cannot kill a live call. - Per-turn latency: `patter.latency.{ttfb_ms, turn_ms}` stamped from `StreamHandler._emit_turn_metrics` via a new `PipelineHookExecutor.record_turn_latency(*, ttfb_ms, turn_ms)`. - Bridge-level `patter_call_scope` entry on Twilio + Telnyx — entire WebSocket bridge lifetime (incl. hangup/cleanup) bound to the call identity via `contextlib.ExitStack`. - `TwilioAdapter.record_call_end_cost` / `TelnyxAdapter.record_call_end_cost` — adapter helpers used by the bridge to emit `patter.cost.telephony_minutes` once wall-clock duration is known. Versions bumped 0.6.0 → 0.6.1 in `__init__.py`, `pyproject.toml`, `package.json`. CHANGELOG entry added under a new `## 0.6.1 (2026-05-09)` block; the existing `## 0.6.0 (2026-05-08)` block is preserved verbatim — it reflects exactly what was published to PyPI and npm at that tag. ⚠️ TS parity gap: Python only. TypeScript follow-up tracked separately. This is a known time-boxed exception per `.claude/rules/sdk-parity.md`. 5 new unit tests in `libraries/python/tests/unit/ test_observability_attributes_unit.py` exercise the helper module's public surface (`patter_call_scope`, `record_patter_attrs` no-op, `attach_span_exporter` side stamping). Full Python suite: 1719 passed, 7 skipped — green. Refs: closed PR #82. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…in gate from first audio Two bugs caught during 0.6.0 acceptance against `releases/0.6.0/typescript/matrix/outbound-cartesia-cerebras-elevenlabs.ts`: 1. **Dashboard hydrate schema mismatch**: `CallLogger.log_call_end` writes `cost`/`latency`/`duration_ms`/`telephony_provider` as top-level keys of `metadata.json`, but `MetricsStore.hydrate` looked for them under `meta.metrics.cost`/`meta.metrics.latency`. Every hydrated row landed with `metrics=null`, so cost/latency rendered as `$0.00`/`—` for all on-disk calls (only the in-flight call had real numbers). Fix synthesizes a `metrics` dict from the top-level fields when `meta.metrics` is absent while preserving any explicit `meta.metrics` payload untouched. 2. **Early barge-in self-cancellation**: cloud TTS first-byte latency is 200–700 ms; the 250 ms anti-flicker gate (no-AEC PSTN default) was anchored on `_speaking_started_at`/`speakingStartedAt` and expired BEFORE TTS produced audio. VAD then picked up background noise and self-cancelled the agent's first turn — 0 bytes emitted, line silent. Fix anchors the gate on a new `_first_audio_sent_at`/`firstAudioSentAt` set AFTER `bridge.sendAudio` / `audio_sender.send_audio` succeeds at the four pipeline emit sites (firstMessage, streaming, regular, WebSocket remote). `_can_barge_in`/`canBargeIn` returns false while the marker is null. Gate values (250 ms / 1000 ms) unchanged — only the anchor moves. Tests: - Py 1717/1717, TS 1394/1394 green; lint clean. - New regressions: `test_hydrate_lifts_top_level_cost_and_latency_into_metrics`, `test_hydrate_preserves_explicit_metrics_when_present`, `test_barge_in_suppressed_before_first_audio_emitted` (Py) + parity TS cases in `tests/dashboard-store.test.ts` and `tests/unit/stream-handler.test.ts`. - Existing `_handle_barge_in`/`handleBargeIn` tests updated to set both timestamps for the new contract.

Cloud TTS first-byte latency (200-700 ms) plus PSTN background noise mean the legacy "any VAD speech_start cancels the agent" contract produced frequent false-positive cancels — cough, click, HVAC, breath, or a quick "okay" cut the agent mid-sentence and lost the conversational thread. This PR adds an opt-in two-stage confirmation pipeline. With the new empty-tuple default behaviour is unchanged. Configure ``Agent.barge_in_strategies`` / ``agent.bargeInStrategies`` to enable: 1. VAD speech_start during TTS marks the barge-in PENDING. TTS keeps streaming naturally — the LLM stream stays alive. 2. Each STT transcript is evaluated by every configured strategy (short-circuit OR; per-strategy errors are isolated). 3. First strategy that returns True confirms the cancel: runs the existing send_clear + flush ring + LLM abort sequence. 4. If no strategy confirms within ``barge_in_confirm_ms`` (default 1500 ms) the pending state is dropped and the agent finishes its sentence. New module ``getpatter.services.barge_in_strategies`` exposes: - ``BargeInStrategy`` Protocol (async ``evaluate`` + optional ``reset``) - ``MinWordsStrategy`` — filters short backchannels by requiring N words while the agent is speaking and letting any single word through while the agent is silent (so the first user turn is never delayed). - ``evaluate_strategies`` / ``reset_strategies`` helpers. TS parity in ``src/services/barge-in-strategies.ts`` with the same public surface (``MinWordsStrategy``, ``BargeInStrategy`` interface, ``evaluateStrategies``/``resetStrategies``). Wiring lives in stream_handler.py ``_handle_barge_in`` and stream-handler.ts ``handleBargeIn`` — both keep the existing canBargeIn gate (firstAudioSentAt anchor) and only add the strategy check when at least one strategy is configured. Tests: - Py: 1741/1741 green; new ``test_barge_in_strategies.py`` (14) + ``test_barge_in_two_stage.py`` (10). - TS: 1419/1419 green; new ``barge-in-strategies.test.ts`` (15) + ``barge-in-two-stage.test.ts`` (10). Lint clean. - Existing barge-in regression suites still pass byte-for-byte: empty strategies preserve legacy behaviour exactly. CHANGELOG ``## Unreleased`` updated with full design + file list.

…n strategies Bundle three changes from branch fix/dashboard-hydrate-schema-and-bargein-grace into the 0.6.1 release: 1. Dashboard MetricsStore.hydrate now lifts top-level cost/latency from CallLogger metadata.json into the synthesized metrics dict — hydrated calls in the dashboard show real \$/p95 instead of \$0.00 / "—". 2. Barge-in gate anchored on firstAudioSentAt (not beginSpeaking) so ElevenLabs/Cartesia first-byte latency no longer lets background noise cancel the agent before any audio reaches the wire. 3. New opt-in barge-in confirmation pipeline with MinWordsStrategy reference implementation. Empty-tuple default preserves legacy cancel-on-VAD behaviour. # Conflicts: # CHANGELOG.md

…ixes Three user-visible features plus a hardening sweep from a 5-agent code review covering security, billing safety, race conditions, and resource leaks. ## Features ### Dashboard cost panel: STT and TTS as separate rows The cost breakdown previously combined STT and TTS into one "STT / TTS" line, hiding which side dominated cost. Now rendered as two adjacent rows labelled with the actual provider name (e.g. "Cartesia STT" / "ElevenLabs TTS"), driven by ``record.metrics.stt_provider`` / ``tts_provider`` already exposed by the backend. Files: ``dashboard-app/src/components/CostPanel.tsx``, ``dashboard-app/src/lib/mappers.ts``. ### stt_ms is now finalization-only (BREAKING semantic change) Previously ``LatencyBreakdown.stt_ms`` measured ``stt_complete - turn_start`` — which conflated user speech duration with STT processing. A 5 s utterance produced ``stt_ms ≈ 5000`` even when Cartesia/Deepgram finalized in 200 ms after end-of-speech. Industry benchmarks (Picovoice/Deepgram/Gladia/Speechmatics) all report STT latency as the finalization window: ``final_transcript - end_of_speech``. ``stt_ms`` now matches that definition. New optional field ``user_speech_duration_ms`` carries the displaced "how long did the user speak" number. Files: ``libraries/python/getpatter/models.py``, ``libraries/python/getpatter/services/metrics.py``, ``libraries/typescript/src/metrics.ts``. ### Pre-warm services + pre-synth firstMessage ``Agent.prewarm: bool = True`` (default on) warms STT/TTS/LLM provider connections in parallel with carrier ``initiate_call`` so DNS, TLS, HTTP/2 / WebSocket handshakes are complete by the time the callee answers. Concrete ``warmup()`` overrides shipped on Deepgram / Cartesia / AssemblyAI STT, ElevenLabs WS / Cartesia / Inworld TTS, OpenAI Realtime. ``Agent.prewarm_first_message: bool = False`` (opt-in) pre-renders ``first_message`` to TTS bytes during ringing and streams the cached buffer instantly when the carrier emits ``start`` — eliminates 200-700 ms of TTS first-byte latency on the greeting at the cost of paying TTS even when the call isn't answered (logged at WARN level when wasted). ## Review fixes (12 issues from 5-agent multi-perspective review) ### Provider warmup correctness - 🔴 OpenAI Realtime warmup uses ``session.update`` (not the non-spec ``response.create`` with ``generate:false`` which could silently bill tokens or return ``invalid_request_error``). Files: ``providers/openai_realtime.py``, ``providers/openai-realtime.ts``. - 🟡 ElevenLabs WS warmup BOS frame now mirrors the live ``synthesize`` BOS byte-for-byte (``voice_settings`` + ``generation_config``). Shared helper ``_build_bos_frame`` / ``buildBosFrame``. Verified billing-safe via no ``flush:true``, no real text. Files: ``providers/elevenlabs_ws_tts.py``, ``providers/elevenlabs-ws-tts.ts``. - 🟡 Inworld TTS warmup uses ``GET /tts/v1/voices`` instead of ``HEAD`` against POST-only stream endpoint (was returning 405 in audit logs). - 🟡 Cartesia STT + AssemblyAI STT warmup error logs no longer leak the API key — catches ``WSServerHandshakeError`` specifically and logs only the HTTP status code, never ``str(exc)`` (which embeds the URL). ### StreamHandler / barge-in correctness - 🟠 Double ``record_overlap_start`` on strategy-confirmed barge-in fixed: VAD start path now stamps T1, the strategy-confirm path no longer overwrites with T2 — ``detection_delay_ms`` is now correct for every user opting into ``barge_in_strategies``. Files: ``stream_handler.py:_do_cancel_for_barge_in``, ``stream-handler.ts:runBargeInCancel``. - 🟠 Pending barge-in task leak fixed: ``cleanup`` (Py) / ``handleStop`` + ``handleWsClose`` (TS) now call ``_clear_pending_barge_in`` so a call ending mid-pending no longer leaves an asyncio.Task / setTimeout firing on a finalized handler. - 🟢 Pre-warm bytes now chunked (1280 B / 40 ms) before ``audio_sender.send_audio`` so barge-in mid-greeting can flush cleanly via the existing mark/clear bookkeeping. ### Patter client + cache hardening - 🟠 Cache eviction on abnormal hangup: the Twilio status callback (``no-answer`` / ``busy`` / ``failed`` / ``canceled``) and the Telnyx ``call.hangup`` / AMD-machine paths now call ``_record_prewarm_waste`` so memory doesn't leak proportional to no-answer rate. - 🟠 Race start-vs-prewarm fixed: a ``_prewarm_consumed`` set tracks consumed call_ids so a late-arriving prewarm task drops its bytes instead of orphaning them in the cache. - 🟡 ``disconnect()`` now cancels in-flight prewarm tasks and clears the cache (no spend leak across serve/disconnect cycles). - 🟡 ``prewarm_first_message=True`` on Realtime / ConvAI mode now logs a WARN and skips the spawn (was silently paying TTS for bytes the StreamHandler never consumed). - 🟡 Prewarm cache bounded at 200 entries with TTL-based eviction (``ring_timeout + 5 s``) — caps memory under outbound flood scenarios. ### Documentation - Docstring for ``Agent.barge_in_strategies`` corrected: TTS continues streaming naturally during pending state (was misleadingly described as "paused"). ## Tests 47 new regression tests across 4 new files plus updates to existing suites. Verifies every fix above with authentic mocks at the network boundary only: - ``libraries/python/tests/test_prewarm.py`` (new — 28 tests covering default flag values, no-op default ``warmup``, all-three-providers warmup invocation, opt-out, exception swallow, cache populate / skip / empty-message / timeout, one-shot pop, waste-warn log, StreamHandler cache-hit short-circuit + cache-miss live-TTS fallback, race orphan, disconnect cleanup, cap+TTL eviction, provider-mode validation, chunking). - ``libraries/python/tests/unit/test_provider_warmup.py`` (new — 18 tests covering all 7 concrete ``warmup()`` overrides + billing-safety regressions + key-leak regressions). - ``libraries/typescript/tests/unit/prewarm.test.ts`` (new — 23 TS twins). - ``libraries/typescript/tests/unit/provider-warmup.mocked.test.ts`` (new — 19 TS twins). - Updates to ``test_barge_in_two_stage.py`` (3 ``record_overlap_start`` tests + 2 cleanup tests), ``barge-in-two-stage.test.ts`` (4 TS twins), ``server-routes.test.ts`` (2 status-callback eviction tests). ## Verification - Python: 1797 passed, 7 skipped, 0 failed (was 1707 + 14 prewarm + 76 inherited from new subclass collection-tests) - TypeScript: 1467 passed across 83 files (was 1430 + 37 new) - TypeScript ``tsc --noEmit`` (lint): clean - TypeScript ``tsup build`` (ESM + CJS + dts + CLI): clean ## CHANGELOG All entries under ``## 0.6.1 (2026-05-09)`` with file paths, line numbers, rationale, and test paths.

…atency Live PSTN smoke tests against ``outbound-cartesia-cerebras-elevenlabs.ts`` exposed several issues in 0.6.1 that were not caught by the unit suite. This commit ships seven fixes plus three quick wins on top of the prewarm pipeline. ## Architectural — WebSocket handoff for prewarm (replaces open-then-close) The 0.6.1 prewarm pipeline as previously shipped (commit ``c585f6d``) opened a streaming-STT and streaming-TTS WebSocket during the carrier ringing window, idled ~250 ms, and closed it. Investigation showed the strategy is structurally insufficient on Node: the ``ws`` package does not thread a TLS session ticket across separate ``new WebSocket(...)`` constructions, so every fresh ``connect()`` at call pickup pays full TCP+TLS+HTTP-101 upgrade. Net saved time was 50–250 ms (DNS cache only) versus 700–1500 ms of cold-start budget. Live test reported "several seconds" first-turn latency, p95 3048 ms. The new strategy keeps the warmed WS open and hands it off to the ``StreamHandler`` at call pickup. New API surface: - ``Patter._prewarmedConnections: Map<callId, ParkedProviderConnections>`` (TS) / ``self._prewarmed_connections: dict[str, ParkedProviderConnections]`` (Py) — keyed by carrier-issued ``call_id``, populated during ringing, drained on call end or after a 30 s safety TTL. - ``provider.openParkedConnection()`` / ``open_parked_connection()`` — added to ``CartesiaSTT``, ``ElevenLabsWebSocketTTS``, ``OpenAIRealtimeAdapter``. Opens the WS, sends the same initial config the live ``connect()`` sends (STT: empty config; TTS: BOS frame matching ``synthesize`` BOS byte-for-byte; Realtime: ``session.update``), and returns a handle the caller parks. - ``provider.adoptWebSocket(handle)`` / ``adopt_websocket(handle)`` — added to the same three providers. Accepts a pre-opened WS, validates ``readyState === OPEN``, and proceeds with the live message loop. For ElevenLabs WS TTS the handle carries a ``bosAlreadySent: true`` flag so the first ``synthesizeStream`` iteration does not double-send BOS (which would be a protocol error). - ``StreamHandler`` checks ``client.popPrewarmedConnections(callId)`` before falling back to fresh ``connect()``. On adopt, the path skips TCP+TLS+upgrade and the BOS round-trip — STT connects in 0 ms, TTS in 0 ms. Cleanup wiring: the same status callback paths that already drain the prewarm-audio cache (FIX #91) now also close any parked WS for failed calls (no-answer / busy / failed / canceled / AMD-machine). The 30 s TTL covers the rare carrier path that emits neither ``start`` nor a status callback. Live validation against ``outbound-cartesia-cerebras-elevenlabs.ts``: ``[PREWARM] callId=… provider=stt ms=769`` followed by ``[CONNECT] callId=… provider=stt source=adopted ms=0`` — STT connect went from 150–400 ms to 0 ms. First-turn greeting wire-time dropped from "several seconds" to **990 ms**. Files: ``libraries/typescript/src/client.ts`` (cache + ``parkProviderConnections``, ``popPrewarmedConnections``, ``closePrewarmedConnections``, ``ParkedProviderConnections`` interface, ``closeParkedConnections`` helper); ``libraries/typescript/src/server.ts`` (forwards ``popPrewarmedConnections`` into ``StreamHandlerDeps``); ``libraries/typescript/src/stream-handler.ts`` (adopt-or-connect logic); ``libraries/typescript/src/providers/{cartesia-stt,elevenlabs-ws-tts,openai-realtime}.ts`` (park + adopt API surface). Python parity in ``libraries/python/getpatter/{client,server,stream_handler,telephony/twilio,telephony/telnyx}.py`` and ``libraries/python/getpatter/providers/{cartesia_stt,elevenlabs_ws_tts,openai_realtime}.py``. Realtime mode has the API surface but the ``OpenAIRealtimeStreamHandler`` adoption is deferred to a follow-up — pipeline mode dominates the affected use case. ## Quick wins (parallel to WS handoff, smaller individual savings) - **Eager AEC import on ``Patter.serve()``** (gated on ``agent.echo_cancellation=true``). Was previously a lazy ``await import('./audio/aec')`` on first ``start`` event, paying 150–400 ms JIT on the first call. Files: ``libraries/typescript/src/client.ts``, ``libraries/python/getpatter/client.py``. - **Parallel ``stt.connect()`` and TTS-firstMessage kickoff**. Previously the StreamHandler awaited STT before TTS firstMessage — STT does not need to be ready to send firstMessage out, only to receive caller audio. Now both kick off concurrently. Saves 200–400 ms on the first turn. Files: ``libraries/typescript/src/stream-handler.ts``, ``libraries/python/getpatter/stream_handler.py``. - **Timing instrumentation**: new ``[PREWARM]`` and ``[CONNECT]`` INFO logs in the prewarm spawn and provider connect paths, with elapsed-ms per provider. Lets us A/B-test future prewarm changes with numerical evidence rather than perceptual reports. ## Dashboard fixes (third pass — issues found during the round-2 PSTN test) ### Live transcript shows only one turn at a time (BUG #102) ``MetricsStore.recordTurn`` correctly accumulated turns into ``active.turns[]`` but the frontend ``toUiTranscript`` mapper had two paths: a primary keyed on ``record.transcript.length > 0`` (used for completed calls) and a fallback that derived rows from ``record.turns``. For an in-flight call the primary always returned empty (active records never carried ``transcript[]``) and only the fallback rendered, so the two paths diverged. Each ``recordTurn`` now mirrors the round-trip into a flat ``active.transcript`` array (one user entry + one assistant entry per turn, filtering empty ``user_text`` and the ``[interrupted]`` agent sentinel), so the primary path sees the same accumulating ``user → assistant → user → assistant → …`` history live calls and completed calls both expose. Files: ``libraries/typescript/src/dashboard/store.ts``, ``libraries/typescript/tests/dashboard-store.test.ts`` (5 new authentic tests). ### Transcript disappears after call end (BUG #101) The Twilio status callback for ``CallStatus=completed`` fires a beat before the WS ``stop`` frame, so ``MetricsStore.updateCallStatus`` moved the active record into the completed buffer **without preserving ``turns[]`` or ``transcript[]``**. The subsequent ``recordCallEnd`` overwrote that completed entry, but in the gap any ``useTranscript`` fetch returned a record with no transcript and the live pane went blank. Three-point fix: (a) ``updateCallStatus`` terminal branch now copies ``active.turns`` and ``active.transcript`` into the new completed entry; (b) ``recordCallEnd`` falls back to active/existing transcript when ``data.transcript`` is empty; (c) the ``useTranscript`` hook subscribes to ``call_end`` SSE events (independent of ``isLive``) so the pane refetches the moment ``recordCallEnd`` lands the SDK-authoritative ``history.entries``. Files: ``libraries/typescript/src/dashboard/store.ts``, ``dashboard-app/src/hooks/useTranscript.ts``. ### Sparkline tooltip generic / wrong metric (BUG #104) The metric-tile sparkline tooltip rendered ``"N call(s)"`` plus a per-call sample list regardless of which card it was attached to — the latency and spend cards therefore showed the same headline as the calls card. New ``MetricKind`` prop (``'count' | 'latency' | 'spend'``) threaded through ``Metric`` → ``SparkBar`` → ``SparkTooltip``, with a pure ``bucketHeadline(bucket, kind)`` helper that computes per-card aggregates: ``TOTAL COST $X.XXX`` (sum of per-call cost), ``AVG LATENCY <p95-mean> ms`` (mean of per-call P95), or ``N CALL(S)``. Headline label uppercased, monospace, styled to match the existing time-range header on the same tooltip. Files: ``dashboard-app/src/App.tsx``, ``dashboard-app/src/components/Metric.tsx``, ``dashboard-app/src/styles/dashboard.css``. ### caller / callee never persisted to metadata.json (BUG B from the second pass) Every persisted ``metadata.json`` showed ``"caller": ""``, ``"callee": ""`` for completed calls — only the in-memory ``MetricsStore`` had the right values. The persist layer received empty strings because the ``CallLogger.log_call_end`` data shape was built from agent options rather than the live record. ``server.ts`` ``wrappedStart`` now resolves ``caller``/``callee`` from the active store record before persisting; Python ``record_call_start`` parity fix stops clobbering caller/callee with empty strings on the upgrade-from-initiated path (TS already had the right pattern). ### Call disappears from dashboard after end (BUG C from the second pass) Race-induced duplicate row: Twilio's status callback for ``CallStatus=completed`` fires ~50–200 ms before the WS ``stop`` frame. ``updateCallStatus`` moved the row out of ``activeCalls`` into ``calls[]`` correctly, then the WS ``stop`` drove ``recordCallEnd``, ``activeCalls.get(callId)`` returned undefined, and a duplicate entry was pushed with ``started_at = 0`` and empty caller/callee. The duplicate masked the well-formed earlier row and the 24h window filter excluded it. ``recordCallEnd`` / ``record_call_end`` now searches ``calls[]`` for the existing entry when active is gone and **updates in place**, preserving caller/callee/started_at and merging in the just-collected metrics. ## Tests 47 new regression tests across 6 files (TS + Py parity): - ``libraries/python/tests/test_prewarm_handoff.py`` (new — 6 tests) - ``libraries/typescript/tests/unit/prewarm-handoff.test.ts`` (new — 6 tests) - ``libraries/python/tests/unit/test_dashboard_store_unit.py`` (+4 dedup + active-accessor tests) - ``libraries/python/tests/unit/test_server_unit.py`` (+1 caller/callee persist test) - ``libraries/typescript/tests/dashboard-store.test.ts`` (+7 dedup + transcript accumulate + accessor tests) - ``libraries/typescript/tests/server.test.ts`` (+1 caller/callee persist test using real ``CallLogger``) ## Verification - Python: ``pytest -q`` → 1808 passed, 7 skipped (was 1797 + 11 new) - TypeScript: ``npm test`` → 1481 passed (was 1467 + 14 new) - TypeScript ``tsc --noEmit`` (lint): clean - TypeScript ``tsup build`` (esm + cjs + dts + cli): clean - Dashboard SPA build (``cd dashboard-app && npm run build``): clean (204.93 kB / 63.47 kB gz) - Dashboard sync: both ``libraries/{python,typescript}/.../dashboard/ui.html`` refreshed - Live PSTN smoke test (``outbound-cartesia-cerebras-elevenlabs.ts``): WS handoff log fired, first-turn greeting 990 ms, transcript live and post-end render OK, sparkline tooltip per-card OK

…ffold Headline changes since cbe1886: * Rolled back the 400 ms STT-final → LLM dispatch debounce introduced earlier in 0.6.1 (`_scheduleTurnCommit` / `_runDeferredTurnCommit` in TS, `_schedule_turn_commit` / `_delayed_turn_commit` in Python). The partial-transcript reschedule branch was overwriting the dispatched FINAL text with the latest partial, causing entire user turns to be dropped during slow-LLM windows. Verified on real PSTN (round 10k with gpt-5-nano dropped 3 of 5 user turns). Dispatch is now synchronous on `is_final` again. The original double-talk symptom is re-opened with a better fix path documented internally. * Kept beneficial 0.6.1 work: `beginSpeaking` stamps `firstAudioSentAt = Date.now()` on every turn so the `canBargeIn()` anti-flicker gate runs in parallel with LLM TTFT + TTS TTFB; VAD `speech_start` calls `anchorUserSpeechStart()` and skips on phantom-during-warmup-gate; commit-drop path re-anchors; WARN log when pipeline has no `llm` / `onMessage` handler; char/4 fallback billing for providers that don't emit a usage chunk; `OpenAILLMProvider.providerKey` static; firstMessage TTS char billing; persist full latency breakdown per percentile in metadata.json; dashboard hydrate reads `transcript.jsonl`; ElevenLabs default flipped to WS. * Lowered dashboard percentile threshold 5 → 2 turns so the detail pane no longer shows `—` for p50/p95 on typical 4-7 turn PSTN calls while the list column already shows a real number via avg fallback. * Added Krisp VIVA noise-suppression scaffold for the TypeScript SDK at `libraries/typescript/src/providers/krisp-filter.ts` for cross- SDK parity with the existing Python `KrispVivaFilter`. Throws at construction time because Krisp does not publish an official Node SDK as of 2026-05; users supply SDK + `.kef` model + license. New top-level exports: `KrispVivaFilter`, `KrispVivaFilterOptions`, `KrispSampleRate`, `KrispFrameDuration`, `DeepFilterNetFilter`, `DeepFilterNetOptions`. * CHANGELOG 0.6.1 section revised to reflect the rollback narrative honestly (debounce attempted, rolled back before release) and to document the new entries. * Scrubbed competitor-name references from source files (Pipecat, LiveKit) per project rule `.claude/rules/no-competitor-references.md`; replaced with "industry-standard pattern" wording. Source files affected: `stream-handler.ts`, `stream_handler.py`, `metrics.ts`, `services/metrics.py`, `silero_vad.py`. * Krisp Python wrapper unchanged. Tests: TS lint clean, vitest 1486/1486 pass; Python pytest unit 1252 pass, 5 skip. Validated on real PSTN: post-rollback p95 wait 1844 ms over 4 clean sequential turns (no drops) on cellular hotspot — vs catastrophic 8521 ms with 3 dropped turns pre-rollback.

Pre-commit end-of-file-fixer was failing on this single file. Trim extra blank line so file ends with exactly one '\n'.

The Python ``CallMetricsAccumulator._emit_eou_metrics`` had ``end_of_utterance_delay`` and ``transcription_delay`` swapped relative to the TypeScript ``emitEouMetrics`` AND emitted them in seconds while TS emits milliseconds. Dashboards or exporters reading the same metric across both SDKs saw a 1000x disagreement on top of swapped field semantics. Locked convention (now identical in both SDKs): - end_of_utterance_delay = stt_final - vad_stopped (ms) - transcription_delay = turn_commit - vad_stopped (ms) - on_user_turn_completed_delay (ms, unchanged) Python now clamps negative deltas to 0 (TS already did). The Python ``EOUMetrics`` docstring updated from "seconds" to "milliseconds". Tests pin both behaviours: - libraries/python/tests/test_metrics.py::TestEOUMetricsEmission - libraries/typescript/tests/unit/metrics.test.ts :: CallMetricsAccumulator > emitEouMetrics field semantics Refs: 0.6.1 observability parity audit.

The Python SDK exposed three OTel-related helpers since 0.6.1: ``record_patter_attrs``, ``patter_call_scope``, ``attach_span_exporter`` (in ``getpatter.observability.attributes``). The TypeScript SDK had no equivalent surface — every provider adapter that called the Python helpers had no place to call across the parity boundary, violating ``.claude/rules/sdk-parity.md``. Port the helpers to TypeScript as no-ops by default. When ``PATTER_OTEL_ENABLED`` is unset or ``@opentelemetry/api`` is not installed, each helper returns immediately, keeping the zero-cost disabled path that the rest of the observability module already respects. Semantic mapping: - recordPatterAttrs(attrs) <-> record_patter_attrs - patterCallScope({ callId, side }, fn) <-> patter_call_scope - attachSpanExporter(patterInstance, exporter) <-> attach_span_exporter The JS form of patterCallScope takes an async callback because JS lacks ``with``-style context managers; the closure is the scope body. The module uses a module-level stack instead of a ContextVar, which is sufficient for the SDK's one-call-per-handler model. Tests: - libraries/typescript/tests/unit/observability-attributes.test.ts (7 smoke cases covering the public surface + scope unwind on throw)

…nt loop ``ElevenLabsWebSocketTTS.adopt_websocket`` closed any previously parked WS handle via ``asyncio.create_task(prev.ws.close())`` and silently swallowed the resulting ``RuntimeError`` whenever the method ran outside an event loop. The FD on our side leaked until process exit. Real scenario: cleanup hooks fired from ``__del__``, atexit handlers, or signal-driven teardown. Fix: - Keep the async fast path when a loop is running. - Fall back to a best-effort synchronous ``transport.close()`` when no loop is available. ``transport.close`` is non-blocking and safe off-loop; it skips the WS close handshake but cleans up the socket. - Log a warning on the fallback path so the FD-leak symptom shifts from "silent" to "logged". The TypeScript counterpart ``adoptWebSocket`` is unaffected — ``ws.close()`` from the ``ws`` package is synchronous so the same scenario doesn't reach an analogous error branch. Tests: - libraries/python/tests/unit/test_elevenlabs_ws_tts.py::TestAdoptWebSocketCleanup (3 cases: with running loop, without loop, idempotent same-handle).

Add regression coverage that ``_stream_prewarm_bytes`` / ``streamPrewarmBytes`` open the barge-in gate (``_first_audio_sent_at`` / ``firstAudioSentAt``) once the first chunk reaches the wire. The current code already does this — the gate is opened both by ``_begin_speaking(is_first_message=True)`` ahead of streaming AND by ``_mark_first_audio_sent`` per-iteration inside the prewarm loop — but a future refactor of the begin-speaking path could silently regress the prewarm-specific case. The per-chunk mark call inside the streaming loop is the last line of defence and now has explicit coverage on both SDKs. Test names match across SDKs for grep-friendly parity: - Python: tests/test_prewarm.py ::test_stream_prewarm_bytes_opens_barge_in_gate_on_first_chunk - TypeScript: tests/unit/prewarm.test.ts > "opens the barge-in gate by stamping firstAudioSentAt after the first chunk"

The four fix/feat entries that landed in ## Unreleased during the 0.6.1 review pass (EOU semantics + unit, OTel TS no-op stubs, ElevenLabs adopt_websocket cleanup, prewarm barge-in regression tests) belong under the 0.6.1 release block since version literals stay at 0.6.1 (no separate 0.6.2 bump). Date bumped to 2026-05-12 to reflect the actual release-prep date.

…se of #90) (#91) * chore(cerebras): debug log when usage chunk missing + fallback fires When an upstream LLM stream (Cerebras and similar) does not emit a `usage` chunk despite `stream_options={include_usage:true}`, the char/4 fallback billing path previously emitted WARN on every tool-loop iteration. Multi-tool turns logged 5-10 identical WARN lines for the same call, drowning real warnings. Replace with one-shot INFO at first fallback per LLMLoop instance (provider, model, char counts, est_tokens), then DEBUG for every subsequent iteration with the running `_usage_missing_count` / `_usageMissingCount` total. No billing behaviour change — char/4 estimation still drives `record_llm_usage` / `recordLlmUsage`. Symmetric Python (`logger.info`/`logger.debug`) and TypeScript (`getLogger().info`/`.debug`). * docs(krisp): refresh unavailable message with current SDK status KrispVivaFilter constructor in the TypeScript SDK still throws — no official Krisp Node.js server SDK exists as of 2026-05. Verified via `npm search krisp`: - `@livekit/krisp-noise-filter` (0.4.3, 2026-04) — browser WASM track processor on the local microphone; cannot run server-side. - `@livekit/react-native-krisp-noise-filter` (0.0.3) — mobile native. - `@krisp.ai/kr-local-monitoring` — Krisp's only first-party npm package; "Local Monitoring API", not noise cancellation. Refreshed the thrown message to (a) stamp the verification date, (b) explicitly distinguish "server Node SDK" from the existing browser/RN wrappers, (c) list the LiveKit packages with the reason they don't apply to Patter (server-received PCM/mulaw stream). Python KrispVivaFilter and TS DeepFilterNetFilter remain the only shipped paths. No code behaviour change. * fix(krisp): remove competitor package names from error message Per .claude/rules/no-competitor-references.md the TS Krisp filter error message cannot cite competitor package names — refactored the "Browser/React Native" block to describe the category generically (third-party wrappers, client-side scope) without naming specific packages. Same cleanup applied to the matching CHANGELOG entry. No behavioural change.

…s (re-base of #89) (#92) * fix(dashboard): preserve existing calls when new call arrives in SSE stream `mergeCallPreserving` in `dashboard-app/src/hooks/useDashboardData.ts` rebuilt the calls array from the server snapshot via `next.map(...)`, so any call present in the previous UI state but missing from the next payload was silently dropped. With back-to-back calls, the SSE `call_start` refresh occasionally landed before the prior call propagated to `/api/dashboard/calls` and the row vanished from the SPA — regression reported as #124. The merge is now a true upsert: rows present in `prev` but absent from `next` are appended, so prior calls stay visible until the server snapshot stabilises. Server-side eviction (ring buffer of 500) bounds long-running sessions. Pure merge helpers extracted to `dashboard-app/src/hooks/mergeCalls.ts` and exercised by `dashboard-app/src/hooks/mergeCalls.test.ts` (added Vitest to the SPA so the helpers can be tested in isolation without a React harness). Refs #124. * fix(barge-in): firstMessage interruptible via per-chunk mark gating The firstMessage TTS chunks were pushed into the carrier WebSocket as fast as the provider yielded them. Twilio's outbound buffer ended up several seconds deep, and a barge-in's sendClear was queued behind the already-enqueued media frames — the agent kept talking on the user's earpiece for up to ~2 s after the user spoke (#128). The firstMessage send path is now a paced loop: * Twilio: every chunk is followed by a unique mark; the loop waits for the oldest unconfirmed mark once FIRST_MESSAGE_MARK_WINDOW (3 chunks ≈ 120 ms) are in flight. ``onMark`` drains the FIFO on echo so the next chunk goes out. ``cancelSpeaking`` (Py: ``_run_barge_in_cancel``) resolves every pending mark waiter so the loop exits on the next tick and ``sendClear`` lands on a near-empty carrier buffer. * Telnyx (no mark concept): the loop falls back to a playout-duration- based sleep so the buffer can't out-run a clear by more than one chunk. Both SDKs stay in parity: TS ``sendPacedFirstMessageBytes`` mirrors Py ``_send_paced_first_message_bytes`` and both ``streamPrewarmBytes`` / ``_stream_prewarm_bytes`` delegate to the new helper. The existing prewarm chunking test was updated to echo marks via the mock bridge so it interoperates with the new pacing. Coverage: * libraries/typescript/tests/unit/stream-handler.test.ts — ``firstMessage mark-gated pacing`` (3 cases: window cap + barge-in, mark echo slides window, Telnyx playout pacing). * libraries/python/tests/unit/test_first_message_pacing.py — 4 cases including FIFO mark resolution. Refs #128. * fix(barge-in): drain pending marks on call cleanup/stop/ws-close The firstMessage paced sender accumulates one mark waiter (asyncio.Future on Python / Promise on TS) per chunk in _pending_marks / pendingMarks while audio is streaming to the carrier. The barge-in cancel path already drained these, but a call that ended without going through cancel — carrier WebSocket drop, hangup mid firstMessage, stop event arriving before the paced sender finished — left every queued future unresolved. The send loop was awaiting them, so the orphan futures leaked until the handler itself was garbage-collected. Fix: PipelineStreamHandler.cleanup (Py) now invokes _drain_pending_marks before tearing down adapters; the TS handleStop and handleWsClose do the equivalent via drainPendingMarks(). Idempotent and safe when the queue is already empty. Added regression coverage: - libraries/python/tests/unit/test_first_message_pacing.py (TestCleanupDrainsPendingMarks) - libraries/typescript/tests/unit/stream-handler.test.ts (cleanup drains pending firstMessage marks — handleStop + handleWsClose) * fix(barge-in): reset firstMessage mark counter per send + on cleanup PipelineStreamHandler._first_message_mark_counter (Py) and StreamHandler.firstMessageMarkCounter (TS) were never reset between turns or calls. With handler re-use, the counter incremented monotonically across turns — a paced send for the second turn issued fm_<previous_count + 1> while the carrier could still be echoing a stale fm_<N> from the previous turn, corrupting FIFO matching in on_mark / onMark. Fix: reset the counter to 0 at the top of _send_paced_first_message_bytes (Py) / sendPacedFirstMessageBytes (TS) so each paced send begins a fresh fm_1, fm_2, ... sequence. Also reset on cleanup (PipelineStreamHandler.cleanup Py, handleStop + handleWsClose TS) as a belt-and-braces against the cross-call boundary. Coverage: - libraries/python/tests/unit/test_first_message_pacing.py (TestFirstMessageMarkCounterReset — per-send reset + cleanup reset) - libraries/typescript/tests/unit/stream-handler.test.ts (firstMessage mark counter resets across sends + on cleanup) * fix(dashboard): cap merged UI calls at 500 + sort by startedAt desc mergeCallPreserving in dashboard-app/src/hooks/mergeCalls.ts preserved prev_only calls indefinitely by appending them after the fresh snapshot block. Two consequences on a long-lived session: 1. The UI array grew unbounded — once the session cycled through more than 500 calls (the server-side MetricsStore ring buffer default), rows the server had already evicted stayed pinned by prev and were re-appended on every refresh. 2. Ordering was non-deterministic — prev_only rows always landed at the bottom regardless of their startedAtMs, so a newer call could end up below an older one if the snapshot ordering shifted. Fix: after the upsert pass, sort the merged list by startedAtMs descending and slice to MAX_UI_CALLS = 500 so the SPA mirrors the server ring buffer. Coverage: dashboard-app/src/hooks/mergeCalls.test.ts adds a 600-prev+1-fresh cap test and an explicit startedAtMs ordering test. * fix(realtime): only update lastConfirmedMark on matched mark (parity with Python) StreamHandler.onMark in libraries/typescript/src/stream-handler.ts unconditionally assigned this.lastConfirmedMark = markName before checking whether the name corresponded to a queued mark. Any echo arriving after the queue was drained, or any mark name emitted by adapters outside the firstMessage queue, would overwrite the handler- level field and contaminate downstream barge-in heuristics gated on lastConfirmedMark. Python stream_handler.py's on_mark never touches a handler-level field at all — the equivalent state lives on TwilioAudioSender.last_confirmed_mark and is updated only by the carrier's own echo handler. The TS path now matches that behaviour defensively: lastConfirmedMark is updated only after the queue lookup confirms a matching entry, mirroring the safer Python semantics. Coverage: libraries/typescript/tests/unit/stream-handler.test.ts (onMark only updates lastConfirmedMark on a matched mark) asserts that an unmatched echo cannot clobber a previously-set value.

… duck-type adopt (re-base of #88) (#93) * feat(realtime): wire OpenAI Realtime warmup() into provider prewarm framework The `warmup()` method on `OpenAIRealtimeAdapter` (Python + TS) was defined but unreachable from `Patter.call()` — the prewarm framework only iterated `agent.stt` / `agent.tts` / `agent.llm`, but OpenAI Realtime is an all-in-one provider that's server-instantiated at `StreamHandler.start()` time and therefore not stored on the Agent. `_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) now constructs a transient `OpenAIRealtimeAdapter` from the resolved Agent + the configured `openai_key` when `agent.provider == "openai_realtime"` and runs `warmup()` in parallel with the carrier `initiate_call`. The transient adapter is configured identically to the production one (model, voice, instructions, language, audio format = g711_ulaw for both Twilio and Telnyx, plus optional reasoning_effort / input_audio_transcription_model knobs from the engine marker) so the upstream `session.update` primes the same session state that the live call will use. Saves 150-400 ms of TLS + WebSocket handshake + `session.created` round-trip on the first turn. Best-effort: failures during warmup adapter build or `warmup()` itself are logged at DEBUG and never abort the call. * feat(realtime): persist primed Realtime session across warmup → live call boundary Builds on the previous warmup wiring. The transient warmup adapter closes its WS after a session.update / session.updated round-trip, so the live call still pays a fresh ``new WebSocket`` + handshake. This change parks the primed Realtime WS instead — same pattern the SDK already uses for STT (Cartesia) and TTS (ElevenLabs WS). `_park_provider_connections` (Py) / `parkProviderConnections` (TS) now build a transient `OpenAIRealtimeAdapter` when `agent.provider == "openai_realtime"`, call its `open_parked_connection` to keep the `session.updated` WS OPEN, and stash it under the `openai_realtime` slot key alongside the existing `stt` / `tts` parked handles. `OpenAIRealtimeStreamHandler` (Py) accepts a new `pop_prewarmed_connections` callback (wired through the Twilio and Telnyx telephony adapters). `StreamHandler.start()` consults the parked slot before calling `connect()` and calls `adapter.adopt_websocket(...)` when a live WS is available — saving ~250-450 ms of cold-handshake on the first turn. TS mirrors the same flow in `StreamHandler.initRealtimeAdapter` for both Twilio and Telnyx bridges. All failure modes (missing OpenAI key, dead parked WS, park-task exception, adoption error) fall through transparently to the cold `connect()` path. Existing 36-test TS handoff/prewarm suite and 45-test Python suite all green after change. * fix(realtime): include agent tools + built-ins in primed warmup session The prewarm path built the transient OpenAIRealtimeAdapter without a ``tools=`` argument, so the ``session.update`` sent during ringing carried an empty tool list. When ``StreamHandler.start()`` adopted that parked WebSocket it skipped a fresh ``session.update``, leaving the upstream session permanently unaware that the two Patter built-ins (``transfer_call`` / ``end_call``) existed — they silently no-op'd on every hit-prewarm call (~80% of outbound calls when prewarm is enabled). Extracted the canonical tool-list construction (user tools + ``transfer_call`` + ``end_call``) into a shared helper — ``build_realtime_tools()`` in Python and ``buildRealtimeTools()`` in TypeScript — and call it from both the live ``buildAIAdapter`` / ``StreamHandler.start()`` path and the warmup-side ``_build_realtime_warmup_adapter`` / ``buildRealtimeWarmupAdapter`` path so the two ``session.update`` bodies match byte-for-byte. Tests: 4 new regression tests (2 Py + 2 TS) verifying that the warmup adapter carries user-defined tools plus both built-ins, and that the built-ins are still injected when the agent declares no user tools. * fix(realtime): eliminate double-handshake on outbound prewarm (park does warmup work) Both ``_spawn_provider_warmup`` and ``_park_provider_connections`` built a transient ``OpenAIRealtimeAdapter`` and opened its own WebSocket against ``api.openai.com`` during the ringing window — two handshakes per outbound call where one suffices. The warmup-only handshake is a strict subset of what park performs (open WS → ``session.created`` → ``session.update`` → ``session.updated``) and park keeps the socket open for adoption. The warmup-side WS was opened, primed, and immediately discarded — pure waste of 150-400 ms of ringing-window budget, plus doubled rate-limit pressure against OpenAI for no benefit. Fix: ``_spawn_provider_warmup`` no longer builds the Realtime adapter at all; park is now the sole Realtime warm path on outbound calls. Pipeline-mode STT / TTS / LLM ``warmup()`` calls are unchanged. Tests: 2 new regression tests verify (1) ``_spawn_provider_warmup`` does not construct a Realtime adapter, and (2) end-to-end warmup+park together construct exactly one adapter (the one park uses). Updated 3 existing tests that asserted the old double-build behaviour. * fix(realtime): recreate adapter on adopt failure to avoid stale state When ``adopt_websocket`` / ``adoptWebSocket`` raised mid-adoption, the partially-adopted ``OpenAIRealtimeAdapter`` was left in an inconsistent state: ``_running`` / ``messageListenerAttached`` was already true, the heartbeat task may have started, ``_current_response_item_id`` / ``currentResponseItemId`` may have carried leaked state from the parked session, and the ``_ws`` / ``ws`` reference pointed at a now-closed socket. Falling through to ``connect()`` on that carcass raced ``session.created`` against stale state, ran two heartbeat timers, and sometimes attached a second message listener to the new socket — silent corruption of every adopt-failed call. Fix: when adopt raises, re-instantiate the adapter (via the existing ``adapter_kwargs`` in Python, ``deps.buildAIAdapter`` in TS) before the cold ``connect()`` path runs, guaranteeing a clean slate. Tests: regression test in each SDK constructs an adapter whose ``adopt_websocket`` throws, then asserts (a) a second adapter instance was created, (b) ``connect()`` ran on the fresh adapter, (c) the handler's adapter reference points at the fresh instance. * refactor(stream-handler): duck-type adoptWebSocket capability (drop instanceof) The TS realtime adopt branch in ``stream-handler.ts:initRealtimeAdapter`` previously gated the prewarm-handoff path with two ``this.adapter instanceof OpenAIRealtimeAdapter`` checks. Switched both to a single duck-type check (``typeof adoptWebSocket === 'function'``) so: 1. The generic ``stream-handler`` module stays provider-agnostic on this hot path. Pipeline-only users still get the symbol resolved at module load (the import is used elsewhere in this file for legitimate provider-specific behaviour), but the adopt-handoff gate no longer demands a concrete class identity. 2. The check mirrors the Python handler's ``getattr(self._adapter, "adopt_websocket", None)`` shape — both SDKs now use capability-based detection rather than identity. 3. Future Realtime-like adapters (e.g. a different vendor's all-in-one provider that also exposes ``adoptWebSocket``) can opt into the adopt flow simply by implementing the method, no SDK change needed. No behaviour change: the same WS-adopt path runs for the same adapter class. Existing adopt-handoff tests cover the behaviour and continue to pass.

With ``agent.prewarm=true`` (default) the OpenAI Realtime WebSocket is parked, primed, and adopted at call pickup with ``source=adopted ms=0``. The audio bridge is live the instant the callee answers, and the caller's "Hi" / "Hello?" reliably reaches OpenAI in the ~250-450 ms before the firstMessage audio starts streaming back. OpenAI's server-VAD treats that early caller audio as a barge-in and silently cancels the in-flight ``response.create``, so the configured ``first_message`` is never delivered. The cold ``connect()`` path masked the bug because the WS handshake naturally buffered ~300 ms of caller silence. Fix: ``send_first_message`` / ``sendFirstMessage`` now arm a one-shot server-VAD lockout. A ``session.update`` with ``turn_detection: null`` (OpenAI-documented: disables server-VAD entirely, no audio-driven response cancellation) is sent immediately before ``response.create``, then the receive loop / message listener restores the original ``turn_detection`` block (snapshotted from the configured ``vad_type`` / ``silence_duration_ms`` / ``threshold`` / ``prefix_padding_ms``) on the firstMessage ``response.done`` so barge-in works normally for every subsequent turn. The lockout is strictly one-shot. ``turn_detection: null`` was chosen over a temporary high ``silence_duration_ms`` because it is fully OpenAI-documented and guarantees zero server-side cancellation (timer-based fallbacks remain sensitive to clock skew on multi-second response.done windows). Complements the client-side ``firstAudioSentAt`` guard from PR #92 which prevents the local audio bridge from clearing the playout buffer on caller speech — this closes the same gap on the *server* side. Coverage: 3 new Python tests + 4 new TypeScript tests in the ``OpenAIRealtimeAdapter`` IO suites, covering lockout sequence, custom ``silence_duration_ms`` / ``vad_type`` restore, one-shot semantics, and no-ws no-op. Files: libraries/python/getpatter/providers/openai_realtime.py, libraries/typescript/src/providers/openai-realtime.ts, libraries/python/tests/unit/test_providers_io_unit.py, libraries/typescript/tests/unit/openai-realtime.test.ts, CHANGELOG.md.

PR #95 (cc9e51b) shipped the `session.update {turn_detection: null}` lockout for the OpenAI Realtime firstMessage on the prewarm-adopted path. A live outbound test after the rebuild still showed the original symptom (utente parla per primo, agent non saluta). This branch adds temporary `[DIAG-VAD]` INFO-level instrumentation so the next live test produces a deterministic root-cause trace. No behavioural changes. Log points (both SDKs, byte-for-byte parity): 1. `[DIAG-VAD] sent session.update turn_detection=null (lockout arm)` — right after the lockout `session.update` is enqueued. 2. `[DIAG-VAD] sent response.create (firstMessage)` — right after the response.create is enqueued. Pairing 1 and 2 vs the server echo in step 3 reveals whether the server applied the update before processing response.create. 3. `[DIAG-VAD] session.created|session.updated turn_detection=...` — every inbound `session.created` / `session.updated` event logs the server-acknowledged `turn_detection` value. This is the smoking gun for the "server silently ignores turn_detection=null" hypothesis reported in OpenAI community threads (see community.openai.com/t/error-turning-turn-detection-off-...). 4. `[DIAG-VAD] speech_started fired DURING lockout` — the canonical "server-VAD still active" smoking gun if observed while firstMessageProtectionPending is true. 5. `[DIAG-VAD] response.cancelled (firstMessageProtectionPending=...)` — server cancelled the response. If pending=true, the firstMessage was silently dropped (the exact symptom we're chasing). 6. `[DIAG-VAD] response.done received during lockout — restoring turn_detection` + `sent session.update turn_detection=<saved> (restore)` — success path. If we never see step 6, the firstMessage turn never completed. Diagnosis hypotheses being validated: - H1 (race): server processes response.create before the turn_detection session.update propagates to the VAD subsystem. Confirmed if the [DIAG-VAD] echo for `session.updated turn_detection =null` arrives AFTER `sent response.create`. - H2 (silent ignore): server-VAD remains active even after `turn_detection: null` because the payload is rejected silently. Confirmed if the [DIAG-VAD] session.updated echo never shows turn_detection=null, OR if speech_started fires during lockout. - H3 (npm link cache): the acceptance package was running stale code. RULED OUT: `releases/0.6.1/typescript/node_modules/getpatter` is a symlink to `libraries/typescript`, dist was freshly rebuilt May 12 19:23, and the fix string is present at chunk-D4XCC4FF.mjs line 603 (`session: { turn_detection: null }`). - H4 (reachability): `sendFirstMessage` not called on adopt path. RULED OUT: stream-handler.ts:2535 uses `instanceof OpenAIRealtimeAdapter`; the adopted path uses the same class (`adoptWebSocket` is a method on `OpenAIRealtimeAdapter`). Next steps: 1. Rebuild SDK: `cd libraries/typescript && npm run build` (already done; dist contains the new logs). 2. Re-run live outbound Realtime test with prewarm=default. Collect the full server log. 3. Search for `[DIAG-VAD]` lines in chronological order. The pattern tells us which hypothesis is correct. 4. Based on H1 / H2, the follow-up fix is either: - Wait for `session.updated` ack before sending `response.create` (fixes H1 race). - Try `turn_detection: {type: 'server_vad', interrupt_response: false, create_response: false}` instead of null (fixes H2 silent-ignore; documented per OpenAI guide). - Or fall back to a brief client-side audio gate (drop inbound audio for ~400 ms after `sendFirstMessage` returns) as belt-and-suspenders. Files: `libraries/python/getpatter/providers/openai_realtime.py`, `libraries/typescript/src/providers/openai-realtime.ts`, `CHANGELOG.md`. Coverage: unchanged — diagnostic logs are INFO-only and don't alter call flow. Existing test_providers_io_unit.py (98 passing) and openai-realtime.test.ts (30 passing) all green. The diagnostic logs MUST be removed once the root cause is pinned down.

nicolotognoni and others added 18 commits May 9, 2026 00:10

chore: fix trailing newline in cartesia-stt.ts

10a4bfa

Pre-commit end-of-file-fixer was failing on this single file. Trim extra blank line so file ends with exactly one '\n'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diag(0.6.1): Realtime firstMessage lockout — instrumentation + finding#99

diag(0.6.1): Realtime firstMessage lockout — instrumentation + finding#99
nicolotognoni wants to merge 18 commits into
mainfrom
diag/0.6.1-bug1-followup

nicolotognoni commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicolotognoni commented May 12, 2026

Summary

Investigation so far

Instrumentation

Implementation

Breaking change?

Next-test playbook for whoever picks this up

Test plan

Docs updates

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant