Skip to content

babelive/windows

Repository files navigation

Babelive (.NET)

CI Release .NET 9 Avalonia 12 Platform: Windows

Avalonia 12 desktop app that captures Windows audio output (any app's playback), streams it to OpenAI's realtime translation model (gpt-realtime-translate), renders the translated audio + dual-language transcript live in a desktop-lyric overlay, and optionally records the whole session to WAV + SRT for later replay. Windows-only today; the audio layer is being abstracted for a macOS port.

image

Stack

  • .NET 9 + Avalonia 12 (Fluent theme, Inter font fallback)
  • NAudioWasapiLoopbackCapture / Process Loopback for input, WasapiOut / WaveOutEvent for playback, WdlResamplingSampleProvider for high-quality 48 kHz → 24 kHz resampling, WaveFileWriter for recording
  • MessageBox.Avalonia — dialog replacement (Avalonia has no built-in MessageBox)
  • System.Net.WebSockets.ClientWebSocket (built-in) for the realtime API
  • System.Text.Json (built-in) for protocol serialization

How it works

[Any Windows app] ──► WasapiLoopbackCapture ──► downmix ──► WDL resample to 24 kHz ──► PCM16
                                                                                          │
                                                                                          ▼
                                                              ClientWebSocket → OpenAI Realtime
                                                                                          │
                          ┌───────────────────────────────────────────────────────────────┴───┐
                          ▼                                                                   ▼
              translated audio (PCM16)                                       dual transcript deltas
                          │                                                                   │
                          ▼                                                                   ▼
                    WasapiOut device                                Avalonia TextBox + LyricWindow overlay

Requirements

  • Windows 10 / 11
  • .NET 9 SDK
  • An OpenAI API key with access to gpt-realtime-translate

Setup & run

dotnet restore
dotnet run

On first launch, click the API… button in the settings panel and paste your sk-… key. The key is stored locally at %APPDATA%\Babelive\settings.json (plain JSON, never transmitted anywhere except to the configured API endpoint).

For a self-contained release build:

dotnet publish -c Release

Produces a single Babelive.exe at bin\Release\net9.0-windows\win-x64\publish\ (~80–90 MB, bundles the .NET 9 runtime + Avalonia/Skia/HarfBuzz native libs, single-file compressed). Just ship that one file.

Using it

  1. Pick a target language.
  2. Pick a Capture source — recommended is All system audio (no echo) which uses Win10 build 20348+ Process Loopback to exclude Babelive's own playback. Per-app entries (Teams, Chrome, Spotify, …) and legacy device loopbacks are also available.
  3. Pick a Playback device for the translated audio. Read the feedback warning below.
  4. Optional checkboxes:
    • Transcript only — silence the translation audio, keep just the on-screen subtitles.
    • Alt endpoint — fall back to the non-translations endpoint if your account doesn't have access to the dedicated one.
    • Echo suppress — pause API input while translation plays (prevents feedback at the cost of occasional model stalls).
    • Mute source — physically mute every speaker except Playback so you only hear the translation. Loopback still captures the source for the API (mute is downstream of the engine tap).
  5. Optional sliders (both support mouse-wheel adjust on hover):
    • Source volume — level source apps are ducked to during translation playback (5–100%, default 10%). At 100% the ducker is fully disabled.
    • Translation volume — PCM-level gain applied to translated audio (0–200%). 100% is unity; the OpenAI TTS is quieter than typical system audio so you'll often want 130–180%. Sliding this does not change Babelive's session volume in Windows Volume Mixer.
  6. Click Start (or the red ▶ on the lyric overlay), then play any video / call / song.

The settings window is hide-on-close — closing it leaves the lyric overlay + tray icon running. Exit from the tray menu (right-click the 译 icon) fully quits.

Lyric overlay

A transparent always-on-top desktop-lyric panel docks bottom-center on first launch. Hover to fade in the toolbar:

Button Action
▶ Start / ■ Stop Toggle translation
● Record / ■ Stop Toggle recording (auto-starts translation if not running)
A− / A+ Decrease / increase translation font size
🔉 / 🔊 Step translation volume by ±10%
Open settings window
Hide overlay (re-open from tray menu)

Drag the panel by any non-button area to move it. Double-click the top empty strip to snap between top / bottom of the screen. Drag the bottom-right grip to resize.

Recording

Press ● Record (lyric overlay or main window) to start saving. Press again to stop. If translation isn't running yet, recording auto-starts it.

Each Record → Stop cycle creates a fresh timestamped folder under %APPDATA%\Babelive\Recordings\{yyyy-MM-dd_HHmmss}\ containing:

source.wav                 ← original captured audio (24 kHz mono PCM16)
source.srt                 ← source-language transcript with timecodes
translation.<lang>.wav     ← model's translated audio (same format)
translation.<lang>.srt     ← target-language transcript with timecodes

The path is shown next to the Record button on the main window — click it to open the folder in Explorer.

SRT files share their base name with the matching WAV so any standard player auto-loads the subtitles when opened. The transcript splits cues at sentence terminators (., ?, !, , ?, !) plus a delta-arrival-gap heuristic (~800 ms) that catches sentence boundaries the model didn't punctuate.

Playing back recordings

Windows' built-in audio players don't render external SRT for .wav files — they treat audio as audio, no subtitle track. Two ways around it:

  • VLC — open source.wav, then Audio → Visualizations → Spectrum. The visualizer activates VLC's video output surface, which the subtitle renderer needs. Subtitles appear immediately.
  • mpvmpv source.wav shows subtitles without configuration. mpv is the most reliable choice for audio + external SRT on Windows.

Microsoft Teams / Skype audio

Teams and Skype set AUDCLNT_STREAMFLAGS_PREVENT_LOOPBACK_CAPTURE on their call audio for privacy, so Windows' Process Loopback API returns silence for them. Babelive auto-detects this and, if VB-CABLE is installed, redirects the Teams/Skype process tree to CABLE Input via IAudioPolicyConfig per-app routing, then loopback-captures from the cable. No manual Teams/Skype audio config needed.

Without VB-CABLE installed, Teams/Skype audio cannot be captured — this is a Windows DRM-style restriction, not a Babelive bug.

Zoom / Discord / Google Meet / WebEx / Slack use WebRTC and don't set the flag — they work via plain Process Loopback.

⚠️ Feedback loop warning

If translated audio plays through the same speakers you're capturing, the loopback re-translates it forever. Three fixes:

  1. Use headphones for playback (different physical device than the captured speakers).
  2. Install VB-CABLE — free virtual audio cable. Send the source app's output to CABLE Input; Babelive can then loopback-capture the cable while playing translation through your real speakers / headphones.
  3. Tick "Transcript only" — only spoken text appears, nothing replays.

The recommended All system audio (no echo) capture mode also fixes this — it uses Process Loopback to exclude Babelive's own playback from the captured stream, so even with Translation playing on the same device the API never re-hears it.

API quirks / things that may need tuning

The realtime translation API is new. The exact event/field names in Translation/RealtimeTranslatorClient.cs are best-effort based on https://developers.openai.com/api/docs/guides/realtime-translation plus the standard /v1/realtime event conventions. If your account sees errors:

  • Endpoint: defaults to wss://api.openai.com/v1/realtime/translations?model=gpt-realtime-translate. Tick "Alt endpoint" in the UI to fall back to wss://api.openai.com/v1/realtime?model=gpt-realtime-translate.
  • Session config: RealtimeTranslatorClient.SendSessionUpdateAsync sends session.update with input_audio_format=pcm16, output_audio_format=pcm16, and translation.target_language=<code>. Adjust if the official schema differs.
  • Event names: Dispatch matches both the output_*.delta and response.output_*.delta shapes. If transcripts/audio don't arrive, log every incoming event and adjust.

Quick sanity test

Open YouTube in any non-target language, hit Start, and the translation should start streaming into the lyric overlay (and the settings window's transcript panes) within a second or two of the source audio playing.

About

实时同声传译 Windows 的全部声音, 带字幕

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages