Babelive (.NET)

Avalonia 12 desktop app that captures Windows audio output (any app's playback), streams it to OpenAI's realtime translation model (gpt-realtime-translate), renders the translated audio + dual-language transcript live in a desktop-lyric overlay, and optionally records the whole session to WAV + SRT for later replay. Windows-only today; the audio layer is being abstracted for a macOS port.

Stack

.NET 9 + Avalonia 12 (Fluent theme, Inter font fallback)
NAudio — WasapiLoopbackCapture / Process Loopback for input, WasapiOut / WaveOutEvent for playback, WdlResamplingSampleProvider for high-quality 48 kHz → 24 kHz resampling, WaveFileWriter for recording
MessageBox.Avalonia — dialog replacement (Avalonia has no built-in MessageBox)
System.Net.WebSockets.ClientWebSocket (built-in) for the realtime API
System.Text.Json (built-in) for protocol serialization

How it works

[Any Windows app] ──► WasapiLoopbackCapture ──► downmix ──► WDL resample to 24 kHz ──► PCM16
                                                                                          │
                                                                                          ▼
                                                              ClientWebSocket → OpenAI Realtime
                                                                                          │
                          ┌───────────────────────────────────────────────────────────────┴───┐
                          ▼                                                                   ▼
              translated audio (PCM16)                                       dual transcript deltas
                          │                                                                   │
                          ▼                                                                   ▼
                    WasapiOut device                                Avalonia TextBox + LyricWindow overlay

Requirements

Windows 10 / 11
.NET 9 SDK
An OpenAI API key with access to gpt-realtime-translate

Setup & run

dotnet restore
dotnet run

On first launch, click the API… button in the settings panel and paste your sk-… key. The key is stored locally at %APPDATA%\Babelive\settings.json (plain JSON, never transmitted anywhere except to the configured API endpoint).

For a self-contained release build:

dotnet publish -c Release

Produces a single Babelive.exe at bin\Release\net9.0-windows\win-x64\publish\ (~80–90 MB, bundles the .NET 9 runtime + Avalonia/Skia/HarfBuzz native libs, single-file compressed). Just ship that one file.

Using it

Pick a target language.
Pick a Capture source — recommended is All system audio (no echo) which uses Win10 build 20348+ Process Loopback to exclude Babelive's own playback. Per-app entries (Teams, Chrome, Spotify, …) and legacy device loopbacks are also available.
Pick a Playback device for the translated audio. Read the feedback warning below.
Optional checkboxes:
- Transcript only — silence the translation audio, keep just the on-screen subtitles.
- Alt endpoint — fall back to the non-translations endpoint if your account doesn't have access to the dedicated one.
- Echo suppress — pause API input while translation plays (prevents feedback at the cost of occasional model stalls).
- Mute source — physically mute every speaker except Playback so you only hear the translation. Loopback still captures the source for the API (mute is downstream of the engine tap).
Optional sliders (both support mouse-wheel adjust on hover):
- Source volume — level source apps are ducked to during translation playback (5–100%, default 10%). At 100% the ducker is fully disabled.
- Translation volume — PCM-level gain applied to translated audio (0–200%). 100% is unity; the OpenAI TTS is quieter than typical system audio so you'll often want 130–180%. Sliding this does not change Babelive's session volume in Windows Volume Mixer.
Click Start (or the red ▶ on the lyric overlay), then play any video / call / song.

The settings window is hide-on-close — closing it leaves the lyric overlay + tray icon running. Exit from the tray menu (right-click the 译 icon) fully quits.

Lyric overlay

A transparent always-on-top desktop-lyric panel docks bottom-center on first launch. Hover to fade in the toolbar:

Button	Action
▶ Start / ■ Stop	Toggle translation
● Record / ■ Stop	Toggle recording (auto-starts translation if not running)
A− / A+	Decrease / increase translation font size
🔉 / 🔊	Step translation volume by ±10%
⚙	Open settings window
✕	Hide overlay (re-open from tray menu)

Drag the panel by any non-button area to move it. Double-click the top empty strip to snap between top / bottom of the screen. Drag the bottom-right grip to resize.

Recording

Press ● Record (lyric overlay or main window) to start saving. Press again to stop. If translation isn't running yet, recording auto-starts it.

Each Record → Stop cycle creates a fresh timestamped folder under %APPDATA%\Babelive\Recordings\{yyyy-MM-dd_HHmmss}\ containing:

source.wav                 ← original captured audio (24 kHz mono PCM16)
source.srt                 ← source-language transcript with timecodes
translation.<lang>.wav     ← model's translated audio (same format)
translation.<lang>.srt     ← target-language transcript with timecodes

The path is shown next to the Record button on the main window — click it to open the folder in Explorer.

SRT files share their base name with the matching WAV so any standard player auto-loads the subtitles when opened. The transcript splits cues at sentence terminators (., ?, !, 。, ?, !) plus a delta-arrival-gap heuristic (~800 ms) that catches sentence boundaries the model didn't punctuate.

Playing back recordings

Windows' built-in audio players don't render external SRT for .wav files — they treat audio as audio, no subtitle track. Two ways around it:

VLC — open source.wav, then Audio → Visualizations → Spectrum. The visualizer activates VLC's video output surface, which the subtitle renderer needs. Subtitles appear immediately.
mpv — mpv source.wav shows subtitles without configuration. mpv is the most reliable choice for audio + external SRT on Windows.

Microsoft Teams / Skype audio

Teams and Skype set AUDCLNT_STREAMFLAGS_PREVENT_LOOPBACK_CAPTURE on their call audio for privacy, so Windows' Process Loopback API returns silence for them. Babelive auto-detects this and, if VB-CABLE is installed, redirects the Teams/Skype process tree to CABLE Input via IAudioPolicyConfig per-app routing, then loopback-captures from the cable. No manual Teams/Skype audio config needed.

Without VB-CABLE installed, Teams/Skype audio cannot be captured — this is a Windows DRM-style restriction, not a Babelive bug.

Zoom / Discord / Google Meet / WebEx / Slack use WebRTC and don't set the flag — they work via plain Process Loopback.

⚠️ Feedback loop warning

If translated audio plays through the same speakers you're capturing, the loopback re-translates it forever. Three fixes:

Use headphones for playback (different physical device than the captured speakers).
Install VB-CABLE — free virtual audio cable. Send the source app's output to CABLE Input; Babelive can then loopback-capture the cable while playing translation through your real speakers / headphones.
Tick "Transcript only" — only spoken text appears, nothing replays.

The recommended All system audio (no echo) capture mode also fixes this — it uses Process Loopback to exclude Babelive's own playback from the captured stream, so even with Translation playing on the same device the API never re-hears it.

API quirks / things that may need tuning

The realtime translation API is new. The exact event/field names in Translation/RealtimeTranslatorClient.cs are best-effort based on https://developers.openai.com/api/docs/guides/realtime-translation plus the standard /v1/realtime event conventions. If your account sees errors:

Endpoint: defaults to wss://api.openai.com/v1/realtime/translations?model=gpt-realtime-translate. Tick "Alt endpoint" in the UI to fall back to wss://api.openai.com/v1/realtime?model=gpt-realtime-translate.
Session config: RealtimeTranslatorClient.SendSessionUpdateAsync sends session.update with input_audio_format=pcm16, output_audio_format=pcm16, and translation.target_language=<code>. Adjust if the official schema differs.
Event names: Dispatch matches both the output_*.delta and response.output_*.delta shapes. If transcripts/audio don't arrive, log every incoming event and adjust.

Quick sanity test

Open YouTube in any non-target language, hit Start, and the translation should start streaming into the lyric overlay (and the settings window's transcript panes) within a second or two of the source audio playing.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
Audio		Audio
Styles		Styles
Translation		Translation
.gitattributes		.gitattributes
.gitignore		.gitignore
ApiSettingsWindow.axaml		ApiSettingsWindow.axaml
ApiSettingsWindow.axaml.cs		ApiSettingsWindow.axaml.cs
App.axaml		App.axaml
App.axaml.cs		App.axaml.cs
AppIcon.cs		AppIcon.cs
AppSettings.cs		AppSettings.cs
Babelive.csproj		Babelive.csproj
LanguageCodes.cs		LanguageCodes.cs
LyricWindow.axaml		LyricWindow.axaml
LyricWindow.axaml.cs		LyricWindow.axaml.cs
MainWindow.axaml		MainWindow.axaml
MainWindow.axaml.cs		MainWindow.axaml.cs
Program.cs		Program.cs
README.md		README.md
TrayIconHost.cs		TrayIconHost.cs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Babelive (.NET)

Stack

How it works

Requirements

Setup & run

Using it

Lyric overlay

Recording

Playing back recordings

Microsoft Teams / Skype audio

⚠️ Feedback loop warning

API quirks / things that may need tuning

Quick sanity test

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Babelive (.NET)

Stack

How it works

Requirements

Setup & run

Using it

Lyric overlay

Recording

Playing back recordings

Microsoft Teams / Skype audio

⚠️ Feedback loop warning

API quirks / things that may need tuning

Quick sanity test

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages