A multi-agent AI system that watches your live stream and reacts in real time — roasting bad plays, hyping victories, triggering sound effects, and firing visual overlays. Three autonomous agents coordinate via A2A (Agent-to-Agent) communication to create a fully automated co-host experience.
Built by Sudharshan Ramesh & Tanish Vardhineni
sr7431@nyu.edu/sudharshanthunk@gmail.com and tv2291@nyu.edu
┌─────────────┐ WebSocket (video + audio) ┌──────────────────────────┐
│ │◄──────────────────────────────────►│ Agent 1: Co-Host │
│ Frontend │ │ Gemini Live API │
│ (React) │◄─── SSE (sound triggers) ─────────│──── A2A ──►┌────────────┐│
│ │◄─── SSE (visual triggers) ────────│──── A2A ──►│Sound Board ││
│ │ │ │Hype Producer│
└─────────────┘ └────────────└────────────┘┘
- Frontend captures video frames at ~1 FPS and streams them over WebSocket to the Co-Host
- Co-Host (Gemini Live API) watches the frames, decides when to react, and speaks back with native AI-generated voice
- When the Co-Host decides a moment needs a sound effect or visual, it fires a function call that becomes an A2A task to the Sound Board or Hype Producer
- Sound Board and Hype Producer have their own pacing logic — cooldowns, energy tracking, and the ability to refuse if effects are too frequent
- Effects are pushed to the frontend via SSE and rendered as Web Audio / CSS animations
- Runs on Gemini Live API with native audio output — speaks with a real AI voice, not TTS
- Receives continuous video frames through a persistent WebSocket connection
- Personality-prompted as a witty, sarcastic co-host: roasts bad plays, hypes good ones, reacts to what the streamer does and says
- Uses proactive audio — waits for natural pauses, doesn't talk over the streamer
- Delegates all sound/visual effects via function calls to the other agents
- Receives A2A tasks from the Co-Host with context about what just happened
- Has its own decision-making: picks the best sound from its library based on mood (roast, hype, neutral)
- Enforces cooldowns to prevent sound spam
- Can refuse requests if a sound was just played
- Library: airhorn, sad trombone, bruh, vine boom, crickets, crowd cheer, fail horn, dramatic reverb, MLG hitmarker, emotional damage
- Receives A2A tasks from the Co-Host for visual moments
- Manages overlays: confetti bursts, screen shake, "L" / "W" graphic overlays, zoom effects
- Tracks stream energy level — escalates effects when hype builds, pulls back when things are calm
- Enforces pacing: one visual at a time, minimum intervals between effects
| Component | Technology |
|---|---|
| AI Model | Gemini Flash Live (native audio) via Live API |
| AI SDK | @google/genai TypeScript SDK with Vertex AI |
| Agent Communication | A2A over HTTP, function calling for delegation |
| Co-Host Server | Node.js, Express, WebSocket (ws) |
| Sound Board / Hype Producer | Node.js, Express, Server-Sent Events (SSE) |
| Frontend | React 19, Vite, Tailwind CSS, Web Audio API |
| Audio Playback | Raw PCM streaming at 24kHz via Web Audio API |
| Deployment | Google Cloud Run (4 services) |
| Auth | Google Cloud Application Default Credentials (ADC) |
PROJECT=your-gcp-project
REGION=us-central1
# 1. Deploy backend agents
gcloud run deploy sound-board --source agents/sound-board --region $REGION \
--allow-unauthenticated --set-env-vars GOOGLE_PROJECT_ID=$PROJECT
gcloud run deploy hype-producer --source agents/hype-producer --region $REGION \
--allow-unauthenticated --set-env-vars GOOGLE_PROJECT_ID=$PROJECT
# 2. Deploy co-host with agent URLs
SOUND_URL=$(gcloud run services describe sound-board --region $REGION --format 'value(status.url)')
HYPE_URL=$(gcloud run services describe hype-producer --region $REGION --format 'value(status.url)')
gcloud run deploy co-host --source agents/co-host --region $REGION \
--allow-unauthenticated --session-affinity \
--set-env-vars GOOGLE_PROJECT_ID=$PROJECT,SOUND_BOARD_URL=${SOUND_URL}/tasks,HYPE_PRODUCER_URL=${HYPE_URL}/tasks
# 3. Deploy frontend with all backend URLs
CO_HOST_URL=$(gcloud run services describe co-host --region $REGION --format 'value(status.url)')
gcloud run deploy frontend --source frontend --region $REGION --allow-unauthenticated \
--set-build-env-vars VITE_CO_HOST_WS_URL=wss://${CO_HOST_URL#https://},VITE_SOUND_BOARD_URL=$SOUND_URL,VITE_HYPE_PRODUCER_URL=$HYPE_URLnpm install && cd agents/co-host && npm install && cd ../sound-board && npm install && cd ../hype-producer && npm install && cd ../../frontend && npm install && cd ..
cp .env.example .env # Set GOOGLE_PROJECT_ID
npm run dev # Starts all 4 servicesOpen the frontend, upload a video, hit play — the AI co-host starts watching and reacting.