Skip to content

3libs/jerry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Jerry — Discord Voice Assistant Bot

Jerry is a Discord voice bot that listens in voice channels, transcribes speech, generates replies via a local LLM, and speaks back using Google's Gemini TTS. Built for AMD GPUs via DirectML.

Stack

Component Technology
STT OpenAI Whisper (ONNX, runs on AMD GPU via DirectML)
LLM Ollama — local model (default: mistral)
TTS Google Gemini gemini-2.5-flash-preview-tts (REST API, voice: Charon)
Audio playback FFmpeg via discord.py

Prerequisites

  • Python 3.10+
  • FFmpeg installed and on PATH (or set FFMPEG_PATH in .env)
  • Ollama running locally with your chosen model pulled
  • AMD GPU (RX 6000/7000 series recommended) with up-to-date drivers for DirectML
  • A Google Gemini API key (free tier works, no rate limit on TTS model)

Setup

1. Clone the repo

git clone https://github.com/YOUR_USERNAME/jerry-discord-bot.git
cd jerry-discord-bot

2. Create and activate a virtual environment

python -m venv venv

# Windows
venv\Scripts\activate

# Linux/macOS
source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Download the Whisper ONNX model

The bot expects a whisper-small-onnx/ directory (or whichever size you configure). Export it using Optimum:

optimum-cli export onnx --model openai/whisper-small --task automatic-speech-recognition ./whisper-small-onnx

Change small to tiny, base, medium, etc. if you want a different model size. Update WHISPER_MODEL_SIZE in .env to match.

5. Pull your Ollama model

ollama pull mistral

6. Create .env

Copy the example and fill in your values:

cp .env.example .env
DISCORD_TOKEN=your_discord_bot_token_here
GEMINI_API_KEY=your_gemini_api_key_here
OLLAMA_MODEL=mistral
WHISPER_MODEL_SIZE=small
FFMPEG_PATH=ffmpeg

7. Run the bot

python bot.py

Bot Commands

Command Description
!join Jerry joins your current voice channel and starts listening
!leave Jerry leaves the voice channel
!clear Clears Jerry's conversation history for this server

Usage

  1. Use !join in a text channel while you're in a voice channel
  2. Say "Jerry" (or a phonetic variant — see below) followed by your request
  3. Jerry transcribes your speech, generates a reply via Ollama, and speaks it back using Gemini TTS

Wake word variants supported (handles different accents): jerry, gerry, gary, jeri, gerri, sherry, terry, jury, and many more phonetic variants.


What Jerry Can Do

Jerry has full administrator capabilities in your Discord server. Speak naturally — for example:

  • "Jerry, how are you doing?" — casual chat
  • "Jerry, send a message in general saying game night is at 9"
  • "Jerry, move Ari to the AFK channel"
  • "Jerry, mute John"
  • "Jerry, kick that guy"
  • "Jerry, create a voice channel called chill zone"
  • "Jerry, create an event called Tetris Night on Friday at 10pm"
  • "Jerry, give Dave the Moderator role"

Available Actions

Action What it does
send_message Send a text message to a channel
send_dm Send a DM to a user
move_user Move a user to a voice channel
disconnect_user Disconnect a user from voice
kick_user / ban_user Kick or ban a user
mute_user / unmute_user Server mute/unmute
deafen_user / undeafen_user Server deafen/undeafen
rename_channel Rename a text or voice channel
create_text_channel / create_voice_channel Create channels
delete_channel Delete a channel
create_role / delete_role Manage roles
assign_role / remove_role Assign/remove roles from users
rename_server Rename the server
rename_user Change a user's nickname
create_event / delete_event Manage scheduled events

Configuration Reference (.env)

Variable Default Description
DISCORD_TOKEN (required) Your Discord bot token
GEMINI_API_KEY (required) Google Gemini API key
OLLAMA_MODEL mistral Any model pulled in Ollama (e.g. llama3, phi3)
WHISPER_MODEL_SIZE small Whisper model size: tiny, base, small, medium
FFMPEG_PATH ffmpeg Full path to ffmpeg binary if not on PATH

Discord Bot Setup

  1. Go to Discord Developer Portal
  2. New Application → give it a name → go to Bot tab
  3. Copy the token → paste into .env as DISCORD_TOKEN
  4. Under Bot: enable Server Members Intent and Message Content Intent
  5. Under OAuth2 → URL Generator: select scopes bot + applications.commands
  6. Bot permissions needed: Send Messages, Connect, Speak, Mute Members, Deafen Members, Move Members, Kick Members, Ban Members, Manage Channels, Manage Roles, Manage Nicknames, Manage Events
  7. Use the generated URL to invite the bot to your server

Architecture

User speaks in VC
      │
      ▼
AssistantSink (discord-ext-voice-recv)
  — buffers PCM audio per user
  — detects silence (0.8s timeout)
      │
      ▼
Whisper ONNX (DirectML / AMD GPU)
  — transcribes audio to text
      │
      ▼
Wake word detection ("Jerry" + variants)
      │
      ▼
Ollama (local LLM, streaming)           ←── streams tokens as they arrive
  — generates JSON: {reply, action, params}
      │
      ├── sentences → Gemini TTS (gemini-2.5-flash-preview-tts)
      │                  — synthesizes each sentence as it arrives
      │                  — plays via FFmpeg in Discord VC
      │
      └── actions → execute_action()
                      — Discord API calls (mute, move, create channel, etc.)

Concurrency model:

  • _whisper_executor — single-threaded, DirectML GPU (not thread-safe)
  • _ollama_executor — single-threaded, Ollama streaming
  • _tts_executor — 2 threads, Gemini TTS REST calls (allows prefetch of next sentence)
  • Ollama streaming and TTS playback run concurrently — Jerry starts speaking the first sentence while still generating the rest

Troubleshooting

Bot joins but never responds:

  • Check that Ollama is running: ollama serve
  • Check your GEMINI_API_KEY is set correctly

TTS audio sounds wrong / garbled:

  • The Gemini TTS model returns 24kHz mono 16-bit PCM. If audio sounds off, check that FFmpeg is up to date.

Wake word not detected:

  • Try speaking "Jerry" clearly — the bot supports many phonetic variants
  • Check the console for [username] Said: ... to see what Whisper transcribed

Whisper model not found:

  • Make sure whisper-small-onnx/ (or your configured size) exists in the project directory
  • Re-run the optimum-cli export command from step 4

AMD GPU not used for Whisper:

  • Ensure onnxruntime-directml is installed (not just onnxruntime)
  • Update your AMD drivers

.env.example

DISCORD_TOKEN=
GEMINI_API_KEY=
OLLAMA_MODEL=mistral
WHISPER_MODEL_SIZE=small
FFMPEG_PATH=ffmpeg

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages