Skip to content

HackingPain/TranscribeX

Repository files navigation

TranscribeX v1.2

A fast, accurate desktop app for transcribing audio and video files. Powered by faster-whisper with a clean PyQt6 interface.

Runs on CPU by default for maximum compatibility, with optional GPU acceleration on NVIDIA CUDA systems.


Features

  • Accurate Transcription - Powered by faster-whisper (CTranslate2 Whisper implementation)
  • Multiple Export Formats - .txt, .docx, .srt (SubRip), .vtt (WebVTT)
  • 40+ Languages - Auto-detect or force a specific language
  • Drag and Drop - Drop media files directly onto the window
  • Timestamps - Optional timestamp insertion in text/docx output
  • GPU Acceleration - NVIDIA CUDA with automatic CPU fallback
  • Voice Activity Detection - Skip silence to speed up processing (requires onnxruntime)
  • Elapsed Time / ETA - Real-time progress with time remaining estimate
  • Settings Persistence - Remembers your preferences between sessions
  • Same-as-Source Output - Save transcripts next to the original files
  • File List Management - See, review, and clear selected files
  • Dark Theme - Modern dark UI with styled controls
  • Desktop Notifications - Get notified when transcription finishes
  • Per-File Error Recovery - Corrupt files are skipped; the batch continues

Supported Input Formats

MP3, MP4, M4A, WAV, FLAC, AAC, OGG, WMA, MOV, WEBM, MKV


Requirements

  • Windows 10/11 (Linux/macOS possible from source)
  • Python 3.9-3.12 (if running from source)
  • FFmpeg on PATH (required for media decoding)

Installation (from source)

# Clone and enter directory
git clone https://github.com/HackingPain/TranscribeX.git
cd TranscribeX

# Create and activate virtual environment
python -m venv .venv
# Windows
.\.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run
python transcribex.py

Building a Standalone Executable

# Windows only
build.bat
# Output: dist\TranscribeX.exe

Project Structure

transcribex.py    Entry point, FFmpeg check
constants.py      Configuration, defaults, TranscribeJob dataclass
worker.py         Background transcription thread
gui.py            PyQt6 GUI window
requirements.txt  Python dependencies
pyinstaller.spec  Build configuration
build.bat         Windows build script

Model Sizes

Model Speed Accuracy RAM
tiny Fastest Lower ~1 GB
base Fast Fair ~1 GB
small Medium Good ~2 GB
medium Slow Better ~5 GB
large-v3 Slowest Best ~10 GB

Models are downloaded automatically on first use from HuggingFace.


License

MIT

About

Fast, accurate desktop app for transcribing audio/video to TXT, DOCX, SRT, and VTT. Powered by faster-whisper with a modern PyQt6 dark-themed GUI. CPU by default, optional GPU acceleration.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors