PodGraph

Status: paused. Active development has stopped. The CLI prototype works as documented below; the project did not advance into Phase 1 (Next.js + Postgres). Code is preserved here as a reference and starting point if work resumes.

A pipeline that ingests podcast interviews, transcribes them with speaker diarization, and builds AI-generated profiles of the people who appear — entirely from their own words across all their appearances.

The product is the person page: a single research surface where someone's worldview, conviction-ranked positions, taste clusters, and deep-dive topics are synthesized across every episode they've been on. Episodes are internal data units, not destinations.

The cardinal rule

Every fact on a person's page comes from that person's own transcribed words. No Wikipedia, no LinkedIn, no external bios. The only outside data is the person's name and basic deduplication metadata. This constraint is non-negotiable.

Current status

Phase 0 + 0.5: complete (CLI prototype). The Andrew Huberman profile across 12 podcast appearances is the primary test case — 81 themes, 31 convictions, 130 tools, full timestamp-linked attribution.

What works today (run as TypeScript CLI scripts):

5-stage episode pipeline — transcribe → correct → identify speakers → extract → registry update
4-pass extraction by default (segmentation + entities in parallel via Haiku, theme synthesis from summaries via Sonnet, targeted quote selection via Haiku)
Post-extraction quote correction against raw transcript utterances (no API calls)
Lex Fridman fast path that scrapes pre-made transcripts (skips Deepgram)
Person aggregation — semantic theme merging, conviction extraction & ranking, worldview synthesis, deep-on badge identification, taste clustering
Static HTML profile pages with collapsible sections and timestamp-linked quotes
Per-step cost tracking (costs.json ledger per episode)

Originally planned next: profile a second person, add more podcast feeds, then start Phase 1 (Next.js + PostgreSQL + Prisma + BullMQ). Not in progress — see status banner above.

Tech stack

TypeScript + tsx (no build step in the prototype)
Deepgram Nova-3 — transcription with speaker diarization
Anthropic Claude — extraction (Haiku + Sonnet, multi-pass), aggregation (Sonnet + Opus where nuance matters)
Google Gemini — A/B comparison for extraction quality
Zod — runtime schema validation with retry-on-failure
yt-dlp — YouTube audio download

Phase 1 will add Next.js (App Router), PostgreSQL, Prisma, BullMQ + Redis.

Quick start

git clone https://github.com/dstrunin/PodGraph.git
cd PodGraph
npm install

cp .env.example .env
# Edit .env and set DEEPGRAM_API_KEY and ANTHROPIC_API_KEY
npm run test-keys     # verify both keys authenticate

End-to-end on a single episode:

# Register a podcast feed (uses iTunes Search API)
npm run add-podcast -- "Lex Fridman"

# Find appearances by a person
npm run discover -- "Andrew Huberman"

# Run the full pipeline on a YouTube or direct audio URL
npm run pipeline -- "https://www.youtube.com/watch?v=VIDEO_ID"

# After processing 1+ episodes for someone, build their profile
npm run aggregate -- "Andrew Huberman"
npm run build-profile -- "Andrew Huberman"   # → output/andrew-huberman.html

For the full command reference, prompts, data file layout, and operational tips, see PODGRAPH_PIPELINE_GUIDE.md.

Project layout

PodGraph/
├── scripts/                  # CLI pipeline (current implementation)
│   ├── pipeline.ts           # End-to-end episode pipeline
│   ├── transcribe.ts         # Deepgram Nova-3 + diarization
│   ├── correct-transcript.ts # Claude proper-noun correction
│   ├── identify-speakers.ts  # Claude maps speakers to real names
│   ├── extract-4pass.ts      # 4-pass extraction (default)
│   ├── extract-multipass.ts  # 2-pass extraction (legacy)
│   ├── correct-quotes.ts     # Programmatic quote verification
│   ├── validate-extraction.ts# Cross-reference + accuracy checks
│   ├── aggregate.ts          # Person aggregation pipeline
│   ├── build-profile.ts      # Static HTML profile page
│   ├── lex/                  # Lex Fridman fast path
│   └── lib/                  # Shared schemas, manifest, dirs, cost ledger
├── prompts/                  # All Claude prompts (one file each)
├── data/
│   ├── episodes/             # Per-episode artifacts (gitignored)
│   ├── profiles/             # Aggregated person profiles
│   ├── entities.json         # Global entity registry
│   ├── corrections-global.json
│   └── manifest.json         # Processed-episode index
└── .env.example

Cost profile

Per episode (rough): $0.80–$2.40 depending on length, mostly Deepgram + multi-pass Claude. Aggregation is ~$0.12 per person across 5 Claude calls.

npm run costs aggregates spending across every episode and profile from the per-episode costs.json ledgers.

Documentation

File	What it covers
PODGRAPH_PIPELINE_GUIDE.md	Full CLI reference, prompts, data files, operational tips
podgraph-roadmap-revised.md	Source-of-truth architecture and full implementation plan
TODO.md	Current task list across all phases
CASE_STUDY.md	Portfolio narrative — problem, architecture, tradeoffs, challenges
CLAUDE.md	Project instructions for Claude / AI agents

License

ISC.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PodGraph

The cardinal rule

Current status

Tech stack

Quick start

Project layout

Cost profile

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
prompts		prompts
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CASE_STUDY.md		CASE_STUDY.md
CLAUDE.md		CLAUDE.md
PODGRAPH_PIPELINE_GUIDE.md		PODGRAPH_PIPELINE_GUIDE.md
README.md		README.md
TODO.md		TODO.md
package-lock.json		package-lock.json
package.json		package.json
podgraph-roadmap-revised.md		podgraph-roadmap-revised.md
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

PodGraph

The cardinal rule

Current status

Tech stack

Quick start

Project layout

Cost profile

Documentation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages