PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
-
Updated
May 5, 2026 - Python
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
Extract structured data from local or remote LLM models
Reproducible diagnostic investigation of a fine-tuned SLM that scored 99.75% on evaluation and failed silently on 10% of production inputs. Full pipeline. Every number verified.
Claude Code Skill for structured information extraction from code/docs/logs. 6-step Python pipeline (source grounding, dedup, confidence scoring, entity resolution, relation inference, KG injection). Zero dependencies, no API keys. Replaces LangExtract.
A simple llm library
Collection of purpose-built MCP servers for AI agent workflows.
news-summizr extracts structured summaries from headlines, labeling key points like announcement, products, region for quick insight.
A new package is designed to facilitate structured, reliable extraction of key insights from user-provided texts about cultural topics. It accepts a text input, such as an article or discussion prompt
Turn tutorial videos into structured specs — Pine Script, recipes, code walkthroughs
Automated research paper analysis: PDF → JSON with evidence extraction using LLMs (DeepSeek, Gemma). Extracts methods, results, datasets, and claims with precise evidence grounding.
Auditable LLM extraction for Java: structured output with source citations, PDF bounding boxes, confidence, provenance, and audit JSON.
Automated prompt optimization using mentor-agent architecture. Generate and refine prompts from labeled data.
Robust extraction of structured signals from messy unstructured text. Hybrid LLM + tool-use schema + source span linking + eval harness.
Human-in-the-loop LLM orchestration with structured signal extraction and session persistence. Annotate confusion and curiosity—feedback shapes responses, topology accumulates over time. API-first design, no gamification. FastAPI + Claude + SQLite + D3.
Source content for Vstorm blog posts—carefully crafted to provide both depth and clarity, with practical insights readers can apply immediately.
Evaluate local LLM accuracy on structured data extraction. Tests models' ability to extract JSON from unstructured text with ground-truth comparison, F1 scoring, and fuzzy matching. Supports MLX and Ollama backends. Generates interactive reports with charts and per-model analysis.
Multilingual structured OCR (11+ languages, CJK-tuned) — MCP server with verified per-character bboxes for AI agents
AI-agent-driven venue governance database. Extracts editorial boards and program committees from journal websites using local LLMs, with entity resolution against OpenAlex.
AI-assisted PDF/DOCX packet structuring workflow with source citations, semantic retrieval, deterministic validation, and reviewer-facing run sheets.
Add a description, image, and links to the structured-extraction topic page so that developers can more easily learn about it.
To associate your repository with the structured-extraction topic, visit your repo's landing page and select "manage topics."