Cookbook/cookbooks revamp by SuhaniNagpal7 · Pull Request #657 · future-agi/docs

SuhaniNagpal7 · 2026-05-15T05:57:09Z

Pull Request

Description

Describe the changes in this pull request:

What feature/bug does this PR address?
Provide any relevant links or screenshots.

Checklist

Code compiles correctly.
Created/updated tests.
Linting and formatting applied.
Documentation updated.

Related Issues

Closes #<issue_number>

…and explainable steps - Replace TLDR with outcome-first intro paragraph naming the concrete outcomes - Add 'What is an eval correction loop?' section explaining the technique - Beef up each Step intro with the why before the how (surface-form evals miss domain rules; disagreements are the calibration signal; rule prompt has two jobs; same-batch re-scoring is the controlled variable; iteration discipline) - Replace samples with subtler failures (off-policy refund commits, in-support upsells) so the demo actually lands: is_helpful passes them, custom eval catches them, agreement jumps 50% to 100% - Add 'What you solved' closer with 4 specific pain to solution bullets - Document the full create_custom_evals response shape (status + result.eval_template_id) - Drop all em/en dashes per house style

…ntro and explainable steps - Replace TLDR with outcome-first intro paragraph naming the concrete outcomes - Add 'What is MCP?' section defining the protocol and FAGI's tool surface - Beef up each Step intro with the why before the how (config registration is the trust boundary; OAuth scopes are user-revocable; assistant reads tool descriptions to pick the right tool; chat carries context across turns; full debug loop in one IDE thread is the payoff) - Add 'What you solved' closer with 4 specific pain to solution bullets - Fix broken inline link: [agent.py](agent.py) -> backticked agent.py - Drop all em/en dashes per house style

…ncrete intro and explainable steps - Replace TLDR with outcome-first intro paragraph (21 containers, dashboard URL, local backend URL, smoke-test trace) leading with what the reader gets - Add 'What is the self-hosted stack?' section with the four-layer architecture (apps, databases, workflow engine, replication stack) named by purpose first and technology second so non-experts can skip the parens - Beef up each Step intro with the why before the how: * Step 1: the repo IS the install, no separate package manager * Step 2: the four secrets are baked into running services at boot * Step 3: docker compose up builds + starts in dependency order * Step 4: why three URLs (frontend + backend + PeerDB admin) * Step 5: a single trace is a smoke test for every layer - Split the env-config and Django-shell password tip across Steps 2 and 4 - Add jump-link from Step 2 to Step 4 for the no-Mailgun path - Add 'docker compose ps --format' to keep the health table scannable - Add 'What you solved' closer with 4 specific pain to solution bullets - Drop all em/en dashes per house style

…intro and explainable steps - Replace TLDR with outcome-first intro paragraph naming the concrete outcomes (exact + semantic caching enabled, paraphrases returning cached answers, bypass/invalidate path) plus the one-line app-code-change closer - Add 'What is Agent Command Center?' section explaining the gateway and the two cache tiers (L1 exact + L2 semantic) before diving into steps - Beef up each Step intro with the why before the how: * Step 1: baseline as a 'no caching' control; name the three response headers * Step 2: explain what L1 exact-match actually does (hash + store) * Step 3: paraphrased duplicates miss L1; vector embedding + similarity for L2 * Step 4: why a realistic mixed batch measures actual hit rate vs single hit * Step 5: two real bypass situations (prompt change, post-fix invalidation) - Add 'What you solved' closer with 4 specific pain to solution bullets - Fix dashboard navigation: 'Gateway -> Providers -> Cache' (not 'Agent Command Center -> Caching') matching the real UI - Clarify L1 Backend value (memory) vs the L1 layer (always exact-match) - Drop all em/en dashes per house style

…l-task and alert-monitor API - Four cost levers (sampling, judge model tiers, span filters, alert-driven attention) laid out as a table up front, then one explainable step per lever - Step 1: judge model comparison with a sanity-check snippet across the three Turing tiers; recommend turing_flash unless evidence justifies escalation - Step 2: filter to user-facing LLM spans before sampling so the math runs against the right denominator; embeds the Create Task UI screenshot - Step 3: sampling-rate cheatsheet keyed to daily LLM volume; sampling_rate + spans_limit combo for hard ceiling on traffic spikes - Step 4: alert-monitor wiring with percentage_change threshold so the only time a human looks at the dashboard is when a real regression fires - 'What you solved' closer with 4 specific pain to solution bullets - 'Eval task API' reference section with verified field names (project, evals, run_type, sampling_rate, spans_limit, filters) and status enum - All field names verified against EvalTaskSerializer and UserAlertMonitorSerializer - Colab + GitHub badges added - Drop all em/en dashes per house style

…injection, jailbreak, and social engineering testing - Outcome-first intro: regression suite of three adversarial classes with prompt_injection, answer_refusal, and is_harmful_advice evals scoring every response, plus a fail list of exact prompts that broke through - 'What is adversarial simulation?' section names the three attack classes (prompt injection, jailbreak, social engineering) with concrete patterns - Step 1: three custom hostile personas (Script Injector, DAN Jailbreaker, Social Engineer) with persona descriptions and behavioural settings - Step 2: scenario auto-generated via Workflow Builder, 30 conversation paths pairing personas with concrete attack situations - Step 3: attach three security evals to the run, with verified required-input signatures (prompt_injection: input only; answer_refusal: input+output; is_harmful_advice: output only) - Step 4: read the fail list and patch the system prompt with a paste-ready security-rules diff - 'What you solved' closer with 4 specific pain to solution bullets - Colab + GitHub badges linked to the matching notebook - Drop all em/en dashes per house style

Suhani Nagpal added 6 commits May 14, 2026 02:39

SuhaniNagpal7 requested a review from nik13 May 15, 2026 05:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cookbook/cookbooks revamp#657

Cookbook/cookbooks revamp#657
SuhaniNagpal7 wants to merge 6 commits into
devfrom
cookbook/cookbooks-revamp

SuhaniNagpal7 commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SuhaniNagpal7 commented May 15, 2026

Pull Request

Description

Checklist

Related Issues

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant