Cookbook/cookbooks revamp#657
Open
SuhaniNagpal7 wants to merge 6 commits into
Open
Conversation
added 6 commits
May 14, 2026 02:39
…and explainable steps - Replace TLDR with outcome-first intro paragraph naming the concrete outcomes - Add 'What is an eval correction loop?' section explaining the technique - Beef up each Step intro with the why before the how (surface-form evals miss domain rules; disagreements are the calibration signal; rule prompt has two jobs; same-batch re-scoring is the controlled variable; iteration discipline) - Replace samples with subtler failures (off-policy refund commits, in-support upsells) so the demo actually lands: is_helpful passes them, custom eval catches them, agreement jumps 50% to 100% - Add 'What you solved' closer with 4 specific pain to solution bullets - Document the full create_custom_evals response shape (status + result.eval_template_id) - Drop all em/en dashes per house style
…ntro and explainable steps - Replace TLDR with outcome-first intro paragraph naming the concrete outcomes - Add 'What is MCP?' section defining the protocol and FAGI's tool surface - Beef up each Step intro with the why before the how (config registration is the trust boundary; OAuth scopes are user-revocable; assistant reads tool descriptions to pick the right tool; chat carries context across turns; full debug loop in one IDE thread is the payoff) - Add 'What you solved' closer with 4 specific pain to solution bullets - Fix broken inline link: [agent.py](agent.py) -> backticked agent.py - Drop all em/en dashes per house style
…ncrete intro and explainable steps - Replace TLDR with outcome-first intro paragraph (21 containers, dashboard URL, local backend URL, smoke-test trace) leading with what the reader gets - Add 'What is the self-hosted stack?' section with the four-layer architecture (apps, databases, workflow engine, replication stack) named by purpose first and technology second so non-experts can skip the parens - Beef up each Step intro with the why before the how: * Step 1: the repo IS the install, no separate package manager * Step 2: the four secrets are baked into running services at boot * Step 3: docker compose up builds + starts in dependency order * Step 4: why three URLs (frontend + backend + PeerDB admin) * Step 5: a single trace is a smoke test for every layer - Split the env-config and Django-shell password tip across Steps 2 and 4 - Add jump-link from Step 2 to Step 4 for the no-Mailgun path - Add 'docker compose ps --format' to keep the health table scannable - Add 'What you solved' closer with 4 specific pain to solution bullets - Drop all em/en dashes per house style
…intro and explainable steps - Replace TLDR with outcome-first intro paragraph naming the concrete outcomes (exact + semantic caching enabled, paraphrases returning cached answers, bypass/invalidate path) plus the one-line app-code-change closer - Add 'What is Agent Command Center?' section explaining the gateway and the two cache tiers (L1 exact + L2 semantic) before diving into steps - Beef up each Step intro with the why before the how: * Step 1: baseline as a 'no caching' control; name the three response headers * Step 2: explain what L1 exact-match actually does (hash + store) * Step 3: paraphrased duplicates miss L1; vector embedding + similarity for L2 * Step 4: why a realistic mixed batch measures actual hit rate vs single hit * Step 5: two real bypass situations (prompt change, post-fix invalidation) - Add 'What you solved' closer with 4 specific pain to solution bullets - Fix dashboard navigation: 'Gateway -> Providers -> Cache' (not 'Agent Command Center -> Caching') matching the real UI - Clarify L1 Backend value (memory) vs the L1 layer (always exact-match) - Drop all em/en dashes per house style
…l-task and alert-monitor API - Four cost levers (sampling, judge model tiers, span filters, alert-driven attention) laid out as a table up front, then one explainable step per lever - Step 1: judge model comparison with a sanity-check snippet across the three Turing tiers; recommend turing_flash unless evidence justifies escalation - Step 2: filter to user-facing LLM spans before sampling so the math runs against the right denominator; embeds the Create Task UI screenshot - Step 3: sampling-rate cheatsheet keyed to daily LLM volume; sampling_rate + spans_limit combo for hard ceiling on traffic spikes - Step 4: alert-monitor wiring with percentage_change threshold so the only time a human looks at the dashboard is when a real regression fires - 'What you solved' closer with 4 specific pain to solution bullets - 'Eval task API' reference section with verified field names (project, evals, run_type, sampling_rate, spans_limit, filters) and status enum - All field names verified against EvalTaskSerializer and UserAlertMonitorSerializer - Colab + GitHub badges added - Drop all em/en dashes per house style
…injection, jailbreak, and social engineering testing - Outcome-first intro: regression suite of three adversarial classes with prompt_injection, answer_refusal, and is_harmful_advice evals scoring every response, plus a fail list of exact prompts that broke through - 'What is adversarial simulation?' section names the three attack classes (prompt injection, jailbreak, social engineering) with concrete patterns - Step 1: three custom hostile personas (Script Injector, DAN Jailbreaker, Social Engineer) with persona descriptions and behavioural settings - Step 2: scenario auto-generated via Workflow Builder, 30 conversation paths pairing personas with concrete attack situations - Step 3: attach three security evals to the run, with verified required-input signatures (prompt_injection: input only; answer_refusal: input+output; is_harmful_advice: output only) - Step 4: read the fail list and patch the system prompt with a paste-ready security-rules diff - 'What you solved' closer with 4 specific pain to solution bullets - Colab + GitHub badges linked to the matching notebook - Drop all em/en dashes per house style
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Description
Describe the changes in this pull request:
Checklist
Related Issues
Closes #<issue_number>