Skip to content

[Hackathon] feat: Workflow Snippets + Quick Steps — Reusable Operator Bundles & One-Click Automation ⚡#5117

Open
EmilySun621 wants to merge 5 commits into
apache:mainfrom
EmilySun621:hackathon/snippets-quicksteps
Open

[Hackathon] feat: Workflow Snippets + Quick Steps — Reusable Operator Bundles & One-Click Automation ⚡#5117
EmilySun621 wants to merge 5 commits into
apache:mainfrom
EmilySun621:hackathon/snippets-quicksteps

Conversation

@EmilySun621
Copy link
Copy Markdown

😤 The Problem

Every ML project starts the same way: drag CSV Source, add Missing Value Handler, add Train/Test Split, add Model, add Evaluation. Every. Single. Time. And when you want to run + generate a report? That's 3 separate manual steps.

✨ The Solution

Snippets: select operators → save as a bundle → drag the bundle onto any future workflow. All operators + connections appear at once.

Quick Steps: pre-defined action sequences. One click = multiple steps executed automatically.


🧩 Snippets

Save from Canvas

Select 2+ operators on canvas → Right-click → "Save as Snippet"
→ Name it → All operators + connections saved as a reusable bundle

Create from Operator Catalog

Snippets page → "+ Create Snippet" → Browse operator catalog
→ Click operators to add → They chain automatically → Save

Drag to Reuse

Operator panel → Snippets section (bottom) → Drag "Classification Bundle"
→ Split + Logistic Regression + Evaluation appear connected on canvas

3 Built-in Snippets

Snippet | Operators | Use case -- | -- | -- 🧹 Data Cleaning Kit | Filter → Distinct → TypeCasting | Clean any dataset 🧠 Classification Bundle | Split → LogisticRegression → SklearnTesting | Quick ML pipeline 📊 EDA Starter | CSVFileScan → Aggregate → ScatterMatrixChart | Explore any dataset

Real Execution

Quick Steps actually execute — not simulated:

  • "Run and Report" programmatically triggers the Run button, waits for execution to complete, then sends a real message to the AI agent requesting a comprehensive analysis report
  • "Clean and Profile" opens the actual Data Profiling panel
  • Progress panel shows live status: ✅ Step 1 → ⏳ Step 2 → ⬜ Step 3

Create Custom Quick Steps

  • "+ Create Quick Step" in the dropdown
  • Pick from available actions, set order
  • Save for future use

🎬 Demo
Snippets:

Open operator panel → scroll to "📦 Snippets" section
Drag "Classification Bundle" onto canvas
→ Split + Logistic Regression + SklearnTesting appear connected ✅
Select 2 operators on canvas → right-click → "Save as Snippet"
New snippet appears in the panel

Quick Steps:

Open a workflow with data → click "⚡ Quick Steps" in toolbar
Click "🚀 Run and Report"
Progress: Run workflow ✅ → Generate report ✅ → Report ready ✅
Agent produces real analysis in chat / Results Dashboard

Emily Sun and others added 5 commits May 15, 2026 21:55
This bundles the feature work that built up on this branch:

- Custom agents: dashboard CRUD page and editor dialog (48px icon tile,
  chip-style guardrails, model selector). Each custom agent now carries a
  LiteLLM model_name (Opus 4.7 / Haiku 4.5) that is passed through to the
  agent-service so different agents can use different models.

- Conversation history is scoped per (workflowId, agentId): switching
  agent or workflow yields a different conversation list. localStorage
  key: texera.workflowConversations.v1.{workflowId}.{agentId}.

- Time machine: workflow snapshot list, revert, and agent-tagged
  checkpoints. New workflow-history-tool in agent-service backs the
  "undo my last change" flow; amber gains a WorkflowSnapshotResource;
  sql/updates/23.sql adds the snapshot table.

- Operator-aware custom-agent prompts: the system prompt now injects the
  full operator catalog with a "prefer built-in operators over Python
  UDFs" rule, sourced from WorkflowSystemMetadata at request time.

- LiteLLM: added the claude-opus-4.7 entry alongside claude-haiku-4.5
  and gpt-5-mini in bin/litellm-config.yaml.

- Agent panel rewritten around the (conversation list / chat) two-view
  model with subscription-managed list reloads and per-step persistence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, role detection

Adds a Data Profiling Panel triggered from data-source operator properties
(CSV/JSON/Parquet/FileScan). The panel surfaces three derived views on top of
a single profile response — no new backend calls:

  - Data Quality Score (0–100): completeness, duplicates, outliers, constant
    columns, high-cardinality categoricals, and class-imbalance penalties,
    with a colored progress bar and sub-score badges.
  - Auto-Suggest Cleaning Actions: severity-sorted rules (drop sparse/ID/
    constant cols, impute via median/mode, deduplicate, review outliers) with
    an Add-to-Workflow button that copies an operator hint to the clipboard.
  - Column Relationship Detector: heuristic ID/target/feature/datetime/
    constant classification with badges per column and an auto-detected
    summary section.

Wires a small "📊 Profile Data" button into the operator property editor that
opens the panel as a draggable modal seeded with the operator's file path.
Backend integration is intentionally a follow-up; the service ships a
deterministic mock so the UX is fully exercised.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ble rule

Adds a console.debug so we can see what operatorType is on the selected
operator (helps when the rule doesn't match an unexpected name). Also
broadens the profileable regex to include Text/File so anything that looks
remotely like a data source shows the button.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DataProfilingService now fetches the actual dataset file via
DatasetService.retrieveDatasetVersionSingleFile (presign-download endpoint),
parses with papaparse (first 5000 rows for performance), and runs a new
pure-TS profiler that computes:

  - dtype inference per column (numeric / datetime / boolean / categorical / text)
  - per-column: count, missing, missingPercent, unique, plus dtype-specific stats
  - numeric: mean, median, std, min, max, ±3σ outlier count, 10-bin histogram
  - categorical/boolean: top-5 value counts
  - dataset-level: row-key duplicate count
  - Pearson correlation matrix across (up to 8) numeric columns

If the source isn't a dataset path or any step fails (fetch / parse / empty
headers), we fall back to the deterministic mock so the panel always renders.
The panel header now shows a short filename (full path on hover) and surfaces
fetch/parse errors inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…report generation

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added engine ddl-change Changes to the TexeraDB DDL frontend Changes related to the frontend GUI dev common agent-service labels May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-service common ddl-change Changes to the TexeraDB DDL dev engine frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant