Agent-guided ML pipeline framework with PyQt6 GUI, experiment tracking, and production-ready model automation.
- Overview
- Key Features
- Architecture
- Usage Flow
- Algorithm Coverage
- Technology Stack
- Setup & Installation
- Usage
- Core Capabilities
- Experiment Tracking
- Data Versioning
- Roadmap
- Development Status
- Contributing
- License
Machine Learning Model is a comprehensive, agent-driven ML framework that automates the full machine learning lifecycle β from raw data ingestion to production model deployment β through a 13-step guided pipeline. It targets data scientists, ML engineers, and developers who want structured, reproducible ML workflows without sacrificing flexibility.
The framework pairs a PyQt6 graphical interface with an intelligent ML agent that provides context-aware recommendations at every pipeline stage. Traditional algorithm exploration and AI-guided automation coexist in a single, unified environment.
Important
Agent Mode is the primary workflow entry point. It guides you step-by-step through the entire ML pipeline with automatic state persistence so you can pause and resume at any stage.
| Icon | Feature | Description | Impact | Status |
|---|---|---|---|---|
| π€ | ML Agent | AI-powered assistant navigating the 13-step pipeline | Critical | β Stable |
| π₯οΈ | PyQt6 GUI | Interactive workflow navigator with real-time progress | High | β Stable |
| πΎ | State Persistence | Auto save/load of workflow progress across sessions | High | β Stable |
| π | Enhanced Results | Execution timing, hyperparameters, smart recommendations | High | β Stable |
| π§ͺ | MLflow Tracking | Experiment logging: params, metrics, feature importances | Medium | β Stable |
| ποΈ | DVC Versioning | Reproducible data & model pipelines via DVC | Medium | β Stable |
| π³ | Docker Support | GUI-in-container with X11 forwarding and font rendering | Medium | β Stable |
| βοΈ | Hyperparameter Tuning | Automated optimization integrated into pipeline | High | π‘ Beta |
| π‘ | Drift Monitoring | Continuous learning and model drift detection | Medium | π‘ Beta |
Highlights:
- 13-step automated pipeline: Data Collection β Preprocessing β EDA β Feature Engineering β Splitting β Algorithm Selection β Training β Evaluation β Tuning β Deployment β Monitoring β Experiment Tracking β Data Versioning
- Rich algorithm output: every algorithm run returns execution time, full hyperparameter config, performance category (Excellent/Good/Fair/Poor), and actionable recommendations
- Cross-platform: Linux, Windows, and basic macOS support with both shell and batch launchers
flowchart TD
User([π€ User]) --> Entry{Entry Point}
Entry -->|Agent Mode| Agent[π€ ML Agent\nml_agent.py]
Entry -->|GUI Mode| GUI[π₯οΈ PyQt6 GUI\nmain_window_pyqt6.py]
Entry -->|CLI Mode| CLI[β¨οΈ CLI\ncli.py]
Agent --> Workflow[π ML Workflow\nml_workflow.py]
GUI --> Workflow
Workflow --> Steps[π§ Step Implementations\nstep_implementations.py]
Steps --> DataLayer[π¦ Data Layer]
DataLayer --> Loader[Data Loader]
DataLayer --> Validator[Data Validator]
Steps --> Supervised[π² Supervised Algorithms]
Supervised --> DT[Decision Tree]
Supervised --> RF[Random Forest]
Supervised --> SKLearn[scikit-learn Suite]
Steps --> Eval[π Evaluation\nMetrics & Reports]
Steps --> Track[π§ͺ MLflow Tracking]
Steps --> DVC[ποΈ DVC Versioning]
Eval --> Results[EnhancedResult\nTiming + Recommendations]
Results --> Viz[π Visualization\nmatplotlib / plotly]
| Component | Location | Responsibility |
|---|---|---|
ML Agent |
workflow/ml_agent.py |
Orchestrates pipeline steps, provides context-aware recommendations |
ML Workflow |
workflow/ml_workflow.py |
State machine managing 13-step progression and persistence |
Step Implementations |
workflow/step_implementations.py |
Concrete logic for each pipeline stage |
PyQt6 GUI |
gui/main_window_pyqt6.py |
Interactive dashboard, progress tracking, real-time output |
CLI |
cli.py |
Typer-based command-line interface |
Supervised |
supervised/ |
Decision Tree, Random Forest with enhanced result output |
Tracking |
tracking/ |
MLflow integration for experiment logging |
Visualization |
visualization/ |
matplotlib, seaborn, plotly chart generation |
Note
All pipeline state is automatically serialized to disk so sessions survive crashes or intentional exits. Resume by re-launching β the agent picks up where you left off.
sequenceDiagram
participant Dev as π€ Developer
participant GUI as π₯οΈ PyQt6 GUI
participant Agent as π€ ML Agent
participant Pipeline as π Workflow
participant MLflow as π§ͺ MLflow
participant DVC as ποΈ DVC
Dev->>GUI: Launch application
GUI->>Agent: Initialize agent session
Agent->>Pipeline: Load or create workflow state
Pipeline-->>Agent: Current step (e.g. Step 1: Data Collection)
Agent-->>GUI: Display step + recommendations
Dev->>GUI: Load dataset
GUI->>Pipeline: execute_step(data_collection)
Pipeline->>DVC: Track raw data file
DVC-->>Pipeline: β
data versioned
Pipeline-->>GUI: Step complete β advance to Step 2
Dev->>GUI: Run model training (Step 7)
GUI->>Pipeline: execute_step(model_training)
Pipeline->>MLflow: log_params(), log_metrics()
MLflow-->>Pipeline: Run ID logged
Pipeline-->>GUI: EnhancedResult{timing, metrics, recommendations}
GUI-->>Dev: Display results + performance category
pie title Algorithm Coverage by Category
"Supervised Classification" : 40
"Supervised Regression" : 30
"Ensemble Methods" : 20
"Unsupervised (planned)" : 10
| Category | Algorithms | Status |
|---|---|---|
| Supervised Classification | Decision Tree, Random Forest, SVM, KNN, Logistic Regression | β Stable |
| Supervised Regression | Linear Regression, Decision Tree Regressor, Random Forest Regressor | β Stable |
| Ensemble Methods | Random Forest, XGBoost, LightGBM | β Stable |
| Unsupervised Clustering | K-Means, DBSCAN | π‘ Planned |
| Neural Networks | scikit-learn MLPClassifier | π‘ Planned |
| Technology | Purpose | Why Chosen | Alternatives Considered |
|---|---|---|---|
| Python 3.8+ | Core runtime | Ubiquitous ML ecosystem, broad OS support | Julia, R |
| scikit-learn | ML algorithms | Battle-tested, consistent API, rich estimator library | PyTorch, TensorFlow |
| XGBoost / LightGBM | Gradient boosting | State-of-the-art tabular performance | CatBoost |
| PyQt6 | Desktop GUI | Native look/feel, rich widget set, Linux/Win/Mac | Tkinter, Dear PyGui |
| MLflow | Experiment tracking | Self-hostable, rich UI, scikit-learn autolog | Weights & Biases, Neptune |
| DVC | Data versioning | Git-native, storage-agnostic, pipeline support | LakeFS, Pachyderm |
| Docker | Containerization | Reproducible GUI environment, CI isolation | Podman |
| pytest | Testing | Fixture system, coverage plugins, hypothesis | unittest |
| loguru | Logging | Structured logs, rotation, zero-boilerplate | standard logging |
| Typer + Rich | CLI | Auto-help generation, colored output | Click, argparse |
- Python 3.8 β 3.12
- Git
- Docker (optional, for containerized GUI)
- A display server (X11 or Wayland for GUI)
git clone https://github.com/hkevin01/Machine-Learning-Model.git
cd Machine-Learning-ModelLinux / macOS:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Dev + ML + viz extras
pip install -r requirements-dev.txtWindows:
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txtCopy the example environment file and configure as needed:
cp .env.example .env# .env
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=defaultpython scripts/validate_setup.pyTip
Run make mlflow-ui after installing dev dependencies to open the MLflow experiment dashboard at http://localhost:5000.
# Linux / macOS
./run_agent.sh
# Windows
run_agent.batThe agent launches an interactive CLI + GUI session and guides you through all 13 pipeline steps.
# Unified launcher (Docker or local)
./run.sh # Launch GUI in Docker
./run.sh --local # Launch GUI natively
./run.sh --headless # Headless import smoke-test
./run.sh --rebuild # Force rebuild Docker image
./run.sh --healthcheck # Environment & ML diagnosticspython -m machine_learning_model --helpfrom machine_learning_model.workflow.ml_agent import MLAgent
agent = MLAgent()
agent.run() # Starts the guided 13-step pipelineEnhanced Algorithm Output:
from machine_learning_model.supervised.random_forest import run_algorithm
result = run_algorithm("Random Forest", "classification", spec)
print(f"Execution Time : {result.execution_time:.4f}s")
print(f"Performance : {result.performance_summary}") # "Accuracy: 0.934 (Excellent)"
print(f"Recommendations: {result.recommendations}") # ["Try cross-validation", ...]The ML Agent executes a deterministic 13-step workflow. Each step is independently resumable:
| # | Step | Description |
|---|---|---|
| 1 | Data Collection | Automated dataset loading and schema validation |
| 2 | Data Preprocessing | Cleaning, null handling, encoding, type coercion |
| 3 | Exploratory Data Analysis | Automated statistical summary and distribution plots |
| 4 | Feature Engineering | Scaling, polynomial features, selection |
| 5 | Data Splitting | Stratified train / validation / test splitting |
| 6 | Algorithm Selection | Automatic algorithm recommendation based on data profile |
| 7 | Model Training | Multi-algorithm training with MLflow logging |
| 8 | Model Evaluation | Accuracy, F1, ROC-AUC, RΒ², MSE with visual reports |
| 9 | Hyperparameter Tuning | Grid/random search with cross-validation |
| 10 | Model Deployment | Pickle + ONNX export, production-ready persistence |
| 11 | Monitoring | Drift detection and continuous learning hooks |
| 12 | Experiment Tracking | MLflow run comparison and artifact logging |
| 13 | Data Versioning | DVC pipeline for fully reproducible data & model history |
Every algorithm execution returns an EnhancedResult object:
@dataclass
class EnhancedResult:
execution_time: float # Precise wall-clock timing
model_params: dict # Full hyperparameter configuration
performance_summary: str # "Accuracy: 0.934 (Excellent)"
recommendations: list[str] # Context-aware next-step suggestions
extended_metrics: dict # AUC, F1-macro, confusion matrix, etc.
model_insights: dict # Algorithm-specific info (feature importances, etc.)Warning
Performance categories (Excellent/Good/Fair/Poor) are heuristic thresholds. Always validate against your domain's acceptable error bounds before deployment.
- Real-time pipeline step progress tracker
- Side-by-side algorithm comparison panel
- Integrated log viewer with severity filtering
- Decision boundary and feature importance charts
- Keyboard shortcuts for power users:
- Press Ctrl+R to run the current pipeline step
- Press Ctrl+N to advance to the next step
- Press Ctrl+S to save workflow state
MLflow is integrated into Steps 7β9 of the pipeline. Enable it with:
pip install -r requirements-dev.txt
make mlflow-ui # Opens http://localhost:5000Configure in .env:
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=defaultWhen enabled, all built-in algorithms automatically log:
- Hyperparameters (
log_params) - Evaluation metrics (
log_metrics) - Feature importances (
log_artifact) - Trained model artifacts (
mlflow.sklearn.log_model)
A minimal DVC pipeline is defined in dvc.yaml with two stages: prepare and train.
pip install -r requirements-dev.txt
make dvc-init
dvc repro # Executes the full pipelineAdd a remote storage backend (optional):
dvc remote add -d origin <remote-url> # S3, GCS, SSH, local path
dvc pushgantt
title Machine Learning Model β Roadmap
dateFormat YYYY-MM-DD
section Foundation
Core pipeline & agent mode :done, f1, 2025-01-01, 2025-06-01
PyQt6 GUI :done, f2, 2025-04-01, 2025-08-01
MLflow + DVC integration :done, f3, 2025-06-01, 2025-10-01
section Enhancement
Enhanced algorithm results :done, e1, 2025-09-01, 2026-01-01
Docker GUI support :done, e2, 2025-11-01, 2026-02-01
Hyperparameter tuning engine :active, e3, 2026-01-01, 2026-05-01
section Expansion
Unsupervised algorithms : x1, 2026-05-01, 2026-08-01
Neural network support : x2, 2026-06-01, 2026-10-01
REST API / model serving : x3, 2026-08-01, 2026-12-01
| Phase | Goals | Target | Status |
|---|---|---|---|
| Foundation | Core pipeline, Agent Mode, PyQt6 GUI | Q2 2025 | β Complete |
| Enhancement | Enhanced results, Docker, MLflow/DVC | Q1 2026 | β Complete |
| Tuning | Hyperparameter engine, drift monitoring | Q2 2026 | π‘ In Progress |
| Expansion | Unsupervised algorithms, neural nets | Q3 2026 | β Planned |
| Serving | REST API, model serving, cloud export | Q4 2026 | β Planned |
| Version | Stability | Test Coverage | Known Limitations |
|---|---|---|---|
| 0.1.0 | Alpha | Growing | macOS untested, neural nets planned |
# Run full test suite
python -m pytest tests/ -v
# With coverage report
python -m pytest tests/ --cov=src/machine_learning_model --cov-report=html
# Cross-platform compatibility
python -m pytest tests/test_platform_compatibility.py -v
# Linux / macOS convenience script
./scripts/run_comprehensive_tests.shDevelopment Tools:
| Tool | Purpose |
|---|---|
pytest + pytest-cov |
Test runner and coverage |
black |
Code formatting |
isort |
Import ordering |
flake8 |
Linting |
mypy |
Static type checking |
ruff |
Fast linting |
pre-commit |
Git hook automation |
commitizen |
Conventional commits |
| Platform | Support Level | Notes |
|---|---|---|
| β Linux (Ubuntu 18.04+) | Full | Primary development target |
| β Windows 10/11 | Full | Batch scripts provided |
| Basic | Untested β use Linux scripts |
- Fork the repository
- Create a feature branch:
git checkout -b feat/my-feature - Commit using conventional commits:
git commit -m "feat: add new algorithm" - Ensure tests pass:
./scripts/run_comprehensive_tests.sh - Open a Pull Request
π Detailed Contribution Guidelines
- Formatter:
blackβ runblack src/ tests/before committing - Imports:
isortβ runisort src/ tests/ - Linting:
flake8 src/ tests/ - Type hints: all public functions must have type annotations
- New features require unit tests in
tests/ - Bug fixes require a regression test
- Run
pytest tests/ --cov=src/machine_learning_modeland ensure coverage does not decrease
| Type | Pattern | Example |
|---|---|---|
| Feature | feat/* |
feat/add-kmeans |
| Bug fix | fix/* |
fix/workflow-resume |
| Documentation | docs/* |
docs/update-readme |
| Chore | chore/* |
chore/bump-deps |
Follow Conventional Commits:
feat(agent): add drift detection to monitoring step
fix(gui): resolve PyQt6 thread crash on large datasets
docs(readme): add mermaid architecture diagram
π³ Docker Development Workflow
# Build GUI image
docker build -f Dockerfile.gui -t ml-model-gui .
# Run with X11 forwarding (Linux)
docker run -e DISPLAY=$DISPLAY \
-v /tmp/.X11-unix:/tmp/.X11-unix \
ml-model-gui
# Use docker-compose
docker-compose upπ¦ Full Dependency List
Core (requirements.txt)
numpy,pandasβ data manipulationscikit-learn,xgboost,lightgbmβ ML algorithmsmatplotlib,seaborn,plotlyβ visualizationPyQt6β desktop GUIloguruβ structured loggingpython-dotenvβ environment managementpydanticβ data validationtyper,rich,clickβ CLI
Dev (requirements-dev.txt)
pytest,pytest-cov,hypothesisβ testingblack,isort,flake8,ruff,mypyβ code qualitypre-commit,commitizenβ git automationmlflowβ experiment trackingdvcβ data versioningmkdocsβ documentation site
This project is licensed under the MIT License β you are free to use, modify, and distribute it with attribution. See the LICENSE file for full terms.
Built with β€οΈ by hkevin01