🤖 Machine Learning Model

Agent-guided ML pipeline framework with PyQt6 GUI, experiment tracking, and production-ready model automation.

Overview

Machine Learning Model is a comprehensive, agent-driven ML framework that automates the full machine learning lifecycle — from raw data ingestion to production model deployment — through a 13-step guided pipeline. It targets data scientists, ML engineers, and developers who want structured, reproducible ML workflows without sacrificing flexibility.

The framework pairs a PyQt6 graphical interface with an intelligent ML agent that provides context-aware recommendations at every pipeline stage. Traditional algorithm exploration and AI-guided automation coexist in a single, unified environment.

Important

Agent Mode is the primary workflow entry point. It guides you step-by-step through the entire ML pipeline with automatic state persistence so you can pause and resume at any stage.

(back to top ↑)

Key Features

Icon	Feature	Description	Impact	Status
🤖	ML Agent	AI-powered assistant navigating the 13-step pipeline	Critical	✅ Stable
🖥️	PyQt6 GUI	Interactive workflow navigator with real-time progress	High	✅ Stable
💾	State Persistence	Auto save/load of workflow progress across sessions	High	✅ Stable
📊	Enhanced Results	Execution timing, hyperparameters, smart recommendations	High	✅ Stable
🧪	MLflow Tracking	Experiment logging: params, metrics, feature importances	Medium	✅ Stable
🗃️	DVC Versioning	Reproducible data & model pipelines via DVC	Medium	✅ Stable
🐳	Docker Support	GUI-in-container with X11 forwarding and font rendering	Medium	✅ Stable
⚙️	Hyperparameter Tuning	Automated optimization integrated into pipeline	High	🟡 Beta
📡	Drift Monitoring	Continuous learning and model drift detection	Medium	🟡 Beta

Highlights:

13-step automated pipeline: Data Collection → Preprocessing → EDA → Feature Engineering → Splitting → Algorithm Selection → Training → Evaluation → Tuning → Deployment → Monitoring → Experiment Tracking → Data Versioning
Rich algorithm output: every algorithm run returns execution time, full hyperparameter config, performance category (Excellent/Good/Fair/Poor), and actionable recommendations
Cross-platform: Linux, Windows, and basic macOS support with both shell and batch launchers

(back to top ↑)

Architecture

System Architecture

flowchart TD
    User([👤 User]) --> Entry{Entry Point}
    Entry -->|Agent Mode| Agent[🤖 ML Agent\nml_agent.py]
    Entry -->|GUI Mode| GUI[🖥️ PyQt6 GUI\nmain_window_pyqt6.py]
    Entry -->|CLI Mode| CLI[⌨️ CLI\ncli.py]

    Agent --> Workflow[📋 ML Workflow\nml_workflow.py]
    GUI --> Workflow
    Workflow --> Steps[🔧 Step Implementations\nstep_implementations.py]

    Steps --> DataLayer[📦 Data Layer]
    DataLayer --> Loader[Data Loader]
    DataLayer --> Validator[Data Validator]

    Steps --> Supervised[🌲 Supervised Algorithms]
    Supervised --> DT[Decision Tree]
    Supervised --> RF[Random Forest]
    Supervised --> SKLearn[scikit-learn Suite]

    Steps --> Eval[📈 Evaluation\nMetrics & Reports]
    Steps --> Track[🧪 MLflow Tracking]
    Steps --> DVC[🗃️ DVC Versioning]

    Eval --> Results[EnhancedResult\nTiming + Recommendations]
    Results --> Viz[📊 Visualization\nmatplotlib / plotly]

Component Responsibilities

Component	Location	Responsibility
`ML Agent`	`workflow/ml_agent.py`	Orchestrates pipeline steps, provides context-aware recommendations
`ML Workflow`	`workflow/ml_workflow.py`	State machine managing 13-step progression and persistence
`Step Implementations`	`workflow/step_implementations.py`	Concrete logic for each pipeline stage
`PyQt6 GUI`	`gui/main_window_pyqt6.py`	Interactive dashboard, progress tracking, real-time output
`CLI`	`cli.py`	Typer-based command-line interface
`Supervised`	`supervised/`	Decision Tree, Random Forest with enhanced result output
`Tracking`	`tracking/`	MLflow integration for experiment logging
`Visualization`	`visualization/`	matplotlib, seaborn, plotly chart generation

Note

All pipeline state is automatically serialized to disk so sessions survive crashes or intentional exits. Resume by re-launching — the agent picks up where you left off.

(back to top ↑)

Usage Flow

End-to-End Interaction Sequence

sequenceDiagram
    participant Dev as 👤 Developer
    participant GUI as 🖥️ PyQt6 GUI
    participant Agent as 🤖 ML Agent
    participant Pipeline as 📋 Workflow
    participant MLflow as 🧪 MLflow
    participant DVC as 🗃️ DVC

    Dev->>GUI: Launch application
    GUI->>Agent: Initialize agent session
    Agent->>Pipeline: Load or create workflow state
    Pipeline-->>Agent: Current step (e.g. Step 1: Data Collection)
    Agent-->>GUI: Display step + recommendations

    Dev->>GUI: Load dataset
    GUI->>Pipeline: execute_step(data_collection)
    Pipeline->>DVC: Track raw data file
    DVC-->>Pipeline: ✅ data versioned
    Pipeline-->>GUI: Step complete → advance to Step 2

    Dev->>GUI: Run model training (Step 7)
    GUI->>Pipeline: execute_step(model_training)
    Pipeline->>MLflow: log_params(), log_metrics()
    MLflow-->>Pipeline: Run ID logged
    Pipeline-->>GUI: EnhancedResult{timing, metrics, recommendations}
    GUI-->>Dev: Display results + performance category

(back to top ↑)

Algorithm Coverage

Supported Algorithm Distribution

pie title Algorithm Coverage by Category
    "Supervised Classification" : 40
    "Supervised Regression" : 30
    "Ensemble Methods" : 20
    "Unsupervised (planned)" : 10

Category	Algorithms	Status
Supervised Classification	Decision Tree, Random Forest, SVM, KNN, Logistic Regression	✅ Stable
Supervised Regression	Linear Regression, Decision Tree Regressor, Random Forest Regressor	✅ Stable
Ensemble Methods	Random Forest, XGBoost, LightGBM	✅ Stable
Unsupervised Clustering	K-Means, DBSCAN	🟡 Planned
Neural Networks	scikit-learn MLPClassifier	🟡 Planned

(back to top ↑)

Technology Stack

Technology	Purpose	Why Chosen	Alternatives Considered
Python 3.8+	Core runtime	Ubiquitous ML ecosystem, broad OS support	Julia, R
scikit-learn	ML algorithms	Battle-tested, consistent API, rich estimator library	PyTorch, TensorFlow
XGBoost / LightGBM	Gradient boosting	State-of-the-art tabular performance	CatBoost
PyQt6	Desktop GUI	Native look/feel, rich widget set, Linux/Win/Mac	Tkinter, Dear PyGui
MLflow	Experiment tracking	Self-hostable, rich UI, scikit-learn autolog	Weights & Biases, Neptune
DVC	Data versioning	Git-native, storage-agnostic, pipeline support	LakeFS, Pachyderm
Docker	Containerization	Reproducible GUI environment, CI isolation	Podman
pytest	Testing	Fixture system, coverage plugins, hypothesis	unittest
loguru	Logging	Structured logs, rotation, zero-boilerplate	standard logging
Typer + Rich	CLI	Auto-help generation, colored output	Click, argparse

(back to top ↑)

Setup & Installation

Prerequisites

Python 3.8 – 3.12
Git
Docker (optional, for containerized GUI)
A display server (X11 or Wayland for GUI)

Clone & Install

git clone https://github.com/hkevin01/Machine-Learning-Model.git
cd Machine-Learning-Model

Linux / macOS:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Dev + ML + viz extras
pip install -r requirements-dev.txt

Windows:

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

Environment Variables

Copy the example environment file and configure as needed:

cp .env.example .env

# .env
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=default

Verify Setup

python scripts/validate_setup.py

Tip

Run make mlflow-ui after installing dev dependencies to open the MLflow experiment dashboard at http://localhost:5000.

(back to top ↑)

Usage

Option 1 — Agent Mode (Recommended)

# Linux / macOS
./run_agent.sh

# Windows
run_agent.bat

The agent launches an interactive CLI + GUI session and guides you through all 13 pipeline steps.

Option 2 — PyQt6 GUI

# Unified launcher (Docker or local)
./run.sh               # Launch GUI in Docker
./run.sh --local       # Launch GUI natively
./run.sh --headless    # Headless import smoke-test
./run.sh --rebuild     # Force rebuild Docker image
./run.sh --healthcheck # Environment & ML diagnostics

Option 3 — CLI

python -m machine_learning_model --help

Option 4 — Python API

from machine_learning_model.workflow.ml_agent import MLAgent

agent = MLAgent()
agent.run()  # Starts the guided 13-step pipeline

Enhanced Algorithm Output:

from machine_learning_model.supervised.random_forest import run_algorithm

result = run_algorithm("Random Forest", "classification", spec)
print(f"Execution Time : {result.execution_time:.4f}s")
print(f"Performance    : {result.performance_summary}")   # "Accuracy: 0.934 (Excellent)"
print(f"Recommendations: {result.recommendations}")       # ["Try cross-validation", ...]

(back to top ↑)

Core Capabilities

🤖 Agent Mode Pipeline

The ML Agent executes a deterministic 13-step workflow. Each step is independently resumable:

#	Step	Description
1	Data Collection	Automated dataset loading and schema validation
2	Data Preprocessing	Cleaning, null handling, encoding, type coercion
3	Exploratory Data Analysis	Automated statistical summary and distribution plots
4	Feature Engineering	Scaling, polynomial features, selection
5	Data Splitting	Stratified train / validation / test splitting
6	Algorithm Selection	Automatic algorithm recommendation based on data profile
7	Model Training	Multi-algorithm training with MLflow logging
8	Model Evaluation	Accuracy, F1, ROC-AUC, R², MSE with visual reports
9	Hyperparameter Tuning	Grid/random search with cross-validation
10	Model Deployment	Pickle + ONNX export, production-ready persistence
11	Monitoring	Drift detection and continuous learning hooks
12	Experiment Tracking	MLflow run comparison and artifact logging
13	Data Versioning	DVC pipeline for fully reproducible data & model history

📊 Enhanced Algorithm Results

Every algorithm execution returns an EnhancedResult object:

@dataclass
class EnhancedResult:
    execution_time: float          # Precise wall-clock timing
    model_params: dict             # Full hyperparameter configuration
    performance_summary: str       # "Accuracy: 0.934 (Excellent)"
    recommendations: list[str]     # Context-aware next-step suggestions
    extended_metrics: dict         # AUC, F1-macro, confusion matrix, etc.
    model_insights: dict           # Algorithm-specific info (feature importances, etc.)

Warning

Performance categories (Excellent/Good/Fair/Poor) are heuristic thresholds. Always validate against your domain's acceptable error bounds before deployment.

🖥️ PyQt6 GUI Features

Real-time pipeline step progress tracker
Side-by-side algorithm comparison panel
Integrated log viewer with severity filtering
Decision boundary and feature importance charts
Keyboard shortcuts for power users:
- Press Ctrl+R to run the current pipeline step
- Press Ctrl+N to advance to the next step
- Press Ctrl+S to save workflow state

(back to top ↑)

Experiment Tracking

MLflow is integrated into Steps 7–9 of the pipeline. Enable it with:

pip install -r requirements-dev.txt
make mlflow-ui        # Opens http://localhost:5000

Configure in .env:

MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=default

When enabled, all built-in algorithms automatically log:

Hyperparameters (log_params)
Evaluation metrics (log_metrics)
Feature importances (log_artifact)
Trained model artifacts (mlflow.sklearn.log_model)

(back to top ↑)

Data Versioning

A minimal DVC pipeline is defined in dvc.yaml with two stages: prepare and train.

pip install -r requirements-dev.txt
make dvc-init
dvc repro              # Executes the full pipeline

Add a remote storage backend (optional):

dvc remote add -d origin <remote-url>   # S3, GCS, SSH, local path
dvc push

(back to top ↑)

Roadmap

gantt
    title Machine Learning Model — Roadmap
    dateFormat  YYYY-MM-DD
    section Foundation
        Core pipeline & agent mode    :done,    f1, 2025-01-01, 2025-06-01
        PyQt6 GUI                     :done,    f2, 2025-04-01, 2025-08-01
        MLflow + DVC integration      :done,    f3, 2025-06-01, 2025-10-01
    section Enhancement
        Enhanced algorithm results    :done,    e1, 2025-09-01, 2026-01-01
        Docker GUI support            :done,    e2, 2025-11-01, 2026-02-01
        Hyperparameter tuning engine  :active,  e3, 2026-01-01, 2026-05-01
    section Expansion
        Unsupervised algorithms       :         x1, 2026-05-01, 2026-08-01
        Neural network support        :         x2, 2026-06-01, 2026-10-01
        REST API / model serving      :         x3, 2026-08-01, 2026-12-01

Phase	Goals	Target	Status
Foundation	Core pipeline, Agent Mode, PyQt6 GUI	Q2 2025	✅ Complete
Enhancement	Enhanced results, Docker, MLflow/DVC	Q1 2026	✅ Complete
Tuning	Hyperparameter engine, drift monitoring	Q2 2026	🟡 In Progress
Expansion	Unsupervised algorithms, neural nets	Q3 2026	⭕ Planned
Serving	REST API, model serving, cloud export	Q4 2026	⭕ Planned

(back to top ↑)

Development Status

Version	Stability	Test Coverage	Known Limitations
0.1.0	Alpha	Growing	macOS untested, neural nets planned

Testing

# Run full test suite
python -m pytest tests/ -v

# With coverage report
python -m pytest tests/ --cov=src/machine_learning_model --cov-report=html

# Cross-platform compatibility
python -m pytest tests/test_platform_compatibility.py -v

# Linux / macOS convenience script
./scripts/run_comprehensive_tests.sh

Development Tools:

Tool	Purpose
`pytest` + `pytest-cov`	Test runner and coverage
`black`	Code formatting
`isort`	Import ordering
`flake8`	Linting
`mypy`	Static type checking
`ruff`	Fast linting
`pre-commit`	Git hook automation
`commitizen`	Conventional commits

(back to top ↑)

Platform Support

Platform	Support Level	Notes
✅ Linux (Ubuntu 18.04+)	Full	Primary development target
✅ Windows 10/11	Full	Batch scripts provided
⚠️ macOS	Basic	Untested — use Linux scripts

Contributing

Fork the repository
Create a feature branch: git checkout -b feat/my-feature
Commit using conventional commits: git commit -m "feat: add new algorithm"
Ensure tests pass: ./scripts/run_comprehensive_tests.sh
Open a Pull Request

📋 Detailed Contribution Guidelines

Code Style

Formatter: black — run black src/ tests/ before committing
Imports: isort — run isort src/ tests/
Linting: flake8 src/ tests/
Type hints: all public functions must have type annotations

Testing Requirements

New features require unit tests in tests/
Bug fixes require a regression test
Run pytest tests/ --cov=src/machine_learning_model and ensure coverage does not decrease

Branch Naming

Type	Pattern	Example
Feature	`feat/*`	`feat/add-kmeans`
Bug fix	`fix/*`	`fix/workflow-resume`
Documentation	`docs/*`	`docs/update-readme`
Chore	`chore/*`	`chore/bump-deps`

Commit Format

Follow Conventional Commits:

feat(agent): add drift detection to monitoring step
fix(gui): resolve PyQt6 thread crash on large datasets
docs(readme): add mermaid architecture diagram

🐳 Docker Development Workflow

# Build GUI image
docker build -f Dockerfile.gui -t ml-model-gui .

# Run with X11 forwarding (Linux)
docker run -e DISPLAY=$DISPLAY \
           -v /tmp/.X11-unix:/tmp/.X11-unix \
           ml-model-gui

# Use docker-compose
docker-compose up

📦 Full Dependency List

Core (requirements.txt)

numpy, pandas — data manipulation
scikit-learn, xgboost, lightgbm — ML algorithms
matplotlib, seaborn, plotly — visualization
PyQt6 — desktop GUI
loguru — structured logging
python-dotenv — environment management
pydantic — data validation
typer, rich, click — CLI

Dev (requirements-dev.txt)

pytest, pytest-cov, hypothesis — testing
black, isort, flake8, ruff, mypy — code quality
pre-commit, commitizen — git automation
mlflow — experiment tracking
dvc — data versioning
mkdocs — documentation site

(back to top ↑)

License

This project is licensed under the MIT License — you are free to use, modify, and distribute it with attribution. See the LICENSE file for full terms.

Built with ❤️ by hkevin01

Report Bug · Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github		.github
Learning		Learning
config		config
data		data
docs		docs
examples		examples
models		models
notebooks		notebooks
scripts		scripts
src/machine_learning_model		src/machine_learning_model
test-output		test-output
tests		tests
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.gui		Dockerfile.gui
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
dvc.yaml		dvc.yaml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements-dev.in		requirements-dev.in
requirements-dev.txt		requirements-dev.txt
requirements.in		requirements.in
requirements.txt		requirements.txt
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

🤖 Machine Learning Model

Table of Contents

Overview

Key Features

Architecture

System Architecture

Component Responsibilities

Usage Flow

End-to-End Interaction Sequence

Algorithm Coverage

Supported Algorithm Distribution

Technology Stack

Setup & Installation

Prerequisites

Clone & Install

Environment Variables

Verify Setup

Usage

Option 1 — Agent Mode (Recommended)

Option 2 — PyQt6 GUI

Option 3 — CLI

Option 4 — Python API

Core Capabilities

🤖 Agent Mode Pipeline

📊 Enhanced Algorithm Results

🖥️ PyQt6 GUI Features

Experiment Tracking

Data Versioning

Roadmap

Development Status

Testing

Platform Support

Contributing

Code Style

Testing Requirements

Branch Naming

Commit Format

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages