Skip to content

hkevin01/Machine-Learning-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Machine Learning Model

Agent-guided ML pipeline framework with PyQt6 GUI, experiment tracking, and production-ready model automation.

License GitHub Stars GitHub Forks Last Commit Repo Size Issues Python PyQt6 scikit-learn MLflow DVC Docker


Table of Contents


Overview

Machine Learning Model is a comprehensive, agent-driven ML framework that automates the full machine learning lifecycle β€” from raw data ingestion to production model deployment β€” through a 13-step guided pipeline. It targets data scientists, ML engineers, and developers who want structured, reproducible ML workflows without sacrificing flexibility.

The framework pairs a PyQt6 graphical interface with an intelligent ML agent that provides context-aware recommendations at every pipeline stage. Traditional algorithm exploration and AI-guided automation coexist in a single, unified environment.

Important

Agent Mode is the primary workflow entry point. It guides you step-by-step through the entire ML pipeline with automatic state persistence so you can pause and resume at any stage.

(back to top ↑)


Key Features

Icon Feature Description Impact Status
πŸ€– ML Agent AI-powered assistant navigating the 13-step pipeline Critical βœ… Stable
πŸ–₯️ PyQt6 GUI Interactive workflow navigator with real-time progress High βœ… Stable
πŸ’Ύ State Persistence Auto save/load of workflow progress across sessions High βœ… Stable
πŸ“Š Enhanced Results Execution timing, hyperparameters, smart recommendations High βœ… Stable
πŸ§ͺ MLflow Tracking Experiment logging: params, metrics, feature importances Medium βœ… Stable
πŸ—ƒοΈ DVC Versioning Reproducible data & model pipelines via DVC Medium βœ… Stable
🐳 Docker Support GUI-in-container with X11 forwarding and font rendering Medium βœ… Stable
βš™οΈ Hyperparameter Tuning Automated optimization integrated into pipeline High 🟑 Beta
πŸ“‘ Drift Monitoring Continuous learning and model drift detection Medium 🟑 Beta

Highlights:

  • 13-step automated pipeline: Data Collection β†’ Preprocessing β†’ EDA β†’ Feature Engineering β†’ Splitting β†’ Algorithm Selection β†’ Training β†’ Evaluation β†’ Tuning β†’ Deployment β†’ Monitoring β†’ Experiment Tracking β†’ Data Versioning
  • Rich algorithm output: every algorithm run returns execution time, full hyperparameter config, performance category (Excellent/Good/Fair/Poor), and actionable recommendations
  • Cross-platform: Linux, Windows, and basic macOS support with both shell and batch launchers

(back to top ↑)


Architecture

System Architecture

flowchart TD
    User([πŸ‘€ User]) --> Entry{Entry Point}
    Entry -->|Agent Mode| Agent[πŸ€– ML Agent\nml_agent.py]
    Entry -->|GUI Mode| GUI[πŸ–₯️ PyQt6 GUI\nmain_window_pyqt6.py]
    Entry -->|CLI Mode| CLI[⌨️ CLI\ncli.py]

    Agent --> Workflow[πŸ“‹ ML Workflow\nml_workflow.py]
    GUI --> Workflow
    Workflow --> Steps[πŸ”§ Step Implementations\nstep_implementations.py]

    Steps --> DataLayer[πŸ“¦ Data Layer]
    DataLayer --> Loader[Data Loader]
    DataLayer --> Validator[Data Validator]

    Steps --> Supervised[🌲 Supervised Algorithms]
    Supervised --> DT[Decision Tree]
    Supervised --> RF[Random Forest]
    Supervised --> SKLearn[scikit-learn Suite]

    Steps --> Eval[πŸ“ˆ Evaluation\nMetrics & Reports]
    Steps --> Track[πŸ§ͺ MLflow Tracking]
    Steps --> DVC[πŸ—ƒοΈ DVC Versioning]

    Eval --> Results[EnhancedResult\nTiming + Recommendations]
    Results --> Viz[πŸ“Š Visualization\nmatplotlib / plotly]
Loading

Component Responsibilities

Component Location Responsibility
ML Agent workflow/ml_agent.py Orchestrates pipeline steps, provides context-aware recommendations
ML Workflow workflow/ml_workflow.py State machine managing 13-step progression and persistence
Step Implementations workflow/step_implementations.py Concrete logic for each pipeline stage
PyQt6 GUI gui/main_window_pyqt6.py Interactive dashboard, progress tracking, real-time output
CLI cli.py Typer-based command-line interface
Supervised supervised/ Decision Tree, Random Forest with enhanced result output
Tracking tracking/ MLflow integration for experiment logging
Visualization visualization/ matplotlib, seaborn, plotly chart generation

Note

All pipeline state is automatically serialized to disk so sessions survive crashes or intentional exits. Resume by re-launching β€” the agent picks up where you left off.

(back to top ↑)


Usage Flow

End-to-End Interaction Sequence

sequenceDiagram
    participant Dev as πŸ‘€ Developer
    participant GUI as πŸ–₯️ PyQt6 GUI
    participant Agent as πŸ€– ML Agent
    participant Pipeline as πŸ“‹ Workflow
    participant MLflow as πŸ§ͺ MLflow
    participant DVC as πŸ—ƒοΈ DVC

    Dev->>GUI: Launch application
    GUI->>Agent: Initialize agent session
    Agent->>Pipeline: Load or create workflow state
    Pipeline-->>Agent: Current step (e.g. Step 1: Data Collection)
    Agent-->>GUI: Display step + recommendations

    Dev->>GUI: Load dataset
    GUI->>Pipeline: execute_step(data_collection)
    Pipeline->>DVC: Track raw data file
    DVC-->>Pipeline: βœ… data versioned
    Pipeline-->>GUI: Step complete β†’ advance to Step 2

    Dev->>GUI: Run model training (Step 7)
    GUI->>Pipeline: execute_step(model_training)
    Pipeline->>MLflow: log_params(), log_metrics()
    MLflow-->>Pipeline: Run ID logged
    Pipeline-->>GUI: EnhancedResult{timing, metrics, recommendations}
    GUI-->>Dev: Display results + performance category
Loading

(back to top ↑)


Algorithm Coverage

Supported Algorithm Distribution

pie title Algorithm Coverage by Category
    "Supervised Classification" : 40
    "Supervised Regression" : 30
    "Ensemble Methods" : 20
    "Unsupervised (planned)" : 10
Loading
Category Algorithms Status
Supervised Classification Decision Tree, Random Forest, SVM, KNN, Logistic Regression βœ… Stable
Supervised Regression Linear Regression, Decision Tree Regressor, Random Forest Regressor βœ… Stable
Ensemble Methods Random Forest, XGBoost, LightGBM βœ… Stable
Unsupervised Clustering K-Means, DBSCAN 🟑 Planned
Neural Networks scikit-learn MLPClassifier 🟑 Planned

(back to top ↑)


Technology Stack

Technology Purpose Why Chosen Alternatives Considered
Python 3.8+ Core runtime Ubiquitous ML ecosystem, broad OS support Julia, R
scikit-learn ML algorithms Battle-tested, consistent API, rich estimator library PyTorch, TensorFlow
XGBoost / LightGBM Gradient boosting State-of-the-art tabular performance CatBoost
PyQt6 Desktop GUI Native look/feel, rich widget set, Linux/Win/Mac Tkinter, Dear PyGui
MLflow Experiment tracking Self-hostable, rich UI, scikit-learn autolog Weights & Biases, Neptune
DVC Data versioning Git-native, storage-agnostic, pipeline support LakeFS, Pachyderm
Docker Containerization Reproducible GUI environment, CI isolation Podman
pytest Testing Fixture system, coverage plugins, hypothesis unittest
loguru Logging Structured logs, rotation, zero-boilerplate standard logging
Typer + Rich CLI Auto-help generation, colored output Click, argparse

(back to top ↑)


Setup & Installation

Prerequisites

  • Python 3.8 – 3.12
  • Git
  • Docker (optional, for containerized GUI)
  • A display server (X11 or Wayland for GUI)

Clone & Install

git clone https://github.com/hkevin01/Machine-Learning-Model.git
cd Machine-Learning-Model

Linux / macOS:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Dev + ML + viz extras
pip install -r requirements-dev.txt

Windows:

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

Environment Variables

Copy the example environment file and configure as needed:

cp .env.example .env
# .env
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=default

Verify Setup

python scripts/validate_setup.py

Tip

Run make mlflow-ui after installing dev dependencies to open the MLflow experiment dashboard at http://localhost:5000.

(back to top ↑)


Usage

Option 1 β€” Agent Mode (Recommended)

# Linux / macOS
./run_agent.sh

# Windows
run_agent.bat

The agent launches an interactive CLI + GUI session and guides you through all 13 pipeline steps.

Option 2 β€” PyQt6 GUI

# Unified launcher (Docker or local)
./run.sh               # Launch GUI in Docker
./run.sh --local       # Launch GUI natively
./run.sh --headless    # Headless import smoke-test
./run.sh --rebuild     # Force rebuild Docker image
./run.sh --healthcheck # Environment & ML diagnostics

Option 3 β€” CLI

python -m machine_learning_model --help

Option 4 β€” Python API

from machine_learning_model.workflow.ml_agent import MLAgent

agent = MLAgent()
agent.run()  # Starts the guided 13-step pipeline

Enhanced Algorithm Output:

from machine_learning_model.supervised.random_forest import run_algorithm

result = run_algorithm("Random Forest", "classification", spec)
print(f"Execution Time : {result.execution_time:.4f}s")
print(f"Performance    : {result.performance_summary}")   # "Accuracy: 0.934 (Excellent)"
print(f"Recommendations: {result.recommendations}")       # ["Try cross-validation", ...]

(back to top ↑)


Core Capabilities

πŸ€– Agent Mode Pipeline

The ML Agent executes a deterministic 13-step workflow. Each step is independently resumable:

# Step Description
1 Data Collection Automated dataset loading and schema validation
2 Data Preprocessing Cleaning, null handling, encoding, type coercion
3 Exploratory Data Analysis Automated statistical summary and distribution plots
4 Feature Engineering Scaling, polynomial features, selection
5 Data Splitting Stratified train / validation / test splitting
6 Algorithm Selection Automatic algorithm recommendation based on data profile
7 Model Training Multi-algorithm training with MLflow logging
8 Model Evaluation Accuracy, F1, ROC-AUC, RΒ², MSE with visual reports
9 Hyperparameter Tuning Grid/random search with cross-validation
10 Model Deployment Pickle + ONNX export, production-ready persistence
11 Monitoring Drift detection and continuous learning hooks
12 Experiment Tracking MLflow run comparison and artifact logging
13 Data Versioning DVC pipeline for fully reproducible data & model history

πŸ“Š Enhanced Algorithm Results

Every algorithm execution returns an EnhancedResult object:

@dataclass
class EnhancedResult:
    execution_time: float          # Precise wall-clock timing
    model_params: dict             # Full hyperparameter configuration
    performance_summary: str       # "Accuracy: 0.934 (Excellent)"
    recommendations: list[str]     # Context-aware next-step suggestions
    extended_metrics: dict         # AUC, F1-macro, confusion matrix, etc.
    model_insights: dict           # Algorithm-specific info (feature importances, etc.)

Warning

Performance categories (Excellent/Good/Fair/Poor) are heuristic thresholds. Always validate against your domain's acceptable error bounds before deployment.

πŸ–₯️ PyQt6 GUI Features

  • Real-time pipeline step progress tracker
  • Side-by-side algorithm comparison panel
  • Integrated log viewer with severity filtering
  • Decision boundary and feature importance charts
  • Keyboard shortcuts for power users:
    • Press Ctrl+R to run the current pipeline step
    • Press Ctrl+N to advance to the next step
    • Press Ctrl+S to save workflow state

(back to top ↑)


Experiment Tracking

MLflow is integrated into Steps 7–9 of the pipeline. Enable it with:

pip install -r requirements-dev.txt
make mlflow-ui        # Opens http://localhost:5000

Configure in .env:

MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=default

When enabled, all built-in algorithms automatically log:

  • Hyperparameters (log_params)
  • Evaluation metrics (log_metrics)
  • Feature importances (log_artifact)
  • Trained model artifacts (mlflow.sklearn.log_model)

(back to top ↑)


Data Versioning

A minimal DVC pipeline is defined in dvc.yaml with two stages: prepare and train.

pip install -r requirements-dev.txt
make dvc-init
dvc repro              # Executes the full pipeline

Add a remote storage backend (optional):

dvc remote add -d origin <remote-url>   # S3, GCS, SSH, local path
dvc push

(back to top ↑)


Roadmap

gantt
    title Machine Learning Model β€” Roadmap
    dateFormat  YYYY-MM-DD
    section Foundation
        Core pipeline & agent mode    :done,    f1, 2025-01-01, 2025-06-01
        PyQt6 GUI                     :done,    f2, 2025-04-01, 2025-08-01
        MLflow + DVC integration      :done,    f3, 2025-06-01, 2025-10-01
    section Enhancement
        Enhanced algorithm results    :done,    e1, 2025-09-01, 2026-01-01
        Docker GUI support            :done,    e2, 2025-11-01, 2026-02-01
        Hyperparameter tuning engine  :active,  e3, 2026-01-01, 2026-05-01
    section Expansion
        Unsupervised algorithms       :         x1, 2026-05-01, 2026-08-01
        Neural network support        :         x2, 2026-06-01, 2026-10-01
        REST API / model serving      :         x3, 2026-08-01, 2026-12-01
Loading
Phase Goals Target Status
Foundation Core pipeline, Agent Mode, PyQt6 GUI Q2 2025 βœ… Complete
Enhancement Enhanced results, Docker, MLflow/DVC Q1 2026 βœ… Complete
Tuning Hyperparameter engine, drift monitoring Q2 2026 🟑 In Progress
Expansion Unsupervised algorithms, neural nets Q3 2026 β­• Planned
Serving REST API, model serving, cloud export Q4 2026 β­• Planned

(back to top ↑)


Development Status

Version Stability Test Coverage Known Limitations
0.1.0 Alpha Growing macOS untested, neural nets planned

Testing

# Run full test suite
python -m pytest tests/ -v

# With coverage report
python -m pytest tests/ --cov=src/machine_learning_model --cov-report=html

# Cross-platform compatibility
python -m pytest tests/test_platform_compatibility.py -v

# Linux / macOS convenience script
./scripts/run_comprehensive_tests.sh

Development Tools:

Tool Purpose
pytest + pytest-cov Test runner and coverage
black Code formatting
isort Import ordering
flake8 Linting
mypy Static type checking
ruff Fast linting
pre-commit Git hook automation
commitizen Conventional commits

(back to top ↑)


Platform Support

Platform Support Level Notes
βœ… Linux (Ubuntu 18.04+) Full Primary development target
βœ… Windows 10/11 Full Batch scripts provided
⚠️ macOS Basic Untested β€” use Linux scripts

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feat/my-feature
  3. Commit using conventional commits: git commit -m "feat: add new algorithm"
  4. Ensure tests pass: ./scripts/run_comprehensive_tests.sh
  5. Open a Pull Request
πŸ“‹ Detailed Contribution Guidelines

Code Style

  • Formatter: black β€” run black src/ tests/ before committing
  • Imports: isort β€” run isort src/ tests/
  • Linting: flake8 src/ tests/
  • Type hints: all public functions must have type annotations

Testing Requirements

  • New features require unit tests in tests/
  • Bug fixes require a regression test
  • Run pytest tests/ --cov=src/machine_learning_model and ensure coverage does not decrease

Branch Naming

Type Pattern Example
Feature feat/* feat/add-kmeans
Bug fix fix/* fix/workflow-resume
Documentation docs/* docs/update-readme
Chore chore/* chore/bump-deps

Commit Format

Follow Conventional Commits:

feat(agent): add drift detection to monitoring step
fix(gui): resolve PyQt6 thread crash on large datasets
docs(readme): add mermaid architecture diagram
🐳 Docker Development Workflow
# Build GUI image
docker build -f Dockerfile.gui -t ml-model-gui .

# Run with X11 forwarding (Linux)
docker run -e DISPLAY=$DISPLAY \
           -v /tmp/.X11-unix:/tmp/.X11-unix \
           ml-model-gui

# Use docker-compose
docker-compose up
πŸ“¦ Full Dependency List

Core (requirements.txt)

  • numpy, pandas β€” data manipulation
  • scikit-learn, xgboost, lightgbm β€” ML algorithms
  • matplotlib, seaborn, plotly β€” visualization
  • PyQt6 β€” desktop GUI
  • loguru β€” structured logging
  • python-dotenv β€” environment management
  • pydantic β€” data validation
  • typer, rich, click β€” CLI

Dev (requirements-dev.txt)

  • pytest, pytest-cov, hypothesis β€” testing
  • black, isort, flake8, ruff, mypy β€” code quality
  • pre-commit, commitizen β€” git automation
  • mlflow β€” experiment tracking
  • dvc β€” data versioning
  • mkdocs β€” documentation site

(back to top ↑)


License

This project is licensed under the MIT License β€” you are free to use, modify, and distribute it with attribution. See the LICENSE file for full terms.


Built with ❀️ by hkevin01

Report Bug Β· Request Feature

About

This project demonstrates a comprehensive machine learning pipeline with examples of supervised, unsupervised, and semi-supervised learning approaches. It serves as a template and learning resource for ML practitioners.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors