Skip to content

Swately/phyriad

Phyriad

A high-performance C++23 runtime built on stigmergic dispatch. Workers coordinate like ant colonies — no work-stealing, no locks in the hot path, every decision traced. The same primitives extend to any domain where N agents share observable state.

License: Apache 2.0 C++23 ci-linux ci-windows sanitizers bench-regression Release


📚 Lectura rápida — documentos humanos

BEEP BOOP HUMANOS SABER ESCRIBIR, leanlo, creo vale la pena.

Documento Qué cuenta
docs/EVOLUCION.md Cómo nació Phyriad. Corto.
docs/POR_QUE_AYAMA.md Por qué construí Ayama: mi 7950X3D, los juegos viejos, la Compaq de la infancia.
docs/COMO_TRABAJO_CON_LLM.md Mi método con LLM. Manifiesto. Lo que funciona y lo que no.
docs/EXPERIMENTAL_BENCHMARKS.md Los números. Con disclaimer sobre validación.
docs/example_plans/AYAMA_IMPLEMENTATION_STRATEGIES.md Ejemplo completo (1900+ líneas) de un plan de implementación pre-código.

Phyriad es mi framework personal, libre para el que lo quiera utilizar. Phyriad es mi base para mis implementaciones y proyectos actuales y a futuro, un flexing de rendimiento y herramientas de todo tipo para lo que yo quisiera implementar.

Todo lo que está abajo es generado por IA, mas no quiere decir que sea falso. Son libres de hacer los tests personalmente; yo me quedo con esos números y trabajaré con ellos. Cuando sea el momento de que una implementación requiera que esos números estén validados, haré una verificación manual de ellos y los actualizaré.

Gran parte del código, si no es que su totalidad, está generado por IA. Claro, no a lo bruto: seguí un régimen muy estricto en la implementación. La idea del framework lleva trabajándose por años, pero con la llegada de los LLM por fin pude ponerla a prueba e implementarla, gracias a la mano de obra tan poderosa que ésta ofrece.

De momento existen 23 pilares fundamentales que construyen este framework, cada uno implementado, repito, con rigurosos planes de implementación. Adjunté uno de ejemplo en docs/example_plans/AYAMA_IMPLEMENTATION_STRATEGIES.md por si quieren usarlo de referencia para sus propias implementaciones.

Siendo sincero, hay muchas implementaciones que requieren años de estudio real para poder ser entendidas en su totalidad. Es aquí donde la IA y el LLM entran. No busco menospreciar la teoría, busco hacerla accesible gracias a esta nueva herramienta.

Estos fragmentos de texto son escritos por mi Swately el developer, claramente evito que la IA los toque para que no pierdan el mensaje que quiero transimitir con el adorno que le da la IA, asi minimo no se pierde la humnaidad en este repo tan "pulido" y sacado de la nada.

Este proyecto es parte mi corazoncito y le he dedicado mucho cariño <3.

A partir de aqui el texto es BEEP BOOP FIRE FIRE 🤖🤖🤖


Ejemplo de plan de implementación (uno al azar)

A continuación va un extracto del plan de implementación de Ayama tal como existió en la era gamma del proyecto (cuando el framework todavía se llamaba así). Es el tipo de documento que precede a cada pilar: restricciones, decisiones técnicas concretas, patrones a aplicar. Lo incluyo para que se vea el "régimen estricto" del que hablo arriba. El archivo completo (1900+ líneas) está en docs/example_plans/AYAMA_IMPLEMENTATION_STRATEGIES.md.

Nota: este documento usa los nombres antiguos (gma::, Gamma) — el proyecto se renombró a Phyriad después de esta planificación. La metodología y los patrones siguen vigentes, solo cambia el namespace.

# Ayama — Implementation Strategies
## Patrones técnicos concretos para máximo rendimiento, eficiencia y UX

Version: 0.3 — Mayo 2026
Foundation: Gamma Framework release line 1.1.0
Audience: implementadores de cada Bloque del Master Plan.

Este documento NO redefine los Bloques (eso está en el Master Plan).
Esto es la **caja de herramientas técnica** con patrones concretos,
decisiones difíciles ya tomadas y código de referencia. Cuando se
implementa un Bloque, primero se consulta acá si hay un patrón
aplicable.

## Tabla de contenidos

- §1 — Performance: zero-alloc hot path
- §2 — Performance: lock-free e IPC
- §3 — Eficiencia: anti-parasitic resource budget
- §4 — Eficiencia: adaptive ticking y power awareness
- §5 — Eficiencia: ETW con bajo overhead
- §6 — UX: Auto mode pipeline
- §7 — UX: clasificación de procesos
- §8 — UX: persistencia y aprendizaje
- §9 — UX: transparencia y reversibility
- §10 — Robustez: error handling y degradación
- §11 — Compatibilidad: cero invasive operations
- §12 — Templates de código reusable
- §13 — Pilares nuevos (Gamma 1.1.0): process / ipc / etw

## §1 — Performance: zero-alloc hot path

### 1.1 Regla maestra

**Toda struct/buffer del hot path se construye una vez en start() y
vive hasta stop().** Ninguna llamada en el bucle de tick puede invocar
new, malloc, std::vector::resize, ni cualquier path que pueda allocar.

### 1.2 Pre-allocación estándar

Cada componente principal expone una constante kMax* y usa std::array
o buffer raw alineado:

    class ProcessObserver {
    public:
        static constexpr uint32_t kMaxTargets = 32u;
    private:
        alignas(64) std::array<TargetProcess, kMaxTargets> targets_;
        alignas(64) std::array<TargetMetrics, kMaxTargets> metrics_;
        uint32_t n_targets_{0u};
    };

No usar std::vector para colecciones de tamaño acotado. Usar
std::array<T, N> con un counter aparte. Esto:
- Elimina indirection.
- Garantiza layout contiguo (cache-friendly).
- Hace explícito el límite en compile-time.

(El resto — §1.3 a §13.5 — está en el archivo completo.)


Two layers, one library

Layer 1 — the runtime

A C++23 thread pool with stigmergic dispatch: workers don't pull from a shared queue, they advertise their fill_pct into a Pheromone field and a TaskClassifier reads the field to decide where each task lands. No work-stealing, no contention on a central scheduler.

Two API styles:

// Ergonomic (lambda with captures — same shape as Taskflow / std::async).
// Heap-allocates the lambda.
uint64_t id;
pool.try_submit_callable([&local_var]() noexcept {
    do_work(local_var);
}, id);

// Zero-alloc native (captureless function + explicit ctx). Use for hot loops.
static void fn(void* ctx) noexcept { /* ... */ }
phyriad::pool::Task t{ .fn = fn, .ctx = &my_ctx };
pool.try_submit(t, id);

See docs/ERGONOMICS_ANALYSIS.md for the trade-off and docs/QUICKSTART_POOL.md for both patterns in a runnable example.

📊 Performance numbers (throughput, latency, comparisons vs Taskflow / concurrencpp / Boost.Lockfree) live in docs/EXPERIMENTAL_BENCHMARKS.md. They are preliminary and not independently validated — see that document's top-of-file disclaimer for context on the limitations.

Layer 2 — the pattern as a library

Phyriad's stigmergy primitives are extracted as a standalone pillar: build YOUR own stigmergic system on top of them, without taking the pool runtime.

#include <phyriad/stigmergy/Stigmergy.hpp>

phyriad::stigmergy::Field<MarketTick>       ticks;
phyriad::stigmergy::Pheromone<uint8_t, 32>  fill_pct;

ticks.publish(latest);
auto t = ticks.read();
fill_pct.deposit(worker_id, fill);
auto snap = fill_pct.read_all();

Stigmergy is the only thing Phyriad does that no other C++ library ships as a primitive. The pattern is documented separately with Grassé's 1959 origin and three worked examples. Primitive-level latency measurements (publish / read / deposit / cross-CCD penalty) are in docs/EXPERIMENTAL_BENCHMARKS.md.


Optional — Network dispatch (net pillar, WIP)

PhyriadNet/1 lets you ship work to a Phyriad runtime running on another process or another host over UDP. The protocol is custom (16-byte header, xxHash32 checksum, frame-type-scoped reliability — TCP-style retransmits for TASK_REQUEST, fire-and-forget for PHERO_UPDATE).

// Server (echo handler at task_id=0x42)
pn::NetGateway<> gw({pn::Endpoint::any(9742)});
gw.register_handler(0x42u, &echo_handler);
gw.start();
std::jthread driver([&](std::stop_token t){ gw.run(t); });

// Client
pn::NetClient<> client({pn::Endpoint::localhost(9742)});
client.connect();
uint8_t resp[64]{};
auto n = client.submit_wait(0x42u, msg, msg_len, resp, sizeof(resp));

📊 Codec micro-benchmarks and loopback RTT figures (vs raw UDP, TCP echo, gRPC) live in docs/EXPERIMENTAL_BENCHMARKS.md. The codec numbers are reproducible to ~3%; the loopback RTT figures are heavily affected by Windows scheduling and should be read with the qualifications in that document.

Where PhyriadNet/1 wins:

  • Thin protocol overhead on top of raw UDP — framing + reliability + dedup + checksum cost a small percentage of bandwidth (see docs/EXPERIMENTAL_BENCHMARKS.md).
  • Frame-type-scoped reliability: TASK_REQUEST is reliable+ordered, PHERO_UPDATE is fire-and-forget. TCP can't model the latter.
  • Single header for end users (framework/netclient/) — C++17, no Phyriad dependency, compiles on gcc 8 / clang 7 / MSVC 19.14. ~400 LOC for the whole client.
  • Zero external dependencies. No protoc step. No DLL bloat (gRPC's shared libraries add 5-15 MB to a binary distribution).
  • The same phyriad::pool::Pool you use locally is what handles remote TASK_REQUESTs on the receiver side — one dispatch model for both same-process and over-the-network tasks.

Where the alternatives win:

  • gRPC: cross-language support (Java, Go, Python, …). Phyriad is C++ only. If your service mesh has non-C++ peers, gRPC is the right call.
  • TCP: kernel-managed reliability with no application-level retransmit logic. PN1 has retransmits but they're application-level and currently single-host validated only.
  • Raw UDP: zero overhead, but you re-invent framing, dedup, checksum, and session management. Realistic feature parity is ~300-500 LOC of hand-written code.

See docs/QUICKSTART_NET.md for a 5-minute runnable walkthrough and docs/internal/NET_PILLAR_DX_COMPARISON.md for a side-by-side LOC comparison with raw UDP / TCP / gRPC.

Status: WIP.


What is this for

Layer 1 is for you if you need: a fast lock-free thread pool with < 2 µs round-trip latency, predictable p99/p99.9 under sustained load, and zero allocation in the steady state. C++23, header-only-ish, no runtime dependencies.

Layer 2 is for you if you're building: emergent-coordination systems, sensor fusion pipelines, autoscalers, routers / classifiers, load balancers, last-writer-wins broadcast, or anywhere N agents need to observe shared state without a central queue.

What this is NOT for

Hard real-time (no microsecond-bounded GC, no deterministic scheduler — best-effort low-latency only)

Distributed (multi-machine) coordination — Phyriad is in-process and shared-memory only. No network primitives.

Strict ordering — stigmergy is by-design eventually-consistent. If you need every message delivered exactly once in order, use a ring (phyriad::transport::Ring<T>) not a Field<T>.

Anti-cheat compatibility out of the box — process affinity APIs trip many anti-cheat systems. Use only in single-player / anti-cheat-disabled scenarios.

Sub-nanosecond single-thread baseline — Phyriad's primitives are on the order of single-digit nanoseconds, but the floor is reached by hand-tuned single-purpose code that won't scale across threads.

A drop-in replacement for std::async — different API surface, different mental model.


Quick start — runtime (Layer 1)

git clone https://github.com/Swately/phyriad.git
cd phyriad
cmake -B build -G Ninja -DPHYRIAD_BUILD_BENCHMARKS=ON
cmake --build build --config Release --parallel
./build/bench/bench_pool_throughput

Using an AI assistant to integrate Phyriad? Paste docs/LLM_INTEGRATION_GUIDE.md into your assistant's context. It's a single self-contained file covering the build commands, verification benchmarks, the five most-used primitives with minimal working examples, and the gotchas that bite first-time integrators. Designed for developers who lean on Claude / ChatGPT / Copilot / Gemini for C++ work.

Expected output (first run on a clean Zen3+ box):

Round-trip p99 = 1200 ns
Sustained submit = 16.4 M/s
Pool tasks_completed = 4.6 M (2 s window), dropped = 0

Quick start — primitives (Layer 2)

cmake --build build --target bench_stigmergy_primitives
./build/bench/bench_stigmergy_primitives

Or in your own code:

#include <phyriad/stigmergy/Stigmergy.hpp>

// 32 sensors, each deposits noisy readings into its own slot
phyriad::stigmergy::Pheromone<uint16_t, 32> sensors;
phyriad::stigmergy::Field<FusedState>       fused;

// sensor i (any thread):
sensors.deposit(i, sample_sensor(i));

// fusion thread:
auto snap = sensors.read_all();
fused.publish(compute_median(snap));

// any consumer:
auto state = fused.read();   // wait-free

See examples/sensor_fusion/ for a live ImGui demo using all four primitives (Field, Pheromone, Classifier, Worker).


The stigmergy pattern

   N worker threads                            classifier
   ┌────────┐  deposit fill_pct[i]   ┌─────────────────────┐
   │ Worker₀│ ────────────────────▶  │ TaskClassifier      │
   │ Worker₁│ ────────────────────▶  │   reads metrics     │
   │ Workerₙ│ ────────────────────▶  │   reads Pheromone   │
   └────────┘                        │   picks tier        │
        ▲                            └─────────┬───────────┘
        │ publish metrics                      │ decide(task)
        │                                      ▼
   ┌─────────────────────────┐         dispatch to worker
   │ MetricsAggregator       │
   │   publishes Field       │
   └─────────────────────────┘

   Workers never message the classifier directly.
   They modify the shared environment (Field + Pheromone);
   the classifier observes the environment and decides.
   This is what termites do. Grassé named it stigmergy in 1959.

Documentation

Conceptual

QUICKSTART guides (5-minute, runnable)

Pillar reference (brief, what-is-it-for)

  • docs/pillars/ — one page per pillar: STIGMERGY · TOPOLOGY · PROCESS · SCHEMA · NODE · BEHAVIOR · CORRELATION · TUNING · ETW

Extensibility

  • docs/PLUGINS.md — write your own plugin (v3 control-plane today, v4 data-plane in v1.1)
  • docs/PGO.md — 2-pass PGO build for production binaries (+10-20% on the hot path)

Live demos


Showcase application: Ayama

Phyriad's stigmergy pillar was originally extracted from Ayama, a runtime optimizer for AMD V-Cache asymmetric CPUs. Ayama uses Phyriad's pool + stigmergy primitives to classify processes, pick policies, and pin game threads to the V-Cache CCD.

I built Ayama because I was frustrated that old games ran badly on my 7950X3D. The motivation, the methodology, and an honest discussion of what I measured (and what I didn't) are in docs/POR_QUE_AYAMA.md. Per-game A/B/A/B/A reports live in docs/ayama/reports/.

The classifier abstraction generalized far beyond gaming — that's how phyriad::stigmergy::Classifier<S, A> became Layer 2.


Build matrix

Platform Compiler Status
Windows 11 gcc 15.2 (MinGW-w64) ✅ Primary dev/test (69/69 tests, 100%)
Windows 11 MSVC 19+ 🟡 Build verified; bench numbers TBD
Linux x86-64 gcc 13.3 (libstdc++) ✅ Verified 2026-05-18 (54/58 tests, 93%); SLOs met
Linux x86-64 clang 18 (libc++ +-fexperimental-library) ✅ Verified 2026-05-18 (54/58 tests, 93%); some SIMD wins over gcc
Linux AArch64 clang 18+ 🟡 HAL is ARM-correct (STLR/LDAR); not yet HW-verified

All performance numbers in this README are from the Primary dev/test configuration. Linux + ARM numbers will land in follow-up releases as hardware verification completes.


Windows distribution — runtime DLLs and Defender

Two end-user concerns specific to the Windows MinGW build path.

Static MinGW runtime (zero external DLLs)

End-user-facing binaries (Ayama tooling) are built with statically linked MinGW runtime. Default MinGW links these three runtime libraries dynamically:

  • libstdc++-6.dll (C++ standard library)
  • libgcc_s_seh-1.dll (GCC runtime, SEH unwinding)
  • libwinpthread-1.dll (POSIX threads)

A user without MinGW on PATH sees three consecutive "DLL not found" errors and the binary fails to start. We solve this at link time:

# Defined in the root CMakeLists.txt — call once per end-user binary.
phyriad_static_mingw_runtime(<target>)

The helper passes -static -static-libgcc -static-libstdc++ only on Windows + MinGW. Linux, MSVC, Clang are no-ops (those toolchains either ship runtimes with the OS or already link statically). System DLLs (kernel32, user32, opengl32, etc.) stay dynamic — -static on MinGW is smart about which libraries actually have static archives.

Cost: ~2-3 MB extra per binary. Benefit: a clean dist with zero MinGW runtime DLLs to ship alongside.

Licensing — all three runtimes are static-link compatible:

  • libgcc / libstdc++ — GPL with the GCC Runtime Library Exception (explicitly permits static linking from non-GPL applications)
  • libwinpthread — MIT (unrestricted)

Windows Defender false positives

Ayama binaries are unsigned in v1.0 (no commercial code-signing cert yet). Defender and SmartScreen heuristics tend to flag any unsigned binary that calls SetProcessAffinityMask on processes owned by other users — process-affinity tools have a history of being bundled with cheats and grey-market "optimizers", so the heuristic is understandably aggressive.

Ayama is verifiably benign — Win32-only, no kernel driver, no code injection, no game-memory reads (see apps/ayama/action/ for the complete list of API calls) — but Defender can't tell that from the binary alone without a signature.

Mitigation shipped with the dist:

ayama-dist/
├── ayama-ui.exe
├── scripts/
│   └── add-defender-exclusion.ps1   ← whitelists Ayama with Defender
└── runtime/
    └── ...

The script:

  • Resolves the install dir from its own location (no hardcoded paths).
  • Uses Add-MpPreference (Microsoft's official cmdlet) to add path + process exclusions for Ayama only.
  • Supports -Remove to undo cleanly.

The script itself doesn't trigger Defender — it modifies Defender config through the sanctioned API and requires admin elevation (UAC).

Why a script instead of disabling Defender or signing?

Approach Cost Effective Decision
Disable Defender globally $0 Security regression for the whole system No
Per-binary exclusion script $0 Localized whitelist, easy undo Yes — shipped
Submit to Microsoft SmartScreen $0 Builds reputation over time (slow) Recommended in parallel
Code-signing certificate (EV/OV) $200-500/year Eliminates the warning entirely Tracked as v1.1 milestone
SignPath open-source signing $0 Same as commercial cert Application pending

For the framework-only build (no Ayama), Defender does not flag the framework's tests or benchmarks — they don't manipulate other processes' affinity.

Full design rationale (which option was picked and why the others were rejected, license compatibility for static linking, the signing roadmap, what the script does and does NOT do): see docs/WINDOWS_DISTRIBUTION.md.


Project status

v1.0.0 — first public release (2026-05-18). The framework reaches v1.0 with Linux parity verified (gcc 13 + clang 18 + libc++), a locked performance baseline, comparison data against Taskflow / Boost.Lockfree / concurrencpp, and the Ayama showcase application demonstrating real- world usage. See CHANGELOG.md for the full set of shipped features and known limitations.


Contributing

See CONTRIBUTING.md for development setup, coding standards, and PR workflow. Phyriad is open to bug reports and benchmark contributions; perf-sensitive changes require a scripts/check_perf_regression.ps1 pass.

Community standards: CODE_OF_CONDUCT.md.


License

Licensed under the Apache License, Version 2.0.

Phyriad is a research/engineering artifact. It is provided "as is" with no warranty. See LICENSE §7-8 for full disclaimers.


Citation

@software{phyriad_2026,
  author  = {Eduardo Ramos Mendoza (Swately)},
  title   = {Phyriad: A C++23 runtime built on stigmergic dispatch},
  year    = {2026},
  url     = {https://github.com/Swately/phyriad},
  version = {1.0.0},
  note    = {Stigmergy as a first-class library pillar. Cross-platform
             C++23 (Linux gcc 13 / clang 18 + libc++, Windows MinGW /
             MSVC; ARM-ready in source). Empirical cross-CCD scaling
             measurements published for AMD Zen X3D; Intel hybrid
             (P+E) and non-X3D multi-CCD AMD are code-tested and
             benchmarks are accepted as PRs (see CONTRIBUTING.md).}
}