A harmonic-substrate programming language with first-class φ, dual-band execution, an LLVM-backed JIT, self-healing, and an O(log_φπfib N) algorithm family — built toward a transformerless LLM.
OMC is not a thin layer over IEEE-754 and types. Its substrate is φ (the golden ratio) and the canonical 40-entry Fibonacci attractor table reaching 63,245,986. Every harmonic operation in the language — fold(n), phi.res(n), harmony(x), zeckendorf(n), substrate_search(arr, target), the heal pass's literal-rewrite, the bucketing in the harmonic anomaly detector — routes through the same substrate.
It runs as one binary with two execution engines kept byte-identical, optional LLVM-18 JIT producing dual-band SSE2 code, embedded CPython for bidirectional interop, WASM and LSP targets, a self-hosting compiler that's gen2==gen3 byte-identical, a self-healing pass that fixes typos/off-attractor literals/divide-by-zero, and a registry-backed package manager.
The endpoint is a transformerless LLM — a model whose attention, positional encoding, and OOD gating are built from harmonic primitives instead of softmax + sinusoidal PE + L2. CRT-Fibonacci positional encoding wins -19.9% (tiny scale) and -5.4% (TinyShakespeare scale) vs sinusoidal. HBit cross-cutting tension is a reference-free OOD signal at AUROC 1.0. The architectural pieces are being built and measured one at a time.
These are concrete, present-in-the-code features, not aspirations:
-
The substrate is a primitive, not a library.
HInt, OMC's integer type, carries a φ-resonance and HIM score computed at construction. EveryValue::HInt(_)ever created has been routed throughcompute_resonanceandnearest_attractor_with_dist. This is at the type level, not at the user-code level. -
Dual-band executable code. OMC values have a classical α-band and a harmonic shadow β-band, packed into LLVM
<2 x i64>SSE2 vectors inside JIT'd functions.phi_shadow(x)makes β diverge;harmony(x)reads the substrate-routed coherence between the bands. Branch elision based on harmony is shipped: high-coherence inputs skip entire conditional blocks at native code speed. -
O(log_phi_pi_fibonacci N) algorithm family. The
phi_pi_fib_search_v2algorithm uses F(k)/φ^(π·k) split-points — each iteration shrinks the live range by φ^π ≈ 4.534, not 2. The substrate-canonical iteration bound islog_phi_pi_fibonacci(n) ≈ 0.459 · log₂ n. Exposed as a complete primitive family:substrate_search,substrate_lower_bound,substrate_upper_bound,substrate_rank,substrate_count_range,substrate_slice_range,substrate_intersect,substrate_difference,substrate_insert,substrate_quantile,substrate_select_k,substrate_nearest,substrate_min_distance,substrate_hash. -
Zeckendorf as first-class integer encoding. Every positive integer has a unique sum of non-consecutive Fibonaccis (Zeckendorf 1972). OMC exposes the canonical encoder/decoder:
zeckendorf(n) -> [indices],from_zeckendorf(idxs) -> n, pluszeckendorf_weight,zeckendorf_bit,is_zeckendorf_valid, andsubstrate_hash(Zeckendorf-mixed avalanche). The iteration count is bounded bylog_phi_pi_fibonacci(n). -
CRT-Fibonacci positional encoding wins on a real LM training task. Pairs of
(sin(2π·pos%m_i/m_i), cos(2π·pos%m_i/m_i))with Fibonacci moduli{5, 8, 13, 21, ...}. Validation loss −19.9% at toy scale (4/5 seeds) and −5.4% at TinyShakespeare scale (3/3 seeds) vs Vaswani sinusoidal. Seeexperiments/transformerless_lm/README.mdfor the full numbers. -
Self-healing compiler. A 5-class heal pass runs at the AST level: typo correction (Levenshtein over the symbol table), off-attractor literal snap, divide-by-zero rescue, arity correction, parser-error recovery. Enabled with
OMC_HEAL=1. -
Two-engine byte-identical parity. A tree-walking interpreter AND a bytecode VM, kept lockstep. 44 of 45 functional examples produce byte-identical output between engines (the diverger is a timing-only benchmark). Verified by
--audit FILE. -
Self-hosting compiler V.9b. Compiles itself; gen2 == gen3 byte-identical. See
examples/self_hosting_v9b.omc. -
@harmonyand@predictJIT pragmas. Mark a function or branch as harmony-eligible; the JIT compounds these with@hbitfor layered speedups (270× alone; 95% additional branch reduction with@harmony+@predict). -
Substrate-routed harmonic libraries.
harmonic_anomalybeats scikit-learn's IsolationForest 10/10 vs 7/10 on multi-dim credential-stuffing detection (the structural-anomaly regime).
git clone https://github.com/RandomCoder-lab/OMC.git
cd OMC
PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 cargo build --release -p omnimcode-cli
./target/release/omnimcode-standalone --init
./target/release/omnimcode-standalone main.omcFor the JIT path (LLVM-backed, dual-band native code):
sudo apt install llvm-18-dev libpolly-18-dev libzstd-dev
PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 LLVM_SYS_180_PREFIX=/usr/lib/llvm-18 \
cargo build --release -p omnimcode-cli --features llvm-jit
PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 OMC_HBIT_JIT=1 OMC_HBIT_JIT_VERBOSE=1 \
./target/release/omnimcode-standalone main.omcEligible user fns get compiled to dual-band native code. 272× over tree-walk, 119× over the bytecode VM on factorial(12); see docs/jit_benchmark.md.
OMC ships one binary. Six layers, each validated:
A 40-entry FIBONACCI table reaching 63,245,986 and a single canonical search algorithm:
nearest_attractor_with_dist(value)— closest Fibonacci attractor + distance,#[inline]phi_pi_fib_search_v2— F(k)/φ^(π·k) split-point search with binary-search fallback when offset rounds to zerolog_phi_pi_fibonacci(n) = ln(n) / (π · ln(φ))— the substrate's iteration-count boundzeckendorf_indices/from_zeckendorf_indices— canonical sparse decompositionsubstrate_search_i64/substrate_lower_bound/substrate_upper_bound— O(log_φπfib N) primitives backing the OMC builtinsattractor_bucket— FIBONACCI-table index of nearest attractor; used for substrate-aligned hash bucketing
Every harmonic operation in the language routes through these. 16 of 17 duplicate Fibonacci arrays were deleted across the codebase; the substrate is single-source. See SUBSTRATE_CHANGES.md for the audit.
Values carry α (classical) and β (harmonic shadow). Inside JIT'd code, they're packed into <2 x i64> LLVM vectors so harmonic ops execute as SIMD on both lanes.
phi_shadow(x)— makes β diverge from αharmony(x)— substrate-routed coherence:1 / (1 + attractor_distance(|α-β|))@hbit,@harmony,@predictfunction-level pragmas- Branch elision on harmony: 95.2% reduction on high-harmony inputs with 5–8% break-even fraction
- LLVM 18 via inkwell, feature-gated as
llvm-jit - 77 codegen tests pass: locals via allocas, CFG branches, loops, recursion, comparisons, floats, arrays (read + write), cross-fn calls, L1.6 array bridges (both directions), 22 harmonic-primitive intrinsics
- Dual-band lowerer produces packed
<2 x i64>forphi_shadowandharmony - Cascade-cleanup: failed-to-lower fns get
unreachabletrap stubs (not raw deletion), plus a fixpoint marks dependent fns as failed OMC_HBIT_JIT_VERIFY=1,OMC_HBIT_JIT_DUMP_IR=1for diagnostics- Empirical: 272× on factorial(12), 115× on array-sum hot loop, 10.6× on substrate-heavy mixed workload
L1.6 Array↔JIT bridge (both directions): Value::Array(int_only) marshals to a length-prefixed Box<[i64]> for arg-passing; the @jit_returns_array_int pragma triggers omc_arr_heapify so a JIT'd fn can return a Value::Array it built internally. Same layout both ways, no codegen changes needed at the lowerer.
JIT'd harmonic primitives (table-driven HARMONIC_INTRINSICS in dual_band.rs, 3 lines per new entry):
| Arity | Primitives |
|---|---|
i64 → i64 |
nth_fibonacci, is_attractor, attractor_distance, hbit_tension, fibonacci_index, attractor_bucket, substrate_hash, zeckendorf_weight, bit_count, bit_length, digit_sum, digit_count, harmonic_align, harmonic_unalign |
i64, i64 → i64 |
gcd, lcm, safe_mod |
i64, i64, i64 → i64 |
mod_pow |
array_ptr → i64 |
arr_sum_int, arr_product, arr_min_int, arr_max_int |
array_ptr, i64 → i64 |
int_binary_search, int_lower_bound, substrate_search |
A first-class API surface (50+ builtins this session alone):
| Substrate-routed (probe count: log_φπfib N) | Native baseline (probe count: log₂ N) |
|---|---|
substrate_search(arr, t) |
int_binary_search(arr, t) |
substrate_lower_bound(arr, t) |
int_lower_bound(arr, t) |
substrate_upper_bound(arr, t) |
int_upper_bound(arr, t) |
substrate_rank, substrate_count_range, substrate_slice_range |
— |
substrate_intersect, substrate_difference |
sorted_merge, sorted_union, sorted_dedupe |
substrate_insert, substrate_quantile, substrate_select_k |
— |
substrate_nearest, substrate_min_distance |
— |
substrate_hash, attractor_bucket |
fnv1a_hash |
Plus Zeckendorf encoding (zeckendorf, from_zeckendorf, zeckendorf_weight, zeckendorf_bit, is_zeckendorf_valid), substrate analytics (harmonic_align, harmonic_unalign, harmonic_score, harmonic_resample, resonance_band_histogram, is_phi_resonant, phi_pi_log_distance), and phi primitives (phi_pow, phi_pi_pow, nth_fibonacci, attractor_table, fib_chunks).
The architectural trade-off: substrate ops use fewer probes (7.3 vs 16 at N=65536) but each probe pays for F(k)/φ^(π·k) floating-point math. Native int-binary wins on raw throughput against uniform data; substrate ops win when probe sequence coherence matters (substrate-indexed data, attractor-aligned queries). Both paths coexist so callers pick. See experiments/substrate_primitives/bench_substrate_search.omc.
Substrate-routed end-to-end. harmonic_anomaly (+ v2 with substrate-routed lookup), harmonic_clustering, harmonic_recommend. Plus the high-level examples/lib/substrate.omc wrapper exposing s_* (substrate-routed), i_* (int-binary), h_* (harmonic) naming.
Anomaly detection vs scikit-learn IsolationForest (full results in docs/anomaly_detection.md):
| Workload | OMC harmonic | IsolationForest |
|---|---|---|
| Multi-dim credential stuffing, K=10 | 10/10 | 7/10 |
| Multi-dim K=25 | 24/25 | 17/25 |
| Multi-dim K=50 | 49/50 | 40/50 |
OMC loses on volumetric-dominated data (NSL-KDD K=500: 302 vs 351). Ties on simple time-series. The pattern: harmonic substrate is a structural detector, not a primary computation replacement.
JIT integration impact on NSL-KDD harmonic_anomaly fit (5000 rows, 6 dims):
| Configuration | fit + score | Speedup vs tree-walk |
|---|---|---|
| Tree-walk | 363 ms | 1× |
| JIT pre-L1.6 (arrays in dispatch → tree-walk) | 363 ms | 1× (no JIT actually used) |
| JIT + L1.6 input bridge | 191 ms | 1.9× |
| JIT + L1.6 + harmonic-primitive intrinsics | 107 ms | 3.4× |
| JIT + L1.6 + intrinsics + harmonic_anomaly_v2 (substrate_search) | 271 ms total fit+score | substrate-routed lookup keeps recall byte-identical (nsl_kdd_v1_vs_v2.omc) |
- Self-hosting compiler V.9b (gen2 == gen3 byte-identical)
- Self-healing pass — 7 classes of automatic correction (typo, arity-pad, arity-truncate, div-zero → safe_divide, mod-zero → safe_mod, harmonic-index snap, missing-return). Substrate-routed typo lookup uses 32-bucket
substrate_hash_nameindex for ~10× speedup on projects with hundreds of names (docs/heal_pass.mdhas the bench table). Per-class disable pragmas (@no_heal_typo, etc.) + per-pass heal budget. - Two-engine parity verified by
--audit FILE - Embedded CPython via PyO3:
py_import,py_call,py_callback("omc_fn")for callbacks - WASM target (
omnimcode-wasm, no LLVM/Python deps) - LSP server (
omnimcode-lsp) + VS Code extension - Package manager (
--installfrom registry, sha256-verified, or arbitrary URL) - 161 OMC tests + 77 codegen tests + cargo unit tests — all green
A modern transformer has four primitives. The hybrid LLM experiments measure each against a harmonic alternative:
| Transformer piece | Harmonic alternative | Empirical status |
|---|---|---|
| Sinusoidal PE | CRT-Fibonacci PE (pairwise-coprime moduli {5, 8, 13, 21, ...}) | Harmonic wins: −19.9% loss (tiny), −5.4% on TinyShakespeare (3/3 seeds) |
| Softmax attention | OmniWeight (`φ^(- | q-k |
| Softmax-only attention | Hybrid: softmax × HBit-tension gate | Harmonic wins on adversarial mixes (experiment 12) |
| L2-NN OOD detection | HBit cross-cutting tension | Harmonic wins: AUROC 1.0 on scenario A |
CRT-PE is the first per-component substitution that beats the transformer baseline on a real LM training task, at two orders of magnitude in both model and data scale. The transformerless thesis is now testing whether the same substitution holds at modern transformer scale.
See experiments/hybrid_llm/README.md for the per-experiment record and experiments/transformerless_lm/README.md for the end-to-end LM results.
# φ-resonance is built into the integer type. No imports needed.
h x = 89; # FIBONACCI[11], on-attractor
println(phi.res(x)); # high resonance
# Substrate-routed O(log_phi_pi_fibonacci N) search:
h sorted = arr_range(0, 1000000);
println(substrate_search(sorted, 524287)); # 524287
# Zeckendorf decomposition: every int is a unique sum of non-consecutive Fibonaccis
h z = zeckendorf(100); # [11, 6, 4] (89 + 8 + 3)
println(from_zeckendorf(z)); # 100 (round-trip)
# Substrate-coherence diagnostic — how Fibonacci-aligned is this data?
println(harmonic_score([89, 1, 1, 2, 3, 5, 8])); # 1.0
println(harmonic_score([100, 7, 42, 99])); # 0.0
# Self-healing: typo + off-attractor literal both auto-corrected by `--check`
fn near_attractor(x) {
return fold(x); # snap to nearest substrate attractor
}
# Dual-band JIT: harmony(x) reads coherence at native code speed
@harmony
fn coherent_loop(n) {
h i = 0;
while i < n {
if harmony(i) > 0.5 { # branch eliminable on high-harmony inputs
i = i + 1;
} else {
i = i + 2;
}
}
return i;
}
- The transformerless LLM itself. CRT-PE wins at the per-component level; the hybrid attention gate as currently formulated lost the distractor-mix test 0/3 (
experiments/transformerless_lm/distractor_mix_README.md). Two concrete follow-on architectures documented (score-level gate, learned-threshold gate). Building a harmonic-only architecture top-to-bottom and training it competitively is the next step. - AVX-512 widening. Dual-band uses
<2 x i64>(SSE2). Wider lanes need array-processing OMC fns to fill them. - JIT for float-returning harmonic primitives.
harmony_value/value_dangershims exist as extern Rust fns; the dispatch boundary needs areturns_floatflag mirroringreturns_array_intto materialize their f64-bit-pattern returns correctly. Nothing in current hot paths needs it. - JIT for string/dict ops. Pure JIT operates on i64 only; strings and dicts stay tree-walk by design. The harmonic libraries' L1 rewrite to array-of-hashed-int eliminated this constraint for the hot path (now 3.4× faster end-to-end).
| File | Story |
|---|---|
experiments/transformerless_lm/ |
CRT-PE wins on real LM training, two scales, 3-of-3 + 4-of-5 seeds |
experiments/transformerless_lm/distractor_mix_README.md |
Adversarial-mix scaling test: CRT-PE generalizes, hybrid gate falsified |
experiments/hybrid_llm/experiment_5_hbit_combined.omc |
HBit cross-cutting tension as reference-free OOD: AUROC 1.0 |
experiments/hybrid_llm/experiment_12_hybrid_attention.omc |
Hybrid softmax × HBit-gate attention beats softmax on adversarial mixes |
experiments/substrate_primitives/bench_substrate_search.omc |
4-way bench: linear vs OMC binary vs substrate vs native int-binary |
examples/datascience/nsl_kdd_v1_vs_v2.omc |
A/B: harmonic_anomaly v1 (linear) vs v2 (substrate_search) — 10.3% speedup, identical recall |
examples/datascience/multidim_anomaly.omc |
Credential stuffing: harmonic 10/10 vs IsolationForest 7/10 @ K=10 |
examples/datascience/nsl_kdd_validation.omc |
NSL-KDD intrusion detection — honest mixed result, JIT 3.4× via L1.6 + intrinsics |
examples/self_hosting_v9b.omc |
Self-hosting compiler, gen2 == gen3 byte-identical |
examples/lisp.omc |
Mini Scheme interpreter in OMC |
examples/datascience/titanic.omc |
Kaggle Titanic via embedded Python pipeline |
examples/lib/substrate.omc |
High-level wrappers around the substrate primitives |
examples/tests/test_heal_pass.omc |
16 tests for the self-healing compiler's heal classes + per-class pragmas |
| Path | What |
|---|---|
omnimcode-core/ |
Parser, AST, interpreter, bytecode VM, substrate (phi_pi_fib), HBit, harmonic types, 50+ substrate builtins, substrate-routed heal pass |
omnimcode-codegen/ |
LLVM-backed JIT, dual-band lowerer, L1.6 array bridges, 22 harmonic-primitive intrinsics (table-driven) |
omnimcode-cli/ |
Standalone binary (omnimcode-standalone) + omc-bench |
omnimcode-wasm/ |
WebAssembly target (no LLVM, no Python) |
omnimcode-lsp/ |
LSP server for editor integration |
omnimcode-gdextension/ |
Godot 4 GDExtension binding |
experiments/transformerless_lm/ |
PyTorch end-to-end CRT-PE vs sinusoidal training, two scales |
experiments/hybrid_llm/ |
12 pure-OMC per-component substitution experiments |
experiments/substrate_primitives/ |
Empirical comparison of substrate vs native vs OMC search |
examples/lib/ |
substrate.omc, harmonic_anomaly, harmonic_clustering, harmonic_recommend, np/pd/sklearn/torch/requests/sqlite |
examples/datascience/ |
Real-data demos with honest numbers |
examples/tests/ |
test_substrate_primitives.omc (57), test_new_builtins.omc (70), test_harmonic_libs.omc (18), test_heal_pass.omc (16) — 161 total |
docs/ |
Substrate audit, JIT benchmarks, anomaly-detection comparisons |
registry/ |
Central package registry (sha256-verified) |
omnimcode-standalone FILE # run a program
omnimcode-standalone # REPL
omnimcode-standalone --init # scaffold project
omnimcode-standalone --install [SPEC] # package install
omnimcode-standalone --check FILE # heal-pass diagnostics
omnimcode-standalone --fmt FILE # pretty-print
omnimcode-standalone --test FILE # run fn test_*() suite
omnimcode-standalone --bench FILE # run fn bench_*() suite
omnimcode-standalone --audit FILE # tree-walk vs VM divergence check
omnimcode-standalone --help # all flags + env vars
OMC_HBIT_JIT=1 # JIT-compile eligible user fns via omnimcode-codegen
OMC_HBIT_JIT_VERBOSE=1 # report which fns got JIT'd
OMC_HBIT_JIT_VERIFY=1 # LLVM module verification (debug)
OMC_HBIT_JIT_DUMP_IR=1 # dump LLVM IR for inspection
OMC_VM=1 # use bytecode VM (default: tree-walk)
OMC_HEAL=1 # auto-heal AST iteratively
OMC_HEAL_RETRY=1 # retry after runtime errors
OMC_NO_PYTHON=1 # skip embedded Python init
OMC_REGISTRY=<url> # alternative package registryomnimcode-standalone --install harmonic_anomaly # registry name (sha256-verified)
omnimcode-standalone --install # everything in omc.toml
omnimcode-standalone --install https://example.com/raw/lib.omc # explicit URL
omnimcode-standalone --list # what's installedomc.toml example:
[package]
name = "my-omc-project"
version = "0.1.0"
[dependencies]
np = "np"
sklearn = "sklearn"
substrate = "substrate"
custom = "https://example.com/raw/my_lib.omc"Submit a package: PR an entry to registry/index.json.
| Layer | Status |
|---|---|
Substrate (log_phi_pi_fibonacci everywhere) |
shipped, audited |
| O(log_phi_pi_fibonacci N) primitive family | shipped, 50+ builtins, 57 tests |
| HBit dual-band executable | shipped (OMC_HBIT_JIT=1) |
| LLVM JIT for pure-int/array/float | shipped, 77 codegen tests, 272× factorial(12), 3.4× harmonic_anomaly NSL-KDD |
| L1.6 Array↔JIT bridges (both directions) | shipped, 11 codegen tests; 115× synthetic, 1.9× real-world harmonic_anomaly |
| 22 harmonic-primitive JIT intrinsics | shipped, table-driven, 10.6× substrate-heavy hot loop |
| Zeckendorf encoding + substrate hash | shipped |
| harmonic_anomaly v2 (substrate_search lookup) | shipped, 10.3% speedup, byte-identical recall to v1 |
| Harmonic libraries on real data | shipped, mixed-honest results |
| Hybrid LLM experiments (12 experiments) | shipped, 1 perfect AUROC, 1 architectural negative, 1 CRT-PE win |
| End-to-end transformerless LM (PyTorch) | CRT-PE wins -19.9% (tiny), -5.4% (TinyShakespeare, 3/3 seeds), -2.9% (distractor mix, 3/3) |
| Hybrid HBit-gate distractor-mix test | falsified at current gate formulation (0/3 wins), score-level / learned-threshold reformulations documented |
| Self-hosting compiler V.9b | shipped, gen2 == gen3 byte-identical |
| Self-healing pass (7 classes, substrate-routed typo) | shipped, OMC_HEAL=1, 10× typo lookup, 16 tests, per-class pragmas |
| Two-engine parity (tree-walk + VM) | shipped, 44/45 byte-identical |
| Embedded CPython + callbacks | shipped, 6 wrapper libs |
| WASM + LSP + GDExtension targets | shipped |
| Package manager + registry | shipped |
| Per-component harmonic wins on PE | shipped (CRT-PE), validated under noise |
| Per-component harmonic wins on attention | hybrid only — clean softmax still wins; gate reformulation pending |
| End-to-end transformerless model | not yet — building blocks validated, integration is the open work |
Build dependencies for the JIT path: llvm-18-dev, libpolly-18-dev, libzstd-dev. For the no-JIT build, just Rust + (optionally) Python 3.
License: MIT.
Built around φ (1.6180339887…). The substrate is the architecture. The transformerless LLM is what the substrate is for.