Skip to content

op12no2/patchwork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Patchwork

An informal cumulative and comptitive frontier model eval using a Javascript chess engine.

Procedure

Assume A is currently the leading engine (initially 0000_original). A model/CLI is selected to improve it by creating a new engine B via prompt.md. If a B v A SPRT passes, B becomes the new leader. So for example 0002_sonnet_4_6 was derived from 0000_original, not 0001_haiku_4_5.

    /---> 0001          /---> 0004
0000 ---> 0002 ---> 0003 ---> 0005 ---> 0006 etc.

See bin/sprt.

Progress

Engine Diff Model CLI SPRT
0007_opus_4_7 Δ Anthropic Claude Opus 4.7 Claude Code
0006_gpt_5_5 Δ OpenAI GPT 5.5 Codex
0005_opus_4_7 Δ Anthropic Claude Opus 4.7 Claude Code
0004_gpt_5_5 Δ OpenAI GPT 5.5 Codex
0003_opus_4_7 Δ Anthropic Claude Opus 4.7 Claude Code
0002_sonnet_4_6 Δ Anthropic Claude Sonnet 4.6 Claude Code
0001_haiku_4_5 Δ Anthropic Claude Haiku 4.5 Claude Code
0000_original

Tournament

Rank Engine Elo Games Score Draws
1 0007_opus_4_7 2169 ±19.50 1400 75.8% 22.6%
2 0006_gpt_5_5 2063 ±16.41 1400 62.9% 29.7%
3 0005_opus_4_7 2020 ±16.18 1400 57.0% 33.0%
4 0004_gpt_5_5 2014 ±16.12 1400 56.2% 33.1%
5 0003_opus_4_7 2007 ±16.50 1400 55.2% 30.6%
6 0002_sonnet_4_6 1912 ±17.79 1400 41.6% 27.7%
7 0000_original 1800 ±18.23 1400 27.2% 27.6%
8 0001_haiku_4_5 1771 ±19.14 1400 24.1% 23.4%

See bin/tourny.

Notes

  • There is a Windows executable for each engine in ./engines for anybody that is interested.

Acknowledgements

About

An informal cumulative and comptitive frontier model eval using a Javascript chess engine

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors