some perf improvements#3
Conversation
|
"some" 🤣 I actually started with a variant of this approach. There are two big problems with it: maintainability and extensibility. I wonder if extensibility is worth it though. Need to think about this. |
Been working a lot with external Markdown lately. Like the ones that websites now provide for agents. Many have their own MDX components. If you want to allow MDX parsing eventually, then the extensibility is probably worth it. Not sure how others consume Markdown, but I'd assume that parsing most of the time is a (near) one-time cost, thanks to result caching. |
|
Maintainability is improved, the whole |
Fixes #1, #2.
Replaces the Lexer/LexerRules/Tokens object graph with a single-pass parsee of free functions in
src/Parser.php. Output is byte-identical; 104 existing tests pass unchanged. Public API (Parser,Markdown) preserved.The previous parser allocated approximately 1800 token objects per parse and cloned the Parser+Lexer tree once per block for inline re-parsing. The new implementation scans the source once with
strcspn/strposand writes HTML directly into a by-ref string buffer.Benchmark (composer bench:tempest, opcache+JIT, same host):