perf(walk): flatten recursion via rayon::scope, drop -S stack-size knob#574
Open
arcuru wants to merge 1 commit into
Open
perf(walk): flatten recursion via rayon::scope, drop -S stack-size knob#574arcuru wants to merge 1 commit into
arcuru wants to merge 1 commit into
Conversation
Assisted-by: Claude Opus 4.7 (code generation, refactoring, code review)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces the recursive
walk()function with a scope-based work queue. Each directory becomes an independent rayon task that spawns subtasks for its subdirectories, then completes. AnAtomicUsizepending counter alongside aMutex<Vec<Node>>for children, combined with anArcparent chain,iteratively bubbles completions up to the root.
Stack depth is now O(1) regardless of tree depth, so the
-S/--stack-sizeflag (and its 1 GiB-per-thread heuristic at startup) are no longer needed.sysinfois dropped with them.Benchmarks
The gains primarily come from 3 places:
sysinfo::System::new_all()(costs >100ms on my host and is unnecessary even on the old path)Further Work
I have a stack of changes on top of this with further work, most of which is fairly minor improvements but some edge cases benefit a lot.
e.g. using AT_STATX_DONT_SYNC can give big speed boosts for networked drives from my testing. Let me know if you'd be interested in further changes, I have to clean them up a little bit.Nvm I got bored - #575The main blocker here becomes the stat calls on everything. For cold caches, slow drives, and networked drives they still benefit quite a bit from more parallelism in the stat calls. In one case I was able to see a 14x improvement for a networked filesystem by running everything async, but that heavily regressed the more common warm cache performance.
After experimenting with io_uring, async rust, batching, and more, I didn't find anything that was a pure win for the common case and also handled the edge cases well without a major rewrite. To squeeze much more performance out of the core loop in all scenarios you likely need to switch to a custom threadpool that can be resized as needed. I think that's what ripgrep does. I suspect further speed boosts probably aren't desirable here.
Increasing the thread count with
-Tis easy though and would get perf conscious people most of the way there. In the warm cache case you don't want to increase that above core count so the default still shouldn't change.AI-Usage
I am a real human person who is knowledgeable in this area. I did use AI to assist with this change and as such I labeled commits with "Assisted-by:" following the same scheme used by Fedora.