Refactor(a5): align profiling stack with a2a3 (host CRTP + stable ring) by ChaoZheng109 · Pull Request #777 · hw-native-sys/simpler

ChaoZheng109 · 2026-05-14T09:45:45Z

Summary

Brings a5's profiling stack to the same shape as a2a3 (PRs #705 / #709 / #714):

Host CRTP framework: L2PerfCollector / PmuCollector / TensorDumpCollector now derive from ProfilerBase<Derived, Module> over a shared BufferPoolManager<Module> under src/a5/platform/include/host/profiling_common/. Mgmt + poll threads + buffer pool logic deduplicated across three collectors.
Stable AICore staging ring: AICore writes per-task records to a per-core L2PerfAicoreRing / PmuAicoreRing allocated once by the host; address is published via KernelArgs::aicore_*_ring_addrs[block_idx] and never reassigned. Decouples AICore writes from AICPU's rotating records buffer, and fixes the latent PMU buffer-flip bug where AICore kept writing to the stale buffer.
Profiling out of Handshake: enablement bits + per-core address arrays moved from runtime Handshake to KernelArgs + AICore platform-owned slots (aicore_profiling_state.h). Adding a new profiling field no longer touches the runtime sync protocol. AICore resolves its PMU MMIO base directly from the existing KernelArgs::regs table at kernel entry (regs[get_physical_core_id()]), so there is no separate per-core pmu_reg_addrs table and no AICPU-fill-after-handshake dependency — the reg base is stable from Phase 1, and aicore_execute caches it at Phase 3 alongside the rings.
Three-bucket reconcile: L2PerfBufferState and PmuBufferState now track total / dropped / mismatch; collectors cross-check collected + dropped + mismatch == device_total per pool, matching a2a3.

a5's transport channel still differs from a2a3 (no halHostRegister on DAV_3510). The framework absorbs that as:

MemoryOps carries 5 callbacks (adds copy_to_device / copy_from_device)
mgmt loop mirrors the shm region per tick via profiling_copy.h
release_owned_buffers frees the paired host-shadow malloc()

Docs

docs/profiling-framework.md §8 — new a5-specifics section
docs/dfx/{pmu-profiling,l2-swimlane-profiling,tensor-dump}.md §5.x — a5 transport channel + lifecycle + reconcile semantics updated to match the code as it stands

Naming note

a5's new KernelArgs fields are named *_addrs (plural) for the per-core address arrays: aicore_l2_perf_ring_addrs, aicore_pmu_ring_addrs. a2a3's aicore_ring_addr (singular but pointing to an array) is logged for a follow-up a2a3 rename PR; not touched here per the arch-independence rule.

Test plan

python -m simpler_setup.build_runtimes --platform a5sim — host_build_graph + tensormap_and_ringbuffer build clean
a5sim CI (./ci.sh -p a5sim)
a5 hardware smoke
Verify reconcile equation collected + dropped + mismatch == device_total matches per pool on a representative case

gemini-code-assist

Code Review

This pull request refactors the profiling framework for the a5 architecture, centralizing host-side infrastructure for PMU, L2Perf, and TensorDump collectors. Key architectural changes include migrating profiling state from the runtime handshake to KernelArgs, introducing stable AICore staging rings to decouple writes from buffer rotations, and implementing a host-shadow transport mechanism to support systems without SVM. Review feedback recommends adding warning logs for null headers during buffer switches and utilizing store barriers when initializing hardware registers to prevent race conditions on weak memory model architectures.

…ICore ring, profiling off Handshake) Brings a5 to the same shape as a2a3's PR [hw-native-sys#705](hw-native-sys#705) / [hw-native-sys#709](hw-native-sys#709) / [hw-native-sys#714](hw-native-sys#714): host-side collectors share a ProfilerBase<Derived, Module> + BufferPoolManager<Module> framework; AICore writes through a stable per-core L2PerfAicoreRing / PmuAicoreRing decoupled from AICPU buffer rotation; profiling state moves off Handshake onto KernelArgs + AICore platform-owned slots (aicore_profiling_state.h). a5's transport channel deviates only in MemoryOps carrying copy_to_device / copy_from_device, the mgmt loop mirroring the shm region per tick, and release_owned_buffers freeing the paired host shadow. AICore now resolves its own PMU MMIO base at kernel entry directly from KernelArgs::regs[get_physical_core_id()] (the per-physical-core register-base table the host already fills for AICPU), instead of indexing a separate pmu_reg_addrs table that AICPU filled during handshake. The resolved base is valid from Phase 1 onward, so aicore_execute caches it at Phase 3 alongside the rings rather than re-reading per PMU record. Drops KernelArgs::pmu_reg_addrs, set/get_platform_pmu_reg_addrs, and the corresponding host-side table allocation in PmuCollector.

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/a5/platform/src/aicpu/l2_perf_collector_aicpu.cpp

Comment thread src/a5/platform/src/aicpu/pmu_collector_aicpu.cpp

ChaoZheng109 force-pushed the refactor/a5-profiling-framework-alignment branch 4 times, most recently from c0630d2 to 0572269 Compare May 18, 2026 02:52

ChaoZheng109 force-pushed the refactor/a5-profiling-framework-alignment branch from 0572269 to a5a594b Compare May 18, 2026 03:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor(a5): align profiling stack with a2a3 (host CRTP + stable ring)#777

Refactor(a5): align profiling stack with a2a3 (host CRTP + stable ring)#777
ChaoZheng109 wants to merge 1 commit into
hw-native-sys:mainfrom
ChaoZheng109:refactor/a5-profiling-framework-alignment

ChaoZheng109 commented May 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChaoZheng109 commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Docs

Naming note

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ChaoZheng109 commented May 14, 2026 •

edited

Loading