Redesign intrinsic-test to use simple comparison#2063
Conversation
feb1dcd to
6ef8b8f
Compare
This seems weird ( unsafe extern "C" {
fn vdup_n_f16_wrapper(value: f16) -> float16x4_t;
}In fact most Edit: To work around this issue I have modified the tool to communicate with C via pointers (e.g. the C wrapper for @Amanieu is this intended behavior or a bug? |
e2346ff to
db1b2ca
Compare
|
Btw the time gains are significant, it reduces the Arm and aarch64 times to 2-3 minutes, and the full x86 run (we did 20% previously) to around 12 mins for release and 17 mins for dev |
|
Great work. Quick sanity check on |
|
@folkertdev ooh, that makes sense. I don't particularly care about windows, but we are using LLVM20 in the CI. I can change it to use the build from kernel.org |
|
I'm seeing |
|
yeah, but I can use the LLVM github builds or the kernel.org builds |
ce53e81 to
76dd339
Compare
|
Can f16 tests just be gated with |
|
@tgross35 the f16 tests are mostly fine now. More concerning is that a lot of tests are failing in all 3 arm archs, e.g. edit: sorry, my mistake, they are still failing in ARMv7. I will gate them against the flag |
|
With LLVM 22 |
|
FTZ/DAZ-related perhaps? |
I don't really think so, the outputs seem completely distinct. I noticed that use core::arch::aarch64::*;
#[unsafe(no_mangle)]
#[target_feature(enable = "neon")]
pub unsafe extern "C" fn foo(dst: *mut uint8x16x2_t, a: *const uint8x16_t, b: *const uint8x16_t) {
unsafe {
*dst = vzipq_u8(*a, *b);
}
}produces foo:
ld1 { v0.16b }, [x1]
ld1 { v1.16b }, [x2]
add x8, x0, #16
zip1 v2.16b, v0.16b, v1.16b
zip2 v0.16b, v0.16b, v1.16b
st1 { v2.16b }, [x0]
st1 { v0.16b }, [x8]
retBut the C code seemingly has different behavior on GCC and clang https://godbolt.org/z/T3YnrejjG @adamgemmell can you help in this? |
|
I'm not sure it will fix your issue but the difference in instructions comes from the fact that in arm_neon.h, they reverse every vector before and after the operation on big endian. It's not always actually necessary so we only do it if it's broken without it - however, the intrinsic test tool doesn't detect the difference in behaviour because both arguments it picks are identical. e.g.: |
|
You can try adding Also I don't actually see vzipq_u8 on the latest CI run, why is that? |
I have no idea, I can confirm that locally the test is generated and run.
I will check. Thanks Edit: @adamgemmell adding Edit2: sorry, |
|
None of the unsigned variants of vzipq seem to be seen there, weird. I'd quite like to know why this patch detects the difference - when I looked locally the codegen of the tests seemed very similar |
|
Yeah I fixed the test not being included, I used |
This comment has been minimized.
This comment has been minimized.
a057d30 to
2dfa840
Compare
This comment has been minimized.
This comment has been minimized.
19cfd95 to
421ca05
Compare
|
r? @Amanieu rustbot has assigned @Amanieu. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
@Amanieu @folkertdev sorry for the inconvenience, I will split up just one more PR (last one I promise). Figured out how to split the @rustbot author |
|
Error: The feature Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #triagebot on Zulip. |
8a85fab to
5a3c59a
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
| )*} | ||
| } | ||
|
|
||
| make_nice!(NiceF16(f16), NiceF32(f32), NiceF64(f64)); |
There was a problem hiding this comment.
What is "nice" here? is it really just partialeq wrapper? Maybe there is a better name for that? Or just document that that is the goal.
There was a problem hiding this comment.
yes, it is just wrapping PartialEq to deal with NaNs, I couldn't come up with a nice name so just named it Nice lol. Any suggestions for the name?
There was a problem hiding this comment.
idk, CustomPartialEqF16 etc?
There was a problem hiding this comment.
Decided to go with NanEqF32 and added a small comment
| .enumerate() | ||
| .map(|(i, chunk)| { | ||
| let c_filename = format!("c_programs/wrapper_{i}.cpp"); | ||
| let c_filename = format!("c_programs/wrapper_{i}.c"); |
There was a problem hiding this comment.
oh are we just C now (no c++ any more)?
There was a problem hiding this comment.
yea, C++ was used previously to take advantage of some templates, but C typically has much better compile times, and linking to C++ from Rust is weird af
| # top bits are undefined, unclear how to test these | ||
| _mm256_castph128_ph256 | ||
| _mm256_castps128_ps256 | ||
| _mm256_castpd128_pd256 |
There was a problem hiding this comment.
and we just didn't notice before?
There was a problem hiding this comment.
Actually not sure how, maybe the C compiler constant-folding some extracts to 0 (it was UB so the compiler could do anything)
Currently
intrinsic-testprints the outputs and then compares the outputs manually. This PR uses a different approach -- generate C wrappers for the intrinsics, link to them from Rust, and then just use simple rust tests to compare outputsIt is much easier to review commit-by-commit