From SIMD Wrappers to SIMD Ranges (pt 1).
Joel Falcou, Denis Yaroshevskiy
This talk
-
tinyurl.com/jfdy2025pt1
- SIMD ranges are coming
- We want to knowledge share
- We need to teach you simd first
- Discussed with Matthias Kretz
No Magic Compiler
- std::unseq is in progress
Let's explain some SIMD
- memcmp / memchr / strlen
- reduce
- structs
- min_element
- copy_if
eve library
-
github
- Joel Falcou, Jean-Thierry Lapresté, Alexis
Aune,
Denis
Yaroshevskiy
- eve::algo::mismatch
Vector Processor Extensions
- x86
- 128 bits: SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2
- 256 bits: AVX, AVX2, XOP
- 512 bits: AVX512 and its myriad of sub-genre
- ARM
- 128 bits: NEON, ASIMD
- SVE (VLS/VLA)
- RVV
- PowerPC
- WASM
Strlen
- allocations happen in pages
- aligned addresses are safe
memcmp/memchr/strlen conclusions
- Compilation target is important
- Types are important
- You can't just do "for each"
- Overlapping registers tail
handling
- Aligned loads tail handling
- Registers can contain garbage
memcmp/memchr/strlen acknowledgements
reduce
- reduce to the same type
- reduce to a different type
reduce conclusions
- shuffles
- different interfaces
- mixing types
min_element(1)
- std::reduce(rng, min)
- reduce + find
min_element conclusions
- index handling
- writing a loop is difficult
- "Not for each"
min_element acknowledgements
copy_if conclusions
- compress_*
- differrent interfaces
- "Not for each" (two loops inside)
copy_if acknowledgements
- tiny lookup tables (@aqrit)
- bmi2 (Peter Cordes)
- switch + shuffle (@Z boson)
- simd-scalar: Peter Cordes, Ilya Albrecht