Advanced SIMD algorithms in pictures
Denis Yaroshevskiy
Vector Processor Extensions
- x86
- 128 bits: SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2
- 256 bits: AVX, AVX2, XOP
- 512 bits: AVX512 and its myriad of sub-genre
- ARM
- 128 bits: NEON, ASIMD
- SVE (VLS/VLA)
- PowerPC
- WASM
compress_copy (pure simd)
- compress instructions (avx512, sve)
- tiny lookup tables (@aqrit)
- bmi2 (Peter Cordes)
- switch + shuffle (@Z boson)
compress_copy (simd/scalar)
- Peter Cordes, Ilya Albrecht
set intersection (simd/simd)
sort(wide)
- sorting networks
- bitonic sort
Quick Sort (components)
- pivot selection
- partition
- insertion sort
Thanks
- Joel Falcou, Jean-Thierry Lapresté
- Amazing people mentioned