Advanced SIMD algorithms in pictures

Denis Yaroshevskiy

This talk

memcmp

eve library

Vector Processor Extensions

  • x86
    • 128 bits: SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2
    • 256 bits: AVX, AVX2, XOP
    • 512 bits: AVX512 and its myriad of sub-genre
  • ARM
    • 128 bits: NEON, ASIMD
    • SVE (VLS/VLA)
  • PowerPC
  • WASM

What's inside mismatch?

  • find_if + zip

mismatch from scratch

copy_if

compress_copy (pure simd)

  • compress instructions (avx512, sve)
  • tiny lookup tables (@aqrit)
  • bmi2 (Peter Cordes)
  • switch + shuffle (@Z boson)

compress_copy (simd/scalar)

  • Peter Cordes, Ilya Albrecht

set_intersection

set intersection (simd/simd)

sort

Previous work (0)

Previous work (1)

sort(wide)

sort(wide)

  • sorting networks
  • bitonic sort

Quick Sort (components)

  • pivot selection
  • partition
  • insertion sort

Pivot Selection

Partition

~sort

Thanks

  • Joel Falcou, Jean-Thierry Lapresté
  • Amazing people mentioned

Links