My first SIMD

Implementing STL Algorithms Using SIMD Extensions

Denis Yaroshevskiy

So, what's this about?

https://tinyurl.com/myfirstsimd2

Disclaimers

  • Performance is tricky.
  • No ARM measurements.
  • Some of my statements are my opinions.

The Plan

  • What's SIMD and how to get it?
  • std::strlen
  • Some other algorithms.

What's SIMD and how to get it?

Vector Processor Extensions

  • x86
    • 128 bits: SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2
    • 256 bits: AVX, AVX2, XOP
    • 512 bits: AVX512 and its myriad of sub-genre
  • ARM
    • 128 bits: NEON, ASIMD
    • SVE (VLS/VLA)
  • PowerPC
  • WASM

Tell your compiler

  • -march=native
  • Compile for specific architectures
  • Runtime selection (hard)

Now get some SIMD code

  • auto-vectorization
  • openmp
  • by hand with intrinsics or Assembly

On using intrinsics

Wrap it!

eve library: GitHub , CppCon

strlen

Ideas

  • allocations happen in pages
  • aligned addresses are safe

strlen code (buggy)

Dynamic analysis

  • unsafe(load)
  • __attribute__((no_sanitize_address))

strlen code (correct)

Measuring

  • Google benchmark
  • code alignment
  • 40, 1000, 10'000 bytes
  • char, short, int

Notes on find/find_unguarded

reduce

reduce (same type)

  • replace(wide, ignore, wide)
  • reduce(wide, op) -> wide

reduce (different type)

  • wide<T, fixed<N>>
  • convert

Notes on reduce

  • sse2-sse4.2 char -> int reduction
  • load[ignore.else_(x)]

inclusive_scan (inplace)

inclusive_scan (inplace)

  • store[ignore](wide, ptr)
  • scan(wide, op, zero)

Notes on inclusive_scan

remove

remove

  • safe/unsafe compress_store[ignore]

Notes on remove

General notes

  • massive speed ups
  • code alignment impact on SIMD
  • data impact on SIMD
  • aligned/unaligned memory access

What we didn't cover?

  • multiple range algorithms
  • set operations
  • partition (in place) / reverse
  • floating point / math
  • cache effects
  • gather

Write strlen

Thanks to

  • Joel Falcou, Jean-Thierry Lapresté
  • Stack Overflow: @aqrit, @Peter Cordes, @Z boson, @Stephen Canon
  • eve library links

  • github.com/jfalcou/eve (discussions/issues)
  • CppCon 2020: SIMD in C++20: EVE of a new Era
  • Discord
  • My videos on SIMD
  • cpplang slack: jfalcou, dyaroshev
  • email: joel.falcou@lri.fr, denis.yaroshevskij@gmail.com
  • twitter: @dyaroshev