SIMD substring in a string

Denis Yaroshevskiy

This talk

Interview problem

  • nested loops
  • find + equal

substring in a string (interview)

find + equal (asm)

performance expectations

  • first char is rare: 22 in 1111...s
  • first char is common: 12 in 1111...s
  • very long needle

C++ standard

  • string/string_view::find
  • std::search
  • std::strstr
  • regex
  • searchers

SIMD

  • memcmp
  • memchr
  • search

Notes

  • std::mismatch - Nikolas Klauser

eve library

Vector Processor Extensions

  • x86
    • 128 bits: SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2
    • 256 bits: AVX, AVX2, XOP
    • 512 bits: AVX512 and its myriad of sub-genre
  • ARM
    • 128 bits: NEON, ASIMD
    • SVE (VLS/VLA)
  • RVV
  • PowerPC
  • WASM

What's inside mismatch?

  • find_if + zip

find

Unrolling: Notes

  • less instructions
  • perf stat
  • Duff's device

search

search

StringZilla

code

Improvements

  • unrolled pre-check
  • inline needle validation

Thanks

  • Joel Falcou
  • Ilya Albrecht
  • Amazing people publishing their results

Links