My first SIMD
        Implementing STL Algorithms Using SIMD Extensions
        Denis Yaroshevskiy
      
      
      
      
        Disclaimers
        
          - Performance is tricky.
- No ARM measurements.
- Some of my statements are my opinions.
The Plan
        
          - What's SIMD and how to get it?
- std::strlen
- Some other algorithms.
What's SIMD and how to get it?
      
      
        Vector Processor Extensions
        
          - x86
            - 128 bits: SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2
- 256 bits: AVX, AVX2, XOP
- 512 bits: AVX512 and its myriad of sub-genre
- ARM
            - 128 bits: NEON, ASIMD
-  SVE (VLS/VLA)
- PowerPC
- WASM
Tell your compiler
        
          - -march=native
- Compile for specific architectures
- Runtime selection (hard)
Now get some SIMD code
        
          - auto-vectorization
- openmp
- by hand with intrinsics or Assembly
Ideas
        
          - allocations happen in pages
- aligned addresses are safe
Dynamic analysis
        
          - unsafe(load)
- __attribute__((no_sanitize_address))
Measuring
        
          - Google benchmark
- code alignment
- 40, 1000, 10'000 bytes
- char, short, int
 Notes on find/find_unguarded 
        
      
      
      
         reduce (same type)
        
          - replace(wide, ignore, wide) 
- reduce(wide, op) -> wide
 reduce (different type) 
        
          - wide<T, fixed<N>>
- convert
 Notes on reduce 
      
        -  sse2-sse4.2 char -> int reduction 
-  load[ignore.else_(x)]
 inclusive_scan (inplace) 
        
          -  store[ignore](wide, ptr) 
-  scan(wide, op, zero)
 remove 
        
          -  safe/unsafe compress_store[ignore] 
 General notes 
        
          -  massive speed ups 
-  code alignment impact on SIMD 
-  data impact on SIMD 
-  aligned/unaligned memory access 
 What we didn't cover? 
        
          -  multiple range algorithms 
-  set operations
-  partition (in place) / reverse
-  floating point / math 
-  cache effects 
-  gather 
Thanks to
         Joel Falcou, Jean-Thierry Lapresté 
         Stack Overflow: @aqrit, @Peter Cordes, @Z boson, @Stephen Canon