My First SIMD

My first SIMD

Implementing STL Algorithms Using SIMD Extensions

Denis Yaroshevskiy

So, what's this about?

https://tinyurl.com/myfirstsimd2

Disclaimers

Performance is tricky.
No ARM measurements.
Some of my statements are my opinions.

The Plan

What's SIMD and how to get it?
std::strlen
Some other algorithms.

What's SIMD and how to get it?

Vector Processor Extensions

128 bits: SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2
256 bits: AVX, AVX2, XOP
512 bits: AVX512 and its myriad of sub-genre

128 bits: NEON, ASIMD
SVE (VLS/VLA)

PowerPC
WASM

Tell your compiler

-march=native
Compile for specific architectures
Runtime selection (hard)

Now get some SIMD code

auto-vectorization
openmp
by hand with intrinsics or Assembly

On using intrinsics

#immintrin.h / #arm_neon.h
Intel Intrinsics Guide
ARM Intrinsics Guide

Wrap it!

eve library: GitHub , CppCon

strlen

Ideas

allocations happen in pages
aligned addresses are safe

strlen code (buggy)

Dynamic analysis

unsafe(load)
__attribute__((no_sanitize_address))

strlen code (correct)

Measuring

Google benchmark
code alignment
40, 1000, 10'000 bytes
char, short, int

Notes on find/find_unguarded

Stephen Canon (Stack Overflow)
int* has to be aligned to alignof(int)
strlen example in eve

reduce

reduce (same type)

replace(wide, ignore, wide)
reduce(wide, op) -> wide

reduce (different type)

wide<T, fixed<N>>
convert

Notes on reduce

sse2-sse4.2 char -> int reduction
load[ignore.else_(x)]

inclusive_scan (inplace)

store[ignore](wide, ptr)
scan(wide, op, zero)

Notes on inclusive_scan

Z boson (Stack Overflow)
Avoid _mm_maskmoveu_si128

remove

safe/unsafe compress_store[ignore]

Notes on remove

Aqrit, Peter Cordes (Stack Overflow)
remove_copy(f, l, o)

General notes

massive speed ups
code alignment impact on SIMD
data impact on SIMD
aligned/unaligned memory access

What we didn't cover?

multiple range algorithms
set operations
partition (in place) / reverse
floating point / math
cache effects
gather

Write strlen

Thanks to

Joel Falcou, Jean-Thierry Lapresté

Stack Overflow: @aqrit, @Peter Cordes, @Z boson, @Stephen Canon

eve library links

github.com/jfalcou/eve (discussions/issues)

CppCon 2020: SIMD in C++20: EVE of a new Era

Discord

My videos on SIMD

cpplang slack: jfalcou, dyaroshev

email: joel.falcou@lri.fr, denis.yaroshevskij@gmail.com

twitter: @dyaroshev