My first simd

My first SIMD

Implementing STL Algorithms Using AVX2 Extensions

Denis Yaroshevskiy

So, what's this about?

https://tinyurl.com/y5hn6e3c

Disclaimers

I am NOT an expert in SIMD.
One compiler, one machine.
Everything is x86 specific.
Some of my statements are my opinions.

The Plan

What's SIMD and how to get it?
std::strlen
Some other algorithms.

What's SIMD and how to get it?

x86 vector extensions

SSE 1-3 (Pentium 4)
SSE 4-4.2
AVX (PS4)
AVX-2 (MacBook Air 2015)
AVX-512 (servers only)

Tell your compiler

-march=native
Compile for specific architectures
Runtime selection (hard)

Now get some SIMD code

auto-vectorization
openmp
by hand with intrinsics or Assembly

On using intrinsics

#immintrin.h
Intel Intrinsics Guide

Wrap it!

Jefferson Amstutz talk (CppCon 2018)
documentation
extensibility
type support

Wrap it!

nsimd, tsimd, xsimd, vc
write my own
eve

strlen

Ideas

allocations happen in pages
aligned addresses are safe

strlen code

					
std::size_t strlen(const char* s) {
  using wide = eve::wide<char>;

  const wide zeroes{0};

  aligned_ptr aligned_s =
    previous_aligned_address<sizeof(wide)>(s);
  wide cur{aligned_s};

  ignore_first ignore(s - aligned_s.get());
  std::optional match = first_true(cur == zeroes, ignore);

  while (!match) {
    aligned_s += wide::static_size;
    cur = wide{aligned_s};
    match = first_true(cur == zeroes, ignore_none);
  }

  return static_cast<std::size_t>(aligned_s.get() + *match - s);
}

Dynamic analysis

load_unsafe(ptr, as_<wide>)
__attribute__((no_sanitize_address))

strlen code

					
std::size_t strlen(const char* s) {
  using wide = eve::wide<char>;

  const wide zeroes{0};

  aligned_ptr aligned_s =
    previous_aligned_address<sizeof(wide)>(s);
  wide cur = load_unsafe(aligned_s, as_<wide>{});

  ignore_first ignore(s - aligned_s.get());
  std::optional match = first_true(cur == zeroes, ignore);

  while (!match) {
    aligned_s += wide::static_size;
    cur = load_unsafe(aligned_s, as_<wide>{});
    match = first_true(cur == zeroes, ignore_none);
  }

  return static_cast<std::size_t>(aligned_s.get() + *match - s);
}

Measuring

Google benchmark
code alignment
40, 1000, 10'000 bytes
char, short, int

Notes on find/find_unguarded

Stephen Canon (Stack Overflow)
int* has to be aligned to alignof(int)

reduce

reduce (same type)

replace_ignored(wide, ignore, wide)
reduce_wide(wide, op) -> wide

reduce (different type)

wide<T, fixed<N>>
convert

inclusive_scan

inclusive_scan (in place)

store(ptr, wide, ignore)
inclusive_scan_wide(wide, zero)

Notes on inclusive_scan

Z boson (Stack Overflow)
Avoid _mm_maskmoveu_si128

remove

compress_store_unsafe(wide, T*, logical, ignore) -> T*
compress_store_precise
if_not_

Notes on remove

Peter Cordes (Stack Overflow)
remove_copy(f, l, o)

General notes

massive speed ups
code alignment impact on SIMD
data impact on SIMD
aligned/unaligned memory access

What we didn't cover?

multiple range algorithms
partition (in place) / reverse
floating point / math
cache effects
gather

Don't know how to

merge/set_union
insertion sort
operating on partial structs

Thanks to

Joel Falcou
Peter Cordes

Links

write strlen

Questions?

https://denisyaroshevskiy.github.io/my_first_simd_1_presentation

Dealing with code alignment

__attribute__((always_inline))
-mllvm -align-all-functions=7

Dealing with code alignment


template <typename Slide, /*...*/>
NOINLINE void benmark_body(Slide slide, benchmark::State& state, /* ... */) {
  noop_slide(slide);  // asm volatile("nop");

  for (auto _ : state) {
    // measure this
  }
}

My first SIMD

Implementing STL Algorithms Using AVX2 Extensions

Disclaimers

The Plan

What's SIMD and how to get it?

x86 vector extensions

Tell your compiler

Now get some SIMD code

On using intrinsics

Wrap it!

Wrap it!

strlen

Ideas

strlen code

Dynamic analysis

strlen code

Measuring

Notes on find/find_unguarded

reduce

reduce (same type)

reduce (different type)

inclusive_scan

inclusive_scan (in place)

Notes on inclusive_scan

remove

remove

Notes on remove

General notes

What we didn't cover?

Don't know how to

Thanks to

Links

Questions?

Dealing with code alignment

Dealing with code alignment

std::remove, alignment