std::reduce comparison

std::reduce is one of the algorithms that can be autovectorized.
What we can see is that on small sizes intel is a bit better most of the time.
However when the SIMD code get's going, we are winning a lot.
From my point of view - clear win for AMD

Code alignment does not have a significant impact on `simd` code -
you can see the worst case by putting `minmax` instead of `min` in `padding` -
AMD is still better
NOTE: if a few nanoseconds for a smaller size is a problem - see unsq_eve::reduce.
since we are using `ignore` instead of scalar clean up, this issue goes away.

std::reduce (char -> char)

std::reduce (char -> short)

std::reduce (char -> int)

std::reduce (short -> short)

std::reduce (short -> int)

std::reduce (int -> int)