std::transform comparison

Benchmark - x = x + x fo every element. This get's autovectorized.
More or less the same in a good case.
AMD is more sensitive to code alignment.

40 bytes, data

1000 bytes

10000 bytes