std::remove comparison

Benchmark - remove 0's. x - percentage of zeroes. Seems like, when you get to big sizes (10000 bytes) - AMD is better.
We can also see that for 1000 chars AMD is better - which still is 'bigger size' -
It's scalar algorithm, it more depends on elements count rather than bytes.
For smaller sizes intel feels better.
Didn't look in detail on code alignment - just a lot.
I only know it's really really bad (about 5 times) on 10'000 int for intel.
For amd did't find quite like that but ~30% I saw.
Some people claim that my benchmarks are overlearned (just branch predictor can remember everything)
Not sure - since the difference between 5 and 50 percent of zeroes is big.
But - not that I know.

std::remove(char, 40)

std::remove (short, 40)

std::remove (int, 40)

std::remove (char, 1000)

std::remove (short, 1000)

std::remove (int, 1000)

std::remove (char, 10000)

std::remove (short, 10000)

std::remove (int, 10000)