In C++, this can be achieved portably using the Highway library: https://github.com/google/highway
(Disclaimer: I work at Google, but not on this, and I am writing this in my personal capacity.)
Makes me wonder why IBM never tried to make a GPU at some point.
They sort of did with the Cell https://en.wikipedia.org/wiki/Cell_(processor)