
I'm testing on an old PowerMac G5, which is a Power4 machine. The build is failing:
$ make
...
g++ -DNDEBUG -g2 -O3 -mcpu=power4 -maltivec -c ppc-simd.cpp
ppc-crypto.h:36: error: use of 'long long' in AltiVec types is invalid
make: *** [ppc-simd.o] Error 1
The failure is due to:
typedef __vector unsigned long long uint64x2_p8;
I'm having trouble determining when I should make the typedef available. With -mcpu=power4 -maltivec
the machine reports 64-bit availability:
$ gcc -mcpu=power4 -maltivec -dM -E - </dev/null | sort | egrep -i -E 'power|ARCH'
#define _ARCH_PPC 1
#define _ARCH_PPC64 1
#define __POWERPC__ 1
The OpenPOWER | 6.1. Vector Data Types manual has a good information on vector data types, but it does not discuss when the vector long long
are available.
What is the availability of __vector unsigned long long
? When can I use the typedef?
Answer1:
TL:DR: it looks like POWER7 is the minimum requirement for 64-bit element size with AltiVec. This is part of VSX (Vector Scalar Extension), which Wikipedia confirms first appeared in POWER7.
<hr>It's very likely that gcc knows what it's doing, and enables 64-bit element-size vector intrinsics with the lowest necessary -mcpu=
requirement.
#include <altivec.h>
auto vec32(void) { // compiles with your options: Power4
return vec_splats((int) 1);
}
// gcc error: use of 'long long' in AltiVec types is invalid without -mvsx
vector long long vec64(void) {
return vec_splats((long long) 1);
}
(With auto
instead of vector long long
, the 2nd function compiles to returning in two 64-bit integer registers.)
Adding -mvsx
lets the 2nd function compile. Using -mcpu=power7
also works, but power6 doesn't.
source + asm on Godbolt (PowerPC64 gcc6.3)
# with auto without VSX:
vec64(): # -O3 -mcpu=power4 -maltivec -mregnames
li %r4,1
li %r3,1
blr
vec64(): # -O3 -mcpu=power7 -maltivec -mregnames
.LCF2:
0: addis 2,12,.TOC.-.LCF2@ha
addi 2,2,.TOC.-.LCF2@l
addis %r9,%r2,.LC0@toc@ha
addi %r9,%r9,.LC0@toc@l # PC-relative addressing for static constant, I think.
lxvd2x %vs34,0,%r9 # vector load?
xxpermdi %vs34,%vs34,%vs34,2
blr
.LC0: # in .rodata
.quad 1
.quad 1
<hr>
And BTW, vec_splats
(splat scalar) with a constant compiles to a single instruction. But with a runtime variable (e.g. a function arg), it compiles to an integer store / vector load / vector-splat (like the vec_splat
intrinsic). Apparently there isn't a single instruction for int->vec.
The vec_splat_s32
and related intrinsics only accept a small (5-bit) constant, so they only compile in cases where the compiler can use the corresponding splat-immediate instruction.
This Intel SSE to PowerPC AltiVec migration looks mostly good, but got that wrong (it claims that vec_splats
splats a signed byte).