0
Aug 01, 2008 at 14:00

The other day with a co-worker (Leo Benaducci) we started a small contest: adding support for the swizzle operator, available in shader languages (hlsl, cg, glsl), to any standard Vector 2, 3 or 4 class in C++. Something like:

Vector3 a;
Vector3 b;
Vector3 c;

a = b.xyy;
c.yz = b.zz;


An hour later (more or less), we both came with a solution, but using different approaches. Leo solved it with a template, and I just used a couple of macros. Both solutions provide optimal assembly code in VC++ 2005 compiler with no additional overhead at all from the swizzle operator. Here you have both versions and examples showing how to use them.

Leo swizzle with mini vector3 class:

class LVector3
{
public:
float x,y,z;

inline LVector3()
{
x=y=z=0;
}

inline LVector3(float _x, float _y, float _z)
{
x = _x;
y = _y;
z = _z;
}

template<int srcSwz, int dstSwz>
__forceinline void _swz(LVector3 v)
{
const int _srcSwz = srcSwz;
const int _dstSwz = dstSwz;
const char *srcSwizzle = (const char*) &_srcSwz;
const char *dstSwizzle = (const char*) &_dstSwz;

int i = 255;
if(*(char*)&i & 255)
{
*(&x + (srcSwizzle[2] - 'x')) = *(&v.x + (dstSwizzle[2] - 'x'));
*(&x + (srcSwizzle[1] - 'x')) = *(&v.x + (dstSwizzle[1] - 'x'));
*(&x + (srcSwizzle[0] - 'x')) = *(&v.x + (dstSwizzle[0] - 'x'));
}
else
{
*(&x + (srcSwizzle[0] - 'x')) = *(&v.x + (dstSwizzle[0] - 'x'));
*(&x + (srcSwizzle[1] - 'x')) = *(&v.x + (dstSwizzle[1] - 'x'));
*(&x + (srcSwizzle[2] - 'x')) = *(&v.x + (dstSwizzle[2] - 'x'));
}

}
};

#define swz(a, b, c) _swz<#@a, #@c>(b)


Now you can simply use the operator like this:

LVector3 a;
LVector3 b;
LVector3 c;

b.swz(xzy, a, zyx);
c.swz(xxx, a, zzz); // just to test optimizer!


Looks complex doesn’t it? But don’t be afraid of using it, because the compiler solved everything and generated optimal code. The first operation unfolds to 3 floating point assignments and the second, only 1.

My version only uses macros, but has additional support for any operation (not just copying) and can use any vector 2, 3, or 4 classes. The only requirement is that vector class implements the [] operator to access vector elements.

Enlight swizzle:

#define i__(e,c) (((*(c+(char*)(e))-0x77)-1)&3)

#define SW2(src,ss,op,dest,ds)  src[i__(#ss,0)] op dest[i__(#ds,0)]; \\
src[i__(#ss,1)] op dest[i__(#ds,1)];

#define SW3(src,ss,op,dest,ds)  SW2(src,ss,op,dest,ds)  \\
src[i__(#ss,2)] op dest[i__(#ds,2)];

#define SW4(src,ss,op,dest,ds)  SW3(src,ss,op,dest,ds)  \\
src[i__(#ss,3)] op dest[i__(#ds,3)];


Now, use it like this:

Vec4 a(1,2,3,4);
Vec4 b(5,6,7,8);

SW4(a,xyzw,=,b,xxyy);    // a.xyzw = b.xxyy; (a=5,5,6,6)
SW2(b,xy,+=,b,zz);       // b.xy += b.zz; (b=12,13,7,8)


The tricky part is the i__ macro. It converts “xyzw” characters to values 0,1,2 and 3. Then, I simply use those values to access vector elements one by one, and let the optimizer do the rest.

Have fun!

Enlight.

#### 23 Replies

0
103 Aug 01, 2008 at 15:01

Within Leo’s template:

int i = 255;
if(*(char*)&i & 255)
{
// ...
}
else
{
// ...
}


You may want to fix that.

Ciao ciao : )

0
167 Aug 01, 2008 at 16:18

I believe that if statement is checking the endianness of the machine. You see it sets an integer to 255 and then checks its first byte, which will be 255 on a little-endian machine and 0 on a big-endian one.

However, the cast should really be to unsigned char, as signed char can’t hold the value 255 (although bitwise-and may be ignoring signed/unsigned differences anyway).

0
103 Aug 01, 2008 at 17:05

I didn’t notice. Right you are.

Ciao ciao : )

0
101 Aug 01, 2008 at 17:46

Hmm, it looks a bit inefficient with the strings and all. For the second implementation, you should at least put the code of the macro’s inside a do {…} while(0) block. Otherwise you could get in serious problems when using things like if-statments without braces.

My version:

template<int X, int Y, int Z, int W> struct swizzle { };
template<int X> struct swizzle1 : public swizzle<X,X,X,X> { };
template<int X, int Y> struct swizzle2 : public swizzle<X,Y,Y,Y> { };
template<int X, int Y, int Z> struct swizzle3 : public swizzle<X,Y,Z,Z> { };

template<int X, int Y> swizzle2<X,Y> operator,(swizzle1<X>, swizzle1<Y>) { return swizzle2<X,Y>(); }
template<int X, int Y, int Z> swizzle3<X,Y,Z> operator,(swizzle2<X,Y>, swizzle1<Z>) { return swizzle3<X,Y,Z>(); }
template<int X, int Y, int Z, int W> swizzle<X,Y,Z,W> operator,(swizzle3<X,Y,Z>, swizzle1<W>) { return swizzle<X,Y,Z,W>(); }

static swizzle1<0> _x;
static swizzle1<1> _y;
static swizzle1<2> _z;
static swizzle1<3> _w;

template<int X, int Y, int Z, int W> struct SwizzledVector;

struct Vector
{
float x, y, z, w;

Vector() { }
Vector(float x, float y, float z, float w) : x(x), y(y), z(z), w(w) { }

template<int X, int Y, int Z, int W>
SwizzledVector<X,Y,Z,W> & operator[](swizzle<X,Y,Z,W>)
{
return reinterpret_cast<SwizzledVector<X,Y,Z,W>&>(*this);
}

template<int X, int Y, int Z, int W>
const SwizzledVector<X,Y,Z,W> & operator[](swizzle<X,Y,Z,W>) const
{
return reinterpret_cast<const SwizzledVector<X,Y,Z,W>&>(*this);
}
};

template<int X, int Y, int Z, int W, int I> struct SwizzleSelector;
template<int X, int Y, int Z, int W> struct SwizzleSelector<X,Y,Z,W,0> { static const int index = X; };
template<int X, int Y, int Z, int W> struct SwizzleSelector<X,Y,Z,W,1> { static const int index = Y; };
template<int X, int Y, int Z, int W> struct SwizzleSelector<X,Y,Z,W,2> { static const int index = Z; };
template<int X, int Y, int Z, int W> struct SwizzleSelector<X,Y,Z,W,3> { static const int index = W; };

template<int X, int Y, int Z, int W>
struct SwizzledVector
{
template<int I> float & Get() { return reinterpret_cast<float*>(this)[SwizzleSelector<X,Y,Z,W,I>::index]; }
template<int I> float Get() const { return const_cast<SwizzledVector&>(*this).Get<I>(); }
template<int I> float & Get(swizzle1<I>) { return Get<I>(); }
template<int I> float Get(swizzle1<I>) const { return Get<I>(); }

SwizzledVector & operator=(const Vector & v)
{
Get<0>() = v.x;
Get<1>() = v.y;
Get<2>() = v.z;
Get<3>() = v.w;
return *this;
}

template<int X2, int Y2, int Z2, int W2>
SwizzledVector & operator=(const SwizzledVector<X,Y,Z,W> & v)
{
Get<0>() = v.Get<0>();
Get<1>() = v.Get<1>();
Get<2>() = v.Get<2>();
Get<3>() = v.Get<3>();
return *this;
}

operator Vector() const
{
return Vector(Get<0>(), Get<1>(), Get<2>(), Get<3>());
}
};

// usage
int main()
{
Vector a(1, 2, 3, 4);
Vector b = a[_w,_y,_x,_w];  // (4, 2, 1, 4)
Vector c = a[_z];  // (3, 3, 3, 3)

c[_x,_z,_y,_w] = a; // (1, 3, 2, 4)
c[_y,_w] = a[_x]; // (*, 1, *, 1)
}


Of course you could generate combinations like _xxy etc. to get rid of the comma’s. The cool thing of this implementation is that, because it uses compile-time constants, you could even make use of compiler instrinsics to permute actual SSE registers.

0
101 Aug 01, 2008 at 18:28

Not inefficient at all. Check the assembler output. Do the code as you wish, the assembly output is perfect in both cases, although I didn’t tried your code…

0
101 Aug 01, 2008 at 18:33

i’m not sure why you say that, check the generated asm

    __asm int 3
0040107C  int         3
b.swz(xzy, a, zyx);
0040107D  fld         dword ptr [esp+1Ch]
00401081  fstp        dword ptr [esp+8]
00401085  fld         dword ptr [esp+18h]
00401089  fstp        dword ptr [esp+10h]
0040108D  fld         dword ptr [esp+14h]
00401091  fstp        dword ptr [esp+0Ch]
__asm int 3
00401095  int         3


do you think there is a faster way to do this?

0
101 Aug 01, 2008 at 18:52

I Like most Leo’s Solution, seems to be more clear for the programmer, at least for me (a noob one :)), and more OO kind.

Greetings!

0
101 Aug 01, 2008 at 19:07

They both suffer from the fact that they rely on strings:

b.swz(aaa, a, bbb);
SW4(a,beer,=,b,good);

both statements will compile just fine, and chrash at runtime.

0
101 Aug 01, 2008 at 19:16

sorry, but both codes can send compile time asserts

0
167 Aug 01, 2008 at 20:33

Kenneth’s right; there’s no compile-time protection against using letters outside the ‘w’ to ‘z’ range. Although with the Enlight version, due to the ‘& 3’ in the macro, any other letters will get silently remapped into the ‘w’-‘z’ range; that’s slightly better than the Leo version, in which other letters will result in runtime out-of-bounds array accesses.

To fix this, you could add compile-time asserts that each character is within the expected range. Here is a bit of code to do a compile-time assert (from boost):

template <bool> struct STATIC_ASSERTION_FAILURE;
template <> struct STATIC_ASSERTION_FAILURE<true> {};

#define STATIC_ASSERT(f) \
sizeof(STATIC_ASSERTION_FAILURE<(bool)(f)>);


(The real one is slightly more complicated, as it is designed to work outside of a function scope, but that gives you the idea.)

0
101 Aug 01, 2008 at 21:30

here is it, out of bounds protection

#define OUT_OF_BOUNDS(a, b, c)  ((a)<b || (a)>c)

#define UN_OP(name, op) template<int srcSwz, int dstSwz>  \
__forceinline void _##name(LVector3 v)  \
{                                   \
const int _srcSwz = srcSwz;     \
const int _dstSwz = dstSwz;     \
const char *srcSwizzle = (char*) &_srcSwz;  \
const char *dstSwizzle = (char*) &_dstSwz;  \
\
if(OUT_OF_BOUNDS(*(srcSwizzle+0), 'x', 'z'))        \
return;                                         \
if(OUT_OF_BOUNDS(*(dstSwizzle+0), 'x', 'z'))        \
return;                                         \
if(OUT_OF_BOUNDS(*(srcSwizzle+1), 'x', 'z'))        \
return;                                         \
if(OUT_OF_BOUNDS(*(dstSwizzle+1), 'x', 'z'))        \
return;                                         \
if(OUT_OF_BOUNDS(*(srcSwizzle+2), 'x', 'z'))        \
return;                                         \
if(OUT_OF_BOUNDS(*(dstSwizzle+2), 'x', 'z'))        \
return;                                         \
\
int i = 255;                                        \
if(*(char*)&i & 255)                                \
{                                                   \
*(&x + (srcSwizzle[2] - 'x')) op *(&v.x + (dstSwizzle[2] - 'x'));   \
*(&x + (srcSwizzle[1] - 'x')) op *(&v.x + (dstSwizzle[1] - 'x'));   \
*(&x + (srcSwizzle[0] - 'x')) op *(&v.x + (dstSwizzle[0] - 'x'));   \
}                                                   \
else                                                \
{                                                   \
*(&x + (srcSwizzle[0] - 'x')) op *(&v.x + (dstSwizzle[0] - 'x'));   \
*(&x + (srcSwizzle[1] - 'x')) op *(&v.x + (dstSwizzle[1] - 'x'));   \
*(&x + (srcSwizzle[2] - 'x')) op *(&v.x + (dstSwizzle[2] - 'x'));   \
}                                                   \
}


and the asm

    __asm int 3
0040103A  int         3
b.mul(feb, a, zxy);
__asm int 3
0040103B  int         3

0
101 Aug 02, 2008 at 10:24

and the asm

    __asm int 3
0040103A  int         3
b.mul(feb, a, zxy);
__asm int 3
0040103B  int         3


Clearly something has gone wrong here… :)

Anyways, I tried to create some kind of swizzle support for a float4 class I was working on, but didn’t have much luck. After seeing this thread, I decided to go back and give it another go, and finally succeeded. Here it is, in its entirety:

#pragma once
#include <xmmintrin.h>

struct __declspec(align(16)) float4
{
private:
template <unsigned mask = ((3 << 6) | (2 << 4) | (1 << 2) | 0)>
struct swizzle_proxy
{
__m128 &ref;
swizzle_proxy(__m128 &ref)
: ref(ref)
{ }

__m128 get_swizzled() const { return _mm_shuffle_ps(ref, ref, mask); }
swizzle_proxy& operator = (const float4 &other);

swizzle_proxy& operator = (const swizzle_proxy<other_mask> &other)
{
__m128 data = other.get_swizzled();
return *this;
}
};

public:
float4()
{ }

float4(const float4 &other)
: x(other.x)
, y(other.y)
, z(other.z)
, w(other.w)
{ }

explicit float4(const __m128 _xmm)
: xmm(_xmm)
{ }

float4(float a, float b, float c, float d)
: x(a)
, y(b)
, z(c)
, w(d)
{ }

: xmm(other.get_swizzled())
{ }

float4 operator + (const float4 &other) const       { return float4(_mm_add_ps(xmm, other.xmm)); }
float4 operator - (const float4 &other) const       { return float4(_mm_sub_ps(xmm, other.xmm)); }
float4 operator * (const float4 &other) const       { return float4(_mm_mul_ps(xmm, other.xmm)); }
float4 operator / (const float4 &other) const       { return float4(_mm_div_ps(xmm, other.xmm)); }
float4 operator & (const float4 &other) const       { return float4(_mm_and_ps(xmm, other.xmm)); }
float4 operator | (const float4 &other) const       { return float4(_mm_or_ps(xmm, other.xmm)); }
float4 operator ^ (const float4 &other) const       { return float4(_mm_xor_ps(xmm, other.xmm)); }
float4 andnot(const float4 &other) const            { return float4(_mm_andnot_ps(xmm, other.xmm)); } // "~this & other"

float4 operator + (float f) const                   { return float4(_mm_add_ps(xmm, _mm_set_ps1(f))); }
float4 operator - (float f) const                   { return float4(_mm_sub_ps(xmm, _mm_set_ps1(f))); }
float4 operator * (float f) const                   { return float4(_mm_mul_ps(xmm, _mm_set_ps1(f))); }
float4 operator / (float f) const                   { return float4(_mm_div_ps(xmm, _mm_set_ps1(f))); }
float4 operator & (float f) const                   { return float4(_mm_and_ps(xmm, _mm_set_ps1(f))); }
float4 operator | (float f) const                   { return float4(_mm_or_ps(xmm, _mm_set_ps1(f))); }
float4 operator ^ (float f) const                   { return float4(_mm_xor_ps(xmm, _mm_set_ps1(f))); }
float4 andnot(float f) const                        { return float4(_mm_andnot_ps(xmm, _mm_set_ps1(f))); } // "~this & f"

float4& operator  = (const swizzle_proxy<mask> &other)            { xmm = other.get_swizzled(); return *this; }

float4& operator  = (const float4 &other)           { xmm = other.xmm; return *this; }
float4& operator += (const float4 &other)           { xmm = _mm_add_ps(xmm, other.xmm); return *this; }
float4& operator -= (const float4 &other)           { xmm = _mm_sub_ps(xmm, other.xmm); return *this; }
float4& operator *= (const float4 &other)           { xmm = _mm_mul_ps(xmm, other.xmm); return *this; }
float4& operator /= (const float4 &other)           { xmm = _mm_div_ps(xmm, other.xmm); return *this; }
float4& operator &= (const float4 &other)           { xmm = _mm_and_ps(xmm, other.xmm); return *this; }
float4& operator |= (const float4 &other)           { xmm = _mm_or_ps(xmm, other.xmm); return *this; }
float4& operator ^= (const float4 &other)           { xmm = _mm_xor_ps(xmm, other.xmm); return *this; }
float4& andnot_asg(const float4 &other)             { xmm = _mm_andnot_ps(xmm, other.xmm); return *this; } // "this = ~this & other"

float4& operator += (float f)                       { xmm = _mm_add_ps(xmm, _mm_set_ps1(f)); return *this; }
float4& operator -= (float f)                       { xmm = _mm_sub_ps(xmm, _mm_set_ps1(f)); return *this; }
float4& operator *= (float f)                       { xmm = _mm_mul_ps(xmm, _mm_set_ps1(f)); return *this; }
float4& operator /= (float f)                       { xmm = _mm_div_ps(xmm, _mm_set_ps1(f)); return *this; }
float4& operator &= (float f)                       { xmm = _mm_and_ps(xmm, _mm_set_ps1(f)); return *this; }
float4& operator |= (float f)                       { xmm = _mm_or_ps(xmm, _mm_set_ps1(f)); return *this; }
float4& operator ^= (float f)                       { xmm = _mm_xor_ps(xmm, _mm_set_ps1(f)); return *this; }
float4& andnot_asg(float f)                         { xmm = _mm_andnot_ps(xmm, _mm_set_ps1(f)); return *this; } // "this = ~this & f"

friend float4 operator / (float f, const float4 &a) { return float4(_mm_mul_ps(_mm_set_ps1(1.0f/f), a.xmm)); }

friend float4 sqrt(const float4 &a)                                     { return float4(_mm_sqrt_ps(a.xmm)); }
friend float4 rcp(const float4 &a)                                      { return float4(_mm_rcp_ps(a.xmm)); }
friend float4 rsqrt(const float4 &a)                                    { return float4(_mm_rsqrt_ps(a.xmm)); }
friend float4 min(const float4 &a, const float4 &b)                     { return float4(_mm_min_ps(a.xmm, b.xmm)); }
friend float4 max(const float4 &a, const float4 &b)                     { return float4(_mm_max_ps(a.xmm, b.xmm)); }
friend float4 dot(const float4 &a, const float4 &b)                     { return horizontal_add(a*b); }
friend float4 length(const float4 &a)                                   { return sqrt(dot(a, a)); }
friend float4 rlength(const float4 &a)                                  { return rsqrt(dot(a, a)); }
friend float4 normalize(const float4 &a)                                { return a * rlength(a); }
friend float4 distance(const float4 &a, const float4 &b)                { return length(a-b); }
friend float4 clamp(const float4 &x, const float4 &a, const float4 &b)  { return max(a, min(b, x)); }

friend float4 cross(const float4 &a, const float4 &b)
{
enum
{
shuf_yzxw = _MM_SHUFFLE(3, 0, 2, 1),
shuf_zxyw = _MM_SHUFFLE(3, 1, 0, 2)
};

__m128 left  = _mm_mul_ps(_mm_shuffle_ps(a.xmm, a.xmm, shuf_yzxw), _mm_shuffle_ps(b.xmm, b.xmm, shuf_zxyw));
__m128 right = _mm_mul_ps(_mm_shuffle_ps(a.xmm, a.xmm, shuf_zxyw), _mm_shuffle_ps(b.xmm, b.xmm, shuf_yzxw));
#if 0
return float4(_mm_add_ps(_mm_set_ps(1.0f, 0.0f, 0.0f, 0.0f), _mm_sub_ps(left, right)));
#else
return float4(_mm_sub_ps(left, right)); // .w equals zero
#endif
}

// NewtonRaphson Reciprocal
// [2 * rcpps(a) - (a * rcpps(a) * rcpps(a))]
friend float4 rcp_nr(const float4 &a)
{
float4 ra0 = rcp(a);
return (ra0 + ra0) - (a * ra0 * ra0);
}

template<const unsigned a, const unsigned b, const unsigned c, const unsigned d>
swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> shuffle()
{
swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> sw(xmm);
return sw;
}

public:
union
{
struct { float x,y,z,w; };
struct { float r,g,b,a; };
__m128 xmm;
};
};

{
return *this;
}

// Test-defines
#define xyzw shuffle<0,1,2,3>()
#define wzyx shuffle<3,2,1,0>()
#define xyxy shuffle<0,1,0,1>()
#define yzyx shuffle<1,2,1,0>()

#define xxxx shuffle<0,0,0,0>()
#define yyyy shuffle<1,1,1,1>()
#define zzzz shuffle<2,2,2,2>()
#define wwww shuffle<3,3,3,3>()


Using the supplied defines, it is now possible to write code like this (which is pretty close to Cg):

float4 float4_test()
{
float4 f1 = float4(1,2,3,4);
printf("'f1' : %f, %f, %f, %f\n", f1.x, f1.y, f1.z, f1.w);

float4 f3;
f3.wzyx = f1;
printf("'f3.wzyx = f1' : %f, %f, %f, %f\n", f3.x, f3.y, f3.z, f3.w);

float4 f2 = f1.yyyy;
printf("'f2 = f1.yyyy' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w);

f2.wzyx = f3.xyxy;
printf("'f2.wzyx = f3.xyxyx' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w);

float4 f4 = f2 + f1.wzyx;
f1 = f1.wzyx;
printf("'f1.wzyx' : %f, %f, %f, %f\n", f1.x, f1.y, f1.z, f1.w);
printf("'f2.xyzw' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w);
printf("'f4 = f2 + f1.wzyx' : %f, %f, %f, %f\n", f4.x, f4.y, f4.z, f4.w);

float4 f5 = f4 * f2.yzyx;
printf("f5 = 'f4 * f2.yzyx' : %f, %f, %f, %f\n", f5.x, f5.y, f5.z, f5.w);

return f5;
}


which results in the following output:

‘f1’ : 1.000000, 2.000000, 3.000000, 4.000000
‘f3.wzyx = f1’ : 4.000000, 3.000000, 2.000000, 1.000000
‘f2 = f1.yyyy’ : 2.000000, 2.000000, 2.000000, 2.000000
‘f2.wzyx = f3.xyxyx’ : 3.000000, 4.000000, 3.000000, 4.000000
‘f1.wzyx’ : 4.000000, 3.000000, 2.000000, 1.000000
‘f2.xyzw’ : 3.000000, 4.000000, 3.000000, 4.000000
‘f4 = f2 + f1.wzyx’ : 7.000000, 7.000000, 5.000000, 5.000000
f5 = ‘f4 * f2.yzyx’ : 28.000000, 21.000000, 20.000000, 15.000000

And finally, the generated assembly (without all the printf calls):

; 6    :    float4 f1 = float4(1,2,3,4);

fld1
fstp    DWORD PTR _f1$[esp+16] fld DWORD PTR __real@40000000 fstp DWORD PTR _f1$[esp+20]
fld DWORD PTR __real@40400000
fstp    DWORD PTR _f1$[esp+24] fld DWORD PTR __real@40800000 fstp DWORD PTR _f1$[esp+28]

; 7    :    //printf("'f1' : %f, %f, %f, %f\n", f1.x, f1.y, f1.z, f1.w);
; 8    :
; 9    :    float4 f3;
; 10   :    f3.wzyx = f1;

movaps  xmm1, XMMWORD PTR _f1$[esp+16] shufps xmm1, xmm1, 27 ; 0000001bH ; 11 : //printf("'f3.wzyx = f1' : %f, %f, %f, %f\n", f3.x, f3.y, f3.z, f3.w); ; 12 : ; 13 : float4 f2 = f1.yyyy; ; 14 : //printf("'f2 = f1.yyyy' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w); ; 15 : ; 16 : f2.wzyx = f3.xyxy; movaps xmm0, xmm1 shufps xmm0, xmm1, 68 ; 00000044H shufps xmm0, xmm0, 27 ; 0000001bH ; 17 : //printf("'f2.wzyx = f3.xyxyx' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w); ; 18 : ; 19 : float4 f4 = f2 + f1.wzyx; ; 20 : f1 = f1.wzyx; ; 21 : //printf("'f1.wzyx' : %f, %f, %f, %f\n", f1.x, f1.y, f1.z, f1.w); ; 22 : //printf("'f2.xyzw' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w); ; 23 : //printf("'f4 = f2 + f1.wzyx' : %f, %f, %f, %f\n", f4.x, f4.y, f4.z, f4.w); ; 24 : ; 25 : float4 f5 = f4 * f2.yzyx; movaps xmm2, xmm0 shufps xmm2, xmm0, 25 ; 00000019H addps xmm1, xmm0 mulps xmm2, xmm1 movaps XMMWORD PTR [eax], xmm2 ; 26 : //printf("f5 = 'f4 * f2.yzyx' : %f, %f, %f, %f\n", f5.x, f5.y, f5.z, f5.w); ; 27 : ; 28 : return f5; ; 29 : }  0 101 Aug 02, 2008 at 15:38 A small adendum to the above: Instead of the single ‘shuffle’ function, there should be two. A read-only, and a read-write version. template<const unsigned a, const unsigned b, const unsigned c, const unsigned d> swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> rw_shuffle() { swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> sw(xmm); return sw; } template<const unsigned a, const unsigned b, const unsigned c, const unsigned d> const swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> ro_shuffle() { swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> sw(xmm); return sw; } ... #define xyzw rw_shuffle<0,1,2,3>() #define wzyx rw_shuffle<3,2,1,0>() #define xyxy ro_shuffle<0,1,0,1>() #define yzyx ro_shuffle<1,2,1,0>() #define xxxx ro_shuffle<0,0,0,0>() #define yyyy ro_shuffle<1,1,1,1>() #define zzzz ro_shuffle<2,2,2,2>() #define wwww ro_shuffle<3,3,3,3>()  This way you can’t accidentally perform operations like ‘v1.xyxy = float4(1234)’ 0 101 Aug 03, 2008 at 07:04 @Kenneth Gorking Clearly something has gone wrong here… No, it is fine! It should NOT compile anything because it has bad syntax: b.mul(feb, a, zxy); // b.feb *= a.zxy;  what is “feb”?, nothing, so it doesn’t do anything hehehe Now, about your work on the swizzle operator, I’m just amazed, I didn’t imagine someone would get *that* far… I didn’t try it out yet, but looks amazing, great work, specially for using 128 bit registers. 0 101 Aug 03, 2008 at 11:10 @Enlight No, it is fine! It should NOT compile anything because it has bad syntax: b.mul(feb, a, zxy); // b.feb *= a.zxy;  what is “feb”?, nothing, so it doesn’t do anything hehehe Oh yeah, I missed that part :) Instead of just doing nothing, maybe you should use a compile-time assert to alert the user to his mistake? @Enlight Now, about your work on the swizzle operator, I’m just amazed, I didn’t imagine someone would get *that* far… I didn’t try it out yet, but looks amazing, great work, specially for using 128 bit registers. Thanks :) 0 101 Aug 03, 2008 at 12:10 Btw, it’s pretty pointless to make template integer arguments const :) 0 101 Aug 07, 2008 at 15:39 I have also implemented swizzle operator in my math library (glm.g-truc.net) My implementation is based on a third party class that only contain references. My implementation is based on GLSL syntax so that we could do something like this: vec4 v1(1, 2, 3, 4); vec4 v2(1); v2.yzx = v1.xyz + v1.yzx; Here is some detail of the implementation… so annoying to do because of the #defines like yzx that wrap function calls. I have no SSE optimization yet for this. I will definitely come back on this post to have a closer look of your implementation! :) Enjoy: namespace glm{ namespace detail{ template <typename T> class _xref4 { public: _xref4(T& x, T& y, T& z, T& w); _xref4<T>& operator= (const _xref4<T>& r); _xref4<T>& operator+=(const _xref4<T>& r); _xref4<T>& operator-=(const _xref4<T>& r); _xref4<T>& operator*=(const _xref4<T>& r); _xref4<T>& operator/=(const _xref4<T>& r); _xref4<T>& operator= (const _xvec4<T>& v); _xref4<T>& operator+=(const _xvec4<T>& v); _xref4<T>& operator-=(const _xvec4<T>& v); _xref4<T>& operator*=(const _xvec4<T>& v); _xref4<T>& operator/=(const _xvec4<T>& v); T& x; T& y; T& z; T& w; }; } //namespace detail } //namespace glm namespace glm{ namespace detail{ template <typename T> class _cvec4 { public: typedef T value_type; typedef int size_type; static const size_type value_size; //////////////// // Components // #ifndef GLM_SINGLE_COMP_NAME #if GLM_FORCE_HALF_COMPATIBILITY || (defined(GLM_COMPILER) && (GLM_COMPILER & GLM_COMPILER_VC) && (GLM_COMPILER <= GLM_COMPILER_VC71)) union { struct{T x, y, z, w;}; struct{T r, g, b, a;}; struct{T s, t, p, q;}; }; #else union{T x, r, s;}; union{T y, g, t;}; union{T z, b, p;}; union{T w, a, q;}; #endif #else T x, y; #endif//GLM_SINGLE_COMP_NAME // Components // //////////////// const T* _address() const{return (T*)(this);} T* _address(){return (T*)(this);} // Constructor _cvec4(){} _cvec4(const T x, const T y, const T z, const T w); // Accesses T& operator[](size_type i); T operator[](size_type i) const; #if (!defined(GLM_AUTO_CAST) || (GLM_AUTO_CAST == GLM_ENABLE)) operator T*(); operator const T*() const; #endif//GLM_AUTO_CAST #if defined(GLM_SWIZZLE) // Left hand side 2 components common swizzle operators _xref2<T> _yx(); _xref2<T> _zx(); _xref2<T> _wx(); _xref2<T> _xy(); _xref2<T> _zy(); _xref2<T> _wy(); _xref2<T> _xz(); _xref2<T> _yz(); _xref2<T> _wz(); _xref2<T> _xw(); _xref2<T> _yw(); _xref2<T> _zw(); // Right hand side 2 components common swizzle operators const _xvec2<T> _xx() const; const _xvec2<T> _yx() const; const _xvec2<T> _zx() const; const _xvec2<T> _wx() const; const _xvec2<T> _xy() const; const _xvec2<T> _yy() const; const _xvec2<T> _zy() const; const _xvec2<T> _wy() const; const _xvec2<T> _xz() const; const _xvec2<T> _yz() const; const _xvec2<T> _zz() const; const _xvec2<T> _wz() const; const _xvec2<T> _xw() const; const _xvec2<T> _yw() const; const _xvec2<T> _zw() const; const _xvec2<T> _ww() const; // Left hand side 3 components common swizzle operators _xref3<T> _zyx(); _xref3<T> _wyx(); _xref3<T> _yzx(); _xref3<T> _wzx(); _xref3<T> _ywx(); _xref3<T> _zwx(); _xref3<T> _zxy(); _xref3<T> _wxy(); _xref3<T> _xzy(); _xref3<T> _wzy(); _xref3<T> _xwy(); _xref3<T> _zwy(); _xref3<T> _yxz(); _xref3<T> _wxz(); _xref3<T> _xyz(); _xref3<T> _wyz(); _xref3<T> _xwz(); _xref3<T> _ywz(); _xref3<T> _yxw(); _xref3<T> _zxw(); _xref3<T> _xyw(); _xref3<T> _zyw(); _xref3<T> _xzw(); _xref3<T> _yzw(); // Right hand side 3 components common swizzle operators const _xvec3<T> _xxx() const; const _xvec3<T> _yxx() const; const _xvec3<T> _zxx() const; const _xvec3<T> _wxx() const; const _xvec3<T> _xyx() const; const _xvec3<T> _yyx() const; const _xvec3<T> _zyx() const; const _xvec3<T> _wyx() const; const _xvec3<T> _xzx() const; const _xvec3<T> _yzx() const; const _xvec3<T> _zzx() const; const _xvec3<T> _wzx() const; const _xvec3<T> _xwx() const; const _xvec3<T> _ywx() const; const _xvec3<T> _zwx() const; const _xvec3<T> _wwx() const; const _xvec3<T> _xxy() const; const _xvec3<T> _yxy() const; const _xvec3<T> _zxy() const; const _xvec3<T> _wxy() const; const _xvec3<T> _xyy() const; const _xvec3<T> _yyy() const; const _xvec3<T> _zyy() const; const _xvec3<T> _wyy() const; const _xvec3<T> _xzy() const; const _xvec3<T> _yzy() const; const _xvec3<T> _zzy() const; const _xvec3<T> _wzy() const; const _xvec3<T> _xwy() const; const _xvec3<T> _ywy() const; const _xvec3<T> _zwy() const; const _xvec3<T> _wwy() const; const _xvec3<T> _xxz() const; const _xvec3<T> _yxz() const; const _xvec3<T> _zxz() const; const _xvec3<T> _wxz() const; const _xvec3<T> _xyz() const; const _xvec3<T> _yyz() const; const _xvec3<T> _zyz() const; const _xvec3<T> _wyz() const; const _xvec3<T> _xzz() const; const _xvec3<T> _yzz() const; const _xvec3<T> _zzz() const; const _xvec3<T> _wzz() const; const _xvec3<T> _xwz() const; const _xvec3<T> _ywz() const; const _xvec3<T> _zwz() const; const _xvec3<T> _wwz() const; const _xvec3<T> _xxw() const; const _xvec3<T> _yxw() const; const _xvec3<T> _zxw() const; const _xvec3<T> _wxw() const; const _xvec3<T> _xyw() const; const _xvec3<T> _yyw() const; const _xvec3<T> _zyw() const; const _xvec3<T> _wyw() const; const _xvec3<T> _xzw() const; const _xvec3<T> _yzw() const; const _xvec3<T> _zzw() const; const _xvec3<T> _wzw() const; const _xvec3<T> _xww() const; const _xvec3<T> _yww() const; const _xvec3<T> _zww() const; const _xvec3<T> _www() const; // Left hand side 4 components common swizzle operators _xref4<T> _wzyx(); _xref4<T> _zwyx(); _xref4<T> _wyzx(); _xref4<T> _ywzx(); _xref4<T> _zywx(); _xref4<T> _yzwx(); _xref4<T> _wzxy(); _xref4<T> _zwxy(); _xref4<T> _wxzy(); _xref4<T> _xwzy(); _xref4<T> _zxwy(); _xref4<T> _xzwy(); _xref4<T> _wyxz(); _xref4<T> _ywxz(); _xref4<T> _wxyz(); _xref4<T> _xwyz(); _xref4<T> _yxwz(); _xref4<T> _xywz(); _xref4<T> _zyxw(); _xref4<T> _yzxw(); _xref4<T> _zxyw(); _xref4<T> _xzyw(); _xref4<T> _yxzw(); _xref4<T> _xyzw(); // Right hand side 4 components common swizzle operators const _xvec4<T> _xxxx() const; const _xvec4<T> _yxxx() const; const _xvec4<T> _zxxx() const; const _xvec4<T> _wxxx() const; const _xvec4<T> _xyxx() const; const _xvec4<T> _yyxx() const; const _xvec4<T> _zyxx() const; const _xvec4<T> _wyxx() const; const _xvec4<T> _xzxx() const; const _xvec4<T> _yzxx() const; const _xvec4<T> _zzxx() const; const _xvec4<T> _wzxx() const; const _xvec4<T> _xwxx() const; const _xvec4<T> _ywxx() const; const _xvec4<T> _zwxx() const; const _xvec4<T> _wwxx() const; const _xvec4<T> _xxyx() const; const _xvec4<T> _yxyx() const; const _xvec4<T> _zxyx() const; const _xvec4<T> _wxyx() const; const _xvec4<T> _xyyx() const; const _xvec4<T> _yyyx() const; const _xvec4<T> _zyyx() const; const _xvec4<T> _wyyx() const; const _xvec4<T> _xzyx() const; const _xvec4<T> _yzyx() const; const _xvec4<T> _zzyx() const; const _xvec4<T> _wzyx() const; const _xvec4<T> _xwyx() const; const _xvec4<T> _ywyx() const; const _xvec4<T> _zwyx() const; const _xvec4<T> _wwyx() const; const _xvec4<T> _xxzx() const; const _xvec4<T> _yxzx() const; const _xvec4<T> _zxzx() const; const _xvec4<T> _wxzx() const; const _xvec4<T> _xyzx() const; const _xvec4<T> _yyzx() const; const _xvec4<T> _zyzx() const; const _xvec4<T> _wyzx() const; const _xvec4<T> _xzzx() const; const _xvec4<T> _yzzx() const; const _xvec4<T> _zzzx() const; const _xvec4<T> _wzzx() const; const _xvec4<T> _xwzx() const; const _xvec4<T> _ywzx() const; const _xvec4<T> _zwzx() const; const _xvec4<T> _wwzx() const; const _xvec4<T> _xxwx() const; const _xvec4<T> _yxwx() const; const _xvec4<T> _zxwx() const; const _xvec4<T> _wxwx() const; const _xvec4<T> _xywx() const; const _xvec4<T> _yywx() const; const _xvec4<T> _zywx() const; const _xvec4<T> _wywx() const; const _xvec4<T> _xzwx() const; const _xvec4<T> _yzwx() const; const _xvec4<T> _zzwx() const; const _xvec4<T> _wzwx() const; const _xvec4<T> _xwwx() const; const _xvec4<T> _ywwx() const; const _xvec4<T> _zwwx() const; const _xvec4<T> _wwwx() const; const _xvec4<T> _xxxy() const; const _xvec4<T> _yxxy() const; const _xvec4<T> _zxxy() const; const _xvec4<T> _wxxy() const; const _xvec4<T> _xyxy() const; const _xvec4<T> _yyxy() const; const _xvec4<T> _zyxy() const; const _xvec4<T> _wyxy() const; const _xvec4<T> _xzxy() const; const _xvec4<T> _yzxy() const; const _xvec4<T> _zzxy() const; const _xvec4<T> _wzxy() const; const _xvec4<T> _xwxy() const; const _xvec4<T> _ywxy() const; const _xvec4<T> _zwxy() const; const _xvec4<T> _wwxy() const; const _xvec4<T> _xxyy() const; const _xvec4<T> _yxyy() const; const _xvec4<T> _zxyy() const; const _xvec4<T> _wxyy() const; const _xvec4<T> _xyyy() const; const _xvec4<T> _yyyy() const; const _xvec4<T> _zyyy() const; const _xvec4<T> _wyyy() const; const _xvec4<T> _xzyy() const; const _xvec4<T> _yzyy() const; const _xvec4<T> _zzyy() const; const _xvec4<T> _wzyy() const; const _xvec4<T> _xwyy() const; const _xvec4<T> _ywyy() const; const _xvec4<T> _zwyy() const; const _xvec4<T> _wwyy() const; const _xvec4<T> _xxzy() const; const _xvec4<T> _yxzy() const; const _xvec4<T> _zxzy() const; const _xvec4<T> _wxzy() const; const _xvec4<T> _xyzy() const; const _xvec4<T> _yyzy() const; const _xvec4<T> _zyzy() const; const _xvec4<T> _wyzy() const; const _xvec4<T> _xzzy() const; const _xvec4<T> _yzzy() const; const _xvec4<T> _zzzy() const; const _xvec4<T> _wzzy() const; const _xvec4<T> _xwzy() const; const _xvec4<T> _ywzy() const; const _xvec4<T> _zwzy() const; const _xvec4<T> _wwzy() const; const _xvec4<T> _xxwy() const; const _xvec4<T> _yxwy() const; const _xvec4<T> _zxwy() const; const _xvec4<T> _wxwy() const; const _xvec4<T> _xywy() const; const _xvec4<T> _yywy() const; const _xvec4<T> _zywy() const; const _xvec4<T> _wywy() const; const _xvec4<T> _xzwy() const; const _xvec4<T> _yzwy() const; const _xvec4<T> _zzwy() const; const _xvec4<T> _wzwy() const; const _xvec4<T> _xwwy() const; const _xvec4<T> _ywwy() const; const _xvec4<T> _zwwy() const; const _xvec4<T> _wwwy() const; const _xvec4<T> _xxxz() const; const _xvec4<T> _yxxz() const; const _xvec4<T> _zxxz() const; const _xvec4<T> _wxxz() const; const _xvec4<T> _xyxz() const; const _xvec4<T> _yyxz() const; const _xvec4<T> _zyxz() const; const _xvec4<T> _wyxz() const; const _xvec4<T> _xzxz() const; const _xvec4<T> _yzxz() const; const _xvec4<T> _zzxz() const; const _xvec4<T> _wzxz() const; const _xvec4<T> _xwxz() const; const _xvec4<T> _ywxz() const; const _xvec4<T> _zwxz() const; const _xvec4<T> _wwxz() const; const _xvec4<T> _xxyz() const; const _xvec4<T> _yxyz() const; const _xvec4<T> _zxyz() const; const _xvec4<T> _wxyz() const; const _xvec4<T> _xyyz() const; const _xvec4<T> _yyyz() const; const _xvec4<T> _zyyz() const; const _xvec4<T> _wyyz() const; const _xvec4<T> _xzyz() const; const _xvec4<T> _yzyz() const; const _xvec4<T> _zzyz() const; const _xvec4<T> _wzyz() const; const _xvec4<T> _xwyz() const; const _xvec4<T> _ywyz() const; const _xvec4<T> _zwyz() const; const _xvec4<T> _wwyz() const; const _xvec4<T> _xxzz() const; const _xvec4<T> _yxzz() const; const _xvec4<T> _zxzz() const; const _xvec4<T> _wxzz() const; const _xvec4<T> _xyzz() const; const _xvec4<T> _yyzz() const; const _xvec4<T> _zyzz() const; const _xvec4<T> _wyzz() const; const _xvec4<T> _xzzz() const; const _xvec4<T> _yzzz() const; const _xvec4<T> _zzzz() const; const _xvec4<T> _wzzz() const; const _xvec4<T> _xwzz() const; const _xvec4<T> _ywzz() const; const _xvec4<T> _zwzz() const; const _xvec4<T> _wwzz() const; const _xvec4<T> _xxwz() const; const _xvec4<T> _yxwz() const; const _xvec4<T> _zxwz() const; const _xvec4<T> _wxwz() const; const _xvec4<T> _xywz() const; const _xvec4<T> _yywz() const; const _xvec4<T> _zywz() const; const _xvec4<T> _wywz() const; const _xvec4<T> _xzwz() const; const _xvec4<T> _yzwz() const; const _xvec4<T> _zzwz() const; const _xvec4<T> _wzwz() const; const _xvec4<T> _xwwz() const; const _xvec4<T> _ywwz() const; const _xvec4<T> _zwwz() const; const _xvec4<T> _wwwz() const; const _xvec4<T> _xxxw() const; const _xvec4<T> _yxxw() const; const _xvec4<T> _zxxw() const; const _xvec4<T> _wxxw() const; const _xvec4<T> _xyxw() const; const _xvec4<T> _yyxw() const; const _xvec4<T> _zyxw() const; const _xvec4<T> _wyxw() const; const _xvec4<T> _xzxw() const; const _xvec4<T> _yzxw() const; const _xvec4<T> _zzxw() const; const _xvec4<T> _wzxw() const; const _xvec4<T> _xwxw() const; const _xvec4<T> _ywxw() const; const _xvec4<T> _zwxw() const; const _xvec4<T> _wwxw() const; const _xvec4<T> _xxyw() const; const _xvec4<T> _yxyw() const; const _xvec4<T> _zxyw() const; const _xvec4<T> _wxyw() const; const _xvec4<T> _xyyw() const; const _xvec4<T> _yyyw() const; const _xvec4<T> _zyyw() const; const _xvec4<T> _wyyw() const; const _xvec4<T> _xzyw() const; const _xvec4<T> _yzyw() const; const _xvec4<T> _zzyw() const; const _xvec4<T> _wzyw() const; const _xvec4<T> _xwyw() const; const _xvec4<T> _ywyw() const; const _xvec4<T> _zwyw() const; const _xvec4<T> _wwyw() const; const _xvec4<T> _xxzw() const; const _xvec4<T> _yxzw() const; const _xvec4<T> _zxzw() const; const _xvec4<T> _wxzw() const; const _xvec4<T> _xyzw() const; const _xvec4<T> _yyzw() const; const _xvec4<T> _zyzw() const; const _xvec4<T> _wyzw() const; const _xvec4<T> _xzzw() const; const _xvec4<T> _yzzw() const; const _xvec4<T> _zzzw() const; const _xvec4<T> _wzzw() const; const _xvec4<T> _xwzw() const; const _xvec4<T> _ywzw() const; const _xvec4<T> _zwzw() const; const _xvec4<T> _wwzw() const; const _xvec4<T> _xxww() const; const _xvec4<T> _yxww() const; const _xvec4<T> _zxww() const; const _xvec4<T> _wxww() const; const _xvec4<T> _xyww() const; const _xvec4<T> _yyww() const; const _xvec4<T> _zyww() const; const _xvec4<T> _wyww() const; const _xvec4<T> _xzww() const; const _xvec4<T> _yzww() const; const _xvec4<T> _zzww() const; const _xvec4<T> _wzww() const; const _xvec4<T> _xwww() const; const _xvec4<T> _ywww() const; const _xvec4<T> _zwww() const; const _xvec4<T> _wwww() const; #endif// defined(GLM_SWIZZLE) }; } //namespace detail } //namespace glm #include "_cvec4.inl" #endif//glm_core_cvec4  0 167 Aug 07, 2008 at 16:09 Groove, thanks for your post, but please use the …[/code[b][/b]] tags. B) [code]…[/code**] tags. B) 0 101 Aug 12, 2008 at 16:28 Sorry for that I’ll try to remember for next time ! Thanks ! 0 101 Sep 09, 2008 at 01:38 Hi Groove, Wouldn’t using the intermediate references cause pointers to be created and extra overhead in the assembly code? The GLM library looks awesome! Keep up the good work. Eric 0 101 Sep 09, 2008 at 02:10 Some people were already afraid of that but using references provide to compiler that support cross function optimizations to just skip them all: Have a look on this: vec2 v1(1.0); vec2 v2(2.0); vec2 v3(3.0); The following is just the asm code for this line: v1.xy = vec2(v2.xy) + vec2(v3.xy); GCC 3.4.5: 0x00401354 <main+100>: mov %ebx,0xffffffc0(%ebp) 0x00401357 <main+103>: lea 0xffffffec(%ebp),%eax 0x0040135a <main+106>: mov %eax,0xffffffc4(%ebp) 0x0040135d <main+109>: mov 0xffffffc0(%ebp),%eax 0x00401360 <main+112>: mov %esi,0xffffffb0(%ebp) 0x00401363 <main+115>: mov 0xffffffc4(%ebp),%edx 0x00401366 <main+118>: mov %eax,0xffffffb8(%ebp) 0x00401369 <main+121>: lea 0xffffffdc(%ebp),%eax 0x0040136c <main+124>: mov %eax,0xffffffb4(%ebp) 0x0040136f <main+127>: mov 0xffffffb0(%ebp),%eax 0x00401372 <main+130>: mov %edx,0xffffffbc(%ebp) 0x00401375 <main+133>: mov 0xffffffb4(%ebp),%edx 0x00401378 <main+136>: mov %eax,0xffffffa8(%ebp) 0x0040137b <main+139>: mov 0xffffffa8(%ebp),%eax 0x0040137e <main+142>: mov %edx,0xffffffac(%ebp) 0x00401381 <main+145>: flds (%eax) 0x00401383 <main+147>: mov 0xffffffb8(%ebp),%eax 0x00401386 <main+150>: fadds (%eax) 0x00401388 <main+152>: mov 0xffffffac(%ebp),%eax 0x0040138b <main+155>: flds (%eax) 0x0040138d <main+157>: mov 0xffffffbc(%ebp),%eax 0x00401390 <main+160>: fadds (%eax) 0x00401392 <main+162>: fxch %st(1) 0x00401394 <main+164>: fsts 0xffffffc8(%ebp)  GCC 4.3.0: 0040809F mov eax,DWORD PTR [ebp+12] 004080A2 mov ecx,DWORD PTR [ebp+16] 004080A5 mov edx,DWORD PTR [ebp+20] 004080A8 fld DWORD PTR [edx+4] 004080AB fadd DWORD PTR [ecx+4] 004080AE fld DWORD PTR [edx] 004080B0 fadd DWORD PTR [ecx] 004080B2 fstp DWORD PTR [eax] 004080B4 fstp DWORD PTR [eax+4]  with SSE 00408123 mov edx,DWORD PTR [ebp+20] 00408126 mov ecx,DWORD PTR [ebp+16] 00408129 mov eax,DWORD PTR [ebp+12] 0040812C movss xmm1,DWORD PTR [edx+4] 00408131 movss xmm0,DWORD PTR [edx] 00408135 addss xmm1,DWORD PTR [ecx+4] 0040813A addss xmm0,DWORD PTR [ecx] 0040813E movss DWORD PTR [eax+4],xmm1 00408143 movss DWORD PTR [eax],xmm0  VC8  mov eax, DWORD PTR _b$[esp-4]
fld    DWORD PTR [eax]
fld    DWORD PTR [eax+4]
mov    eax, DWORD PTR _a$[esp-4] fld DWORD PTR [eax+4] fld DWORD PTR [eax] mov eax, DWORD PTR ___$ReturnUdt$[esp-4] faddp ST(3), ST(0) fxch ST(2) fstp DWORD PTR [eax] faddp ST(1), ST(0) fstp DWORD PTR [eax+4]  With SSE  mov eax, DWORD PTR _b$[esp-4]
movss    xmm2, DWORD PTR [eax]
movss    xmm3, DWORD PTR [eax+4]
mov    eax, DWORD PTR _a$[esp-4] movss xmm0, DWORD PTR [eax] movss xmm1, DWORD PTR [eax+4] mov eax, DWORD PTR ___$ReturnUdt\$[esp-4]
movss    DWORD PTR [eax], xmm0
movss    DWORD PTR [eax+4], xmm1


GCC 3.4.5 show the problem you point. But GCC 3.x didn’t supported cross function optimizations. It gets available since GCC 4.1 I think. Maybe some with GCC 4.0. With Vistua Studioit is supported for ages, even Visual C++ 6 but I’m not sure, under the name whole program optimizations.

I’m doing my best to keep up GLM and GLM 0.8.x will be a good step for this. GLSL 1.30 support indeed but also lot of internal improvements :)

0
101 Nov 12, 2013 at 22:05

I personally find GLM being too verbose and with a lot of code repetition, despite heavy macro usage.

Compared to GLM, CxxSwizzle (https://github.com/gwiazdorrr/CxxSwizzle) uses more modern C++ (I settled for C++11’s subset of MSVC 2010), to the benefit of greatly reduced codebase and no code repetition. For instance, there’s only one vector and matrix class (GLM has dozens).

0
101 Nov 12, 2013 at 22:10

I know this topic is older than mountains, but it still may help someone.

My own take on the topic is the CxxSwizzle library (https://github.com/gwiazdorrr/CxxSwizzle). I focused on maximal GLSL compatibility and it has been taken to the extent where where you can take most GLSL fragment shaders from either http://glsl.heroku.com or http://shadertoy.com and run them as C++, literally without any changes. Basically you can now debug shaders in IDE of your liking just like any other C++ code, including assertions, watches, conditional breakpoints and such. For instance, this is now valid C++ code (check the sample):

uniform float time;
uniform vec2 mouse;
uniform vec2 resolution;
//varying vec2 surfacePosition;

#define MAX_ITER 16
void main( void ) {
//vec2 p = surfacePosition*8.0;
vec2 uv = gl_FragCoord.xy / resolution.xy;
uv = uv * 4.0 - 2.0;
vec2 p = uv;

vec2 i = p;
float c = 2.0;
float inten = 1.0;

for (int n = 0; n < MAX_ITER; n++) {
float t = time * (1.0 - (1.0 / float(n+1)));
i = p + vec2(cos(t - i.x) + sin(t + i.y), sin(t - i.y) + cos(t + i.x));
c += 1.0/length(vec2(p.x / (sin(i.x+t)/inten),p.y / (cos(i.y+t)/inten)));
}
c /= float(MAX_ITER);
float pulse = abs(sin(time*5.));
float pulse2 = pow(abs(sin(time*3.)),.25);
float pulse3 = pow(abs(sin(time*5.)),2.);
gl_FragColor = vec4(vec3(pow(c,1.5+pulse2/2.))*vec3(1.0+pulse2, 2.0-pulse2, 1.5+pulse3)*(1.+pulse2)/2., 1.0);

}


At the moment the underlying math implementation is naive, but due to component based structure of the code it should be a straightforward addition. The point of CxxSwizzle was to compile GLSL without any changes and it worked!

That said, compiling HLSL is also possible, but with minimal code changes (you just can’t replicate semantics).