Swizzle operator in C++ !

00000000000000000000000000000000
0
Anonymous Aug 01, 2008 at 14:00

The other day with a co-worker (Leo Benaducci) we started a small contest: adding support for the swizzle operator, available in shader languages (hlsl, cg, glsl), to any standard Vector 2, 3 or 4 class in C++. Something like:

Vector3 a;
Vector3 b;
Vector3 c;

a = b.xyy;
c.yz = b.zz;

An hour later (more or less), we both came with a solution, but using different approaches. Leo solved it with a template, and I just used a couple of macros. Both solutions provide optimal assembly code in VC++ 2005 compiler with no additional overhead at all from the swizzle operator. Here you have both versions and examples showing how to use them.

Leo swizzle with mini vector3 class:

class LVector3
{
public:
    float x,y,z;

    inline LVector3()
    {
        x=y=z=0;
    }

    inline LVector3(float _x, float _y, float _z)
    {
        x = _x;
        y = _y;
        z = _z;
    }

    template<int srcSwz, int dstSwz>
    __forceinline void _swz(LVector3 v)
    {
        const int _srcSwz = srcSwz;
        const int _dstSwz = dstSwz;
        const char *srcSwizzle = (const char*) &_srcSwz;
        const char *dstSwizzle = (const char*) &_dstSwz;

        int i = 255;
        if(*(char*)&i & 255)
        {
            *(&x + (srcSwizzle[2] - 'x')) = *(&v.x + (dstSwizzle[2] - 'x'));
            *(&x + (srcSwizzle[1] - 'x')) = *(&v.x + (dstSwizzle[1] - 'x'));
            *(&x + (srcSwizzle[0] - 'x')) = *(&v.x + (dstSwizzle[0] - 'x'));
        }
        else
        {
            *(&x + (srcSwizzle[0] - 'x')) = *(&v.x + (dstSwizzle[0] - 'x'));
            *(&x + (srcSwizzle[1] - 'x')) = *(&v.x + (dstSwizzle[1] - 'x'));
            *(&x + (srcSwizzle[2] - 'x')) = *(&v.x + (dstSwizzle[2] - 'x'));
        }

    }
};

#define swz(a, b, c) _swz<#@a, #@c>(b)

Now you can simply use the operator like this:

LVector3 a;
LVector3 b;
LVector3 c;

b.swz(xzy, a, zyx);
c.swz(xxx, a, zzz); // just to test optimizer!

Looks complex doesn’t it? But don’t be afraid of using it, because the compiler solved everything and generated optimal code. The first operation unfolds to 3 floating point assignments and the second, only 1.

My version only uses macros, but has additional support for any operation (not just copying) and can use any vector 2, 3, or 4 classes. The only requirement is that vector class implements the [] operator to access vector elements.

Enlight swizzle:

#define i__(e,c) (((*(c+(char*)(e))-0x77)-1)&3)

#define SW2(src,ss,op,dest,ds)  src[i__(#ss,0)] op dest[i__(#ds,0)]; \\
                src[i__(#ss,1)] op dest[i__(#ds,1)];

#define SW3(src,ss,op,dest,ds)  SW2(src,ss,op,dest,ds)  \\
                src[i__(#ss,2)] op dest[i__(#ds,2)]; 
 
#define SW4(src,ss,op,dest,ds)  SW3(src,ss,op,dest,ds)  \\
                src[i__(#ss,3)] op dest[i__(#ds,3)];

Now, use it like this:

Vec4 a(1,2,3,4);
Vec4 b(5,6,7,8);

SW4(a,xyzw,=,b,xxyy);    // a.xyzw = b.xxyy; (a=5,5,6,6)
SW2(b,xy,+=,b,zz);       // b.xy += b.zz; (b=12,13,7,8)

The tricky part is the i__ macro. It converts “xyzw” characters to values 0,1,2 and 3. Then, I simply use those values to access vector elements one by one, and let the optimizer do the rest.

Have fun!

Enlight.

23 Replies

Please log in or register to post a reply.

D619d95cddb1edb227f51ef539d15cdc
0
Nautilus 103 Aug 01, 2008 at 15:01

Within Leo’s template:

int i = 255;
if(*(char*)&i & 255)
{
    // ...
}
else
{
    // ...
}

You may want to fix that.

Ciao ciao : )

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Aug 01, 2008 at 16:18

I believe that if statement is checking the endianness of the machine. You see it sets an integer to 255 and then checks its first byte, which will be 255 on a little-endian machine and 0 on a big-endian one.

However, the cast should really be to unsigned char, as signed char can’t hold the value 255 (although bitwise-and may be ignoring signed/unsigned differences anyway).

D619d95cddb1edb227f51ef539d15cdc
0
Nautilus 103 Aug 01, 2008 at 17:05

I didn’t notice. Right you are.

Ciao ciao : )

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Aug 01, 2008 at 17:46

Hmm, it looks a bit inefficient with the strings and all. For the second implementation, you should at least put the code of the macro’s inside a do {…} while(0) block. Otherwise you could get in serious problems when using things like if-statments without braces.

My version:

template<int X, int Y, int Z, int W> struct swizzle { };
template<int X> struct swizzle1 : public swizzle<X,X,X,X> { };
template<int X, int Y> struct swizzle2 : public swizzle<X,Y,Y,Y> { };
template<int X, int Y, int Z> struct swizzle3 : public swizzle<X,Y,Z,Z> { };

template<int X, int Y> swizzle2<X,Y> operator,(swizzle1<X>, swizzle1<Y>) { return swizzle2<X,Y>(); }
template<int X, int Y, int Z> swizzle3<X,Y,Z> operator,(swizzle2<X,Y>, swizzle1<Z>) { return swizzle3<X,Y,Z>(); }
template<int X, int Y, int Z, int W> swizzle<X,Y,Z,W> operator,(swizzle3<X,Y,Z>, swizzle1<W>) { return swizzle<X,Y,Z,W>(); }

static swizzle1<0> _x;
static swizzle1<1> _y;
static swizzle1<2> _z;
static swizzle1<3> _w;

template<int X, int Y, int Z, int W> struct SwizzledVector;

struct Vector
{
    float x, y, z, w;

    Vector() { }
    Vector(float x, float y, float z, float w) : x(x), y(y), z(z), w(w) { }

    template<int X, int Y, int Z, int W>
    SwizzledVector<X,Y,Z,W> & operator[](swizzle<X,Y,Z,W>)
    {
        return reinterpret_cast<SwizzledVector<X,Y,Z,W>&>(*this);
    }

    template<int X, int Y, int Z, int W>
    const SwizzledVector<X,Y,Z,W> & operator[](swizzle<X,Y,Z,W>) const
    {
        return reinterpret_cast<const SwizzledVector<X,Y,Z,W>&>(*this);
    }
};

template<int X, int Y, int Z, int W, int I> struct SwizzleSelector;
template<int X, int Y, int Z, int W> struct SwizzleSelector<X,Y,Z,W,0> { static const int index = X; };
template<int X, int Y, int Z, int W> struct SwizzleSelector<X,Y,Z,W,1> { static const int index = Y; };
template<int X, int Y, int Z, int W> struct SwizzleSelector<X,Y,Z,W,2> { static const int index = Z; };
template<int X, int Y, int Z, int W> struct SwizzleSelector<X,Y,Z,W,3> { static const int index = W; };


template<int X, int Y, int Z, int W>
struct SwizzledVector
{
    template<int I> float & Get() { return reinterpret_cast<float*>(this)[SwizzleSelector<X,Y,Z,W,I>::index]; }
    template<int I> float Get() const { return const_cast<SwizzledVector&>(*this).Get<I>(); }
    template<int I> float & Get(swizzle1<I>) { return Get<I>(); }
    template<int I> float Get(swizzle1<I>) const { return Get<I>(); }

    SwizzledVector & operator=(const Vector & v)
    {
        Get<0>() = v.x;
        Get<1>() = v.y;
        Get<2>() = v.z;
        Get<3>() = v.w;
        return *this;
    }

    template<int X2, int Y2, int Z2, int W2>
    SwizzledVector & operator=(const SwizzledVector<X,Y,Z,W> & v)
    {
        Get<0>() = v.Get<0>();
        Get<1>() = v.Get<1>();
        Get<2>() = v.Get<2>();
        Get<3>() = v.Get<3>();
        return *this;
    }

    operator Vector() const
    {
        return Vector(Get<0>(), Get<1>(), Get<2>(), Get<3>());
    }
};




// usage
int main()
{
    Vector a(1, 2, 3, 4);
    Vector b = a[_w,_y,_x,_w];  // (4, 2, 1, 4)
    Vector c = a[_z];  // (3, 3, 3, 3)

    c[_x,_z,_y,_w] = a; // (1, 3, 2, 4)
    c[_y,_w] = a[_x]; // (*, 1, *, 1)
}

Of course you could generate combinations like _xxy etc. to get rid of the comma’s. The cool thing of this implementation is that, because it uses compile-time constants, you could even make use of compiler instrinsics to permute actual SSE registers.

191a0d33d09b44852c9e3272a14aa532
0
Enlight 101 Aug 01, 2008 at 18:28

Not inefficient at all. Check the assembler output. Do the code as you wish, the assembly output is perfect in both cases, although I didn’t tried your code…

Dd10f7a844021b40eabea75070820632
0
leobenaducci 101 Aug 01, 2008 at 18:33

i’m not sure why you say that, check the generated asm

    __asm int 3
0040107C  int         3    
    b.swz(xzy, a, zyx);
0040107D  fld         dword ptr [esp+1Ch] 
00401081  fstp        dword ptr [esp+8] 
00401085  fld         dword ptr [esp+18h] 
00401089  fstp        dword ptr [esp+10h] 
0040108D  fld         dword ptr [esp+14h] 
00401091  fstp        dword ptr [esp+0Ch] 
    __asm int 3
00401095  int         3    

do you think there is a faster way to do this?

B386dfe4f679588732b2a64355e22394
0
Shuank 101 Aug 01, 2008 at 18:52

I Like most Leo’s Solution, seems to be more clear for the programmer, at least for me (a noob one :)), and more OO kind.

Greetings!

46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Aug 01, 2008 at 19:07

They both suffer from the fact that they rely on strings:

b.swz(aaa, a, bbb);
SW4(a,beer,=,b,good);

both statements will compile just fine, and chrash at runtime.

Dd10f7a844021b40eabea75070820632
0
leobenaducci 101 Aug 01, 2008 at 19:16

sorry, but both codes can send compile time asserts

A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Aug 01, 2008 at 20:33

Kenneth’s right; there’s no compile-time protection against using letters outside the ‘w’ to ‘z’ range. Although with the Enlight version, due to the ‘& 3’ in the macro, any other letters will get silently remapped into the ‘w’-‘z’ range; that’s slightly better than the Leo version, in which other letters will result in runtime out-of-bounds array accesses.

To fix this, you could add compile-time asserts that each character is within the expected range. Here is a bit of code to do a compile-time assert (from boost):

template <bool> struct STATIC_ASSERTION_FAILURE;
template <> struct STATIC_ASSERTION_FAILURE<true> {};

#define STATIC_ASSERT(f) \
    sizeof(STATIC_ASSERTION_FAILURE<(bool)(f)>);

(The real one is slightly more complicated, as it is designed to work outside of a function scope, but that gives you the idea.)

Dd10f7a844021b40eabea75070820632
0
leobenaducci 101 Aug 01, 2008 at 21:30

here is it, out of bounds protection

#define OUT_OF_BOUNDS(a, b, c)  ((a)<b || (a)>c)

#define UN_OP(name, op) template<int srcSwz, int dstSwz>  \
                        __forceinline void _##name(LVector3 v)  \
                        {                                   \
                            const int _srcSwz = srcSwz;     \
                            const int _dstSwz = dstSwz;     \
                            const char *srcSwizzle = (char*) &_srcSwz;  \
                            const char *dstSwizzle = (char*) &_dstSwz;  \
                                                                                \
                            if(OUT_OF_BOUNDS(*(srcSwizzle+0), 'x', 'z'))        \
                                return;                                         \
                            if(OUT_OF_BOUNDS(*(dstSwizzle+0), 'x', 'z'))        \
                                return;                                         \
                            if(OUT_OF_BOUNDS(*(srcSwizzle+1), 'x', 'z'))        \
                                return;                                         \
                            if(OUT_OF_BOUNDS(*(dstSwizzle+1), 'x', 'z'))        \
                                return;                                         \
                            if(OUT_OF_BOUNDS(*(srcSwizzle+2), 'x', 'z'))        \
                                return;                                         \
                            if(OUT_OF_BOUNDS(*(dstSwizzle+2), 'x', 'z'))        \
                                return;                                         \
                                                                                \
                            int i = 255;                                        \
                            if(*(char*)&i & 255)                                \
                            {                                                   \
                                *(&x + (srcSwizzle[2] - 'x')) op *(&v.x + (dstSwizzle[2] - 'x'));   \
                                *(&x + (srcSwizzle[1] - 'x')) op *(&v.x + (dstSwizzle[1] - 'x'));   \
                                *(&x + (srcSwizzle[0] - 'x')) op *(&v.x + (dstSwizzle[0] - 'x'));   \
                            }                                                   \
                            else                                                \
                            {                                                   \
                                *(&x + (srcSwizzle[0] - 'x')) op *(&v.x + (dstSwizzle[0] - 'x'));   \
                                *(&x + (srcSwizzle[1] - 'x')) op *(&v.x + (dstSwizzle[1] - 'x'));   \
                                *(&x + (srcSwizzle[2] - 'x')) op *(&v.x + (dstSwizzle[2] - 'x'));   \
                            }                                                   \
                        }

and the asm

    __asm int 3
0040103A  int         3    
    b.mul(feb, a, zxy);
    __asm int 3
0040103B  int         3
46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Aug 02, 2008 at 10:24

@leobenaducci

and the asm

    __asm int 3
0040103A  int         3    
    b.mul(feb, a, zxy);
    __asm int 3
0040103B  int         3

Clearly something has gone wrong here… :)

Anyways, I tried to create some kind of swizzle support for a float4 class I was working on, but didn’t have much luck. After seeing this thread, I decided to go back and give it another go, and finally succeeded. Here it is, in its entirety:

#pragma once
#include <xmmintrin.h>

struct __declspec(align(16)) float4
{
private:
    template <unsigned mask = ((3 << 6) | (2 << 4) | (1 << 2) | 0)>
    struct swizzle_proxy
    {
        __m128 &ref;
        swizzle_proxy(__m128 &ref)
            : ref(ref)
        { }

        __m128 get_swizzled() const { return _mm_shuffle_ps(ref, ref, mask); }
        swizzle_proxy& operator = (const float4 &other);

        template<unsigned other_mask>
        swizzle_proxy& operator = (const swizzle_proxy<other_mask> &other)
        {
            __m128 data = other.get_swizzled();
            ref = _mm_shuffle_ps(data, data, mask);
            return *this;
        }
    };

public:
    float4()
    { }

    float4(const float4 &other)
        : x(other.x)
        , y(other.y)
        , z(other.z)
        , w(other.w)
    { }

    explicit float4(const __m128 _xmm)
        : xmm(_xmm)
    { }

    float4(float a, float b, float c, float d)
        : x(a)
        , y(b)
        , z(c)
        , w(d)
    { }

    template<unsigned mask>
    float4(const swizzle_proxy<mask> &other)
        : xmm(other.get_swizzled())
    { }

    float4 operator + (const float4 &other) const       { return float4(_mm_add_ps(xmm, other.xmm)); }
    float4 operator - (const float4 &other) const       { return float4(_mm_sub_ps(xmm, other.xmm)); }
    float4 operator * (const float4 &other) const       { return float4(_mm_mul_ps(xmm, other.xmm)); }
    float4 operator / (const float4 &other) const       { return float4(_mm_div_ps(xmm, other.xmm)); }
    float4 operator & (const float4 &other) const       { return float4(_mm_and_ps(xmm, other.xmm)); }
    float4 operator | (const float4 &other) const       { return float4(_mm_or_ps(xmm, other.xmm)); }
    float4 operator ^ (const float4 &other) const       { return float4(_mm_xor_ps(xmm, other.xmm)); }
    float4 andnot(const float4 &other) const            { return float4(_mm_andnot_ps(xmm, other.xmm)); } // "~this & other"

    float4 operator + (float f) const                   { return float4(_mm_add_ps(xmm, _mm_set_ps1(f))); }
    float4 operator - (float f) const                   { return float4(_mm_sub_ps(xmm, _mm_set_ps1(f))); }
    float4 operator * (float f) const                   { return float4(_mm_mul_ps(xmm, _mm_set_ps1(f))); }
    float4 operator / (float f) const                   { return float4(_mm_div_ps(xmm, _mm_set_ps1(f))); }
    float4 operator & (float f) const                   { return float4(_mm_and_ps(xmm, _mm_set_ps1(f))); }
    float4 operator | (float f) const                   { return float4(_mm_or_ps(xmm, _mm_set_ps1(f))); }
    float4 operator ^ (float f) const                   { return float4(_mm_xor_ps(xmm, _mm_set_ps1(f))); }
    float4 andnot(float f) const                        { return float4(_mm_andnot_ps(xmm, _mm_set_ps1(f))); } // "~this & f"

    template<unsigned mask>
    float4& operator  = (const swizzle_proxy<mask> &other)            { xmm = other.get_swizzled(); return *this; }

    float4& operator  = (const float4 &other)           { xmm = other.xmm; return *this; }
    float4& operator += (const float4 &other)           { xmm = _mm_add_ps(xmm, other.xmm); return *this; }
    float4& operator -= (const float4 &other)           { xmm = _mm_sub_ps(xmm, other.xmm); return *this; }
    float4& operator *= (const float4 &other)           { xmm = _mm_mul_ps(xmm, other.xmm); return *this; }
    float4& operator /= (const float4 &other)           { xmm = _mm_div_ps(xmm, other.xmm); return *this; }
    float4& operator &= (const float4 &other)           { xmm = _mm_and_ps(xmm, other.xmm); return *this; }
    float4& operator |= (const float4 &other)           { xmm = _mm_or_ps(xmm, other.xmm); return *this; }
    float4& operator ^= (const float4 &other)           { xmm = _mm_xor_ps(xmm, other.xmm); return *this; }
    float4& andnot_asg(const float4 &other)             { xmm = _mm_andnot_ps(xmm, other.xmm); return *this; } // "this = ~this & other"

    float4& operator += (float f)                       { xmm = _mm_add_ps(xmm, _mm_set_ps1(f)); return *this; }
    float4& operator -= (float f)                       { xmm = _mm_sub_ps(xmm, _mm_set_ps1(f)); return *this; }
    float4& operator *= (float f)                       { xmm = _mm_mul_ps(xmm, _mm_set_ps1(f)); return *this; }
    float4& operator /= (float f)                       { xmm = _mm_div_ps(xmm, _mm_set_ps1(f)); return *this; }
    float4& operator &= (float f)                       { xmm = _mm_and_ps(xmm, _mm_set_ps1(f)); return *this; }
    float4& operator |= (float f)                       { xmm = _mm_or_ps(xmm, _mm_set_ps1(f)); return *this; }
    float4& operator ^= (float f)                       { xmm = _mm_xor_ps(xmm, _mm_set_ps1(f)); return *this; }
    float4& andnot_asg(float f)                         { xmm = _mm_andnot_ps(xmm, _mm_set_ps1(f)); return *this; } // "this = ~this & f"

    friend float4 operator / (float f, const float4 &a) { return float4(_mm_mul_ps(_mm_set_ps1(1.0f/f), a.xmm)); }


    friend float4 sqrt(const float4 &a)                                     { return float4(_mm_sqrt_ps(a.xmm)); }
    friend float4 rcp(const float4 &a)                                      { return float4(_mm_rcp_ps(a.xmm)); }
    friend float4 rsqrt(const float4 &a)                                    { return float4(_mm_rsqrt_ps(a.xmm)); }
    friend float4 horizontal_add(const float4 &a)                           { return float4(_mm_add_ss(a.xmm,_mm_add_ss(_mm_shuffle_ps(a.xmm, a.xmm, 1),_mm_add_ss(_mm_shuffle_ps(a.xmm, a.xmm, 2),_mm_shuffle_ps(a.xmm, a.xmm, 3))))); }
    friend float4 min(const float4 &a, const float4 &b)                     { return float4(_mm_min_ps(a.xmm, b.xmm)); }
    friend float4 max(const float4 &a, const float4 &b)                     { return float4(_mm_max_ps(a.xmm, b.xmm)); }
    friend float4 dot(const float4 &a, const float4 &b)                     { return horizontal_add(a*b); }
    friend float4 length(const float4 &a)                                   { return sqrt(dot(a, a)); }
    friend float4 rlength(const float4 &a)                                  { return rsqrt(dot(a, a)); }
    friend float4 normalize(const float4 &a)                                { return a * rlength(a); }
    friend float4 distance(const float4 &a, const float4 &b)                { return length(a-b); }
    friend float4 clamp(const float4 &x, const float4 &a, const float4 &b)  { return max(a, min(b, x)); }

    friend float4 cross(const float4 &a, const float4 &b)
    {
        enum
        {
            shuf_yzxw = _MM_SHUFFLE(3, 0, 2, 1),
            shuf_zxyw = _MM_SHUFFLE(3, 1, 0, 2) 
        };

        __m128 left  = _mm_mul_ps(_mm_shuffle_ps(a.xmm, a.xmm, shuf_yzxw), _mm_shuffle_ps(b.xmm, b.xmm, shuf_zxyw));
        __m128 right = _mm_mul_ps(_mm_shuffle_ps(a.xmm, a.xmm, shuf_zxyw), _mm_shuffle_ps(b.xmm, b.xmm, shuf_yzxw));
#if 0
        return float4(_mm_add_ps(_mm_set_ps(1.0f, 0.0f, 0.0f, 0.0f), _mm_sub_ps(left, right)));
#else
        return float4(_mm_sub_ps(left, right)); // .w equals zero
#endif
    }

    // NewtonRaphson Reciprocal
    // [2 * rcpps(a) - (a * rcpps(a) * rcpps(a))]
    friend float4 rcp_nr(const float4 &a)
    {
        float4 ra0 = rcp(a);
        return (ra0 + ra0) - (a * ra0 * ra0);
    }


    template<const unsigned a, const unsigned b, const unsigned c, const unsigned d>
    swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> shuffle()
    {
        swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> sw(xmm);
        return sw;
    }

public:
    union
    {
        struct { float x,y,z,w; };
        struct { float r,g,b,a; };
        __m128 xmm;
    };
};

template<unsigned mask>
float4::swizzle_proxy<mask>& float4::swizzle_proxy<mask>::operator = (const float4 &other)
{
    ref = _mm_shuffle_ps(other.xmm, other.xmm, mask);
    return *this;
}

// Test-defines
#define xyzw shuffle<0,1,2,3>()
#define wzyx shuffle<3,2,1,0>()
#define xyxy shuffle<0,1,0,1>()
#define yzyx shuffle<1,2,1,0>()

#define xxxx shuffle<0,0,0,0>()
#define yyyy shuffle<1,1,1,1>()
#define zzzz shuffle<2,2,2,2>()
#define wwww shuffle<3,3,3,3>()

Using the supplied defines, it is now possible to write code like this (which is pretty close to Cg):

float4 float4_test()
{
    float4 f1 = float4(1,2,3,4);
    printf("'f1' : %f, %f, %f, %f\n", f1.x, f1.y, f1.z, f1.w);

    float4 f3;
    f3.wzyx = f1;
    printf("'f3.wzyx = f1' : %f, %f, %f, %f\n", f3.x, f3.y, f3.z, f3.w);

    float4 f2 = f1.yyyy;
    printf("'f2 = f1.yyyy' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w);

    f2.wzyx = f3.xyxy;
    printf("'f2.wzyx = f3.xyxyx' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w);

    float4 f4 = f2 + f1.wzyx;
    f1 = f1.wzyx;
    printf("'f1.wzyx' : %f, %f, %f, %f\n", f1.x, f1.y, f1.z, f1.w);
    printf("'f2.xyzw' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w);
    printf("'f4 = f2 + f1.wzyx' : %f, %f, %f, %f\n", f4.x, f4.y, f4.z, f4.w);

    float4 f5 = f4 * f2.yzyx;
    printf("f5 = 'f4 * f2.yzyx' : %f, %f, %f, %f\n", f5.x, f5.y, f5.z, f5.w);

    return f5;
}

which results in the following output:

‘f1’ : 1.000000, 2.000000, 3.000000, 4.000000
‘f3.wzyx = f1’ : 4.000000, 3.000000, 2.000000, 1.000000
‘f2 = f1.yyyy’ : 2.000000, 2.000000, 2.000000, 2.000000
‘f2.wzyx = f3.xyxyx’ : 3.000000, 4.000000, 3.000000, 4.000000
‘f1.wzyx’ : 4.000000, 3.000000, 2.000000, 1.000000
‘f2.xyzw’ : 3.000000, 4.000000, 3.000000, 4.000000
‘f4 = f2 + f1.wzyx’ : 7.000000, 7.000000, 5.000000, 5.000000
f5 = ‘f4 * f2.yzyx’ : 28.000000, 21.000000, 20.000000, 15.000000

And finally, the generated assembly (without all the printf calls):

; 6    :    float4 f1 = float4(1,2,3,4);

    fld1
    fstp    DWORD PTR _f1$[esp+16]
    fld DWORD PTR __real@40000000
    fstp    DWORD PTR _f1$[esp+20]
    fld DWORD PTR __real@40400000
    fstp    DWORD PTR _f1$[esp+24]
    fld DWORD PTR __real@40800000
    fstp    DWORD PTR _f1$[esp+28]

; 7    :    //printf("'f1' : %f, %f, %f, %f\n", f1.x, f1.y, f1.z, f1.w);
; 8    : 
; 9    :    float4 f3;
; 10   :    f3.wzyx = f1;

    movaps  xmm1, XMMWORD PTR _f1$[esp+16]
    shufps  xmm1, xmm1, 27              ; 0000001bH

; 11   :    //printf("'f3.wzyx = f1' : %f, %f, %f, %f\n", f3.x, f3.y, f3.z, f3.w);
; 12   : 
; 13   :    float4 f2 = f1.yyyy;
; 14   :    //printf("'f2 = f1.yyyy' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w);
; 15   : 
; 16   :    f2.wzyx = f3.xyxy;

    movaps  xmm0, xmm1
    shufps  xmm0, xmm1, 68              ; 00000044H
    shufps  xmm0, xmm0, 27              ; 0000001bH

; 17   :    //printf("'f2.wzyx = f3.xyxyx' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w);
; 18   : 
; 19   :    float4 f4 = f2 + f1.wzyx;
; 20   :    f1 = f1.wzyx;
; 21   :    //printf("'f1.wzyx' : %f, %f, %f, %f\n", f1.x, f1.y, f1.z, f1.w);
; 22   :    //printf("'f2.xyzw' : %f, %f, %f, %f\n", f2.x, f2.y, f2.z, f2.w);
; 23   :    //printf("'f4 = f2 + f1.wzyx' : %f, %f, %f, %f\n", f4.x, f4.y, f4.z, f4.w);
; 24   : 
; 25   :    float4 f5 = f4 * f2.yzyx;

    movaps  xmm2, xmm0
    shufps  xmm2, xmm0, 25              ; 00000019H
    addps   xmm1, xmm0
    mulps   xmm2, xmm1
    movaps  XMMWORD PTR [eax], xmm2

; 26   :    //printf("f5 = 'f4 * f2.yzyx' : %f, %f, %f, %f\n", f5.x, f5.y, f5.z, f5.w);
; 27   : 
; 28   :    return f5;
; 29   : }
46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Aug 02, 2008 at 15:38

A small adendum to the above: Instead of the single ‘shuffle’ function, there should be two. A read-only, and a read-write version.

template<const unsigned a, const unsigned b, const unsigned c, const unsigned d>
swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> rw_shuffle()
{
    swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> sw(xmm);
    return sw;
}

template<const unsigned a, const unsigned b, const unsigned c, const unsigned d>
const swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> ro_shuffle()
{
    swizzle_proxy<(d << 6) | (c << 4) | (b << 2) | a> sw(xmm);
    return sw;
}

...

#define xyzw rw_shuffle<0,1,2,3>()
#define wzyx rw_shuffle<3,2,1,0>()
#define xyxy ro_shuffle<0,1,0,1>()
#define yzyx ro_shuffle<1,2,1,0>()

#define xxxx ro_shuffle<0,0,0,0>()
#define yyyy ro_shuffle<1,1,1,1>()
#define zzzz ro_shuffle<2,2,2,2>()
#define wwww ro_shuffle<3,3,3,3>()

This way you can’t accidentally perform operations like ‘v1.xyxy = float4(1234)’

191a0d33d09b44852c9e3272a14aa532
0
Enlight 101 Aug 03, 2008 at 07:04

@Kenneth Gorking

Clearly something has gone wrong here…

No, it is fine! It should NOT compile anything because it has bad syntax:

b.mul(feb, a, zxy); // b.feb *= a.zxy;

what is “feb”?, nothing, so it doesn’t do anything hehehe

Now, about your work on the swizzle operator, I’m just amazed, I didn’t imagine someone would get *that* far…

I didn’t try it out yet, but looks amazing, great work, specially for using 128 bit registers.

46407cc1bdfbd2db4f6e8876d74f990a
0
Kenneth_Gorking 101 Aug 03, 2008 at 11:10

@Enlight

No, it is fine! It should NOT compile anything because it has bad syntax:

b.mul(feb, a, zxy); // b.feb *= a.zxy;

what is “feb”?, nothing, so it doesn’t do anything hehehe

Oh yeah, I missed that part :)
Instead of just doing nothing, maybe you should use a compile-time assert to alert the user to his mistake?
@Enlight

Now, about your work on the swizzle operator, I’m just amazed, I didn’t imagine someone would get *that* far… I didn’t try it out yet, but looks amazing, great work, specially for using 128 bit registers.

Thanks :)

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Aug 03, 2008 at 12:10

Btw, it’s pretty pointless to make template integer arguments const :)

40df6f0cd8571262dabedf4bdcf1e093
0
Groove 101 Aug 07, 2008 at 15:39

I have also implemented swizzle operator in my math library (glm.g-truc.net)

My implementation is based on a third party class that only contain references.

My implementation is based on GLSL syntax so that we could do something like this:

vec4 v1(1, 2, 3, 4);
vec4 v2(1);
v2.yzx = v1.xyz + v1.yzx;

Here is some detail of the implementation… so annoying to do because of the #defines like yzx that wrap function calls. I have no SSE optimization yet for this.

I will definitely come back on this post to have a closer look of your implementation! :)

Enjoy:

namespace glm{

namespace detail{



    template <typename T>

    class _xref4

    {

    public:

        _xref4(T& x, T& y, T& z, T& w);

        _xref4<T>& operator= (const _xref4<T>& r);

        _xref4<T>& operator+=(const _xref4<T>& r);

        _xref4<T>& operator-=(const _xref4<T>& r);

        _xref4<T>& operator*=(const _xref4<T>& r);

        _xref4<T>& operator/=(const _xref4<T>& r);

        _xref4<T>& operator= (const _xvec4<T>& v);

        _xref4<T>& operator+=(const _xvec4<T>& v);

        _xref4<T>& operator-=(const _xvec4<T>& v);

        _xref4<T>& operator*=(const _xvec4<T>& v);

        _xref4<T>& operator/=(const _xvec4<T>& v);

        T& x;

        T& y;

        T& z;

        T& w;

    };



} //namespace detail

} //namespace glm



namespace glm{

namespace detail{



    template <typename T>

    class _cvec4

    {

    public:

        typedef T value_type;

        typedef int size_type;

        static const size_type value_size;



////////////////

// Components //



#ifndef GLM_SINGLE_COMP_NAME



#if GLM_FORCE_HALF_COMPATIBILITY || (defined(GLM_COMPILER) && (GLM_COMPILER & GLM_COMPILER_VC) && (GLM_COMPILER <= GLM_COMPILER_VC71))

        union

        {

            struct{T x, y, z, w;};

            struct{T r, g, b, a;};

            struct{T s, t, p, q;};

        };

#else

        union{T x, r, s;};

        union{T y, g, t;};

        union{T z, b, p;};

        union{T w, a, q;};

#endif



#else

        T x, y;

#endif//GLM_SINGLE_COMP_NAME



// Components //

////////////////



        const T* _address() const{return (T*)(this);}

        T* _address(){return (T*)(this);}



        // Constructor

        _cvec4(){}

        _cvec4(const T x, const T y, const T z, const T w);



        // Accesses

        T& operator[](size_type i);

        T operator[](size_type i) const;



#if (!defined(GLM_AUTO_CAST) || (GLM_AUTO_CAST == GLM_ENABLE))

        operator T*();

        operator const T*() const;

#endif//GLM_AUTO_CAST



#if defined(GLM_SWIZZLE)

        // Left hand side 2 components common swizzle operators

        _xref2<T> _yx();

        _xref2<T> _zx();

        _xref2<T> _wx();

        _xref2<T> _xy();

        _xref2<T> _zy();

        _xref2<T> _wy();

        _xref2<T> _xz();

        _xref2<T> _yz();

        _xref2<T> _wz();

        _xref2<T> _xw();

        _xref2<T> _yw();

        _xref2<T> _zw();



        // Right hand side 2 components common swizzle operators

        const _xvec2<T> _xx() const;

        const _xvec2<T> _yx() const;

        const _xvec2<T> _zx() const;

        const _xvec2<T> _wx() const;

        const _xvec2<T> _xy() const;

        const _xvec2<T> _yy() const;

        const _xvec2<T> _zy() const;

        const _xvec2<T> _wy() const;

        const _xvec2<T> _xz() const;

        const _xvec2<T> _yz() const;

        const _xvec2<T> _zz() const;

        const _xvec2<T> _wz() const;

        const _xvec2<T> _xw() const;

        const _xvec2<T> _yw() const;

        const _xvec2<T> _zw() const;

        const _xvec2<T> _ww() const;



        // Left hand side 3 components common swizzle operators

        _xref3<T> _zyx();

        _xref3<T> _wyx();

        _xref3<T> _yzx();

        _xref3<T> _wzx();

        _xref3<T> _ywx();

        _xref3<T> _zwx();

        _xref3<T> _zxy();

        _xref3<T> _wxy();

        _xref3<T> _xzy();

        _xref3<T> _wzy();

        _xref3<T> _xwy();

        _xref3<T> _zwy();

        _xref3<T> _yxz();

        _xref3<T> _wxz();

        _xref3<T> _xyz();

        _xref3<T> _wyz();

        _xref3<T> _xwz();

        _xref3<T> _ywz();

        _xref3<T> _yxw();

        _xref3<T> _zxw();

        _xref3<T> _xyw();

        _xref3<T> _zyw();

        _xref3<T> _xzw();

        _xref3<T> _yzw();



        // Right hand side 3 components common swizzle operators

        const _xvec3<T> _xxx() const;

        const _xvec3<T> _yxx() const;

        const _xvec3<T> _zxx() const;

        const _xvec3<T> _wxx() const;

        const _xvec3<T> _xyx() const;

        const _xvec3<T> _yyx() const;

        const _xvec3<T> _zyx() const;

        const _xvec3<T> _wyx() const;

        const _xvec3<T> _xzx() const;

        const _xvec3<T> _yzx() const;

        const _xvec3<T> _zzx() const;

        const _xvec3<T> _wzx() const;

        const _xvec3<T> _xwx() const;

        const _xvec3<T> _ywx() const;

        const _xvec3<T> _zwx() const;

        const _xvec3<T> _wwx() const;

        const _xvec3<T> _xxy() const;

        const _xvec3<T> _yxy() const;

        const _xvec3<T> _zxy() const;

        const _xvec3<T> _wxy() const;

        const _xvec3<T> _xyy() const;

        const _xvec3<T> _yyy() const;

        const _xvec3<T> _zyy() const;

        const _xvec3<T> _wyy() const;

        const _xvec3<T> _xzy() const;

        const _xvec3<T> _yzy() const;

        const _xvec3<T> _zzy() const;

        const _xvec3<T> _wzy() const;

        const _xvec3<T> _xwy() const;

        const _xvec3<T> _ywy() const;

        const _xvec3<T> _zwy() const;

        const _xvec3<T> _wwy() const;

        const _xvec3<T> _xxz() const;

        const _xvec3<T> _yxz() const;

        const _xvec3<T> _zxz() const;

        const _xvec3<T> _wxz() const;

        const _xvec3<T> _xyz() const;

        const _xvec3<T> _yyz() const;

        const _xvec3<T> _zyz() const;

        const _xvec3<T> _wyz() const;

        const _xvec3<T> _xzz() const;

        const _xvec3<T> _yzz() const;

        const _xvec3<T> _zzz() const;

        const _xvec3<T> _wzz() const;

        const _xvec3<T> _xwz() const;

        const _xvec3<T> _ywz() const;

        const _xvec3<T> _zwz() const;

        const _xvec3<T> _wwz() const;

        const _xvec3<T> _xxw() const;

        const _xvec3<T> _yxw() const;

        const _xvec3<T> _zxw() const;

        const _xvec3<T> _wxw() const;

        const _xvec3<T> _xyw() const;

        const _xvec3<T> _yyw() const;

        const _xvec3<T> _zyw() const;

        const _xvec3<T> _wyw() const;

        const _xvec3<T> _xzw() const;

        const _xvec3<T> _yzw() const;

        const _xvec3<T> _zzw() const;

        const _xvec3<T> _wzw() const;

        const _xvec3<T> _xww() const;

        const _xvec3<T> _yww() const;

        const _xvec3<T> _zww() const;

        const _xvec3<T> _www() const;



        // Left hand side 4 components common swizzle operators

        _xref4<T> _wzyx();

        _xref4<T> _zwyx();

        _xref4<T> _wyzx();

        _xref4<T> _ywzx();

        _xref4<T> _zywx();

        _xref4<T> _yzwx();

        _xref4<T> _wzxy();

        _xref4<T> _zwxy();

        _xref4<T> _wxzy();

        _xref4<T> _xwzy();

        _xref4<T> _zxwy();

        _xref4<T> _xzwy();

        _xref4<T> _wyxz();

        _xref4<T> _ywxz();

        _xref4<T> _wxyz();

        _xref4<T> _xwyz();

        _xref4<T> _yxwz();

        _xref4<T> _xywz();

        _xref4<T> _zyxw();

        _xref4<T> _yzxw();

        _xref4<T> _zxyw();

        _xref4<T> _xzyw();

        _xref4<T> _yxzw();

        _xref4<T> _xyzw();



        // Right hand side 4 components common swizzle operators

        const _xvec4<T> _xxxx() const;

        const _xvec4<T> _yxxx() const;

        const _xvec4<T> _zxxx() const;

        const _xvec4<T> _wxxx() const;

        const _xvec4<T> _xyxx() const;

        const _xvec4<T> _yyxx() const;

        const _xvec4<T> _zyxx() const;

        const _xvec4<T> _wyxx() const;

        const _xvec4<T> _xzxx() const;

        const _xvec4<T> _yzxx() const;

        const _xvec4<T> _zzxx() const;

        const _xvec4<T> _wzxx() const;

        const _xvec4<T> _xwxx() const;

        const _xvec4<T> _ywxx() const;

        const _xvec4<T> _zwxx() const;

        const _xvec4<T> _wwxx() const;

        const _xvec4<T> _xxyx() const;

        const _xvec4<T> _yxyx() const;

        const _xvec4<T> _zxyx() const;

        const _xvec4<T> _wxyx() const;

        const _xvec4<T> _xyyx() const;

        const _xvec4<T> _yyyx() const;

        const _xvec4<T> _zyyx() const;

        const _xvec4<T> _wyyx() const;

        const _xvec4<T> _xzyx() const;

        const _xvec4<T> _yzyx() const;

        const _xvec4<T> _zzyx() const;

        const _xvec4<T> _wzyx() const;

        const _xvec4<T> _xwyx() const;

        const _xvec4<T> _ywyx() const;

        const _xvec4<T> _zwyx() const;

        const _xvec4<T> _wwyx() const;

        const _xvec4<T> _xxzx() const;

        const _xvec4<T> _yxzx() const;

        const _xvec4<T> _zxzx() const;

        const _xvec4<T> _wxzx() const;

        const _xvec4<T> _xyzx() const;

        const _xvec4<T> _yyzx() const;

        const _xvec4<T> _zyzx() const;

        const _xvec4<T> _wyzx() const;

        const _xvec4<T> _xzzx() const;

        const _xvec4<T> _yzzx() const;

        const _xvec4<T> _zzzx() const;

        const _xvec4<T> _wzzx() const;

        const _xvec4<T> _xwzx() const;

        const _xvec4<T> _ywzx() const;

        const _xvec4<T> _zwzx() const;

        const _xvec4<T> _wwzx() const;

        const _xvec4<T> _xxwx() const;

        const _xvec4<T> _yxwx() const;

        const _xvec4<T> _zxwx() const;

        const _xvec4<T> _wxwx() const;

        const _xvec4<T> _xywx() const;

        const _xvec4<T> _yywx() const;

        const _xvec4<T> _zywx() const;

        const _xvec4<T> _wywx() const;

        const _xvec4<T> _xzwx() const;

        const _xvec4<T> _yzwx() const;

        const _xvec4<T> _zzwx() const;

        const _xvec4<T> _wzwx() const;

        const _xvec4<T> _xwwx() const;

        const _xvec4<T> _ywwx() const;

        const _xvec4<T> _zwwx() const;

        const _xvec4<T> _wwwx() const;

        const _xvec4<T> _xxxy() const;

        const _xvec4<T> _yxxy() const;

        const _xvec4<T> _zxxy() const;

        const _xvec4<T> _wxxy() const;

        const _xvec4<T> _xyxy() const;

        const _xvec4<T> _yyxy() const;

        const _xvec4<T> _zyxy() const;

        const _xvec4<T> _wyxy() const;

        const _xvec4<T> _xzxy() const;

        const _xvec4<T> _yzxy() const;

        const _xvec4<T> _zzxy() const;

        const _xvec4<T> _wzxy() const;

        const _xvec4<T> _xwxy() const;

        const _xvec4<T> _ywxy() const;

        const _xvec4<T> _zwxy() const;

        const _xvec4<T> _wwxy() const;

        const _xvec4<T> _xxyy() const;

        const _xvec4<T> _yxyy() const;

        const _xvec4<T> _zxyy() const;

        const _xvec4<T> _wxyy() const;

        const _xvec4<T> _xyyy() const;

        const _xvec4<T> _yyyy() const;

        const _xvec4<T> _zyyy() const;

        const _xvec4<T> _wyyy() const;

        const _xvec4<T> _xzyy() const;

        const _xvec4<T> _yzyy() const;

        const _xvec4<T> _zzyy() const;

        const _xvec4<T> _wzyy() const;

        const _xvec4<T> _xwyy() const;

        const _xvec4<T> _ywyy() const;

        const _xvec4<T> _zwyy() const;

        const _xvec4<T> _wwyy() const;

        const _xvec4<T> _xxzy() const;

        const _xvec4<T> _yxzy() const;

        const _xvec4<T> _zxzy() const;

        const _xvec4<T> _wxzy() const;

        const _xvec4<T> _xyzy() const;

        const _xvec4<T> _yyzy() const;

        const _xvec4<T> _zyzy() const;

        const _xvec4<T> _wyzy() const;

        const _xvec4<T> _xzzy() const;

        const _xvec4<T> _yzzy() const;

        const _xvec4<T> _zzzy() const;

        const _xvec4<T> _wzzy() const;

        const _xvec4<T> _xwzy() const;

        const _xvec4<T> _ywzy() const;

        const _xvec4<T> _zwzy() const;

        const _xvec4<T> _wwzy() const;

        const _xvec4<T> _xxwy() const;

        const _xvec4<T> _yxwy() const;

        const _xvec4<T> _zxwy() const;

        const _xvec4<T> _wxwy() const;

        const _xvec4<T> _xywy() const;

        const _xvec4<T> _yywy() const;

        const _xvec4<T> _zywy() const;

        const _xvec4<T> _wywy() const;

        const _xvec4<T> _xzwy() const;

        const _xvec4<T> _yzwy() const;

        const _xvec4<T> _zzwy() const;

        const _xvec4<T> _wzwy() const;

        const _xvec4<T> _xwwy() const;

        const _xvec4<T> _ywwy() const;

        const _xvec4<T> _zwwy() const;

        const _xvec4<T> _wwwy() const;

        const _xvec4<T> _xxxz() const;

        const _xvec4<T> _yxxz() const;

        const _xvec4<T> _zxxz() const;

        const _xvec4<T> _wxxz() const;

        const _xvec4<T> _xyxz() const;

        const _xvec4<T> _yyxz() const;

        const _xvec4<T> _zyxz() const;

        const _xvec4<T> _wyxz() const;

        const _xvec4<T> _xzxz() const;

        const _xvec4<T> _yzxz() const;

        const _xvec4<T> _zzxz() const;

        const _xvec4<T> _wzxz() const;

        const _xvec4<T> _xwxz() const;

        const _xvec4<T> _ywxz() const;

        const _xvec4<T> _zwxz() const;

        const _xvec4<T> _wwxz() const;

        const _xvec4<T> _xxyz() const;

        const _xvec4<T> _yxyz() const;

        const _xvec4<T> _zxyz() const;

        const _xvec4<T> _wxyz() const;

        const _xvec4<T> _xyyz() const;

        const _xvec4<T> _yyyz() const;

        const _xvec4<T> _zyyz() const;

        const _xvec4<T> _wyyz() const;

        const _xvec4<T> _xzyz() const;

        const _xvec4<T> _yzyz() const;

        const _xvec4<T> _zzyz() const;

        const _xvec4<T> _wzyz() const;

        const _xvec4<T> _xwyz() const;

        const _xvec4<T> _ywyz() const;

        const _xvec4<T> _zwyz() const;

        const _xvec4<T> _wwyz() const;

        const _xvec4<T> _xxzz() const;

        const _xvec4<T> _yxzz() const;

        const _xvec4<T> _zxzz() const;

        const _xvec4<T> _wxzz() const;

        const _xvec4<T> _xyzz() const;

        const _xvec4<T> _yyzz() const;

        const _xvec4<T> _zyzz() const;

        const _xvec4<T> _wyzz() const;

        const _xvec4<T> _xzzz() const;

        const _xvec4<T> _yzzz() const;

        const _xvec4<T> _zzzz() const;

        const _xvec4<T> _wzzz() const;

        const _xvec4<T> _xwzz() const;

        const _xvec4<T> _ywzz() const;

        const _xvec4<T> _zwzz() const;

        const _xvec4<T> _wwzz() const;

        const _xvec4<T> _xxwz() const;

        const _xvec4<T> _yxwz() const;

        const _xvec4<T> _zxwz() const;

        const _xvec4<T> _wxwz() const;

        const _xvec4<T> _xywz() const;

        const _xvec4<T> _yywz() const;

        const _xvec4<T> _zywz() const;

        const _xvec4<T> _wywz() const;

        const _xvec4<T> _xzwz() const;

        const _xvec4<T> _yzwz() const;

        const _xvec4<T> _zzwz() const;

        const _xvec4<T> _wzwz() const;

        const _xvec4<T> _xwwz() const;

        const _xvec4<T> _ywwz() const;

        const _xvec4<T> _zwwz() const;

        const _xvec4<T> _wwwz() const;

        const _xvec4<T> _xxxw() const;

        const _xvec4<T> _yxxw() const;

        const _xvec4<T> _zxxw() const;

        const _xvec4<T> _wxxw() const;

        const _xvec4<T> _xyxw() const;

        const _xvec4<T> _yyxw() const;

        const _xvec4<T> _zyxw() const;

        const _xvec4<T> _wyxw() const;

        const _xvec4<T> _xzxw() const;

        const _xvec4<T> _yzxw() const;

        const _xvec4<T> _zzxw() const;

        const _xvec4<T> _wzxw() const;

        const _xvec4<T> _xwxw() const;

        const _xvec4<T> _ywxw() const;

        const _xvec4<T> _zwxw() const;

        const _xvec4<T> _wwxw() const;

        const _xvec4<T> _xxyw() const;

        const _xvec4<T> _yxyw() const;

        const _xvec4<T> _zxyw() const;

        const _xvec4<T> _wxyw() const;

        const _xvec4<T> _xyyw() const;

        const _xvec4<T> _yyyw() const;

        const _xvec4<T> _zyyw() const;

        const _xvec4<T> _wyyw() const;

        const _xvec4<T> _xzyw() const;

        const _xvec4<T> _yzyw() const;

        const _xvec4<T> _zzyw() const;

        const _xvec4<T> _wzyw() const;

        const _xvec4<T> _xwyw() const;

        const _xvec4<T> _ywyw() const;

        const _xvec4<T> _zwyw() const;

        const _xvec4<T> _wwyw() const;

        const _xvec4<T> _xxzw() const;

        const _xvec4<T> _yxzw() const;

        const _xvec4<T> _zxzw() const;

        const _xvec4<T> _wxzw() const;

        const _xvec4<T> _xyzw() const;

        const _xvec4<T> _yyzw() const;

        const _xvec4<T> _zyzw() const;

        const _xvec4<T> _wyzw() const;

        const _xvec4<T> _xzzw() const;

        const _xvec4<T> _yzzw() const;

        const _xvec4<T> _zzzw() const;

        const _xvec4<T> _wzzw() const;

        const _xvec4<T> _xwzw() const;

        const _xvec4<T> _ywzw() const;

        const _xvec4<T> _zwzw() const;

        const _xvec4<T> _wwzw() const;

        const _xvec4<T> _xxww() const;

        const _xvec4<T> _yxww() const;

        const _xvec4<T> _zxww() const;

        const _xvec4<T> _wxww() const;

        const _xvec4<T> _xyww() const;

        const _xvec4<T> _yyww() const;

        const _xvec4<T> _zyww() const;

        const _xvec4<T> _wyww() const;

        const _xvec4<T> _xzww() const;

        const _xvec4<T> _yzww() const;

        const _xvec4<T> _zzww() const;

        const _xvec4<T> _wzww() const;

        const _xvec4<T> _xwww() const;

        const _xvec4<T> _ywww() const;

        const _xvec4<T> _zwww() const;

        const _xvec4<T> _wwww() const;

#endif// defined(GLM_SWIZZLE)

    };



} //namespace detail

} //namespace glm



#include "_cvec4.inl"



#endif//glm_core_cvec4
A8433b04cb41dd57113740b779f61acb
0
Reedbeta 167 Aug 07, 2008 at 16:09

Groove, thanks for your post, but please use the …[/code[b][/b]] tags. B) [code]…[/code**] tags. B)

40df6f0cd8571262dabedf4bdcf1e093
0
Groove 101 Aug 12, 2008 at 16:28

Sorry for that I’ll try to remember for next time !
Thanks !

177011c2f5af9df459d2ffc50882e927
0
WizardOfOzzz 101 Sep 09, 2008 at 01:38

Hi Groove,

Wouldn’t using the intermediate references cause pointers to be created and extra overhead in the assembly code?

The GLM library looks awesome! Keep up the good work.

Eric

40df6f0cd8571262dabedf4bdcf1e093
0
Groove 101 Sep 09, 2008 at 02:10

Some people were already afraid of that but using references provide to compiler that support cross function optimizations to just skip them all:

Have a look on this:

vec2 v1(1.0);
vec2 v2(2.0);
vec2 v3(3.0);

The following is just the asm code for this line:
v1.xy = vec2(v2.xy) + vec2(v3.xy);

GCC 3.4.5:

0x00401354 <main+100>: mov   %ebx,0xffffffc0(%ebp)
0x00401357 <main+103>: lea   0xffffffec(%ebp),%eax
0x0040135a <main+106>: mov   %eax,0xffffffc4(%ebp)
0x0040135d <main+109>: mov   0xffffffc0(%ebp),%eax
0x00401360 <main+112>: mov   %esi,0xffffffb0(%ebp)
0x00401363 <main+115>: mov   0xffffffc4(%ebp),%edx
0x00401366 <main+118>: mov   %eax,0xffffffb8(%ebp)
0x00401369 <main+121>: lea   0xffffffdc(%ebp),%eax
0x0040136c <main+124>: mov   %eax,0xffffffb4(%ebp)
0x0040136f <main+127>: mov   0xffffffb0(%ebp),%eax
0x00401372 <main+130>: mov   %edx,0xffffffbc(%ebp)
0x00401375 <main+133>: mov   0xffffffb4(%ebp),%edx
0x00401378 <main+136>: mov   %eax,0xffffffa8(%ebp)
0x0040137b <main+139>: mov   0xffffffa8(%ebp),%eax
0x0040137e <main+142>: mov   %edx,0xffffffac(%ebp)
0x00401381 <main+145>: flds  (%eax)
0x00401383 <main+147>: mov   0xffffffb8(%ebp),%eax
0x00401386 <main+150>: fadds (%eax)
0x00401388 <main+152>: mov   0xffffffac(%ebp),%eax
0x0040138b <main+155>: flds  (%eax)
0x0040138d <main+157>: mov   0xffffffbc(%ebp),%eax
0x00401390 <main+160>: fadds (%eax)
0x00401392 <main+162>: fxch  %st(1)
0x00401394 <main+164>: fsts  0xffffffc8(%ebp)

GCC 4.3.0:

0040809F        mov    eax,DWORD PTR [ebp+12]
004080A2        mov    ecx,DWORD PTR [ebp+16]
004080A5        mov    edx,DWORD PTR [ebp+20]
004080A8        fld    DWORD PTR [edx+4]
004080AB        fadd   DWORD PTR [ecx+4]
004080AE        fld    DWORD PTR [edx]
004080B0        fadd   DWORD PTR [ecx]
004080B2        fstp   DWORD PTR [eax]
004080B4        fstp   DWORD PTR [eax+4]

with SSE

00408123        mov    edx,DWORD PTR [ebp+20]
00408126        mov    ecx,DWORD PTR [ebp+16]
00408129        mov    eax,DWORD PTR [ebp+12]
0040812C        movss  xmm1,DWORD PTR [edx+4]
00408131        movss  xmm0,DWORD PTR [edx]
00408135        addss  xmm1,DWORD PTR [ecx+4]
0040813A        addss  xmm0,DWORD PTR [ecx]
0040813E        movss  DWORD PTR [eax+4],xmm1
00408143        movss  DWORD PTR [eax],xmm0

VC8

    mov    eax, DWORD PTR _b$[esp-4]
    fld    DWORD PTR [eax]
    fld    DWORD PTR [eax+4]
    mov    eax, DWORD PTR _a$[esp-4]
    fld    DWORD PTR [eax+4]
    fld    DWORD PTR [eax]
    mov    eax, DWORD PTR ___$ReturnUdt$[esp-4]
    faddp    ST(3), ST(0)
    fxch    ST(2)
    fstp    DWORD PTR [eax]
    faddp    ST(1), ST(0)
    fstp    DWORD PTR [eax+4]

With SSE

   mov    eax, DWORD PTR _b$[esp-4]
    movss    xmm2, DWORD PTR [eax]
    movss    xmm3, DWORD PTR [eax+4]
    mov    eax, DWORD PTR _a$[esp-4]
    movss    xmm0, DWORD PTR [eax]
    movss    xmm1, DWORD PTR [eax+4]
    mov    eax, DWORD PTR ___$ReturnUdt$[esp-4]
    addss    xmm0, xmm2
    addss    xmm1, xmm3
    movss    DWORD PTR [eax], xmm0
    movss    DWORD PTR [eax+4], xmm1

GCC 3.4.5 show the problem you point. But GCC 3.x didn’t supported cross function optimizations. It gets available since GCC 4.1 I think. Maybe some with GCC 4.0. With Vistua Studioit is supported for ages, even Visual C++ 6 but I’m not sure, under the name whole program optimizations.

I’m doing my best to keep up GLM and GLM 0.8.x will be a good step for this. GLSL 1.30 support indeed but also lot of internal improvements :)

5d2b04d497a41bac9f7182aa8d281925
0
gwiazdorrr 101 Nov 12, 2013 at 22:05

I personally find GLM being too verbose and with a lot of code repetition, despite heavy macro usage.

Compared to GLM, CxxSwizzle (https://github.com/gwiazdorrr/CxxSwizzle) uses more modern C++ (I settled for C++11’s subset of MSVC 2010), to the benefit of greatly reduced codebase and no code repetition. For instance, there’s only one vector and matrix class (GLM has dozens).

5d2b04d497a41bac9f7182aa8d281925
0
gwiazdorrr 101 Nov 12, 2013 at 22:10

I know this topic is older than mountains, but it still may help someone.

My own take on the topic is the CxxSwizzle library (https://github.com/gwiazdorrr/CxxSwizzle). I focused on maximal GLSL compatibility and it has been taken to the extent where where you can take most GLSL fragment shaders from either http://glsl.heroku.com or http://shadertoy.com and run them as C++, literally without any changes. Basically you can now debug shaders in IDE of your liking just like any other C++ code, including assertions, watches, conditional breakpoints and such. For instance, this is now valid C++ code (check the sample):

uniform float time;
uniform vec2 mouse;
uniform vec2 resolution;
//varying vec2 surfacePosition;

#define MAX_ITER 16
void main( void ) {
	//vec2 p = surfacePosition*8.0;
	vec2 uv = gl_FragCoord.xy / resolution.xy;
	uv = uv * 4.0 - 2.0;
	vec2 p = uv;
	
	vec2 i = p;
	float c = 2.0;
	float inten = 1.0;

	for (int n = 0; n < MAX_ITER; n++) {
		float t = time * (1.0 - (1.0 / float(n+1)));
		i = p + vec2(cos(t - i.x) + sin(t + i.y), sin(t - i.y) + cos(t + i.x));
		c += 1.0/length(vec2(p.x / (sin(i.x+t)/inten),p.y / (cos(i.y+t)/inten)));
	}
	c /= float(MAX_ITER);
	float pulse = abs(sin(time*5.));
	float pulse2 = pow(abs(sin(time*3.)),.25);
	float pulse3 = pow(abs(sin(time*5.)),2.);
	gl_FragColor = vec4(vec3(pow(c,1.5+pulse2/2.))*vec3(1.0+pulse2, 2.0-pulse2, 1.5+pulse3)*(1.+pulse2)/2., 1.0);

}

At the moment the underlying math implementation is naive, but due to component based structure of the code it should be a straightforward addition. The point of CxxSwizzle was to compile GLSL without any changes and it worked!

That said, compiling HLSL is also possible, but with minimal code changes (you just can’t replicate semantics).