[SIMD] Direct access on m128i data type...

393095ce1e3315e4f271050aefe4beeb
0
wobuu 101 Nov 30, 2009 at 12:01

The documentation for the m128i datatype on MSDN reads:

You should not access the __m128i fields directly. You can, however, see these types in the debugger. A variable of type __m128i maps to the XMM[0-7] registers.

However, when I access one of the fields directly and store it in an integer, it works like expected (in debug and release builds).

m128i var;
int n = var.m128i_i32[3]

What is the reason the documentation tells me not to write code like this? Or can I ignore that advice and safely use this approach.

Perhaps it’s not _guaranteed_ to work in all circumstances/cpu’s?

5 Replies

Please log in or register to post a reply.

340bf64ac6abda6e40f7e860279823cb
0
_oisyn 101 Nov 30, 2009 at 12:46

It should always work. However, it might not be as efficient as keeping the values in mm128 registers and doing math operations on them, as it needs to store the register in memory first. However, if you simply want to store a single int that’s ok because you’re storing it in memory anyway :(

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 Nov 30, 2009 at 13:33

A lot of compilers don’t handle __m128 as a struct. It’s Visual C++ specific to have a mapping between the SSE vectors and the individual components as the fields of a structure. The names of the fields might even differ between compiler versions.

Another important reason not to read or write those fields directly is to allow the compiler to optimize things better. The compiler will have to store the SSE registers in memory to access individual components. So it’s advised to always use the SSE intrinsics to read and write components, which the compiler does know how to turn into efficient code.

If you don’t care that much about performance and it only needs to work for a specific version of Visual C++ then it’s probably fine to access those fields directly. Just be aware of the compromises your making. I would highly advise to always have a scalar C++ version of your code as well though. Maybe in a couple years you have to use that code again and when the SSE version doesn’t compile at least you still have the plain C++ code…

393095ce1e3315e4f271050aefe4beeb
0
wobuu 101 Dec 01, 2009 at 20:53

Ok thanks for the advice guys. I ended up using the proper ‘_mm_store_si128’ approach. Speedwise, this made no difference as far as I could tell.

99f6aeec9715bb034bba93ba2a7eb360
0
Nick 102 Dec 02, 2009 at 14:59

@wobuu

Ok thanks for the advice guys. I ended up using the proper ‘_mm_store_si128’ approach. Speedwise, this made no difference as far as I could tell.

The Visual Studio compiller is pretty lame when it comes to optimizing SSE code. The Intel compiler for example does a far better job. Anyway, it’s quite possible that the code you rewrote wasn’t in a performance critical path anyway.

I also often see people spending days, weeks or even months writing an SSE optimized vector library, while their application actually only uses vector math for 5% of the exeuction time. So before optimizing anything it’s important to profile instead of assuming and guessing. After identifying a hotspot I’d also first prototype the optimization to see if it’s worth it. If not, stick with readable C++ code.

36b416ed76cbaff49c8f6b7511458883
0
poita 101 Dec 02, 2009 at 17:14

GCC apparently does a really good job of optimizing code. More so than Intel in many cases.

Good read on optimisations done by compilers (very recent): http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf