I'm looking for a way to perform the operations of the non-existant packusdw MMX instruction. I need it to convert four floating-point numbers in an SSE register to 0.16 fixed-point format, without saturation. Currently I use this:
mulps xmm0, _65536 cvtps2pi mm0, xmm0 movhlps xmm0, xmm0 cvtps2pi mm1, xmm0 // packusdw mm0, mm1 pshufw mm0, mm0, 0x08 pshufw mm1, mm1, 0x08 punpckldq mm0, mm1But that's significantly slower than it could have been when packusdw existed. It's really important to me because it's the bottleneck of my application. If you have any ideas to do this conversion/packing faster, please let me know!
Thanks.













