Jump to content


C# Managed code too fast?


10 replies to this topic

#1 Qlone

    New Member

  • Members
  • Pip
  • 7 posts

Posted 27 June 2006 - 09:41 AM

Hello all,

I'm working on a little side project for my big project (an RTS in managed code, just for fun, nothing serious) and have been benchmarking several early approaches to the math problem.

Managed DirectX contains classes for matrix, quaternions, vectors, etc and uses d3dx internally to get things done. Since I want my project to run on several target platforms, but still maintain the best possible performance, I've been benchmarking both .NET math code and managed directx' own implementations. Here I get some odd benchmark results...

The easiest way, using methods on a MDX matrix object:

public bool Execute()

{

	o = Matrix.Multiply(m, n);

	return true;

}


................!
Took: 0h, 0m, 1.656s. for 8000000 iterations.
Iters per ms: 4830,18867924528 - Ms per iter: 0,00020703125

The slightly less easy way, using MDX unsafe native methods:

public bool Execute()

{

	unsafe

	{

		fixed (Matrix* pm = &m) fixed (Matrix* pn = &n) fixed (Matrix* po = &o)

		{

			UnsafeNativeMethods.Matrix.Multiply(po, pm, pn);

		}

	}

	return true;

}


................!
Took: 0h, 0m, 0.750s. for 8000000 iterations.
Iters per ms: 10666,6666666667 - Ms per iter: 9,375E-05

And the final way, using some very simple .NET matrix code:

public bool Execute()

{

	Multiply(ref o, ref m, ref n);

	return true;

}


private void Multiply(ref DNMatrix o, ref DNMatrix m, ref DNMatrix n)

{

	o.m11 = m.m11 * n.m11 + m.m12 * n.m21 + m.m13 * n.m31 + m.m14 * n.m41;

	o.m12 = m.m11 * n.m12 + m.m12 * n.m22 + m.m13 * n.m32 + m.m14 * n.m42;

	o.m13 = m.m11 * n.m13 + m.m12 * n.m23 + m.m13 * n.m33 + m.m14 * n.m43;

	o.m14 = m.m11 * n.m14 + m.m12 * n.m24 + m.m13 * n.m34 + m.m14 * n.m44;

	o.m21 = m.m21 * n.m11 + m.m22 * n.m21 + m.m23 * n.m31 + m.m24 * n.m41;

	o.m22 = m.m21 * n.m12 + m.m22 * n.m22 + m.m23 * n.m32 + m.m24 * n.m42;

	o.m23 = m.m21 * n.m13 + m.m22 * n.m23 + m.m23 * n.m33 + m.m24 * n.m43;

	o.m24 = m.m21 * n.m14 + m.m22 * n.m24 + m.m23 * n.m34 + m.m24 * n.m44;

	o.m31 = m.m31 * n.m11 + m.m32 * n.m21 + m.m33 * n.m31 + m.m34 * n.m41;

	o.m32 = m.m31 * n.m12 + m.m32 * n.m22 + m.m33 * n.m32 + m.m34 * n.m42;

	o.m33 = m.m31 * n.m13 + m.m32 * n.m23 + m.m33 * n.m33 + m.m34 * n.m43;

	o.m34 = m.m31 * n.m14 + m.m32 * n.m24 + m.m33 * n.m34 + m.m34 * n.m44;

	o.m41 = m.m41 * n.m11 + m.m42 * n.m21 + m.m43 * n.m31 + m.m44 * n.m41;

	o.m42 = m.m41 * n.m12 + m.m42 * n.m22 + m.m43 * n.m32 + m.m44 * n.m42;

	o.m43 = m.m41 * n.m13 + m.m42 * n.m23 + m.m43 * n.m33 + m.m44 * n.m43;

	o.m44 = m.m41 * n.m14 + m.m42 * n.m24 + m.m43 * n.m34 + m.m44 * n.m44;

}


................!
Took: 0h, 0m, 0.750s. for 8000000 iterations.
Iters per ms: 10666,6666666667 - Ms per iter: 9,375E-05
(no, this is not a copy/paste error)

My question, seeing these results is the following: Am I doing something wrong with the MDX calls, since a simple implementation in .net code seems to be 'just as fast' as the MDX implementation... I expected the .net code to be quite a bit slower than the native implamantations...

#2 .oisyn

    DevMaster Staff

  • Moderators
  • 1842 posts

Posted 27 June 2006 - 10:51 AM

lousy cross-forum-poster ;)
C++ addict
-
Currently working on: the 3D engine for Tomb Raider.

#3 Nick

    Senior Member

  • Members
  • PipPipPipPip
  • 1227 posts
  • LocationOttawa, Ontario, Canada

Posted 27 June 2006 - 11:13 AM

The 'unsafe' code is just a C++ routine that looks exactly like your C# implementation. The only difference is that the former is already compiled and the latter is compiled at run-time (JIT). But they should produce exactly the same code. It doesn't use arrays and such so there's no overhead from safety checks either.

In other words, C# can perform equivalent to C++ in many cases. It does have its limitations but if you're aware of them then you can get really good performance. It doesn't have inline assembly or intrinics support though, so you can't access the blazing fast MMX/SSE instructions. But that's not a big problem for the average project, and you can still write an external C++ function...

#4 Qlone

    New Member

  • Members
  • Pip
  • 7 posts

Posted 27 June 2006 - 11:22 AM

.oisyn said:

lousy cross-forum-poster ;)
Hey, 2 know more than one, right :p:

#5 .oisyn

    DevMaster Staff

  • Moderators
  • 1842 posts

Posted 27 June 2006 - 11:43 AM

Nick said:

The 'unsafe' code is just a C++ routine that looks exactly like your C# implementation.
In essence, yes, but I believe the d3dx matrix mul uses SSE code. And somehow I don't think the jitter actually produces SSE code from that piece of C#.
C++ addict
-
Currently working on: the 3D engine for Tomb Raider.

#6 roel

    Senior Member

  • Members
  • PipPipPipPip
  • 698 posts

Posted 27 June 2006 - 11:47 AM

What is the resolution of your timer? And maybe you can compare the resulting (both JIT generated and C++ compiler generated) asm code to be sure that they are identical. VS2005 allows you to switch to asm view when you are debugging C# code, if I'm correct.

#7 Qlone

    New Member

  • Members
  • Pip
  • 7 posts

Posted 27 June 2006 - 10:36 PM

roel said:

What is the resolution of your timer? And maybe you can compare the resulting (both JIT generated and C++ compiler generated) asm code to be sure that they are identical. VS2005 allows you to switch to asm view when you are debugging C# code, if I'm correct.
The resolution of my timer is the standard windows (16ms?) resolution. I know it's not very accurate for this kind of measurements, but increasing the number of iterations sort of solves that problem.

If I do that, the numbers don't really change. On some hardware (AMD) the .net implementation is even slightly faster than the MDX calls. On intel hardware, MDX seems to be a bit faster, but not much (about 1 to 2 percent).

Comparing the generated asm is not something I have the time for at the moment. However, if people are interested in playing with this, I can post my little benchmark app's source...

Edit: OK, so I checked and this is the assembly view for the .net implementation:

o.m11 = m.m11 * n.m11 + m.m12 * n.m21 + m.m13 * n.m31 + m.m14 * n.m41;

00000000  push        edi  

00000001  push        esi  

00000002  push        ebx  

00000003  mov         ebx,ecx 

00000005  mov         esi,edx 

00000007  mov         edi,dword ptr [esp+10h] 

0000000b  cmp         dword ptr ds:[035AD030h],0 

00000012  je          00000019 

00000014  call        7943FEDE 

00000019  fld         dword ptr [esi] 

0000001b  fmul        dword ptr [edi] 

0000001d  fld         dword ptr [esi+4] 

00000020  fmul        dword ptr [edi+10h] 

00000023  faddp       st(1),st 

00000025  fld         dword ptr [esi+8] 

00000028  fmul        dword ptr [edi+20h] 

0000002b  faddp       st(1),st 

0000002d  fld         dword ptr [esi+0Ch] 

00000030  fmul        dword ptr [edi+30h] 

00000033  faddp       st(1),st 

00000035  fstp        dword ptr [ebx] 



#8 tbp

    Valued Member

  • Members
  • PipPipPip
  • 135 posts

Posted 28 June 2006 - 12:36 AM

"best possible performance" and d3dx in the same sentence?
http://math-atlas.sourceforge.net would make more sense to bench against.

#9 Nick

    Senior Member

  • Members
  • PipPipPipPip
  • 1227 posts
  • LocationOttawa, Ontario, Canada

Posted 28 June 2006 - 08:55 AM

.oisyn said:

In essence, yes, but I believe the d3dx matrix mul uses SSE code. And somehow I don't think the jitter actually produces SSE code from that piece of C#.
I verified that it doesn't use SSE. So the code being produced for the C# version is practically equivalent to D3DX for C++. :mellow:

#10 .oisyn

    DevMaster Staff

  • Moderators
  • 1842 posts

Posted 29 June 2006 - 08:54 AM

Hmm, that just plain sucks. I recon they'd optimized the d3dx library. Are you sure you're not looking at the debug version?

On the other hand, you should be able to target every reasonable x86 platform with that library, including the pre-XP athlons that don't have SSE support yet. So maybe it's no surprise after all.

Then again, what serious gamedeveloper is using the d3dx math functions anyway? ;)
C++ addict
-
Currently working on: the 3D engine for Tomb Raider.

#11 Qlone

    New Member

  • Members
  • Pip
  • 7 posts

Posted 29 June 2006 - 09:20 AM

Its performance slightly disappointed me too. It's very tempting to try to do all math stuff in .NET code now, knowing (assuming) it will be as fast or nearly as fast as D3DX' functions, which should be fast enough for what I'm trying to do.

The only real advantage using d3dx has now is that it's a pre-built and (hopefully) debugged math library, which makes for a nice quick start on that front. Since platform independence is a target for my project, eventually I'll need a platform independent math solution anyway...





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users