Jump to content


Strange behaviour (slowed) on pixel operations using MSVC

c++ fixed-point floating-point algorithm

7 replies to this topic

#1 hellhound_01

    New Member

  • Members
  • PipPip
  • 58 posts

Posted 19 January 2012 - 03:07 PM

Hi,

I've implemented some setPixel color operations for native endian and non native endian pixel formats. Using GCC
everything is running fine. Using MSVC some operations for SHORT and HALF_FLOAT values are slowed on
execution, what I can't explain.

Here is the fragment of my setPixelColor method:
brPixelFormatInfo info = gfxUtils.getPixelFormatInfo(m_format);
unsigned int nativeColor = 0;
// if pixel format is native first calculate the native color before
// entering loop to set color value.
if(true==info.isNativeEndian()){
	nativeColor = this->getNativePixelColor(color);
}
unsigned int bytesPerPixel = this->getBytesPerPixel();
for(unsigned int y = rect.getY(); y < rect.getY()+ rect.getHeight(); y++)
{
	  // calculation of the image value stride
	  unsigned int byteIndex = (this->getBytesPerRow() * y) + rect.getX() * bytesPerPixel;
		for (unsigned int x = rect.getX(); x < rect.getX()+rect.getWidth(); x++)
		{
			if(true==info.isNativeEndian())
			{  
				[...]
			}
			else{
				switch(m_format)
				{
					// 32bit float value formats
					case PF_FLOAT32_RGB:
					{
						m_data[byteIndex]	 = (unsigned char)color.getRed();
						m_data[byteIndex + 1] = (unsigned char)color.getGreen();
						m_data[byteIndex + 2] = (unsigned char)color.getBlue();
						break;
					}
					case PF_SHORT_RGB:
					{
					  unsigned int red, green, blue = 0;
					  brPixelFormatInfo info = gfxUtils.getPixelFormatInfo(PF_SHORT_RGB);
					  brPixelFormatInfo::RGBA_BITS bits = info.getBitValues();
					  brColor::RGBA rgba = color.getRGBA();
					  red	= gfxUtils.convertColorToFixedPoint(rgba.m_red, bits.m_red);
					  green  = gfxUtils.convertColorToFixedPoint(rgba.m_green, bits.m_green);
					  blue   = gfxUtils.convertColorToFixedPoint(rgba.m_blue, bits.m_blue);
	
					  m_data[byteIndex]	 = (unsigned char)red;
					  m_data[byteIndex + 1] = (unsigned char)green;
					  m_data[byteIndex + 2] = (unsigned char)blue;
				break;
					}
					// half float precision values
					case PF_FLOAT16_R:
					{
						brColor::RGBA rgba = color.getRGBA();
						m_data[byteIndex] = (unsigned char)gfxUtils.convertColorToHalfFloat(rgba.m_red);
						break;
					}
					default:
						throw brCore::brIllegalStateException(
						"[brImage]::setPixelColor: Invalid pixel format!");
				}
			}
			byteIndex += bytesPerPixel;
		}
   }

And here is my convert to fixed point method:
unsigned int brGraphicsUtils::convertColorToFixedPoint(float color, unsigned int bits) const
{
   unsigned int fixed = 0;
	if(color <= 0.0f){
		fixed = 0;
	}
	else if (color >= 1.0f){
		fixed =  (1U<<bits)-1U;
	}
	else{
		fixed = (unsigned int)(color * (1U<<bits));
	}
	return fixed;
}

Nothing special... Strange is, if I debug my sources step by step (procedual) anything is fast enough, if I
make a single step over the getPixelColor method it takes nearly some seconds, before the call returns
and the Unit test continue the operation ...

I've checked sources again and again, it looks correct. For evaluation I've added some timestamp calls
to the SHORT_RGB handling:

Quote

- in: 2012-Jan-19 15:44:52.664127
- start convert: 2012-Jan-19 15:44:52.669127
- end convert: 2012-Jan-19 15:44:52.690127
- start writing: 2012-Jan-19 15:44:52.695127
- end writing: 2012-Jan-19 15:44:52.700127
- out: 2012-Jan-19 15:44:52.705127

It looks to me, that my operations are ok, but the writing of the data takes some milliseconds. But why
only for MSVC?

Has anyone an Idea what could be wrong and how I could fasten up this elementar operations?
Thanks for any hint.

Best regards,
Hellhound
"There is only one god and his name is death. And there is only one thing we have to say to Death: Not today!" -- Syrio Forel (from Game of Thrones)

#2 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 939 posts

Posted 19 January 2012 - 03:54 PM

I would start by moving the tests out of your inner loop. Testing isNativeEndian() and m_format for every single pixel is just a waste of time.

GCC might be optimizing this for you, which is why you are seeing the speed difference. You would have to examine the assembly output to be sure though...
"Stupid bug! You go squish now!!" - Homer Simpson

#3 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 939 posts

Posted 19 January 2012 - 04:00 PM

Another thing is the convertColorToFixedPoint function. It seems that values outside the 0..1 range are invalid? Maybe it would be better to catch them with an assert, and loose the two conditionals-per-pixel...
"Stupid bug! You go squish now!!" - Homer Simpson

#4 hellhound_01

    New Member

  • Members
  • PipPip
  • 58 posts

Posted 19 January 2012 - 07:39 PM

It's not the test alone, if I run my demo file on MSVC the startup during initialization of those formats takes many seconds.
First it look like a deadlock or hang up, but after more than 10 sec. it's running. On GCC the startup takes less than 2
sec...

I've implemented those tests to figure out what's wrong and why such simple operations take such a long time using
MSVC...

The color values out of range are clamped to min/max color values. Asserts may be an option, but those two conditionals
are not the reason for the slow down.
"There is only one god and his name is death. And there is only one thing we have to say to Death: Not today!" -- Syrio Forel (from Game of Thrones)

#5 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 939 posts

Posted 19 January 2012 - 07:47 PM

Maybe you should try using CodeAnalyst og VTune (depending on your CPU), they should be able to exactly pinpoint where the time is spent. I am using CodeAnalyst, and is a great tool for finding hotspots and slowdowns.
"Stupid bug! You go squish now!!" - Homer Simpson

#6 hellhound_01

    New Member

  • Members
  • PipPip
  • 58 posts

Posted 20 January 2012 - 06:44 AM

I've tested the demo source with VTunes and figured out that my debug logs takes too much CPU
time. It looks like the GCC optimizes String operations instead of MSVC ...

http://j18.img-up.ne...rofile9r2h1.jpg

Thanks for the hint with the analyzer. VTunes looks good, but is too expensive (800$ for single user
license). Do you know a good free not properitary alternative for VTunes? CodeAnalyzer looks good,
but if I understoot it correctly with Intel i got less informations ...
"There is only one god and his name is death. And there is only one thing we have to say to Death: Not today!" -- Syrio Forel (from Game of Thrones)

#7 Stainless

    Member

  • Members
  • PipPipPipPip
  • 610 posts
  • LocationSouthampton

Posted 20 January 2012 - 10:36 AM

output debug string in windows can be very, very, very slow.

If you are running a visual studio plugin and having the debug output captured by VS in the output window a single print string can take as much at 175 milliseconds

Logging to file is a lot faster, I know that's counter intuitive. :)

#8 Kenneth Gorking

    Senior Member

  • Members
  • PipPipPipPip
  • 939 posts

Posted 21 January 2012 - 11:10 AM

View Posthellhound_01, on 20 January 2012 - 06:44 AM, said:

I've tested the demo source with VTunes and figured out that my debug logs takes too much CPU
time. It looks like the GCC optimizes String operations instead of MSVC ...
Well, do you really need to output all this info? Dumping every single conversion into a log seems a bit overkill... If you absolutely must, then maybe a rewrite of your toString() function would be needed. Since you are dealing with 16-bit values here (I'm assuming from the name), you could precompute all the string representations into a lookup table, and use the half-value to retrieve the string representation.

View Posthellhound_01, on 20 January 2012 - 06:44 AM, said:

Thanks for the hint with the analyzer. VTunes looks good, but is too expensive (800$ for single user
license). Do you know a good free not properitary alternative for VTunes? CodeAnalyzer looks good,
but if I understoot it correctly with Intel i got less informations ...
Yes, VTune is for Intel CPUs, and CodeAnalyst is for AMD CPUs. The reason they work as well as they do, is because they are hooked directly into the the CPU driver, where they can access counters and what-not. This is also why you can't use them on other vendors CPUs for anything but rudimentary timing stuff. I don't know of any alternatives that works on both CPUs, sorry.
"Stupid bug! You go squish now!!" - Homer Simpson





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users