0
101 Apr 26, 2010 at 23:00

Hello everyone :)

Recently, I’ve been having quite some performance problem with using a large (\~\~500 MB) std::vector<unsigned char>, which is repeatedly getting resized, cleared, and dropped in a tight loop.

Weirdly, the problem comes from the STL that accompanies Visual Studio 2008.

In particular, using std::vector::resize() is slow, since it uses a “normal” inner loop for default-constructing (zero-initializing) the new vector entries. On my machine, resizing() the vector takes around 8 seconds, while allocating the memory with std::vector::reserve() or copying an entire new dataset into the vector with one of the optimized RTL functions takes an order of magnitude less time.

After some digging in the STL, I noticed that there is an optimized _fill_n method that uses memset for unsigned-char type elements. Yet this method does not get called by the compiler on template resolution, but instead, the vector is resized with the loop-based, default version of _fill_n, i.e. this

// TEMPLATE FUNCTION fill_n
template<class _OutIt, class _Diff, class _Ty> inline
void __CLRCALL_OR_CDECL _Fill_n(_OutIt _First, _Diff _Count, const _Ty& _Val, _Range_checked_iterator_tag)
{
// copy _Val _Count times through [_First, ...)
for (; 0 < _Count; --_Count, ++_First)
*_First = _Val;
}

with:

_OutIt == unsigned char*
_Diff == unsigned int
_Ty = unsigned char


is preferred over the non-template

inline void __CLRCALL_OR_CDECL _Fill_n
(unsigned char *_First,
size_t _Count, int _Val, _Range_checked_iterator_tag)
{
// copy unsigned char _Val _Count times through [_First, ...)
::memset(_First, _Val, _Count);
}


a few lines further down.

The reason, I think, is that the compiler decides to prefer a template over implicitly casting any parameters (also makes sense when considering classes with custom casts), but then again, the optimized methods won’t probably ever be used (target pointer type != value type…)

For other methods, such as vector copying, the correct, optimized version using memcpy is used for certain scalar types.

Is this explanation correct ?
Is this a compiler bug or a problem in the STL ?
Any ideas on how to fix the misbehaviour ?

(Apart from designing a better algorithm, or using a custom, malloc-ed container…)

Cheers,
- Wernaeh

#### 5 Replies

0
102 Apr 27, 2010 at 07:03

Yeah, the templated version is better match due to implicit cast so looks like a bug in the STL implementation (should be “unsigned char _Val”). You could just implement the proper version of the function yourself which should be called by std::vector, afaik.

0
101 Apr 27, 2010 at 09:23

Okay, works like a charm :)
My vector resize() time is down to less than half a second now.

I just hope this modification does not have any side effects… Well, I guess I’ll notice soon enough :o)

Thank you,
Cheers,
- Wernaeh

0
102 Apr 27, 2010 at 09:36

You may have some ODR issues if the std::vector is instantiated for unsigned char somewhere where the function isn’t defined, but meh d:

0
101 Apr 27, 2010 at 09:47

Seems vector does not do you good. See if you can use different container than std::vector.

You might benefit from making a a new class simulating behavior of std::vector, which would store data in chunks and resize only chunk that needs it. Also if you randomly add items in the middle of vector, those chunks may as well grow and interchange data trying to rebalance their sizes.

0
101 Apr 27, 2010 at 13:37

You may have some ODR issues if the std::vector is instantiated for unsigned char somewhere where the function isn’t defined, but meh d:

That’s why I placed the replacement function directly into <xutility> ;)

Seems vector does not do you good. See if you can use different container than std::vector. You might benefit from making a a new class simulating behavior of std::vector, which would store data in chunks and resize only chunk that needs it. Also if you randomly add items in the middle of vector, those chunks may as well grow and interchange data trying to rebalance their sizes.

Basically, I’m using the std::vector less because its usefulness, but more because its standard and everyone knows how it works. In particular, I could without much effort also simply change to a unsigned char*, use malloc, and add a ScopeGuard for exception safety. std::vector just reads much nicer, and if correctly implemented, isn’t all that much slower than a specialized container.

Thank you for the input, though :)

Cheers,
- Wernaeh