Visual C++ 2008 STL vector resize

B7568a7d781a2ebebe3fa176215ae667
0
Wernaeh 101 Apr 26, 2010 at 23:00

Hello everyone :)

Recently, I’ve been having quite some performance problem with using a large (\~\~500 MB) std::vector<unsigned char>, which is repeatedly getting resized, cleared, and dropped in a tight loop.

Weirdly, the problem comes from the STL that accompanies Visual Studio 2008.

In particular, using std::vector::resize() is slow, since it uses a “normal” inner loop for default-constructing (zero-initializing) the new vector entries. On my machine, resizing() the vector takes around 8 seconds, while allocating the memory with std::vector::reserve() or copying an entire new dataset into the vector with one of the optimized RTL functions takes an order of magnitude less time.

After some digging in the STL, I noticed that there is an optimized _fill_n method that uses memset for unsigned-char type elements. Yet this method does not get called by the compiler on template resolution, but instead, the vector is resized with the loop-based, default version of _fill_n, i.e. this

// TEMPLATE FUNCTION fill_n
template<class _OutIt, class _Diff, class _Ty> inline
    void __CLRCALL_OR_CDECL _Fill_n(_OutIt _First, _Diff _Count, const _Ty& _Val, _Range_checked_iterator_tag)
{
    // copy _Val _Count times through [_First, ...)
    for (; 0 < _Count; --_Count, ++_First)
        *_First = _Val;
}

with:

_OutIt == unsigned char*
_Diff == unsigned int
_Ty = unsigned char

is preferred over the non-template

inline void __CLRCALL_OR_CDECL _Fill_n
    (unsigned char *_First, 
     size_t _Count, int _Val, _Range_checked_iterator_tag)
{
    // copy unsigned char _Val _Count times through [_First, ...)
    ::memset(_First, _Val, _Count);
}

a few lines further down.

The reason, I think, is that the compiler decides to prefer a template over implicitly casting any parameters (also makes sense when considering classes with custom casts), but then again, the optimized methods won’t probably ever be used (target pointer type != value type…)

For other methods, such as vector copying, the correct, optimized version using memcpy is used for certain scalar types.

Is this explanation correct ?
Is this a compiler bug or a problem in the STL ?
Any ideas on how to fix the misbehaviour ?

(Apart from designing a better algorithm, or using a custom, malloc-ed container…)

Thank you for your time,
Cheers,
- Wernaeh

5 Replies

Please log in or register to post a reply.

Fe8a5d0ee91f9db7f5b82b8fd4a4e1e6
0
JarkkoL 102 Apr 27, 2010 at 07:03

Yeah, the templated version is better match due to implicit cast so looks like a bug in the STL implementation (should be “unsigned char _Val”). You could just implement the proper version of the function yourself which should be called by std::vector, afaik.

B7568a7d781a2ebebe3fa176215ae667
0
Wernaeh 101 Apr 27, 2010 at 09:23

Okay, works like a charm :)
My vector resize() time is down to less than half a second now.

I just hope this modification does not have any side effects… Well, I guess I’ll notice soon enough :o)

Thank you,
Cheers,
- Wernaeh

Fe8a5d0ee91f9db7f5b82b8fd4a4e1e6
0
JarkkoL 102 Apr 27, 2010 at 09:36

You may have some ODR issues if the std::vector is instantiated for unsigned char somewhere where the function isn’t defined, but meh d:

Ed823df35ecbbb211ffda5974a156439
0
kvakvs 101 Apr 27, 2010 at 09:47

Seems vector does not do you good. See if you can use different container than std::vector.

You might benefit from making a a new class simulating behavior of std::vector, which would store data in chunks and resize only chunk that needs it. Also if you randomly add items in the middle of vector, those chunks may as well grow and interchange data trying to rebalance their sizes.

B7568a7d781a2ebebe3fa176215ae667
0
Wernaeh 101 Apr 27, 2010 at 13:37

You may have some ODR issues if the std::vector is instantiated for unsigned char somewhere where the function isn’t defined, but meh d:

That’s why I placed the replacement function directly into <xutility> ;)

Seems vector does not do you good. See if you can use different container than std::vector. You might benefit from making a a new class simulating behavior of std::vector, which would store data in chunks and resize only chunk that needs it. Also if you randomly add items in the middle of vector, those chunks may as well grow and interchange data trying to rebalance their sizes.

Basically, I’m using the std::vector less because its usefulness, but more because its standard and everyone knows how it works. In particular, I could without much effort also simply change to a unsigned char*, use malloc, and add a ScopeGuard for exception safety. std::vector just reads much nicer, and if correctly implemented, isn’t all that much slower than a specialized container.

Thank you for the input, though :)

Cheers,
- Wernaeh