STL "hidden instancing"?

D4b6814dd631916fc55440c1c4e95cec
0
osirisp 101 Aug 20, 2005 at 20:52

Hi all,

1st post here…

I’m having some trouble with std::min_element and my code. Basically, what I’m trying to do is a small partial sort of the first N elements of a list (note that partial_sort only works with random access iterators). The thing is, my sorting criteria is a binary predicate that depends on a 3rd varible. Hence, I’m using a functor with an internal reference to my other variable.

However, when I run it it gives an invalid reference/pointer and bails out. I’ve traced the error to the min_element call. Until then, everything is fine but once it gets inside it my iterators gets invalidated. I know I could do it another way but I was quickly prototyping my program and I want to understand what’s happening here. Any ideas?

Aaaanyway, here’s the code:

// distance to a reference disk comparison
struct distPred : binary_function<const Disk*,const Disk*,bool>
{
    Disk* dr;
    bool operator()(const Disk* d1, const Disk* d2) const
    {
        return ( d1->distanceTo(*dr) < d2->distanceTo(*dr) );
    }

    distPred(Disk* rPtr) { dr = rPtr; }
};

and the sorting function…

unsigned short
small_sort(DiskList::iterator from,
      DiskList::iterator to, 
      const distPred& cmp, 
      unsigned short n) 
{
    DiskList::iterator mit, it;
    
    unsigned short i = 0;
    for (; from != to, i < n; ++i, ++from ) {
        mit = min_element(from, to, cmp);
        ptr_swap(*mit,*from);
    }
    
    return i;
}

Feel free to bash my code :blush:

cheers!

Ozz

10 Replies

Please log in or register to post a reply.

22b3033832c5c699c856814b0cf80cb1
0
bladder 101 Aug 21, 2005 at 03:55

what you want in your for loop is this:

“from != to && i < n”

When you use the comma operator, the first value is evaluated and discarded, then the second value is evaluated and returned. So “from != to” is meanigless in the for loop. The way you have it, you call your small_sort with iterators values form beginning to end, and if ‘n’ is greater then the size of the list, you’ll crash. Other then that, make sure the iterators you’re passing in are valid.

D4b6814dd631916fc55440c1c4e95cec
0
osirisp 101 Aug 21, 2005 at 14:44

fixed it! I guess my 4am logic is not very good…

Thanks bladder!

25bbd22b0b17f557748f601922880554
0
bramz 101 Aug 23, 2005 at 08:00

Now your bug is fixed, may I be curious on the reason why you can’t have random access iterators? I’m guessing you’re currently using a std::list, but it’s often a lot faster to use a std::vector especially if the elements are cheap to copy (I guess pointers in your case?). I hope you don’t mind :)

D4b6814dd631916fc55440c1c4e95cec
0
osirisp 101 Aug 23, 2005 at 09:51

@bramz

Now your bug is fixed, may I be curious on the reason why you can’t have random access iterators? I’m guessing you’re currently using a std::list, but it’s often a lot faster to use a std::vector especially if the elements are cheap to copy (I guess pointers in your case?). I hope you don’t mind :) [snapback]20161[/snapback]

You are right. I am using a std::list. Some of the design reasons behind it are:

  • The number of elements in it changes constantly and I need to insert/delete elements in the middle from time to time. Using a vector would either waste a lot of memory if I overestimate the number of elements or just plain slow if it has to expand to accomodate more if I underestimate it.
  • I need to splice it at a random pivot. I know I could still use a vector for it by sorting it in decreasing order and using pop_back repeatedly or something but it seems impractical to me.
  • Ok, I forgot what the 3rd one was :blush:

I must admit, I’m kinda new to the “ways of the STL” so any suggestions or comments would be greatly appreciated.

25bbd22b0b17f557748f601922880554
0
bramz 101 Aug 23, 2005 at 12:37

@osirisp

  • The number of elements in it changes constantly and I need to insert/delete elements in the middle from time to time. Using a vector would either waste a lot of memory if I overestimate the number of elements or just plain slow if it has to expand to accomodate more if I underestimate it.

You might be supprised of the memory waste you have with a std::list. It’s double linked so each node has two pointers. Add your data to it (another pointer) and you have 12 bytes. Round that up to fit nice in memory and it is likely that you need 16 bytes of storage for only 4 bytes of data. So that means you may overestimate your vector 4 times before using the same amount of data :) Plus, memory of list may get fragmented.

You’d also be suprised about the speed of the otherwise O(N) push_front and insert. When data is cheap to copy (like your pointer), it is often much faster to push_front on a vector than a list. Of course, as long as the vector is small enough (small being a couple of hundred elements? I’m not sure, I have some article of Koenig about this matter, I’ll look it up tonight.

About the cost of ‘expanding’ the vector when necessary. Koenig also tested that, and he found out that for data which is cheap to copy (integers in his case), the cost of expanding was negligible.

  • I need to splice it at a random pivot. I know I could still use a vector for it by sorting it in decreasing order and using pop_back repeatedly or something but it seems impractical to me.
  • Ok, I forgot what the 3rd one was :blush:

I must admit, I’m kinda new to the “ways of the STL” so any suggestions or comments would be greatly appreciated.

[snapback]20168[/snapback]

I’m not sure what you mean by splice it at a random point. Splicing is something that’s unique for lists for moving elements from one list to another while iterators remain valid. I’m not sure if that’s what you mean or need.

Bramz

D4b6814dd631916fc55440c1c4e95cec
0
osirisp 101 Aug 23, 2005 at 14:00

You might be supprised of the memory waste you have with a std::list. It’s double linked so each node has two pointers. Add your data to it (another pointer) and you have 12 bytes. Round that up to fit nice in memory and it is likely that you need 16 bytes of storage for only 4 bytes of data. So that means you may overestimate your vector 4 times before using the same amount of data Plus, memory of list may get fragmented.

You have a good point there. I even looked into an slist but as it turns out, I need the bidirectional iterator for another part of the code. I’ll see if I can fit a vector into it without disrupting the code too much and see what happens.

You’d also be suprised about the speed of the otherwise O(N) push_front and insert. When data is cheap to copy (like your pointer), it is often much faster to push_front on a vector than a list. Of course, as long as the vector is small enough (small being a couple of hundred elements? I’m not sure, I have some article of Koenig about this matter, I’ll look it up tonight.

That might be a problem. My dataset is anywhere from a couple of thousand to 30k elements aprox. However, as I understand it, push_back is preferable to push_front on a vector as I doesn’t necessarily means moving the whole thing but I’ll have to look it up to make sure.

I’m not sure what you mean by splice it at a random point. Splicing is something that’s unique for lists for moving elements from one list to another while iterators remain valid. I’m not sure if that’s what you mean or need.

I need to separate in two parts. As far as I know, splicing doesn’t copy the range but moves the pointers instead. At least that’s what it should do. The big advange of lists is the insertion/deletion of chunks with a simple re-link operation.

I’ll think about it and see if I can do without those things as the vector option is starting to sound pretty good to me.

Thanks!

25bbd22b0b17f557748f601922880554
0
bramz 101 Aug 23, 2005 at 16:30

I wouldn’t switch over to std::vector too soon though, because a std::list might be better suitable for your case after all :) I was merely giving you some facts about performance of std::vector vs. std::list. I wasn’t trying to convince you to make the switch.

I looked up that article of Andrew Koenig and Barbara E. Moo I was talking about. It’s called: “Are Vectors Really Fastest?”. You can find in the August 2004 issue of C/C++ Users Journal.

This are the results (for cheap to copy elements):
- One append to a list took about 4.5 time units no matter what the number of elements in the list is. (it doesn’t mention the exact unit, but it doesn’t matter because relative speed is the issue here).
- push_back on a vector was also 4.5 time units for the first element and then quickly drops to 0.045 time units for each element added. It didn’t matter if he used reserve to allocate enough memory beforehand or not.
- push_front on a vector was also 4.5 time units for the first element, then dropped to 0.75 time units for 1000 elements. Then it started to increase again: 6.1 for 10k elements, 60 time units for 100k and 1360 for 1000k elements.

So, regarding that you have about 3k to 30k elements, we can say that std::vector and std::list would be roughly as fast? Not a very accurate statement I know :)

But! If you mostly need push_front and push_back and not much insert in the middle of the sequence, you might consider using a std::deque! It is optimized for both front and back insertion. It only takes 0.11 time units to do both push_front or push_back undependent on the number of elements. And … it has random access iterators. It is nearly as compact in memory as a std::vector too.

By now it should be clear that for cheap-to-copy elements, and for rather small sequences (say to 10k elements), it might be cheaper to copy the whole thing than to do the equivalent list operation. Though if that also counts for splicing, i’m not sure because I’m not sure if there’s allocation of some nodes involved in it. AFAIK it’s exactly this allocation that makes the list operators slower. So you should test that.

IMPORTANT: you should also notice that you’re sort algo is something like O(N*n) with N total number of elements in list. partial_sort is O(N*log(n)). Of course, that doesn’t say anything about the actual speed but it sure is something you have to check out.

So, all together: I don’t know what solution is the fastest, all I know is that it really could be either one. So it should be tested in your particular case. And the bottomline is as always: is the speed up worth the troubles?

Bramz

D4b6814dd631916fc55440c1c4e95cec
0
osirisp 101 Aug 23, 2005 at 16:49

Very interesting stuff. Thanks for the reference. I’ll have a look. As for now, I’m revising my algorithm and I think there’s a better way to do it… and it doesn’t involve the splicing and random insertion part so I’ll definitely have a look into the other container options. I think it’s worth a shot since I’m starting to get the “need for speed” as I try to add some features.
Anyhow, thanks for the advice. Stay tuned for more STL newbie questions!

Ozz

25bbd22b0b17f557748f601922880554
0
bramz 101 Aug 24, 2005 at 09:08

Heh … STL seems daunting at first, but once it starts growing on you, it’s when you discover how nice it is. I know a lot of people totally disagree with me. Though it isn’t perfect, I really like the concept of iterators. You need some container that’s not in the STL? Write your own and you still can use the algorithms in STL. You can even use the algorithms on a plain old C array (yes Nick, even on your plain old C strings ;) ) But it takes time to learn …

You can also write new algorithms to work on existing containers …

e.g.

template <typename ForwardIterator, typename BinaryPredicate>
unsigned short
small_sort(ForwardIterator from,
      ForwardIterator to, 
      const BinaryPredicate& cmp, 
      unsigned short n) 
{
  ...

And suddenly, you have an algorithme with a much broader use …

Isn’t that great?

F7a4a748ecf664f189bb704a660b3573
0
anubis 101 Aug 24, 2005 at 09:29

Throw in boost and you are set…