# Out parameter versus return value performance?

### #1eddie

Posted 10 January 2007 - 04:20 AM

This is more of a musing thought then anything.

I much prefer the syntax of:

MyFoo const kMyFoo = GetMyFoo();

over:


MyFoo myFoo;

GetMyFoo(myFoo);



First off, I can make things const in the previous declaration (I try to be a const freak when I can - makes things easier to read, I find), and also it's nice to have one line instead of two.

But is there any sort of a (minor, albeit) performance issue with what I assume will have to be another value created on the stack to return the value? Or can I assume that it'll use the place I'm assigning it to as storage instead of putting a temporary on the stack?

---

Somewhat related: in C++, I hear a lot of, "Oh, don't do that. The compiler is smart enough to do that for you." While I like hearing that I can put my faith in compiler, I'd like a little more hard truth (belief is for religion, is my motto. ;) ). Is there any easier way of knowing these truths (a FAQ or something with all these rules outlined somewhere) then digging around in disassembly to see what's going on?

Thanks for putting up with my musing. ;)

### #2dave_

Posted 10 January 2007 - 09:19 AM

Both have there uses, its just the same as by value or by reference.
Sometimes by value can be faster... most of the time by reference is fast
For example what is better

std::vector<MyStruct> getValue() const;
bool getValue(std::vector<MyStruct> &result) const;

The second one will win in terms of performance nearly every time.
So for the sake of consistency most of my gets are in the form of the later, appart for very simple/small structs.

### #3.oisyn

Posted 10 January 2007 - 12:10 PM

I prefer the return value over the output parameter as well. In the defense of C++, the next version of the standard will probably include move sementics aside from copy semantics, which basically means that the value of one variable can be moved to another variable while destroying the first.

A function returning a std::vector can be very efficient in that way - the contents of the vector don't need to be copied, the internal pointers will just be moved to the final variable. Much like swapping with a default-constructed vector.
-
Currently working on: the 3D engine for Tomb Raider.

### #4Nick

Posted 10 January 2007 - 12:44 PM

eddie said:

Is there any easier way of knowing these truths (...) then digging around in disassembly to see what's going on?
No. :lol:

Your question is a bit contradictive. You wish to know what happens below the surface, without 'believing' and without actually going below the surface. Learning some assembly basics is actually very productive. You'll be able to answer these kind of questions yourself, making you a better C++ programmer. You'll also be able to locate some nasty bugs with greater ease.

To answer your question directly; return values that fit in 32-bit are stored in the 'eax' register (on 32-bit x86 architectures). Returing values via a pointer or reference always requires a memory operation. Working with registers is faster than working with memory. However, if you're only going to change a few fields in a structure, it's best to pass it by reference (or even better, write a member function).

x86-64 changes these 'truths' significantly though. So really the most productive approach is to learn some assembly, locate the real bottlenecks, and optimize those. If parameter passing/returning is no bottleneck, then don't bother about it.

### #5SpreeTree

Posted 10 January 2007 - 03:25 PM

.oisyn said:

the next version of the standard will probably include move sementics aside from copy semantics, which basically means that the value of one variable can be moved to another variable while destroying the first.

That will be a real winner in terms of performace and useability :)

Regarding the return value/parameter question, you are not going to see a performance issue which ever way you go, its stuch a small area to be concerned about. Unfortunatly, that leads to the dreaded 'It just depends what you like'.

We had a discussion at work about this not to long ago, not from a performace perspective, but from a readability point of view. Take the following for example


uchar returnedNum = GetCalculateValue();



This is what I prefer, but the question is 'how do you signal failure?'.

In this case GetCalculatedValue can fail, so it should return 0? But 0 might be a valid number. Ok, so a minus number indicates failure... Fair enough, but we wanted to return an unsigned value again to indicate exactly what the purpose of the function might be.

If we wrote is as follows


uchar returnedNum = 0;

bool success = GetCalculatedValue(returnedNum);



We can now explicity check for success without having to resort to a magic number which could represent anything. We might have multiple error codes, so do we return -3, -2, -1 to represent them, or return an enum?

Theres no right or wrong answer, and like I said I prefer the former. But it's all dependant on what you like, thing and need from the function.

Spree
### #6eddie

Posted 10 January 2007 - 03:25 PM

Nick said:

Your question is a bit contradictive. You wish to know what happens below the surface, without 'believing' and without actually going below the surface.

Yeah, I kinda realized that I was being weaselish/lazy as I wrote the question. ;) I'm not averse to reading the dissassembly, I suppose. I guess I'm just daunted by having to read the dissassembly on the various compilers I work with, since the C++ specification [more then likely] doesn't specify these types of things as cross-compiler 'truths'.

I suppose I'll just suck it up and start reading dissassembly to get a better picture -- that said, I don't regret my question as it's led to some very interesting findings from you and the other posters. On that note:

Nick said:

To answer your question directly; return values that fit in 32-bit are stored in the 'eax' register (on 32-bit x86 architectures). Returing values via a pointer or reference always requires a memory operation. Working with registers is faster than working with memory. However, if you're only going to change a few fields in a structure, it's best to pass it by reference (or even better, write a member function).

Interesting! First off, I didn't know this. Is there any 'source' (MSDN, some architecture manual) where I could read this (as well as other goodies) from directly? (Not that I distrust you, but I'd love to know how you came to know this and how I can uncover similar things). I suppose it makes sense since that's the basic 'word' size for a 32-bit computer.

Second; to paraphrase: it's best to use return values for classes/structs/primitives that are < 4 bytes longs, otherwise use some sort of pointer reference (alongside your member function mention). Interesting. I always thought a compiler would be 'smart' enough to pass in my existing register (the variable I'm constructing on the left-hand-side of assignment) as the return register on the stack. I suppose I was asking too much. :)

Nick said:

x86-64 changes these 'truths' significantly though. So really the most productive approach is to learn some assembly, locate the real bottlenecks, and optimize those. If parameter passing/returning is no bottleneck, then don't bother about it.

Just because we're on the topic: I'm wondering if you could qualify 'significantly'. Does x86-64 not just double the size of the eax register? Or has whole gobs of the system changed?

Is there a place I can read about this on my own, or is this a place best uncovered, as you say, with dissassembly and experimentation?

Finally (sorry for the long post, but you and the others gave me reams to respond to. ;)): just to indicate my stance: I'm a total believer in "optimize the problem, don't optimize non-problems" - however I do believe it's good to do basic discipline checks on yourself once in a while, and change your habits accordingly (i.e. "++myFoo;" versus "myFoo++;").

I just have to learn what those types of disciplines should be. :)

Thanks!

### #7dave_

Posted 10 January 2007 - 04:33 PM

eddie said:

Interesting! First off, I didn't know this. Is there any 'source' (MSDN, some architecture manual) where I could read this (as well as other goodies) from directly? (Not that I distrust you, but I'd love to know how you came to know this and how I can uncover similar things). I suppose it makes sense since that's the basic 'word' size for a 32-bit computer.

http://en.wikipedia....ing_conventions

### #8.oisyn

Posted 10 January 2007 - 04:39 PM

SpreeTree said:

If we wrote is as follows

uchar returnedNum = 0;
bool success = GetCalculatedValue(returnedNum);


errorable<uchar> returnedNum = GetCalculatedValue();
if (returnedNum.succeeded())
{
std::cout << returnedNum.value() << std::endl;
}
else
{
std::cout << "STUPID USER!" << std::endl;
}

Or more general:
tuple<uchar, bool> GetCalculatedValue(); // declaration

uchar returnedNum;
bool succes;
tie(returnedNum, succes) = GetCalculatedValue();

There are lots of ways to increase the number of returned values, you don't have to use parameters per se
-
Currently working on: the 3D engine for Tomb Raider.

### #9dave_

Posted 10 January 2007 - 04:45 PM

.oisyn said:

A function returning a std::vector can be very efficient in that way - the contents of the vector don't need to be copied, the internal pointers will just be moved to the final variable. Much like swapping with a default-constructed vector.

Are you sure? I've just had a quick check and it seems terrible in comparison

    std::vector<MyStruct> test1;
func1(test1);

1 vector created. In the function the values can be pushed directly onto the result.
    std::vector<MyStruct> test2;
test2 = func2();

In this version you have to create a local in your function to store your results. Then it seems the compiler makes a copy of this to be returned. Then it assigns the value to the one you've already created. Then it destroys the temporary.

Seems worse to me? Or am I missing the point?
The vector was just an example. It could have been any complicated class.

### #10.oisyn

Posted 10 January 2007 - 04:59 PM

dave_: I was talking about the new C++ move-semantics feature (or: reference-to-rvalue), where you don't care what happens with contents of the source variable .

template<class T> class vector
{
public:
// move ctor
vector(vector && other) : m_buffer(other.m_buffer), m_size(other.m_size)
{
other.m_buffer = 0;
other.m_size = 0;
}

// ...

private:
T * m_buffer;
size_t m_size;
};

Read the original proposal and related papers and follow-ups:
http://www.open-std..../2002/n1385.htm
http://www.open-std....2004/n1690.html
http://www.open-std....2005/n1770.html
http://www.open-std....2005/n1855.html
http://www.open-std....2006/n1952.html
-
Currently working on: the 3D engine for Tomb Raider.

### #11dave_

Posted 10 January 2007 - 05:09 PM

I see, what compliers actually implement that now?

I'm stuck in the past with visual studio .net 2003 at the moment... and I'm not even allowed to use templates :sad:

### #12.oisyn

Posted 10 January 2007 - 05:49 PM

Oh don't worry about that, the next C++ revision (C++09) won't come any earlier than 2009 (hence the name). Most compilers currently use C++98, which was the first official C++ standard if I'm not mistaken.
-
Currently working on: the 3D engine for Tomb Raider.

### #13SpreeTree

Posted 10 January 2007 - 09:12 PM

.oisyn said:

<Couple Of Cool Ideas>

I do like your ideas, and did think about something similar but avoided them for the following reasons.

The first example (which you probably realised hence the more general version), is that the errorable template would require to be more generic than you'd like, unless you passed the 'error' value as another template value. But having said that, having specific error values (for example D3D_ERROR or D3D_OK) makes code much clearer to use.

The second one I do personlly like, though the syntax is not really to my liking. Though it would concern me having to copy over two distinct values from various functions. And I can imagine the response if I slid that into a work project ;)

Also, I tend to avoid using lots of generalised templates like you described, simply to avoid code bloat - I'm working on downloadable games at the moment, and while I love templates, I can't go crazy on them, especially when I could use pass by reference and simply return a bool ;)

dave_ said:

and I'm not even allowed to use templates

Is that a work imposed restriction, because .net is more than capable?

Spree

### #14dave_

Posted 10 January 2007 - 09:14 PM

SpreeTree said:

Is that a work imposed restriction, because .net is more than capable?

It is, its to do with portability. There are some really rubbish compilers out there.

### #15SmokingRope

Posted 17 January 2007 - 09:35 PM

I've had similar performance questions. Looking around today i found the options necessarry to generate the assembly language output of your programs. As all of these performance questions are highly compiler dependent, you can find the answer. The only criteria being you can read (and contrast) the assembly instructions of your example program.

Visual C++ 2005(source):
Open the Project Properties Tab
Select C++->Output Files
Configure the 'Assembler Output' property to whatever format you want
Configure the 'ASM List Location' property to be some location. Using '\$(IntDir)/' will put the .asm files in the debug/release (intermediate) folder of the corresponding project.

I found this link with regards to GCC but i'm not familiar enough with GCC to reproduce the necessary switches here.

### #16eddie

Posted 17 January 2007 - 10:27 PM

Personally Ij ust press Ctrl-F11 (I think that's the keystroke) to step through the assembly in question. I think that's the same, yes?

### #17.oisyn

Posted 17 January 2007 - 11:09 PM

eddie said:

Personally Ij ust press Ctrl-F11 (I think that's the keystroke) to step through the assembly in question. I think that's the same, yes?

The point is that you have to run it first, then break the program and hit either ctrl-f11 to show the disassembly of the current line or type the address of the function you want to view in the address field.

If you're only interested in assembly output of a large program it might be more useful to enable the option to output the assembly code, so you can browse through it more easily.