Boris Kolpackov has written a series of 3 articles on the minefield of efficient argument passing in C++11.
In part 1 he discusses the addition to C++11 – rvalue references – that improves argument-passing efficiency in some cases, but that it adds complexity for the general case. There are some interesting points here, particularly that there is no 1-size-fits-all solution for the argument signature (const lvalue reference, rvalue reference, value).
I think for a while I’ve been vaguely aware that something didn’t feel quite right, since I’d been tailoring a few method signatures with the knowledge of whether or not the caller would be moving an object in, and whether I want to copy or just reference the argument. Clearly the loss of opacity, and fragility in the face of implementation change are undesirable in our code.
In part 2 Boris outlines a wrapper similar to std::reference_wrapper that can be constructed with either an rvalue reference or const lvalue reference. It’s rather a good wrapper, but impractical for use everywhere; more suitable for generic library code that needs to be as tight as possible.
Part 3 summarizes and appraises the argument reference wrapper approach, and makes some general conclusions about argument passing generally in C++11.
It’s a shame that this area is as un-clear cut as it seemed; whether a language change to mitigate this occurs, or is even possible, will remain to be seen. For now, following Boris’ suggestion to think about the conceptual requirements of each passed argument, and avoid premature optimisation, seems to be the best approach.
Sumant Tambe of C++ Truths recently gave a talk entitled “C++11 idioms” at Silicon Valley Code Camp, in which he discussed the above approach to efficient argument passing, plus pros/cons and alternative approaches.
Scott Meyers has also posted a response to Sumant’s presentation, building on it and adding his own wisdom, in particular:
The fundamental problem is that perfect forwarding and overloading make very bad bedfellows, because perfect forwarding functions want to take everything. They’re the greediest functions in C++.
There’s an interesting discussion with plenty of good stuff in the comments under that post too.
I’ve just read this week’s stackoverflow newsletter, and was intrigued by a recent top question “Why is processing a sorted array faster than an unsorted array?”
Anyone writing performance-critical processing code should be aware of this; the C++ comparison showed that processing the sorted array was nearly 6x faster than with the unsorted, which is certainly an eyebrow-raising result. A straight Java equivalent showed a similar, albeit less extreme, difference.
The reason for this potentially surprising behaviour is that in the code with the unsorted array, the CPU’s branch predictor fails 50% of the time and stalls the pipeline. I’ll hand you over to GManNickG to show you the code and Mysticial to explain, rather well, what is going on: Why is processing a sorted array faster than an unsorted array?
This is something I was aware of but have never taken into account, so I’ll be trying to remember Mysticial’s general rule of thumb to “avoid data-dependent branching in critical loops“.