Archive
ISO C++ committee draft
Following the recent meeting of the ISO C++ standards committee, Working Group 21 (WG21) in Bristol, UK, a number of proposals have been accepted into the ‘Committee Draft’ of the next standard, currently scheduled for next year (“C++14″).
Although this is a ‘bug fixes and minor update’ revision of the standard, it contains some improvements that I know will show significant improvement in some areas. Generic lambdas, generalized captures and return type deduction for normal functions, for example, will enable creation of code that is less brittle in the face of type changes, and also allow things that weren’t possible without hacks (e.g. moving unique_ptr objects into lambdas).
Herb Sutter has written up his trip report, which shows an overview of the changes, and below is a list I have compiled of the accepted papers. I wrote the original for my team, so it is ordered by level of relevance/benefit to our own work or my own interest, and I have given less attention to detail for the proposals I don’t see affecting me/us. I will gladly accept suggested corrections and omissions for the list.
[April 26th] Michael Wong has also written a trip report covering some of the discussions and rationale. This is part 1 (Language).
[April 27th] Reddit discussion of some of the new facilities.
[April 30th] Part 2 (Library) of Michael Wong’s trip report.
[May 1st] Part 3 (Concurrency) of Michael Wong’s trip report.
N3656 make_unique
Proposes the missing unique_ptr analogue for make_shared. There are 3 overloads: 1 for a single object with argument pack that is forwarded to the object constructor, and 2 for arrays, one of which is deleted to ensure it is used correctly (it takes a single argument: the array size, so cannot be used for a fixed-size array). The implementation of the single object overload matches that suggested by Herb in GotW 102.
N3638 Return type deduction for normal functions
This adds the ability to use auto for any function’s return type, with the same effect as exists now for lambdas, i.e. deduce the return type from the function’s implementation. For both normal functions and lambdas the requirement that the implementation be of the form return expression; is relaxed, and the deduction even works for recursive functions as long as a single type is deducible!
auto sum(int i) {
if (i == 1)
return i; // return type deduced to int
else
return sum(i-1)+i; // ok to call it now
}
Return type deduction is not supported for virtual functions.
N3648 Wording Changes for Generalized Lambda-Capture
Adds lambda captures of the form identifier initializer and &identifier initializer, which both behave like a declaration of the same form preceded by auto. The paper is standardese only, but N3610 has a more readable description.
int x = 4;
auto y = [&r = x]() {...}; // equivalent to auto& r = x
// shows moving an object into a capture variable, as well as the capture
// identifier capturing from and hiding the same name from outer scope.
std::unique_ptr p( new int(42) );
auto z = [p = std::move(p)]() {...};
N3649 Generic (Polymorphic) Lambda Expressions (rev3)
Enables ‘templated’ lambdas, and without an angle bracket or template in sight; marvellous!
auto Identity = [](auto a) { return a; };
In addition this change will allow ‘universal references’, parameter packs, and perfect forwarding:
template< class T >
auto make_factory() { // using N3638 return type deduction
return []( auto&&... args ) {
return std::unique_ptr( new T( std::forward< decltype(args) >(args)... ));
}
}
N3659 Shared locking in C++
This provides shared_mutex, and shared_lock for non-exclusive (‘read’) locking, with unique_lock continuing to provide the exclusive (write) locking as it does today.
N3655 Transformation traits redux, v2
This is a ‘convenience’ proposal to simplify usage of the ‘transformation traits’ that produce one type from another, such as add_lvalue_reference.
With today’s library you must write typename add_lvalue_reference<T>::type to use the facility, whereas the proposal adds a template alias interface enabling add_lvalue_reference_t<T>. Note the trailing _t there: each of the transformation traits gains an alias with the same name transformation.
Although I’m sure some will deem these additions too trivial to fill pages of the standard with – indeed section 4, which adds the same treatment for the other traits, was not accepted for this reason – they do clean up syntax particularly where transformations are nested, and I will be using them myself. The paper includes the additions for all traits, so those who wish can use them immediately including the section 4 transformation aliases.
N3672 The standardisation of Boost.Optional
What it sounds like, with some differences to the current Boost.Optional implementation, particularly useful being the addition of move support.
From the paper, “The basic usage of optional can be illustrated with the following example.”
optional<int> str2int(string); // converts int to string if possible
int get_int_form_user()
{
string s;
for (;;) {
cin >> s;
optional<int> o = str2int(s); // 'o' may or may not contain an int
if (o) { // does optional contain a value?
return *o; // use the value
}
}
}
N3662 C++ dynamic arrays (dynarray)
A kind of std::vector/std::array hybrid; has a capacity fixed on construction (unlike vector) that is not part of the type (unlike array). This provides much-needed sequence container operations over a raw array where we don’t want a different type, or cannot fix size at compile-time.
// construct
std::dynarray<int> ints(5, 42); // the sequence {42,42,42,42,42}
auto another(ints); // copy-constructed sequence
std::dynarray<int> empty(0); // empty sequence {}
// size & unchecked element access
for (int i=0; i < ints.size(); ++i)
ints[i] = i;
ints.at(10) = 10; // throws std::out_of_range
// front/back
ints.front() = 5;
ints.back() = 5;
// sequence iteration output "25 1 25 3 25"
std::copy( ints.begin(), ints.end(), std::ostream_iterator(std::cout, " "));
// contiguous storage access
int* data = another.data();
another.fill(0);
N3639 Runtime-sized arrays with automatic storage duration (rev5)
Based on the recent addition of Variable-Length Arrays (VLAs) in C11, this allows e.g. int a[n]; where n has a runtime value, and restricts a to automatic storage duration. The dynarray proposal (N3662) allows for the freestore-based equivalent.
N3652 Relaxing Constraints on constexpr Functions
Allow some declarations, if, switch, for, while, do-while, and mutation of locally-scoped objects in constexpr functions.
N3668 exchange() utility function (rev3)
Specifies T std::exchange(T& old, U&& new) that updates old to new and returns old‘s original value, similar to the facilities provided for atomics.
N3651 Variable templates
Enables type-parameterized constants e.g. for different representations of pi: pi<double>, pi<float> and, God forbid, pi<int>
N3654 Quoted strings (rev2)
Allows for fixing the inconsistency in the following:
std::stringstream ss; ss << "Hello world"; std::string in; ss >> in; // 'in' is "Hello" instead of "Hello world"
N3642 User-defined literals for standard library types
Adds ‘user-defined’ (albeit defined by the library…) literals for std::basic_string (e.g. "blah"s), std::chrono::duration members (e.g. 21min, 42ms), plus binary and complex numbers.
Other proposals
As Herb’s trip report, and the comments beneath it, show, additional proposals have been worked on during the meeting, in particular N3505, a Boost.Filesystem (v3)-based library, is looking good for release in parallel, along with a networking library and N3580 “concepts lite”.
One of the even more exciting things that I hope will truly revolutionise how we use C++ is the ‘modules’ proposal. This formalises package (module) management to make the compiler aware of them, and neatly combines the facilities that are currently rather clunkily provided by namespaces and headers. If my understanding is correct, this will also provide a basis for much faster builds as the include vs. forward declare, and exposure of implementation of class/function templates in headers become non-issues that need not be considered. According to Herb this is being worked on by the Clang guys and “the intent is to issue a Modules technical specification as soon as it’s ready”. Hurrah!
Finally, a massive thanks to all those at the coalface for the work that went into these proposals and last week. The results are impressive, and the scale of accepted proposals stretches the bounds of the ‘bug fixes and minor update’ description. Roll on 2012
Boost compile error “no class template named ‘result’”
I’ve just come across a compiler error resulting from using boost::result_of with a C++11 lambda.
I wasn’t actually using boost::result_of directly, but provided a lambda as the transforming functor for a boost::transform_iterator, which uses it behind the scenes.
Here’s a small listing that shows the problem:
#include <boost/iterator/transform_iterator.hpp>
#include <iostream>
int main()
{
int i[] = {1, 2, 3, 4};
auto square = [](int i) { return i*i; };
auto begin = boost::make_transform_iterator( i, square );
auto end = boost::make_transform_iterator( i+4, square );
}
with Boost 1.48 that yields the following error:
In file included from /usr/include/boost/iterator/transform_iterator.hpp:23:0,
from boost-result-of-error-with-lambdas.cpp:2:
/usr/include/boost/utility/result_of.hpp: In instantiation of ‘struct boost::detail::result_of_nested_result<main()::, main()::(int&)>’:
/usr/include/boost/utility/result_of.hpp:83:8: required from ‘struct boost::detail::tr1_result_of_impl<main()::, main()::(int&), false>’
/usr/include/boost/utility/detail/result_of_iterate.hpp:23:8: required from ‘struct boost::tr1_result_of<main()::(int&)>’
/usr/include/boost/utility/detail/result_of_iterate.hpp:80:8: required from ‘struct boost::result_of<main()::(int&)>’
/usr/include/boost/mpl/eval_if.hpp:38:31: required from ‘struct boost::mpl::eval_if<boost::is_same, boost::result_of<main()::(int&)>, boost::mpl::identity >’
/usr/include/boost/iterator/iterator_adaptor.hpp:160:12: required from ‘struct boost::detail::ia_dflt_help<boost::use_default, boost::result_of<main()::(int&)> >’
/usr/include/boost/iterator/transform_iterator.hpp:50:17: required from ‘struct boost::detail::transform_iterator_base<main()::, int*, boost::use_default, boost::use_default>’
/usr/include/boost/iterator/transform_iterator.hpp:74:9: required from ‘class boost::transform_iterator<main()::, int*, boost::use_default, boost::use_default>’
boost-result-of-error-with-lambdas.cpp:10:60: required from here
/usr/include/boost/utility/result_of.hpp:79:8: error: no class template named ‘result’ in ‘struct main()::’
The problem is that boost::result_of doesn’t play well with C++11 lambdas straight off, fortunately the solution is simple: define preprocessor symbol BOOST_RESULT_OF_USE_DECLTYPE and all is well again
Thanks to Akira Takahashi for the solution in this response to the same issue triggered from the range adaptors.
An implementation of generic lambdas (request for feedback)
An implementation of generic lambdas (request for feedback)
Faisal Vali has created an initial, alpha implementation (in Clang) of the current Generic (Polymorphic) Lambda Expressions proposal (N3418). This would see the addition of support for templated lambda expressions; the current state achieves this either via explicit parameter list after the lambda introducer, or by using the ‘auto’ keyword in place of a typename.
The article links to an interesting (FSVO interesting) discussion of the proposal, and the Clang source with support for it.
Adventures in lock-free programming
I’ve been intrigued by the idea of lock-free programming since I discovered that C++0x (as it was then) was to gain library facilities for standardised, portable lock-free programming. The new <atomic> header enables us to perform atomic reads and writes on a restricted set of types, and to use atomic operations as synchronisation primitives in lock-free programming.
I am still new to this topic, and have approximately equal measures of interest in the possibilities and fear of the complexity and difficulty, but wish to expose the list of writings I have found informative in my learnings. Get in touch if you know of any good additions to this list as I will be updating it with further discoveries.
volatile vs. volatile [Herb Sutter]
An explanation of what ordered atomic variables do. Also covers the use of volatile in C++ for ‘unusual’ memory semantics.
The Trouble With Locks [Herb Sutter]
Lock-based programming, our status quo, is difficult for experts to get right. Worse, it is also fundamentally flawed for building large programs.
Lock-free programming is difficult for gurus to get right.
Herb explains why lock-based programming is difficult, brittle, and why locks are not generally composable.
Lock-Free Code: A False Sense of Security [Herb Sutter]
Herb dissects a lock-free queue implementation to show the flaws in its design, discussing the twin requirements of atomicity and ordering on any lock-free variable. He exposes the points in the lock-free algorithm in which instruction re-ordering can break the queue’s invariants.
Writing Lock-Free Code: A Corrected Queue [Herb Sutter]
The typical coding pattern to use is to do work off to the side, then “publish” each change to the shared data with a single atomic write or compare-and-swap.
In the sequel to A False Sense of Security Herb shows a lock-free queue implementation with a working implementation based on atomic variables.
Writing a Generalized Concurrent Queue [Herb Sutter]
![]()
In this final part of the series Herb writes a generalized queue structure; he does this without mutexes (mutices?), but uses atomic variables equivalently with no relaxation of the barriers for each read/write operation.
Apply Critical Sections Consistently [Herb Sutter]
Herb shows how to express a critical section with several different synchronisation primitives, and shows some example code too.
Memory Barriers Are Like Source Control Operations [Jeff Preshing]
This article uses the analogy of source code control to explain the different memory barriers (fences). This was a great one for me, though I can’t honestly say I completely grok it yet… Jeff’s other articles linked from that page are also good background on the subject.
C++ atomics and memory ordering [Bartosz Milewski]
Bartosz provides a much more readily understandable description of C++11′s different memory_order enum values than the standard; anyone who understands all of section 1.10 Multi-threaded executions and data races has my respect.
C++ Concurrency in Action: Practical Multithreading [Anthony Williams]

I’ve only just begun reading this loan from a colleague, and so far am very impressed by the clarity with which concepts and techniques are explained and demonstrated. It was written for the new, standard thread support facilities by the creator of the just::thread C++11 implementation who also helped guide the standardisation of threads and other language features.
Eric Niebler: stupid name lookup
Eric Niebler: stupid name lookup
Eric Niebler shows a name lookup failure caused by a recursive template function inferring its return type via decltype. Stupid problem with a rather nasty hacky workaround, but also a likely change in C++14 that will be most welcome, and makes the whole problem go away
Using the standard allocator for a different type [and get free functionality for your custom allocator]
Someone on comp.lang.c++.moderated recently asked “if I have a different allocator, can I use that allocator to allocate memory with vector or string…?”. Since there is not only the mechanism supported by C++98/03, but also a more general – and helpful – mechanism introduced in C++11, I thought I’d write a short post on both approaches.
The problem is this: we have a class template that has an allocator type parameter (defaulted to std::allocator), e.g:
template<class T, class Allocator = std::allocator<T> >
struct linked_list {
typedef Allocator value_alloc;
typedef /* ??? */ node_alloc;
};
We can now create new value type objects via the provided allocator, but how do we create node objects? You could use new, but if the user wants memory to be allocated in a particular way for the value type it’s likely they want the same policy to apply to all memory you allocate.
Fit the First
The solution for this is to use the ‘static rebind’ facility supported by the standard allocator, and hopefully any custom allocators:
template<class T, class Allocator = std::allocator<T>>
struct linked_list {
typedef typename Allocator::template rebind< T >::other value_alloc;
typedef typename Allocator::template rebind<node<T> >::other node_alloc;
};
You’ll notice that I rebound both allocator types; this is because nothing in the template parameter list specifies that Allocator must be an allocator of T objects, so is really just doing due diligence on an unknown that we’ve been given. (I know nothing even specifies that it’s an allocator or has an allocator-like interface, but hey, until we have concepts we’ll just go with it…)
This does exactly what is needed to solve the issue, and is supported by all standard-compliant standard library implementations, but let’s see if newer facilities can help us to improve on it.
A viable alternative?
An alternative that avoids the rebind is to take the allocator as a template-template parameter instead, employing a typical policy-based design pattern:
template<class T, template<class> class Allocator = std::allocator>
struct linked_list {
typedef Allocator< T > value_alloc;
typedef Allocator<node<T> > node_alloc;
};
Although this will work it has a major downside: it forces the allocator to be a class template taking only 1 type parameter, which is restrictive to the point of brokenness. You could improve on this in C++11 by using a template parameter pack – template<class...> class Allocator – but unless all template parameters are type parameters, and all but the first are defaulted, it is still broken.
C++11: a uniform rebind interface
With C++11′s added support for allocator traits, we now have a uniform interface for static rebinding. (Note that I make use of the new ‘alias declaration’ facility via the using keyword.)
template< class T, class Allocator = std::allocator<T> >
struct linked_list {
using value_alloc = typename
std::allocator_traits<Allocator>::template rebind_alloc<T>;
using node_alloc = typename
std::allocator_traits<Allocator>::template rebind_alloc<node<T>>;
};
std::allocator_traits supplies a uniform interface to all allocator types without enforcing the entire default allocator interface on every allocator type. This is not all however; the default allocator_traits instantiation gives you the following for free:
- default definitions of 9 member typedefs and 2 template alias declarations
- default implementations for 4 member functions, including
construct()anddestroy(), which call placement new and the object destructor respectively.
If you are writing a custom allocator then this saves writing a fair amount of dull boilerplate, but note that if you wish to provide backward-compatibility with user code that does not use allocator_traits then you will still need to provide it all yourself.
Conclusion
If you don’t yet have support for the new allocator_traits then the first is the only viable option. For anyone able to take advantage of the new facilities afforded by allocator_traits then I recommend doing so.
If you are creating a custom allocator then it is up to you to determine whether the backward compatibility or minimisation of boilerplate is more important.
Generalised type-deduction for class template instance construction
I’ll wager that most C++ developers with experience of templates have at some point discovered the limitation that a class template’s parameters cannot be deduced in construction. Since type deduction can be performed in invocation of a function template however, I commonly create a helper function such as make_wrapper in the following sample to enable type deduction to take place when I want it:
#include <type_traits>
template< class T >
struct wrapper
{
wrapper( const T& );
};
template
wrapper make_wrapper(const T& t)
{
return wrapper(t);
}
static_assert( std::is_same< decltype(make_wrapper(42)), wrapper >::value, "incorrect type deduction" );
Common examples of this in the standard library are make_pair, make_shared and make_tuple (the latter 2 being available in C++11 only).
The annoyance of this however is that you must write a deduced construction helper for each class template you wish to be able to perform deduced construction of. Except that’s rather untrue: you could write a deduced construction helper for each class template parameter pattern, e.g. <class>, <class, class>, <class, class, class> etc, but with the aid of C++11 language & library additions we can do better than that.
Generalising the type-deduction helper
Thanks to perfect forwarding and variadic templates it is now possible to generalise this pattern to support any template class. If you are unfamiliar with these techniques I recommend reading Sutter’s description of a make_unique function (see the section entitled Enter make_unique) and Meyers’ post on what he calls ‘universal references‘.
#include
#include <type_traits>
#include
/** Changes std::reference_wrapper to the corresponding reference type. */
template< class T >
struct unwrap {
typedef T type;
};
template< class T >
struct unwrap< std::reference_wrapper > {
typedef T& type;
};
template< class T >
struct unwrap< std::reference_wrapper > {
typedef const T& type;
};
/** Deduced construction of a template class with given parameters. */
template< template class T, class... Args >
auto make( Args&&... args ) -> T< typename unwrap< typename std::decay::type >::type... >
{
return T<
typename unwrap<
typename std::decay::type
>::type...
>( std::forward(args)... );
}
Things to note about this implementation:
- The
unwraptemplate and specialisations enables the call site to force a reference type in the deduction in a similar way tostd::bind. std::decayenables this to model function parameter passing more closely by performing certain conversions, such as array-to-pointer.- a
noexceptconstructor does not result inmakebeingnoexcept, although it should be straightforward to add support to do so.
Putting it to use
The following code sample shows how this may be used:
#include
#include
#include
#include
/** Simple wrapper class. */
template< class T >
struct wrapper
{
explicit wrapper( const T& t_ ): t(t_) {}
template< class U >
explicit wrapper( U&& u )
: t( std::forward<span style="text-decoration: underline;">(u) )
{}
T t;
};
/** Simple helper to clarify the assertions below. */
template< class, class >
struct assert_wrapped_type_same;
template< class T, class U >
struct assert_wrapped_type_same< wrapper, U >
{
static_assert( std::is_same< T, U >::value, "wrapped type mismatch" );
};
int main(int, char**)
{
// basic test from literal; copy makes from rvalue
auto w0 = make( 42 );
assert_wrapped_type_same< decltype(w0), int >();
assert( w0.t == 42 );
// basic test from lvalue: copy makes and drops const qualification
const std::string hello("world!");
auto w1 = make( hello );
assert_wrapped_type_same< decltype(w1), std::string >();
assert( w1.t == "world!" );
// test decay from array type.
auto w2 = make( "xyzzy" );
assert_wrapped_type_same< decltype(w2), const char* >();
assert( std::string(w2.t, 5) == "xyzzy" );
// test moving a non-copyable object into makeor
std::unique_ptr upi( new long(120623) );
auto w3 = make( std::move(upi) );
assert_wrapped_type_same< decltype(w3), std::unique_ptr >();
assert( !upi );
assert( w3.t and *w3.t == 120623 );
// test use of reference wrappers to qualify deduced type; std::cref -> const T&
std::string s("must go faster");
auto w4 = make( std::cref(s) );
assert_wrapped_type_same< decltype(w4), const std::string& >();
assert( &w4.t == &s );
// and std::ref -> T&
auto w5 = make( std::ref(s) );
assert_wrapped_type_same< decltype(w5), std::string& >();
assert( &w5.t == &s );
// now a sample with a variadic template
auto tup0 = make( 42, "hello", std::string("world!"), std::unique_ptr(new long(120623) ), std::cref(s), std::ref(s) );
static_assert(
std::is_same< decltype(tup0), std::tuple<int, const char*, std::string, std::unique_ptr, const std::string&, std::string&> >::value,
"tuple type mismatch" );
assert( std::get(tup0) == 42 );
assert( std::get(tup0) == std::string("hello") );
assert( std::get(tup0) == std::string("world!") );
assert( std::get(tup0) and *std::get(tup0) == 120623 );
assert( &std::get(tup0) == &s );
assert( &std::get(tup0) == &s );
}
You can download the complete code sample here; I have tested it successfully with gcc 4.7.2, but it won’t build on 4.6.3.
I’d love to hear what you have to say on this – whether it’s useful to you, if you have any suggestions/corrections etc.
Update, 2013-03-28
Mike Spertus and Daveed Vandevoorde have recenly submitted a standards proposal that would enable deduction of class template parameters in some constructor calls: n3602. The approach is fundamentally different from the make-based approach, which is good as that has some significant issues (not least that it only works where the template type parameter list matches the constructor parameter list).
In their approach for each of the class template’s constructors the compiler will add a function template with the same parameter list to an overload set. The template parameters can then be deduced by performing overload resolution on that set via a synthesised (i.e. pretend) call.
Extending the empty base class optimisation (EBO) to members
The empty base class optimisation or EBO is a space-saving technique that is employed by many popular C++ compilers. Simply put by the standard (§1.8/5 in all versions):
…a most derived object shall have a non-zero size and shall occupy one or more bytes of storage. Base class subobjects may have zero size.
This technique is employed in particular by some standard library implementation where a container template class’ constructor takes a (templated) allocator instance. Even if an allocator object is empty, to accept and store it as a distinct member sub-object requires that it occupy real space within the container class instance; this will likely be 1 byte (the minimum non-zero size) plus whatever padding is required for your platform’s alignment (likely a further 3 or 7 bytes for 32-bit or 64-bit respectively). By instead making the allocator type a base class of the container, this space is no longer required.
I recently found an alternative approach described by Nathan Myers in 1997, in which he wraps some other member object in a special-purpose struct, and then makes the empty type a base of that, thus still employing EBO. It’s a nifty little technique that, although rarely likely to be useful, will be of immense help should I ever have an issue with memory footprint of such a type.
All the detail can be read on Nathan’s website here: The “Empty Member” C++ Optimization.
Efficient argument passing in c++11 [via A Sense of Design]
Boris Kolpackov has written a series of 3 articles on the minefield of efficient argument passing in C++11.
In part 1 he discusses the addition to C++11 – rvalue references – that improves argument-passing efficiency in some cases, but that it adds complexity for the general case. There are some interesting points here, particularly that there is no 1-size-fits-all solution for the argument signature (const lvalue reference, rvalue reference, value).
I think for a while I’ve been vaguely aware that something didn’t feel quite right, since I’d been tailoring a few method signatures with the knowledge of whether or not the caller would be moving an object in, and whether I want to copy or just reference the argument. Clearly the loss of opacity, and fragility in the face of implementation change are undesirable in our code.
In part 2 Boris outlines a wrapper similar to std::reference_wrapper that can be constructed with either an rvalue reference or const lvalue reference. It’s rather a good wrapper, but impractical for use everywhere; more suitable for generic library code that needs to be as tight as possible.
Part 3 summarizes and appraises the argument reference wrapper approach, and makes some general conclusions about argument passing generally in C++11.
It’s a shame that this area is as un-clear cut as it seemed; whether a language change to mitigate this occurs, or is even possible, will remain to be seen. For now, following Boris’ suggestion to think about the conceptual requirements of each passed argument, and avoid premature optimisation, seems to be the best approach.
Update 2012/10/17
Sumant Tambe of C++ Truths recently gave a talk entitled “C++11 idioms” at Silicon Valley Code Camp, in which he discussed the above approach to efficient argument passing, plus pros/cons and alternative approaches.
Scott Meyers has also posted a response to Sumant’s presentation, building on it and adding his own wisdom, in particular:
The fundamental problem is that perfect forwarding and overloading make very bad bedfellows, because perfect forwarding functions want to take everything. They’re the greediest functions in C++.
There’s an interesting discussion with plenty of good stuff in the comments under that post too.
Why processing a sorted array might be faster than unsorted
I’ve just read this week’s stackoverflow newsletter, and was intrigued by a recent top question “Why is processing a sorted array faster than an unsorted array?”
Anyone writing performance-critical processing code should be aware of this; the C++ comparison showed that processing the sorted array was nearly 6x faster than with the unsorted, which is certainly an eyebrow-raising result. A straight Java equivalent showed a similar, albeit less extreme, difference.
The reason for this potentially surprising behaviour is that in the code with the unsorted array, the CPU’s branch predictor fails 50% of the time and stalls the pipeline. I’ll hand you over to GManNickG to show you the code and Mysticial to explain, rather well, what is going on: Why is processing a sorted array faster than an unsorted array?
This is something I was aware of but have never taken into account, so I’ll be trying to remember Mysticial’s general rule of thumb to “avoid data-dependent branching in critical loops“.
![Sutter's Mill [from Alena & Jim's 'The C++ Lands']](http://theotherbranch.files.wordpress.com/2012/12/cppmap-2012-mill1.png?w=595)





