Eric Niebler: out parameters, move semantics, and stateful algorithms
This post from Eric, inspired by a prompt from Andrei Alexcandrescu, examines the interface of std::getline appraising its usability and performance. Eric looks at a few different designs, in particular those that leverage the move semantics introduced in C++11.
This is a particularly good exploration of a small and simple API – almost too simple to bother with – and Eric’s range-based solution is far more natural and still efficient. The crux of his design approach stems from the insightful realisation that “
getline is a curious example because what looks at first blush like a pure out parameter is, in fact, an in/out parameter; on the way in,
getline uses the passed-in buffer’s capacity to make it more efficient. This puts
getline into a large class of algorithms that work better when they have a chance to cache or precompute something.“
The discussion in the comments below the post also merits reading, particularly Bartosz Milewski’s suggestion that
front be thought of as
take 1 (
take n xs is a function in Haskell that returns a list of the first
n elements of the list
I’ve just read this week’s stackoverflow newsletter, and was intrigued by a recent top question “Why is processing a sorted array faster than an unsorted array?”
Anyone writing performance-critical processing code should be aware of this; the C++ comparison showed that processing the sorted array was nearly 6x faster than with the unsorted, which is certainly an eyebrow-raising result. A straight Java equivalent showed a similar, albeit less extreme, difference.
The reason for this potentially surprising behaviour is that in the code with the unsorted array, the CPU’s branch predictor fails 50% of the time and stalls the pipeline. I’ll hand you over to GManNickG to show you the code and Mysticial to explain, rather well, what is going on: Why is processing a sorted array faster than an unsorted array?
This is something I was aware of but have never taken into account, so I’ll be trying to remember Mysticial’s general rule of thumb to “avoid data-dependent branching in critical loops“.