>
>
>
C++ programmer's guide to undefined beh…

Andrey Karpov
Articles: 666

Dmitry Sviridkin
Articles: 7

C++ programmer's guide to undefined behavior: part 7 of 11

Your attention is invited to the seventh part of an e-book on undefined behavior. This is not a textbook, as it's intended for those who are already familiar with C++ programming. It's a kind of C++ programmer's guide to undefined behavior and to its most secret and exotic corners. The book was written by Dmitry Sviridkin and edited by Andrey Karpov.

Broken syntax and standard library: null-terminated strings

In the early '70s, Ken Thompson, Dennis Ritchie, and Brian Kernighan worked on the first versions of C and Unix. They made a decision that resonates with pain, suffering, bugs, and inefficiency today, 50 years later. They decided that developers were to write strings—variable-length data—in a sequence that terminated with a null character. Assembly has it, and C should have it too, if people call it "high-level assembly"! After all, the poor old PDP has limited memory: it's better to have one extra byte per string than 2, 4, or even all 8 bytes (depending on the platform) to store the size... Nah, it's better to have a byte at the end! But even other languages store a size, a reference, or a pointer to the data...

Let's look at where that got us.

String length

If we want to know how long the null-terminated string is, we have to traverse it and count the characters, so the complexity is linear.

const char* str = ...;
for (size_t i = 0 ; i < strlen(str); ++i) {
  ....
}

Even this seemingly simple loop requires not a linear number of operations, but a quadratic one. It's a textbook example. It's even known that a smart compiler can move the string length computation out of the loop.

const char* str = ...;
const size_t len = strlen(str);
for (size_t i = 0 ; i < len; ++i) {
  ....
}

However, the example might be more challenging. One popular game centered on money, mafia, and car theft offers a curious example of using sscanf to parse a large array of numbers from the JSON string.

Someone reverse-engineered the final binary file and got that:

const char* config = ....;
size_t N = ....;

for (size_t i = 0; i < N; ++i) {
  int value = 0;
  size_t parsed = sscanf(config, "%d", &value);
  if parsed > 0 {
    config += parsed;
  }
}

What a marvelous loop! The body is executed N times, but most versions of the standard C library need strlen(config) operations per iteration each time. After all, sscanf has to compute the string length so that it doesn't accidentally overrun! But the string is null-terminated.

String length computation is a ubiquitous operation that should be optimized first—it's more efficient to evaluate it once and then store it with the string... Then why is the null terminator needed? That's just a superfluous byte in memory!

C++ and std::string

C++ is a high-level language! More high-level than C, definitely. Thanks to the bug in C, standard strings are now stored as a size plus a pointer to the data. Yay!

But it's not exactly a "yay". After all, many C libraries are still here. Most of them have null-terminated strings in their interfaces. Therefore, std::string is also inevitably null-terminated. Congratulations! We're storing an extra byte just for compatibility reasons. We also store it implicitly: std::string::capacity() is always one less than the allocated memory block.

C++ and std::string_view

"Use std::string_view in your APIs! Save developers from writing overloads for const char* and const std::string&! Avoid redundant copying!"

Yeah, sure, std::string_view is also a pointer plus a string length. Unlike std::string, it doesn't necessarily point to a null-terminated string. Great news! We can use std::vector and not store an extra byte!

But beware, if behind the curtain of our handy API using std::string_view, there's a call to some C library that needs the null-terminated string...

// The tiny little program cheerfully and merrily outputs

// Hello
// Hello World

// Whether you like it or not.

void print_me(std::string_view s) {
  printf("%s\n", s.data());
}

int main() {
  std::string_view hello = "Hello World";
  std::string_view sub = hello.substr(0, 5);
  std::cout << sub << "\n";
  print_me(sub);
}

Let's tweak the arguments:

// Now the little program will cheerfully and merrily output

// Hello
// Hello Worldnext (or it just crashes with a segfault)

// Whether you like it or not.

void print_me(std::string_view s) {
    printf("%s\n", s.data());
}

int main() {
  char next[] = {'n','e','x','t'};
  char hello[] = {'H','e','l','l','o', ' ',
                  'W','o','r','l','d'};
  std::string_view sub(hello, 5);
  std::cout << sub << "\n";
  print_me(sub);
}

The function hasn't changed. However, when we pass in other parameters, everything crashes! It's just a print. With another function, something completely out of whack can happen if it goes outside the range defined by std::string_view.

What shall we do?

We need to guarantee the null-termination. To do that, we have to copy the string... But we deliberately used std::string_view in the API to avoid copying!

Alas. If we encounter old C-based APIs, we face a choice when wrapping them: either write two implementations—one with raw char* and other with const std::string&—or accept copying the string at some point in the process.

How to tackle it?

The answer is: we can't.

Null-terminated strings inherit inefficiencies and error-prone behavior that we may never fully get rid of. All we can do is just not breed evil: in new C libraries, we can design APIs that use the pointer plus length, not just a pointer to the null-terminated sequence.

Programs across all languages that are forced to interact with the C-based API suffer from this legacy. For instance, Rust uses distinct types, CStr and CString, for such strings, resulting in cumbersome conversions whenever transitioning from regular code.

Developers use the null terminator not only for text strings. For example, the SRILM library relies heavily on the null-terminated sequences of numeric IDs, leading to additional issues. The exec functions in Linux accept null-terminated pointer sequences. EGL initializes the attribute lists that end with a null. And so on.

We shouldn't create awkward and error-prone APIs without a compelling reason. Saving space on a pointer-sized parameter in a function rarely justifies the potential pitfalls.

Broken syntax and standard library: std::shared_ptr constructors

More than 10 years have passed since C++11 was introduced, so most C++ developers are already familiar with smart pointers. Just give them the raw pointer and sleep well—memory will be released. And it's okay.

They even know the difference between std::unique_ptr and std::shared_ptr. A couple of years ago, I interviewed a candidate who didn't know the distinction because he hadn't used STL...

The non-copyable, unique-owning std::unique_ptr just stores the pointer—possibly the deleter function too—and applies the deleter to the stored pointer in its destructor. By the way, std::shared_ptr is trickier and needs a reference counter to maintain shared ownership between copies. Everyone knows that.

Let's just use them and not overthink it.

It's surprising, but in real C++ cases, it's quite common to define an entity that should never be created on the stack. It must reside in the heap.

The simplest example is a thread-safe object protected internally by the mutex/atomic variables. Ideally, we'd like to freely move this object between containers and the stack. However, std::mutex and std::atomic have no move constructors. We have two options in this case:

class MyComponent1 {
  ComponentData data_;
  // Make a non-movable data member movable by adding
  // an indirection layer and sending data to the heap.
  std::unique_ptr<std::mutex> data_mutex_;
};

// Somehow force class users to create
// objects only on the heap and work with
// std::unique_ptr<MyComponent2> or
// std::shared_ptr<MyComponent2>
class MyComponent2 {
  ComponentData data_;
  std::mutex data_mutex_;
};

Developers usually prefer the second option, since there are usually more mutex accesses within MyComponent2 than object address loads. So, we'll explore that option further.

Since we've been talking about thread-safe objects, it makes sense to keep managing the object lifetime via std::shared_ptr.

To restrict where objects can be created, it's usually enough to make constructors private and use a special factory function for object creation.

class MyComponent {
public:
  static auto make(Arg1 arg1, Arg2 arg2) ->
    std::shared_ptr<MyComponent>
  {
    // ???
  }

  // Ban copy and move constructors
  // to avoid accidentally copying and moving
  // the object data into a stack instance.
  MyComponent(const MyComponent&) = delete;
  MyComponent(MyComponent&&) = delete;
  // Ban those bros too, that's optional.
  MyComponent& operator = (const MyComponent&) = delete;
  MyComponent& operator = (MyComponent&&) = delete;

private:
  MyComponent(Arg1, Arg2) { ... };
  ....
};

Let's go inside the make factory member function. This is where we often discover that experienced C++ developers may not actually be that experienced. But it doesn't bother them in any way. It rarely bothers anyone at all.

We may write the function like this:

auto MyComponent::make(Arg1 arg1, Arg2 arg2) ->
  std::shared_ptr<MyComponent>
{
  return std::make_shared<MyComponent>(std::move(arg1),
                                       std::move(arg2));
}

We've immediately faced with fifty lines of errors because std::make_shared can't call our private constructor!

No problem! Our C++ developer quickly fixes the error.

auto MyComponent::make(Arg1 arg1, Arg2 arg2) ->
  std::shared_ptr<MyComponent>
{
  return
    std::shared_ptr<MyComponent>(
      new MyComponent(std::move(arg1), std::move(arg2)));
}

The code compiles and works. Can we go?

Really, it works. But... These two approaches handle memory in different ways! In many cases, this is not a big deal. If many objects are created similarly, the difference becomes noticeable.

As we can see, std::shared_ptr counts both live weak (weak_ptr) and strong references. We just need to allocate a small memory block for a couple of (atomic) size_t and maybe something else. This block is called a control block.

When we use std::make_shared, the control block is allocated next to the object being created. So, a single chunk of memory is allocated for at least sizeof(MyComponent) + 2 * sizeof(size_t). The standard suggests this behavior, but it's not mandatory. That said, all the known implementations follow the recommendation.

When we call the std::shared_ptr constructor from a raw pointer, the object is already created, and we can't cram the control block next to it. Therefore, an additional 2 * sizeof(size_t) of memory will be allocated but somewhere else. This is where the allocator implementations and alignment dances kick the door in. In reality, we end up allocating more than just sizeof(MyComponent) + 2 * sizeof(size_t), and if it's a direct constructor call from the raw pointer, we allocate much more. When the control block is placed near the data, the data locality can become relevant, and you may see some cache benefits—especially if the object is small.

What if it's large?

Let's say it is: we create the object via std::make_shared and then spawn several std::weak_ptr. As a result, we can bump into some sort of memory leak. Although objects are properly destroyed, and destructors are called. We've seen that in the log!

Once again, the control block. If we have the live std::weak_ptr instances bound to the already destroyed std::shared_ptr, the control block stays alive. Well, so we can call std::weak_ptr::expired(), and it'd return us true. But if the control block is in the same memory chunk as the destroyed object—which happens when we create via std::make_shared—the object memory chunk won't be returned to the operating system until the control block dies! Here they are—our memory leaks.

There is also a difference in how we call the operator newstd::make_shared always calls a global one. If we overload new for the type, the behavior may not be as we expect.

It's bad as usual

So, what shall we do if we need to allocate our object and potentially save resources? Is there a solution?

Indeed! In C++, we can always find some sinister solution — sometimes even without undefined behavior. That's our case, though.

We have the access token technique that will help us.

The idea is to provide the public constructor for std::make_shared, but so that it can be called with only an instance of a private type (access token).

class MyComponent {
private:
  // access token
  struct private_ctor_token {
    // only 'MyComponent' can create them
    friend class MyComponent;
    private:
      private_ctor_token() = default;
  };
public:
  static auto
  make(Arg1 arg1, Arg2 arg2) -> std::shared_ptr<MyComponent> {
    return
      std::make_shared<MyComponent>(
        private_ctor_token{}, std:: move(arg1), std::move(arg2));
  }


  // This constructor is private,
  // even though it's in the public block. Can only be called using 
  // the private token
  MyComponent(private_ctor_token, Arg1, Arg2) { .... };
  ....
};

It works.

It's worth noting that the token constructor is to be explicitly private. Otherwise, we can easily work around our private-type security system like this:

int main() {
  MyComponent c({}, // Create the private token without naming it!
                    // We don't have access only to the name.
                {}, {});
}

Useful links

Broken syntax and standard library: std::aligned_storage

There are two types of C++ developers:

1. Those who write like this:

char buff[sizeof(T)];
....
T* obj = new (buff) T(...);

It works, no problems here.

2. Those who write that the same code, and end up with a SIGSEGV, SIGBUS, or something even more interesting.

The first type often builds their programs only for x86 architecture. They may use a prehistoric compiler without the SIMD instructions and with the "aggressive" optimizations turned-off—just to make sure nothing breaks.

Let's not forget about the C developers. For them, the same code would look like this:

char buff[sizeof(T)];
....
T* obj = (T*)buff;

That's even scarier because of uninitialized memory, but that's an issue for another time.

The buffer we're using here may not be aligned as required by the T type.

Many folks learn and recall pretty quickly that the size of data written to the buffer shouldn't be larger than the buffer size—at least out of common sense. However, the alignment is tricky.

To help a soulless machine successfully read/write the T-typed data at the address corresponding to the T* ptr value, or even perform a tricky operation on them, the address must be aligned. It should be a multiple of N number (usually a power of two). Let's thank the engineers and developers of the machine microarchitecture for the alignment that they impose upon us. Why? Because:

  • It's supposed to be more efficient.
  • There was no other choice;
  • They needed to save resources on instruction length (the larger the alignment, the more least significant bits of the address can be omitted);
  • And, if we've ever designed a set of instructions and know more nuances, we can come up with any other reason for it.

Back to C++. We usually know the alignment for built-in types:

Type

alignment

char

1

int16

2

int32

4

int64

8

__m128i

16

Indeed, we know... For example, the double type is quite standard and occupies 8 bytes on both 32-bit 64-bit Linux systems. However, unexpectedly, the alignment can be different. On a 32-bit system, double is aligned to the 4-byte boundary. On a 64-bit system, it's aligned to the 8-byte boundary. That's life.

For other types, especially those intended for SSE/SIMD instructions, the alignment requirement may be larger.

For custom structures and classes, the largest alignment of all data members is applied. Then, the implicit padding bytes appear between the data members to accommodate the alignment requirements of each individual data member.

struct S {
  char c;    // 1
  // implicitly char _padding[3] 
  int32_t i; // 4
};

static_assert(alignof(S) == 4);

The array alignment follows the alignment of its elements.

Hence:

char buff[sizeof(T)]; // alignment == 1
...
T* obj = new (buff) T(...); // (uintptr_t)(obj) shall be
/// multiple of 'alignof(T)', but the code only guarantees
// that it's a multiple of 1

For built-in types on x86, accessing by unaligned pointers often results in a slowdown. However, on other platforms, this may lead to a segfault. Tools like the PVS-Studio analyzer can help identify such issues with warnings like V1032: "Pointer is cast to a more strictly aligned pointer type."

When dealing with the SSE types and on x86, we can easily run into a segfault, and do it quite elegantly:

#include <memory>
#include <xmmintrin.h>

const size_t head = 0;
struct StaticStorage {
  char buffer[256];
} storage;

int main() {
  __m128i* a = new (storage.buffer) __m128i();
  // comment line above & uncomment any line below for segfault
  // __m128i* b = new (storage.buffer + 0) __m128i();
  // __m128i* c = new (storage.buffer + head) __m128i();
}

What shall we do?! How can we write the code without encountering this curious undefined behavior? Don't worry! C++11 is on its way to the rescue!

The standard gives the alignas specifier. We can explicitly set the alignment requirements when defining variables and structures.

#include <memory>
#include <xmmintrin.h>

const size_t head = 0;
struct StaticStorage {
  alignas(__m128i) char buffer[256];
} storage;

int main() {
  __m128i* a = new (storage.buffer) __m128i();
  __m128i* b = new (storage.buffer + 0) __m128i();
  __m128i* c = new (storage.buffer + head) __m128i();
}

And then, the program doesn't crash.

But the code still feels a bit bulky. Isn't there any convenient function in the C++ standard library for such an important task as creating a buffer of suitable size and alignment?

Of course, there is!

Welcome std::aligned_storage and its big bro, std::aligned_union:

template<std::size_t Len,
         std::size_t Align =
           /* default alignment not implemented */
        >
struct aligned_storage
{
  struct type
  {
    alignas(Align) unsigned char data[Len];
  };
};

template <std::size_t Len, class... Types>
struct aligned_union
{
  static constexpr std::size_t alignment_value =
                                 std::max({alignof(Types)...});
 
  struct type
  {
    alignas(alignment_value)
      char _s[std::max({Len, sizeof(Types)...})];
  };
};

It's almost the same as the example above! The first one is quite low-level, it needs the alignment value to be set manually. The second one is smarter, as it selects the proper value from the list of types. It'll also adjust the buffer size if we set the wrong one. What a handy metafunction!

Let's use it:

#include <memory>
#include <xmmintrin.h>
#include <type_traits>

const size_t head = 0;
std::aligned_union<256, __m128i> storage;

int main() {
  __m128i* a = new (&storage) __m128i();
  __m128i* b = new ((char*)(&storage) + 0) __m128i();
  __m128i* c = new ((char*)&storage + head) __m128i();
}

And it immediately crashes: SIGSEGV.

But how?! We've done everything right, right?...

Let's check it.

static_assert(sizeof(storage) >= 256);
<source>:9:1: error:
static assertion failed due to requirement
'sizeof (storage) >= 256'
static_assert(sizeof(storage) >= 256);
^             ~~~~~~~~~~~~~~~~~~~~~~
<source>:9:31: note: expression evaluates to '1 >= 256'
static_assert(sizeof(storage) >= 256);

Wonderful. If we look closely at the examples of the std::aligned_storage template definitions above, we'll notice a marvelous prank and error-prone nature of these templates.

We need to use typename std::aligned_union<256, __m128i>::type storage!

Or, in C++17, std::aligned_union_t<256, __m128i> storage.

Everything works now. It's only a two-character difference, yet the consequences are significant.

At the time of writing, GCC 14.1 can issue warnings out of the box:

<source>:12:23: warning:
placement new constructing an object of type '__m128i' and
size '16' in a region of type
'std::aligned_union<256, __vector(2) long long int>' and
size '1' [-Wplacement-new=]
   12 |     __m128i* a = new (&storage) __m128i();

These warnings suggest an error here.

The Clang 18.1 compiler doesn't report this by default.

Developers consider std::aligned_* as dangerous to use. Due to its poor design, errors can easily go unnoticed.

C++23 marked them as deprecated. But who knows when we'll see C++23 in larger and more legacy codebases....

If you use std::aligned_* in our code, make sure you use it correctly. Better yet, replace it with your own structure that explicitly uses alignas.

Useful links

Broken syntax and standard library: imexplicit type conversion

As a software engineer ran eyes over the metrics graphs displaying the number of pending requests, they noticed that something's wrong: no pulse, just a horizontal straight line. Looking at the Y axis, the engineer saw that the graph had frozen at 18446744073709552000. Such a staggering scale of traffic would make any large company envious. But, indeed, it was not a traffic. It's the metric error, actually two of them. To delight all the C++ fans, it turned out this code was written in Rust.

The first error stemmed from the sloppy-written code for counting requests in the queue. Instead of assigning this task to a constructor/destructor to ensure exactly one increment at the beginning and one decrement at the end, a programmer opted for an old-fashioned way, performing decrements in different branches of different nested conditional statements. As a result, subtraction could occasionally occur twice, leading to an overflow in the unsigned counter. A more detailed analysis revealed more serious problems, but they aren't really relevant to the discussion.

An overflow has occurred. Okay. The counter seems to be 64-bit given the large number. But wait. The value of unsigned -1 in uint64 is 18446744073709551615. The number the engineer observed is a little higher...

The code that generates the metrics is as follows:

metrics.set(Metric::InflightRequests, counter as _);

Obviously, type casting is involved here. The second argument of the set function expects the f64 type.

A curious reader shall refer to the IEEE 754 standard to unravel the mystery of these numbers. For those less inclined to investigate, let me just state that f64(u64(-1)) == f64(u64(-1024)).

counter as _

Here's an explicit conversion to something obscure, recognizable in context yet unclear upon first reading. It's a useful but dubious Rust feature.

Now let's return to C++. In C++, trivial type conversions not only occur implicitly but can also lead to undefined behavior. Therefore, we need to treat the issue very seriously.

As library authors, we want to make it as robust, foolproof, and as user-friendly as possible, allowing the compiler to guide the user toward the usage.

Considering the Rust code fiasco, we immediately decide to use strong typedefs (however, in the case of the Rust code case they'd have helped too).

// Here, we write a lengthy and detailed comment,
// explaining that values are of type 'double (f64)' 
// because it's mandatory.
// The user should know about the associated constraints.
// And all that other relevant details...
struct MetricSample {
  // To avoid implicit conversions, we immediately
  // added 'explicit', as all the best practices recommend.
  explicit MetricSample(double val): value {val} {}
private:
  double value;
};

class Metrics {
public:
  // Great, now the user has no other choice but to explicitly
  // convert to 'MetricSample',
  // and at that point, they can check the documentation...
  void set(std::string_view metric_name, MetricSample val);
};

// Writing an UX test.
int main() {
  uint64_t value = -1;
  Metrics m;
  m.set("InflightRequests", value);
  m.set("InflightRequests" MetricSample{value});
}

And it doesn't compile as we want it to.

<source>:23:31: error:
no viable conversion from 'uint64_t' (aka 'unsigned long') to
'MetricSample'
   23 |     m.set("InflightRequests", value);
      |                               ^~~~~
<source>:4:8: note: candidate constructor (the implicit copy constructor)
not viable: no known conversion from 'uint64_t' (aka 'unsigned long')
to 'const MetricSample &' for 1st argument
    4 | struct MetricSample{
      |        ^~~~~~~~~~~~
<source>:4:8: note: candidate constructor (the implicit move constructor)
not viable: no known conversion from 'uint64_t' (aka 'unsigned long')
to 'MetricSample &&' for 1st argument
    4 | struct MetricSample{
      |        ^~~~~~~~~~~~
<source>:7:14: note: explicit constructor is not a candidate
    7 |     explicit MetricSample(double val): value {val} {}
      |              ^
<source>:16:57: note: passing argument to parameter 'val' here
   16 |     void set(std::string_view metric_name, MetricSample val);
      |                                                         ^
<source>:24:30: error: expected ')'
   24 |     m.set("InflightRequests" MetricSample{value});
      |                              ^
<source>:24:10: note: to match this '('
   24 |     m.set("InflightRequests" MetricSample{value});
      |          ^

That's great. Case solved. Let's release it.

A week later, an experienced user comes back us, saying they've managed to shoot themselves in the foot with our library.

Your protection against implicit type conversion didn't account for the experienced fool:

int main() {
  uint64_t value = -1;
  Metrics m;
  m.set("InflightRequests", MetricSample(value));
}

And it compiles.

We've been enjoying all the benefits of modern and safe C++—thanks to the C++ Core Guidelines—for a quite some time now. Suddenly, we remember about that damned difference between parentheses and curly braces when calling constructions. We clutch our heads, and start pondering how to save our users.

We have a solution! Thank you, C++20:

#include <concepts>

struct MetricSample {
  // Now only 'double' can be passed. 
  // No implicit conversions since it's a template.
  explicit MetricSample(std::same_as<double> auto val) :
    value {val} {}
private:
  double value;
};

int main() {
  uint64_t value = -1;
  Metrics m;
  m.set("InflightRequests", MetricSample(value));
  m.set("InflightRequests", MetricSample{value});
  m.set("InflightRequests", value);
}

Nothing compiles now.

Is that it? No, wait a minute. It's C++, and not everyone has C++20! Here's the version for C++14 and C++17. We can even make one for C++11 (consider it a homework assignment):

#include <type_traits>

struct MetricSample{
  // Now only 'double' can be passed. 
  // No implicit conversions since it's a template.
  template <typename Double, 
            typename = std::enable_if_t<std::is_same_v<Double, double>>
            >
  explicit MetricSample(Double val): value {val} {}
private:
  double value;
};

I hope this will convince people to move to C++20 and higher versions.

Time passes. Our library is on a roll. Sooner or later, a user comes to us and says, "I'd like to add a comment to the metric value."

No problem. We decide to add the set function overload with the third parameter—a string.

class Metrics {
public:
  // To make it clear for the user why
  // to allocate memory for string and avoid redundant implicit
  // copies, we use a rvalue reference.
  // After all, this is a great way to show that 
  // the interface intends to take ownership of the string.
  // A user should explicitly perform a move operation.
  void set(std::string_view metric_name, MetricSample val,
           std::string&& comment);
}

int main() {
  Metrics m;
  auto val = MetricSample(1.0);
  std::string comment = "comment";
  m.set("MName", val, comment); // not compiles as intended
  m.set("MName", val, "comment"); // doubtful, but Ok for convenience
  m.set("MName", val, std::move(comment));
  m.set("MName", val,
        std::string_view("comment")); // not compiles, ok
  auto gen_comment = []()->std::string { return "comment"; };
  m.set("MName", val, gen_comment()); // nice
}

Everything's fine. Now release it. A few days after it, a user comes to us and says they've shot themselves in the foot with our library again. And then shows THIS:

int main() {
  Metrics m;
  auto val = MetricSample(1.0);
  m.set("Metric", val, 0);
}

Output:
terminate called after throwing an instance
of 'std::logic_error' what():
basic_string: construction from null is not valid
Program terminated with signal: SIGSEGV

At this point, we find ourselves cursing the std::string class, the implicit interpretation of 0 as a pointer, and a user who still hasn't bothered to read the documentation or even the code they wrote. We fight the impulse to write our string class and start thinking about how to mitigate this issue as well.

Different options arise here. We can allow only the rvalue string:

class Metrics {
public:
  // only the 'rvalue' references for string. No implicit type conversion
  void set(std::string_view metric_name, 
           MetricSample val, 
           std::same_as<std::string> auto&& comment) {};
};


int main() {
  Metrics m;
  auto val = MetricSample(1.0);
  std::string comm = "comment";
  m.set("Metric", val, comm); // not compiles
  m.set("Metric", val, 0); // not compiles
  m.set("Metric", val, std::move(comm)); // compiles as intended.
  m.set("MName", val,
        std::string_view("comment")); // not compiles, okay
  auto gen_comment = []() -> std::string { return "comment"; };
  m.set("MName", val, gen_comment()); // nice
}

However, we already live in a doomed world: we allow the use of string literals directly. If we disable them, users will be upset because their code will fail to compile. So, we'll have to add them as well. To prevent users from inserting null pointers and to ensure that they only use string literals, there's a great solution: array references! Indeed, string literals are arrays...

class Metrics {
public:
  // The interface only allows string literals and explicit ownership 
  // transfer of the string.
  void set(std::string_view metric_name, MetricSample val,
           std::same_as<std::string> auto&& comment) {};
  template <size_t N>
  void set(std::string_view metric_name, MetricSample val,
           const char(&comment)[N]) requires (N > 0) {
    this->set(metric_name, val, std::string(comment, N - 1));
  }
};


int main() {
  Metrics m;
  auto val = MetricSample(1.0);
  std::string comm = "comment";
  const char* null_comment = 0;
  m.set("Metric", val, "comment"); // "ok"
  m.set("Metric", val, null_comment); // not compiles
  m.set("Metric", val, comm); // not compiles
  m.set("Metric", val, 0); // not compiles
  m.set("Metric", val, std::move(comm)); // compiles as intended.
  m.set("MName", val,
        std::string_view("comment")); // not compiles, okay
  auto gen_comment = []()->std::string { return "comment"; };
  m.set("MName", val, gen_comment()); // works fine
}

Everything's great. Yay! Let's release it!

At some point, a particularly cunning user will construct the crooked array and pass it instead of a string literal... But at that point, the only option is to ignore such a thing as the futile attempts to explain such a limitation in C++—while keeping the literal acceptance unchanged in the user's code—could drive anyone mad.

One last thing: C++23 has a new use for the auto keyword.

void call_it(auto&& obj) {
  call_impl(auto(obj));
}

I've seen developers who work a lot with both Rust and C++. They interpret this as converting obj to a specific type defined as an argument for call_impl. Just like as _ or a call to Into::into() in Rust. That would seem logical...

However, it's a completely different feature. The C++ compilers don't perform type inference as easily. In this position, we need auto to create copies without having the type name at hand.

Useful links

Broken syntax and standard library: std::ranges::views (kind of lazy)

It's 2024. C++20 has (almost) been ready for serious production development for about four years now. At least, someone has told me that the compilers have finally been updated, and now we can...

C++20 introduces four major features. Two of them are immediately usable in our code, while the other two are not quite there yet. Here, we'll talk about the first two.

std::ranges

It's a total game-changer for working with sequences in C++! The last time something this happened was the introduction of C++11 range-based-for in C++11. And here we go again.

Let's forget about the begin/end iterators. Forget about the struggle of throwing away all odd numbers or squaring the even ones, as we can do in other high-level languages:

let v : Vec<_> = ints.iter()
                     .filter(|x| x % 2 == 0)
                     .map(|x| x * x)
                     .collect();

List<int> v = Stream.of(ints)
      .filter(x -> x % 2 == 0)
      .map(x -> x * x)
      .collect(Collectors.toList());

var v = ints.Where(x => x % 2 == 0)
            .Select(x => x * 2)
            .ToList();

// Before C++20
std::vector<int> v;
std::copy_if(ints.begin(), ints.end(),
             std::back_inserter(v), [](int x)
             { return x % 2 == 0;});
std::transform(v.begin(), v.end(), v.begin(),
               [](int x){return x * x;});

// After C++20
std::vector<int> v;
std::ranges::copy(
    ints | std::views::filter([](int x){ return x % 2 == 0;})
         | std::views::transform([](int x) { return x * x;})
    std::back_inserter(v)
);

// After C++23
auto v = 
    ints | std::views::filter([](int x){ return x % 2 == 0;})
         | std::views::transform([](int x) { return x * x;})
         | std::ranges::to<std::vector>();

It's beautiful! It does take a long time to compile and doesn't always optimize well, but that's not a big deal...

Concepts

Developers need concepts as a standalone feature because modern C++ uses SFINAE extensively—especially those developers who write libraries—but this technique is hard to read and write, and the compilation errors are monstrous... Developers thought concepts as named constraints were to improve the situation.

So now we can write a perfectlycorrect and probably clear generic function for adding integers. But only integers. We clearly see this from the function signature without any the specific kind of magic, the specific quirks of enable_if.

std::integral auto sum(std::integral auto a
                       std::integral auto b) {
  return a + b;
}

Concepts are desperately needed for ranges. The entire ranges library is an incredible bunch of templates with constraints. It would be incredibly difficult to read function signatures without the syntactic sugar provided by concepts. We could spend eternity trying to fix the compilation errors.

Now we're firing on all cylinders!

A developer mentions in the general C++ chat that they're working on adding tests for a new feature in the library, which is brilliantly implemented using C++20 and ranges. But something strange is happening: the tests are failing, and Valgrind outputs something completely incomprehensible...

Here's the feature code:

struct MetricsRecord {
   std::string tag;
   // ....
};

struct Metrics {
  std::vector<MetricsRecord> records;

  std::ranges::range auto by_tag(const std::string& tag) const;
  // ....
};

// .... lots and lots of code

std::ranges::range auto
Metrics::by_tag(const std::string& tag) const {
  return records |
         std::ranges::views::filter([&](auto&& r)
           { return r.tag == tag; });
}

It's no big deal, I think it's fine. See no problem.

But let's examine the tests:

int main() {
  auto m = Metrics {
    {
      {"hello"}, {"world"}, {"loooooooooooooooongtag"}
    }
  };

  {
    // outputs 'found'
    auto found = m.by_tag("hello");
    for (const auto& f: found) {
      std::cout << std::format("found {}\n", f.tag);
    }
  }

  {
    // doesn't output... strange enough
    auto found = m.by_tag("loooooooooooooooongtag");
    for (const auto& f: found) {
      std::cout << std::format("found {}\n", f.tag);
    }
  }

  {
    // but it works here
    std::string tag = "loooooooooooooooongtag";
    auto found = m.by_tag(tag);
    for (const auto& f: found) {
      std::cout << std::format("found {}\n", f.tag);
    }
  }
}

As any sophisticated reader has already realized, the issue lies in the temporary variable:

// Here, devs implicitly create the temporary
// 'std::string' variable in the argument!
auto found = m.by_tag("loooooooooooooooongtag");

And in the fact that the filter predicate captures the variable by reference, too:

std::ranges::range auto
Metrics::by_tag(const std::string& tag) const {
  return records |
         std::ranges::views::filter([&](auto&& r)
           { return r.tag == tag; });
}

And also because the developer has been writing in JavaScript for a long time, where Array.prototype.filter immediately creates a new array:

const words = ['spray', 'elite', 'exuberant',
               'destruction', 'present'];

const result = words.filter((word) => {
   console.log(word);
   return word.length > 6
  });  
// All items will be output immediately.

It seems the developer doesn't realize that the function is lazy (and that all std::ranges are lazy).

Easy-peasy! Well, we'll just use C++ correctly and won't have any problems! Or will we?

std::ranges::range auto
  Metrics::by_tag(const std::string& tag) const

Can we guess from the function signature whether it's lazy? Hardly because std::ranges::range auto doesn't provide any indication.

This should have just been stated in the comment. But it isn't there. Either developers could have used the std::ranges::view concept here. Ah, as usual, if only everything had been done correctly...

But Valgrind caught the error! Yes, in the library tests. Who knows if the library users have tests on their own...

Let them write tests! Let them use static analysis! Sure, maybe this will help. At the moment of this writing (April 2024), neither Clang-Tidy nor PVS-Studio can detect this error.

All right. Everyone will now read the documentation beforehand and know that std::viewes are lazy. So, we need to be careful when capturing references with them. Let's wrap up this question.

Thank Nicolai Josuttis' talk at Keynote Meeting C++ 2022 for inspiration.

Wait, std::ranges aren't just lazy, they're incredibly lazy! Sometimes they're not only too lazy to iterate over the container, but they're even too lazy to call begin() and end() on the container. This laziness stems from the standard's requirement to ensure average constant execution time for the begin() and end() methods:

Given an expression t such that decltype((t)) is T&, T models range only if (2.1)

[ranges::begin(t), ranges::end(t)) denotes a range ([iterator.requirements.general]), (2.2) both ranges::begin(t) and ranges::end(t) are amortized constant time and non-modifying, and ....

That's why some views:

  • delay the begin/end call on the container when constructing;
  • cache their begin/end after the first evaluation.

And we get some interesting special effects:

void print_range(std::ranges::range auto&& r) {
  for (auto&& x: r) {
    std::cout << x << " ";
  }
  std::cout << "\n";
}

void test_drop_print() {
  std::list<int> ints = {1, 2, 3 ,4, 5};
  auto v = ints | std::views::drop(2); // Skip first two.
  ints.push_front(-5);
  print_range(v); // -5 and 1 are skipped.
                  // 'drop' calls 'begin' and 'end' only now.
}

void test_drop_print_before_after() {
  std::list<int> ints = {1, 2, 3 ,4, 5};
  auto v = ints | std::views::drop(2);
  print_range(v); // 1 and 2 are skipped.
  ints.push_front(-5);
  print_range(v); // 1 and 2 are skipped!
                  // 'drop' doesn't call 'begin' and 'end' again.
}

void test_take_print() {
  std::list<int> ints = {1, 2, 3 ,4, 5};
  auto v = ints | std::views::take(2);
  ints.push_front(-5);
  print_range(v); // -5 and 1 are output.
}

void test_take_print_before_after() {
  std::list<int> ints = {1, 2, 3 ,4, 5};
  auto v = ints | std::views::take(2);
  print_range(v); // 1 and 2 are output
  ints.push_front(-5);
  print_range(v); // -5 and 1 are output.
                  // 'take' calls 'begin' and 'end' every time.
}

drop: 
2 3 4 5 
------
3 4 5 
3 4 5 
take: 
-5 1 
------
1 2 
-5 1

Great, utterly natural, and most importantly, predictable! There is no magic if we know how it works... The main thing is to avoid mistakes when using it in practice.

Just don't modify the container when it has ranges::view. It's really that simple!

By the way, if we make a tiny, little change:

void print_range(std::ranges::range auto r) // now by value
{
  for (auto&& x: r) {
    std::cout << x << " ";
  }
  std::cout << "\n";
}

void test_drop_print_before_after() {
  std::list<int> ints = {1, 2, 3 ,4, 5};
  auto v = ints | std::views::drop(2);
  print_range(v); // 1 and 2 are skipped.
  ints.push_front(-5);
  print_range(v); // -5 and 1 are skipped.
                  // We have copied the view, and
                  the copy calls 'begin()' and 'end()' again.
}

Another consequence of this lazy and sometimes caching behavior is that we can't pass any view to a function that accepts const std::range::range&.

void print_range(const std::ranges::range auto& r) {
  for (auto&& x: r) {
    std::cout << x << " ";
  }
  std::cout << "\n";
}

void test_drop_print() {
  std::list<int> ints = {1, 2, 3 ,4, 5};
  auto v = ints | std::views::drop(2);
  print_range(v); // Doesn't compile!
                  // 'drop' from 'std::list' should be mutable.
  /*
    <source>: In instantiation of
      'void print_range(const auto:42&) [with auto:42 =
      std::ranges::drop_view<std::ranges::ref_view<
        std::__cxx11::list<int> > >]':
    <source>:19:16:   required from here
             19 |     print_range(v);
                |     ~~~~~~~~~~~^~~
    <source>:10:5: error: passing 'const std::ranges::drop_view<
      std::ranges::ref_view<std::__cxx11::list<int> > >' as
      'this' argument discards qualifiers [-fpermissive]
             10 |     for (auto&& x: r) { 
  */
}

void test_drop_print_vector() {
  std::vector<int> ints = {1, 2, 3 ,4, 5};
  auto v = ints | std::views::drop(2);
  print_range(v); // It's ok.
}

So, the same abstract view can never be directly used by reference in multiple threads. We need to ensure const correctness or synchronize access. Developers writing generic code should make an extra effort to correctly set concept constraints.

Here are four key concepts to start with:

  • std::ranges::range is too abstract, only begin and end;
  • std::ranges::view is also range, but only views fulfill it;
  • std::ranges::borrowed_range is also too abstract, but its iterators are safe to return from functions;
  • std::ranges::constant_range (C++23) is also abstract, but iterators provide only read-only access.

Then other concepts will join in.

The last outstanding consequence of lazy caching is the following curiosity:

enum State {
  Stopped,
  Working,
  ....
};

struct Unit {
  State state;
  ....
};

....

std::vector<Unit> units; 
....

// stop all working units
for (auto& unit: units | std::views::filter{[](auto& unit)
  { return unit.state == Working; }}) {
    ....
    unit.state = State::Stopped; // UB!
    // https://eel.is/c++draft/range.filter#iterator-1
    /*
    Modification of the element a filter_view​::​iterator
    denotes is permitted, but results in undefined behavior
    if the resulting value does not satisfy the filter predicate.
    */
}

The standard explicitly disallows modifications the elements detected using std::views::filter in such a way that the result of the predicate changes! This assumes that we might iterate over the same view again. To avoid doing the job twice, we need to cache begin().

The most troubling aspect is that this behavior is mandated by the standard. It's not implementation-defined:

Remarks: In order to provide the amortized constant time complexity required by the range concept when filter_view models forward_range, this function caches the result within the filter_view for use on subsequent calls.

Useful links

Broken syntax and standard library: how to pass a standard function and not break anything

Suppose we need to perform some calculations—or we're just students, and we desperately need to complete an assignment on numerical methods.

Well, we've taken the ready-made integration function:

template <class F>
concept NumericFunction =
  std::is_invocable_v<F, float> && requires (float arg, F f) {
    { f(arg) } -> std::convertible_to<float>;
};

float integrate(NumericFunction auto f) {
  float sum = 0;
  /* We're not going into details of accuracy,
     convergence, splitting steps, and point selection.
     Although it's very important, we'll leave it for another book. */
  for (float x : std::views::iota(1, 26)) {
    sum += f(x);
  }
  return sum;
}

Splendid. We start testing it on a certain standard function:

#include <cmath>
....
int main() {
  return integrate(sqrt); // Is everything okay?
                          // (we consider int casting as OK)
}

It seems so. The program returns 85.

Actually, it's not okay. There are at least two issues here.

1. The C++ standard library includes the standard C library, which makes the use of standard math functions a bit of a hassle:

static_assert(std::abs(5.8) > 5.5);
static_assert(abs(5.8) > 5.5);

//---------------------

<source>:26:24: error: static assertion failed
   26 | static_assert(abs(5.8) > 5.5);
      |               ~~~~~~~~~^~~~~
<source>:26:24: note: the comparison reduces
to '(5.0e+0 > 5.5e+0)'
Compiler returned: 1

Okay. Got it. Just need to use std::sqrt to avoid the wrong overload.

int main() {
  return integrate(std::sqrt);
}

// ---------------

<source>:22:21: error: no matching function for call to
'integrate(<unresolved overloaded function type>)'
   22 |     return integrate(std::sqrt);

Here's the overloaded function type. Then how do we choose the right one?

We google this question, and the first link directs us to the Qt forum. Oh, it's a common problem to specify overload when connecting signals and slots. The most relevant answer is to perform an explicit type cast for function pointers.

int main() {
  return integrate(
    static_cast<float(*)(float)>(&std::sqrt));
}

Yay, it works! The program still returns 85. It's a slightly different 85 this time :)

Congratul...

2. ...We've violated 16.4.5.2.6.

Let F denote a standard library function ([global.functions]), a standard library static member function, or an instantiation of a standard library function template. Unless F is designated an addressable function, the behavior of a C++ program is unspecified (possibly ill-formed) if it explicitly or implicitly attempts to form a pointer to F.

The integrate(static_cast<float(*)(float)>(&std::sqrt)); call does exactly that. We are taking a pointer to a function, but we can't take the pointers to almost any function in the standard library.

The original version with return integrate(sqrt), which uses sqrt from the C library, also falls into this trap but implicitly.

Since C++20, they warn us that it may stop compiling, but I haven't seen it yet.

Why is this the case?

Who said that it's a function?

Yes, almost all functions of the C++17 standard library turn out to be normal functions after template instantiation. That's why everything has been working smoothly for so many years.

The C standard library functions are worse. They can be macros, and who knows what address we're actually getting in this case.

Since C++20 (inspired by Eric Niebler's ranges), new functions—and potentially old ones (after std moves to modules)—can suddenly become a niebloid. These are global objects with a defined operator(). So, they may appear and behave like the old-school features, but they aren't. If we used the C-style cast instead of the cumbersome static_cast, we might encounter some interesting results:

// The old version
// float f(float a) {
//   return a;
// }

// We've upgraded, and now it's a niebloid!
auto f = [](float a) -> float {
    return a;
};

int main() {
  return integrate((float(*)(float))(&f));
  // A segfault
}

We could have saved the situation by removing & before the function name. For functions and lambdas, we apply the implicit decay to the pointer:

// float f(float a) {
//   return a;
// }

auto f = [](float a) -> float {
  return a;
};

int main() {
  return integrate((float(*)(float))(f));
  // It compiles and works.
}

Niebloids in std are more often defined as shown below, rather than using lambdas:

struct {
  static float operator()(float x) {
    return x;
  } 
} f;

int main() {
  return integrate((float(*)(float))(f));
  // error: invalid cast from type '<unnamed struct>'
  // to type 'float (*)(float)'
}

It doesn't compile, and we're awfully lucky that it doesn't!

Unfortunately, there are numerous code examples that explicitly get the address of a function.

What shall we do?

The good news is that if, at some point, all the wonderful bunch of standard functions becomes callable objects, our integrate(std::sqrt) will compile and work perfectly out of the box. And everyone will be happy.

The bad news is that this is not likely to happen, so we'll have to write code.

We can fix the issue by wrapping the call to the std function in our function or lambda.

int main() {
  return integrate([](float x) {
    return std::sqrt(x);
  });
}

Or we can add a helper macro. If we use C++20, it looks less scary than usual.

#define LAMBDA_WRAP(f) []<class... T>(T&&... args) \
  noexcept(noexcept(f(std::forward<T>(args)...))) -> \
    decltype(auto) \
  { return f(std::forward<T>(args)...); }

int main() {
  return integrate(LAMBDA_WRAP(std::sqrt));
}

It's best to use the lambda rather than the function for optimization. Let's take a look.

If, for any reason, the compiler can't inline a call to the integrate template function, and we pass a pointer to the function, the compiler will have no choice but to generate the call instruction on that pointer.

#define LAMBDA_WRAP(f) []<class... T>(T&&... args) \
  noexcept(noexcept(f(std::forward<T>(args)...))) -> \
    decltype(auto) \
  { return f(std::forward<T>(args)...); }

float my_sqrt(float f) {
  return std::sqrt(f);
}

int main() {
  return integrate(my_sqrt) + integrate(LAMBDA_WRAP(std::sqrt));
}

Here's the assembly code when using the function:

// float integrate<float (*)(float)>(float (*)(float)):
    push    rbp
    mov     rbp, rdi
    ....
.L24:
    pxor    xmm0, xmm0
    cvtsi2ss        xmm0, ebx
    add     ebx, 1
    call    rbp  // ! no function data, here's a pointer call
    addss   xmm0, DWORD PTR [rsp+12]
    movss   DWORD PTR [rsp+12], xmm0
    cmp     ebx, 26
    ....
    ret

Here's the assembly code when using the lambda:

float integrate<
  main::{lambda<typename... $T0>(($T0&&)...)#1}>(
    main::{lambda<typename... $T0>(($T0&&)...)#1})
      [clone .isra.0]:
    ....
.L16:
    pxor    xmm0, xmm0
    cvtsi2ss        xmm0, ebx
    ucomiss xmm2, xmm0
    ja      .L21
    sqrtss  xmm0, xmm0 // ! sqrt is set
    add     ebx, 1
    addss   xmm1, xmm0
    cmp     ebx, 26
    jne     .L16
.L11:
    ...
.L21:
    movss   DWORD PTR [rsp+12], xmm1
    add     ebx, 1
    call    sqrtf  /// ! sqrt is set
    ....

Why is lambda preferred, but not always?

For example, GCC and Clang generate a code copy for each lambda call, even if they're identical. Well, we need it: each lambda has to have a unique type.

int main() {
  return integrate(my_sqrt) +
         integrate(LAMBDA_WRAP(std::sqrt)) + 
         integrate(LAMBDA_WRAP(std::sqrt)) +
         integrate(LAMBDA_WRAP(std::sqrt));
}

What can we do? Code bloat is a well-known result from the monomorphization of templates and generic functions.

Just reuse the lambda, it'll be better:

// The generated code is two times smaller than for the example above:
int main() {
  auto sqrt_f = LAMBDA_WRAP(std::sqrt);
  return integrate(my_sqrt) +
         integrate(sqrt_f) + 
         integrate(sqrt_f) +
         integrate(sqrt_f);
}

Useful links

Author: Dmitry Sviridkin

Dmitry has over eight years of experience in high-performance software development in C and C++. From 2019 to 2021, Dmitry Sviridkin has been teaching Linux system programming at SPbU and C++ hands-on courses at HSE. Currently works on system and embedded development in Rust and C++ for edge servers as a Software Engineer at AWS (Cloudfront). His main area of interest is software security.

Editor: Andrey Karpov

Andrey has over 15 years of experience with static code analysis and software quality. The author of numerous articles on writing high-quality code in C++. Andrey Karpov has been honored with the Microsoft MVP award in the Developer Technologies category from 2011 to 2021. Andrey is a co-founder of the PVS-Studio project. He has long been the company's CTO and was involved in the development of the C++ analyzer core. Andrey is currently responsible for team management, personnel training, and DevRel activities.