>
>
>
C++ programmer's guide to undefined beh…

Andrey Karpov
Articles: 675

Dmitry Sviridkin
Articles: 12

C++ programmer's guide to undefined behavior: part 12 of 11

Your attention is invited to the final part of an e-book on undefined behavior. This is not a textbook, as it's intended for those who are already familiar with C++ programming. It's a kind of C++ programmer's guide to undefined behavior and to its most secret and exotic corners. The book was written by Dmitry Sviridkin and edited by Andrey Karpov.

Why is it the chapter 12 of 11? We couldn't resist highlighting the favorite error of C and C++ programmers that even has its own name—the Off-by-one Error. We also didn't want to break the tradition ;)

std::vector::reserve and std::vector::resize

Many C++ programs have fallen victim to despicable tricks of these twin brothers—not to mention carelessness, haste, and autocomplete, of course.

Almost all C++ programming books advise us to allocate memory in advance when creating std::vector, especially if we know how many elements it will contain. Then, when filling up a vector, std::vector won't need to reallocate, which means our program will run faster without costly memory reallocations.

Here's a catch, though: std::vector doesn't have a constructor that lets you state, "I want an empty vector but with a reserved capacity of N." The vector has another constructor that fills it with the N elements by default:

// Creates N empty strings
std::vector<std::string> text(N);

But creating an entire vector, initializing every element in it, only to overwrite them later isn't optimal... No, no, this isn't the right way to use C++!

We have the reserve() member function, it should be called after an empty vector is created.

std::vector<std::string> text;
text.reserve(N);

Ugh, here's another pitfall! We have the resize() member function, which can also take exactly one argument if the vector elements are default-constructible.

So, naturally, programmers tend to mix them up:

  • the names are short and start with the same letters;
  • they have a similar purpose;
  • they appear next to each other in the autocomplete suggestions;
  • and nowadays, even AI assistants can suggest using the wrong one!

As a result, a programmer ends up creating a vector twice as large as needed and wondering why all the elements they need are empty.

auto read_text(size_t N_lines) {
  std::vector<std::string> text;
  text.resize(N_lines);    // Ouch!
  for (size_t line_no = 0; line_no < N_lines; ++line_no) {
    std::string line;
    std::getline(std::cin, line);
    text.emplace_back(std::move(line));
  }
  return text; 
}

There's no undefined behavior, though. Sigh.

But what if a developer accidentally used reserve() when resize() was actually needed? Moreover, this kind of mistake has a pretty good chance of occurring, because the autocomplete output suggestions is are often ordered alphabetically, and reserve comes first.

std::pair<std::vector<std::byte>, size_t>
  read_text(std::istream& in, size_t buffer_len)
{
  std::vector<std::byte> buffer;
  buffer.reserve(buffer_len);
  in.read(reinterpret_cast<char*>(buffer.data()), buffer_len);
  return {
    std::move(buffer), static_cast<size_t>(in.gcount())
  };
}

int main() {
  auto [buffer, actual_size] = read_text(std::cin, 45);
  for (size_t i = 0; i < actual_size; ++i) {
    std::cout << static_cast<int>(buffer[i]) << "\n";
  }
}

The program will run successfully!

// I/O example
>> hello
104
101
108
108
111

Let's pause here for a moment and take a step back. Countless times, I've seen statements like these on all sorts of tech forums:

  • "C and C++ are the languages for programming close to the hardware".
  • "Undefined behavior is just a formality."
  • "If a programmer knows how memory works, how their program and types work, they can safely ignore this nonsense."
  • and so on.

The reserve() example shown above can be used as to support such a controversial statement. Indeed. I know that:

  • reserve() does allocate memory, so the [buffer.data(), buffer.data() + buffer_len) range is valid;
  • std::istream::read initializes memory in the [buffer.data(), buffer.data() + actual_size) range;
  • by default, the operator[] of a vector doesn't check the passed index;
  • the vector is accessed in the loop within the initialized bounds.

Thus, everything works. Moreover, it works without invoking undefined behavior in the language. If we copy the std::vector implementation, remove all asserts and conditional sanitizer instrumentation from it, and use it in the same weird way, we won't have undefined behavior in our code. At least in this example.

But this is std::vector. std! It declares library undefined behavior—we accessed an element that has an index beyond [0, vector::size()). This is unforgivable.

I couldn't find any compiler available online that could reproduce the sequence of optimizations leading to the incredibly beautiful crash. However, I've seen a few closed bug reports for Apple Clang that exhibit this behavior.

LLVM can generate the ud2 instruction on x86, which is an invalid instruction, often used as an indicator of unreachable code. If the program tries to execute it, it'll crash with the SIGILL signal. The code that causes undefined behavior can be marked as unreachable and replaced with ud2 or discarded in the future. In our wonderful example, the compiler is fully aware that buffer.size() == 0, and it hasn't been changed.

So, this is what it looks like if we try to rewrite the same mess in Rust, which actively uses LLVM features:

fn read(n: usize, mut reader: impl std::io::Read) ->
  std::io::Result<(Vec<u8>, usize)>
{
  // We're reserving memory. It will be uninitialized.
  let mut buf = Vec::<u8>::with_capacity(n);

  // unsafe Rust is quite complex, and technically
  // we can't directly create &mut [u8]
  // in uninitialized memory here.
  // But it won't affect the outcome
  // because "we know what we're doing."
  // For now...
  let actual_size = reader.read(unsafe {
      std::slice::from_raw_parts_mut(buf.as_mut_ptr(), n)
  })?;
  // Unlike in C++, in Rust, we can use
  // unsafe { buf.set_len(actual_size) };
  // And make this example almost correct.
  // We're here to look at undefined behavior, though.
  Ok((buf, actual_size))
}

pub fn main() {
  let (buf, n) = read(42, std::io::stdin()).unwrap();
  for i in 0..n {
    println!("{}",
      unsafe { buf.get_unchecked(i) }
    )
  }
}

In the debug build, the program crashes with a segmentation error and the following message:

thread 'main' panicked at library/core/src/panicking.rs:220:5:
unsafe precondition(s) violated: slice::get_unchecked requires
that the index is within the slice
note: run with `RUST_BACKTRACE=1` environment variable to
display a backtrace thread caused non-unwinding panic. aborting.
Program terminated with signal: SIGSEGV

In the release version with -C opt-level=3, we can see empty output and a successful exit. If we look at the generated code, we won't find a loop in it. The code for accessing the vector elements was completely discarded as unreachable. Thanks to the assert_unsafe_precondition!(check_language_ub, ...) annotation.

example::main::h67df0b7f9b5f8d1a:
      ....
      call    qword ptr [rip + <std::io::stdio::Stdin as
                std::io::Read>::read::h30ce8d6974df759c@GOTPCREL]
      mov     esi, 42
      test    rax, rax
      jne     .LBB1_3    # This is unwrap
      mov     edx, 1
      mov     rdi, rbx
      call    qword ptr [rip + __rust_dealloc@GOTPCREL]
      add     rsp, 8
      pop     rbx
      pop     r14
      ret
.LBB1_7:

"Well, this is Rust!"—one might argue, rolling their eyes. Yes. However, it's only a matter of time before Clang applies the same optimizations to C++.

What can we do?

It would be nice not to mix up reserve() and resize(), of course...

Numerous experiments with various utilities have shown that the state of diagnostics for such undefined behavior in C++ is still quite poor in 2024:

  • unfortunately, static analyzers are silent;
  • sanitizers, by default, don't react either;
  • MSVC's _ITERATOR_DEBUG_LEVEL crashes silently;
  • -fsanitize=address stops being silent only with -stdlib=libc++ ==1==ERROR: AddressSanitizer: container-overflow on address 0x5040000000000050 at pc 0x59481461d1b0 bp 0x7ffcf01b08b0 sp 0x7ffcf01b08a8 READ of size 1 at 0x504000000050 thread T0.

Hold on, wait! What if this isn't an error? We deliberately used reserve() because it doesn't initialize memory. We wanted to overwrite it, as in the Rust example, with some data from a file and adjust the size() at the end. But the vector just doesn't offer such an API...

The C++ standard library includes two more appropriate solutions for this case.

std::make_unique_for_overwrite


std::pair<std::unique_ptr<std::byte[]>, size_t>
read_text(std::istream& in, size_t buffer_len)
{
  // We allocate a default initialized buffer,
  // but the default array initialization of trivial objects
  // is the absence of any initialization.
  auto buffer = std::make_unique_for_overwrite<std::byte[]>(buffer_len);
  in.read(reinterpret_cast<char*>(buffer.get()), buffer_len);
  return {
    std::move(buffer), static_cast<size_t>(in.gcount())
  };
}

This option also works just as well as the original incorrect one, but without undefined behavior.

However, this way, we've lost the information about the remaining capacity, as it's not tied to unique_ptr. We need to handle it separately within its own structure or forget about it.

C++23. Std::basic_string::resize_and_overwrite

Yes! The miracle has happened, and in C++23 we can do things almost as beautifully and efficiently as we can in Rust. However, this works only for "strings". As the good old C tradition goes, we have strings that are just sequences of bytes...

// We'll have to write a bit of the CharTraits magic
// if we want to use
// std::basic_string with the std::byte type.
struct ByteTraits {
  using char_type = ::std::byte;
  static char_type* copy(char_type* dst,
                         char_type* src, size_t n) {
    memcpy(dst, src, n);
    return dst;
  }
  static void assign(char_type& dst, const char_type& src) {
    dst = src;
  } 
};

std::basic_string<std::byte, ByteTraits>
  read_text(std::istream& in, size_t buffer_len)
{
  std::basic_string<std::byte, ByteTraits> buffer;
  buffer.resize_and_overwrite(buffer_len,
                              [&in](std::byte* buf, size_t len) {
    in.read(reinterpret_cast<char*>(buf), len);
    return static_cast<size_t>(in.gcount());
  });
    
  return buffer;
}

int main() {
  auto buffer = read_text(std::cin, 45);
  size_t actual_size = buffer.size();
  std::cout << actual_size << std::endl;
  for (size_t i = 0; i < actual_size; ++i) {
    std::cout << static_cast<int>(buffer[i]) << "\n";
  }
}

Hooray! This code also works as expected.

Unary minus and unsigned integers

Let's imagine that we're designing a GUI for a game. We already have buttons, panels, and icons in place. Neat. Then we decide to make the interface more unique by implementing an animation for the checkbox element—when a player clicks to uncheck the box, it smoothly slides to the side by about 30% of its width.

We had these structures and functions:

struct Element {
  size_t width; // original non-scaled width
  ....
};

// You are using smart component system that uses
// IDs to refer to elements.
using ElementID = uint64_t; 

// Positions in OpenGL/DirectX/Vulkan worlds are floats
struct Offset {
  float x;
  float y;
};

size_t get_width(ElementID);
float screen_scale();
void move_by(ElementID, Offset);

Then we added this:

void on_unchecked(ElementID el) {
  auto w = get_width(el);
  move_by(el, Offset {
    -w * screen_scale() * 0.3f,
    0.0f
  });
}

The checkbox was 50 pixels wide. We ran the test... and the element flew off the screen!

We looked at the logs and found this:

Offset: 5.5340234e+18 0

How could it be?! Is this undefined behavior?

Nope. It's quite defined.

The unary minus, which we accidentally applied to an unsigned variable, is to blame.

For unsigned a, the value of -a is 2^N − a,
where N is the number of bits after promotion.

This is a very vicious error that Clang and GCC with the -Wall -Wextra -Wpedantic flags fail to detect; MSVC, on the other hand, has a warning for it.

Static analyzers such as PVS-Studio can also spot the error.

In more modern programming languages, if we apply unary minus to unsigned values, the code most often doesn't compile. This is the case in languages like Rust, Zig, and Kotlin.

Useful links

Unaligned references

A programmer was busy formatting bytes. After all, what could be more fun for a C++ programmer than repeatedly writing code to format the output of user-defined structures.

The programmer's bytes were packed to avoid any extra alignment! They had the data members arranged as well, so no unnecessary alignment was required:

#pragma pack(1)
struct Record {
  long value;
  int data;
  char status;
};

int main() {
  Record r { 42, 42, 42};
  static_assert(sizeof(r) == sizeof(int) + sizeof(char) + sizeof(long));
  std::cout <<
    std::format("{} {} {}", r.data, r.status, r.value); // 42 - '*'
}

They checked the code using the sanitizer, and it informed them that everything was fine:

Program returned: 0
42 * 42

Well, since everything's okay, more bytes can be formatted!

int main() {
  Record records[] = { { 42, 42, 42}, { 42, 42, 42}  };
  static_assert(sizeof(records) ==
                2 * ( sizeof(int) + sizeof(char) + sizeof(long) ));
  for (const auto& r: records) {
    std::cout << std::format("{} {} {}", r.data, r.status, r.value); // 42 - '*'
  }
}

Then, something exploded (this would definitely happen under ARM):

Program returned: 0
/app/example.cpp:16:48: runtime error: reference binding to
misaligned address 0x7ffd1eda9f85 for type 'const int',
which requires 4 byte alignment
0x7ffd1eda9f85: note: pointer points here
 00 00 00 00 2a 00 00  
 00 2a 00 00 00 00 00 00 
 00 00 00 00 00 00 00 00 
 03 00 00 00 00 00 00 00  b0

Yes, unaligned memory can't be read—it would cause undefined behavior. We already know that. We shouldn't dereference an unaligned pointer. Unfortunately, C++ has references, and they must be correctly aligned too.

We clearly see one reference:

for (const auto& r: records);

But this isn't a const int type! Well, yes, it's Record, and there's nothing wrong with it. The #pragma pack(1) directive sets the alignment requirement to 1, so there's no issue here.

Where did the const int reference come from?

It was created implicitly. After all, implicit reference creation is a key feature of C++!

template< class... Args >

// Here they are, the two tricky &&!
std::string format( std::format_string<Args...> fmt, Args&&... args ); 

// All three data members are passed by reference!
std::cout << std::format("{} {} {}", r.data, r.status, r.value);

Yes, a "universal reference" is still a reference.

In a packed structure, the data members aren't aligned. We can't take references to them.

But it worked without warnings in the original version with a single structure...

Ha! We're just lucky that:

  • data members in the structure are ordered in such a way that there's no padding between them, even without a pragma pack;
  • a stack is usually aligned to sizeof(void*), which is enough for all data members in the structure.

We can add an extra char to the stack, and the things will change.

int main() {
  char data[1];
  Record r { 42, 42, 42};
  memset(data, 0, 1);
  std::cout <<
    std::format("{} {} {}", r.data, r.status, r.value); // 42 - '*'
}

Program returned: 0
/app/example.cpp:17:44: runtime error:
reference binding to misaligned address 0x7ffe3b4e1f36 for type 'int',
which requires 4 byte alignment
0x7ffe3b4e1f36: note: pointer points here
 00 00 00 00 2a 00  00 00 2a 00 00 00 00 00 
 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00

So, how can we fix this unfortunate error?

We need to separately read from each data member into a properly aligned temporary variable, i.e. make a copy.

int main() {
  Record records[] = { { 42, 42, 42}, { 42, 42, 42}  };
  for (const auto& r: records) {

    // C++23 has wonderful auto() for this purpose
    std::cout << std::format("{} {} {}",
      auto(r.data), auto(r.status), auto(r.value)); 

    // In C++20,
    auto data = r.data; auto status = r.status; auto value = r.value;
    std::cout << std::format("{} {} {}", data, status, value); 

    // Or completely ugly and unstable to type changes
    std::cout << std::format("{} {} {}", static_cast<int>(r.data), 
                                         static_cast<char>(r.status), 
                                         static_cast<long>(r.value>));
  }
}

In slightly safer languages, if we take unaligned references to data members of packed structures, the code simply won't compile.

In Rust:

#[repr(C, packed)]
struct Record {
  value: i64,
  data: i32,
  status: i8, 
}

fn main() {
  let r = Record { value: 42, data: 42, status: 42 };
  // In Rust, macros are one of the few things
  // where references can appear implicitly to the person reading the code
  println!("{} {} {}", r.data, r.status, r.value); 

  /*
  error[E0793]: reference to packed field is unaligned
  --> <source>:10:26
      |
   10 |     println!("{} {} {}", r.data, r.status, r.value);
      = note: packed structs are only aligned by one byte,
              and many modern architectures penalize unaligned
              field accesses
      = note: creating a misaligned reference is undefined
              behavior (even if that reference is never dereferenced)
      = help: copy the field contents to a local variable,
              or replace the reference with a raw pointer and
              use `read_unaligned`/`write_unaligned`
              (loads and stores via `*p` must be properly aligned
              even when using raw pointers)
  */

  // This is the right way to do it:
  println!("{} {} {}", {r.data}, {r.status}, {r.value});
}

Ownership, exceptions, and errors

A team once received a bug report, "The service crashed with segmentation fault. In the core dump, stack trace points to something from your library as the last function before the crash. Fix it!" The crash had occurred exactly once in six months.

This "something" turned out to be a call to free somewhere deep inside the Protobuf library. Several subsequent stack frames pointed to a destructor call in our library. After analyzing the destructor code for a while, the engineer on duty found nothing suspicious and assumed it might resemble an issue previously encountered in Protobuf. Nobody had any idea how to reproduce it—a dead end...

I got curious about this mysterious story and went deeper into the core dump.

A few dozen stack frames higher, within code that belonged to someone else's service, the lru_insert function popped up. Interesting. This turned out to be the LRU cache insertion function. We could suspect that the destructor call might be related to an object being evicted from the cache.

Determined to figure it out, I tracked down the code for this service to see what happens during the insertion. The code I discovered confused me at first, then it fascinated me:

auto metadata = new Metadata(....);
metadata->cached = true;
lru_insert(cache, key, metadata);
// check if insert successfull!
if (auto item_handle = lru_get(cache, key)) {
  ....
} else {
  // not found -> it's not cached
  metadata->cached = false;
}

If there's a bare new, there must also be a delete somewhere... So, I found it—both of them.

The first is during the creation of the cache:

auto cache =
  lru_create(n, [](void* data){ delete static_cast<Metadata*>(data); });

And the other one is somewhere else:

if (!metadata->cached) {
  delete metadata;
}

This whole situation looks like a double deletion. What went wrong here, though?

lru_insert(cache, key, metadata);
// check if insert successfull!
if (auto item_handle = lru_get(cache, key)) {
  ....
} else {
  // not found -> it's not cached
  metadata->cached = false;
}

The code is amazing in that it explicitly makes two cache references: one for insertion and one for checking. However, we could limit it to just one insertion if lru_insert provides success information. Could this be the issue? Does the service have any race conditions that could interfere between the insertion and the check? I was assured, however, that the process is single-threaded.

We should probably take a closer look at the lru_insert function. It was created 10 years ago and has never been touched since. It's been tested, it's reliable. How can I doubt it?

void lru_insert(Cache* c, const char* key, void* data) 
{
  try {
    c->cache.insert(std::string(key), 
                    boost::intrusive_pointer(new LRUItem(data,
                                             c->deleter)));
  } catch (...) {}
}

I was horrified by what I saw. After all, here we have a bare new again, which can lead to a memory leak in the rarest and often overlooked case: if the std::string constructor throws the std::bad_alloc exception. As we can see, this function completely ignores any caught exceptions.

However, according to Rust, leaks are perfectly safe! They can't lead to use-after-free or double-free. There's a very high chance that the segfault was caused by double-free.

Of course, there's still the possibility that cache.insert could throw an exception, fail to perform the insertion, and force intrusive_pointer to delete the object. However, this scenario doesn't align with the core dump that we've seen: the repeated deletion and crash of an old object occurred within insert. If the new object had been deleted, the crash would've happened somewhere else... or not at all. Undefined behavior detected!

We have another suspect in this code snippet. Let's look at the lru_get function. It was also written 10 years ago, tested, and there isn't a slightest reason to doubt it!

// LRUItemHandle protects data from deletion,
// if it's evicted from the cache
LRUItemHandle* lru_get(Cache* c, const char* key) {
  try {
    auto item_ptr = c->cache.get(std::string(key));
    if (!item_ptr) {
      return nullptr;
    } 
    return new LRUItemHandle(item_ptr);
  } catch (...) {
    return nullptr;
  }
}

I was horrified (once again). I hope you are, too. Let's get back to the unfortunate fragment.

lru_insert(cache, key, metadata);
// let's say insert worked out well.
if (auto item_handle = lru_get(cache, key)) {
  ....
} else {
  // lru_get has at least two opportunities to mislead us 
  // about whether an item is in the cache, 
  // and it looks like we've been fooled.
  metadata->cached = false;
}

It turned out that in an extremely rare case, depending on the load, the service would run out of memory every six months. However, instead of crashing due to an out-of-memory error, it kept trying to operate—those were the requirements. It often succeeded in doing so, until one day std::bad_alloc was thrown in this reliable and well-tested LRU-cache library.

Now that we've gathered all the evidence and reconstructed the crime scene, we need to take a break and reflect on what happened.

  • The service developers have lost the game with manual data ownership management between different components and dynamically defining it. Who will release this data? This is a tough game.
  • The library with the LRU cache, that they wanted to share ownership with, ended up having a monstrous API. There's a reason why it's a C-API, but it doesn't concern our story. However, it's a horrendous C-API.
// The function doesn't report a possible insertion error.
// The function tries to take ownership of data,
// but if an error occurs, data may or may not be deleted
// depending on the error type:
// — Not deleted if allocation error occurs
//   before entering c->cache.insert
// — Deleted if any error occurs within c->cache.insert
void lru_insert(Cache* c, const char* key, void* data)

// This function may crash with an error or fail to find an element.
// However, the user has no way to distinguish
// between these two outcomes.
LRUItemHandle* lru_get(Cache* c, const char* key)

Passing and sharing ownership across C-API boundaries via raw pointers and considering errors and exceptions is a challenging task that requires great focus. It can become a nightmare if the code is written in the C-style that ignores C++ features.

I've extended the library API that contains the "robust and tested" LRU cache with new functions that are more error-resistant. I also tried to fix the old ones as much as possible: for example, the ownership issue in lru_insert couldn't be completely fixed... because there was a user who was relying on its incorrect behavior.

This is the way the function should've been written from the start:

// The function takes ownership of data and passes it to the cache.
// In case of any error, data will NOT be released.
// If successful, the cache will control the data lifetime.
ErrorCode lru_try_insert(Cache* c, const char* key, void* data) try {
  // Preparing a slot for the raw pointer.
  auto slot =
    boost::intrusive_pointer(new LRUItem(nullptr, c->deleter));
  // Inserting an empty slot into the cache. 
  // The slot must be empty to prevent data from being erased
  // if an insertion error occurs.
  c->cache.insert(std::string(key), slot);
  // Passing ownership to the slot.
  // No further error can occur at this stage.
  // The LRUItem destructor guarantees the deleter(data) call
  slot->data = data 
  return ErrorCode::LRU_OK;
} catch (...) {
  return ErrorCode::LRU_ERROR;
}

It would be better, of course, to revise dependencies and use the C++ library with safer RAII types for the LRU cache.

Exceptions (or panics, as in Rust) always complicate manual resource management. This isn't specific to low-level languages. For example, programs written in Go, Java, C#, Python, and other languages also suffer from unclosed files or database connections if a programmer forgets to use try-with-resources, finally, or defer blocks.

It's highly recommended to minimize manual resource management:

I also highly recommend C++ developers to try Rust at least as a tutorial on ownership, its transfer, and separation. The strict borrow checker will reveal many interesting patterns where full manual control can easily lead to double-free.

Coroutines: time of life and death

The async/await syntax is tightly integrated into the lives of modern developers: from frontend to backend and even low-level system programming. It first appeared in F# in 2007 and gradually spread a variety of languages over the following years: C# (2012), Python (2015), JavaScript (2017), Kotlin (2018), Rust (2019), and Zig (2020, but removed it in 2024 due to implementation issues in the self-hosted compiler).

Despite all its controversial aspects (the famous "colored" function problem), the capability to write simple linear code instead of classic callback spaghetti for complex asynchronous tasks is useful, and it reduces the effort required for prototyping.

To keep up, C++20 also added long-awaited support for asynchronous programming: instead of async functions, C++ offers relatively explicit and more general types called coroutines. There's await for them too... or rather, co_await! There are also co_return and co_yield. In C++, they fixed both—the asynchronous function and function generator issues, or they introduced new issues instead...

Unfortunately, even though coroutine support is there, they aren't in the standard library! If you want them, implement your own. However, I think I'll take the coroutines from boost::asio to illustrate the following fascinating example.

#include <iostream>
#include <concepts>
#include <vector>
#include <string>
#include <ranges>
#include <chrono>

#include <boost/asio/co_spawn.hpp>
#include <boost/asio/detached.hpp>
#include <boost/asio/io_context.hpp>
#include <boost/asio/steady_timer.hpp>

using namespace boost::asio;
namespace this_coro = boost::asio::this_coro;

using namespace std::literals::string_literals;
using namespace std::chrono;

using Request = std::string;

struct MetricEmitter {
  std::string metric_class;
  void emit(std::chrono::milliseconds elapsed) const {
    std::cout << metric_class << " " << elapsed << "\n";
  }
};

// A demonstration coroutine for simulating I/O
awaitable<void> some_io(int delay) {
  steady_timer timer(co_await this_coro::executor);
  timer.expires_after(milliseconds(delay));
  co_await timer.async_wait(use_awaitable);
  co_return;
}

awaitable<void> handle_request(const Request& r) {
  co_await some_io(15);
  std::cout << "Hello " << r << "\n";
  co_return;
}

template <std::ranges::range Requests>
awaitable<void> process_requests_batch(Requests&& reqs) 
requires
std::convertible_to<std::ranges::range_value_t<Requests>, Request> {
  auto executor = co_await this_coro::executor;
  // We add runtime metrics to the query processing.
  auto handle_with_metrics =
    [metrics = MetricEmitter { "batch_processor" } ]
    (auto&& request) -> awaitable<void> 
  {
      auto start = steady_clock::now();
      co_await handle_request(std::move(request));
      auto finish = steady_clock::now();
      metrics.emit(duration_cast<milliseconds>(finish - start));
  };
  for (auto&& r: std::move(reqs)) {
    // we run a concurrent execution for each request.
    co_spawn(executor, handle_with_metrics(std::move(r)), detached);
  }
  co_return;
}

awaitable<std::vector<Request>> accept_requests_batch() {
  co_return std::vector{ "Adam"s, "Helen"s, "Bob"s };
}

awaitable<void> run() {
  co_await process_requests_batch(co_await accept_requests_batch());
  co_await some_io(100);
}

int main()
{
  // We run our coroutines in a single-threaded execution context.
  boost::asio::io_context io_context(1);
  co_spawn(io_context, run(), detached);
  io_context.run();
}

You might assume that this code will successfully output the following three times:

Hello <name>
batch_processor <processing time>

In some order.

However, it's very likely to crash with a segmentation error.

Let's see the welcome message from the address sanitizer:

gcc -std=c++23 -O0 -fsanitize=address
AddressSanitizer:DEADLYSIGNAL
==1==ERROR: AddressSanitizer:
SEGV on unknown address 0x00000000001b
(pc 0x7a9f129aedf4 bp 0x7a9f12a1b780 sp 0x7fff9f00a228 T0)
==1==The signal is caused by a READ memory access.
==1==Hint: address points to the zero page.
    #0 0x7a9f129aedf4  (/lib/x86_64-linux-gnu/libc.so.6+0x1aedf4)
       (BuildId: 490fef8403240c91833978d494d39e537409b92e)
    #1 0x7a9f1288b664 in _IO_file_xsputn
       (/lib/x86_64-linux-gnu/libc.so.6+0x8b664) (BuildId:
       490fef8403240c91833978d494d39e537409b92e)
    #2 0x7a9f1287ffd6 in fwrite
       (/lib/x86_64-linux-gnu/libc.so.6+0x7ffd6)
       (BuildId: 490fef8403240c91833978d494d39e537409b92e)
    #3 0x7a9f12e900ab 
       (/opt/compiler-explorer/gcc-14.2.0/lib64/libasan.so.8+0x820ab)
       (BuildId: e522418529ce977df366519db3d02a8fbdfe4494)
    #4 0x7a9f12ce8d1c in
       std::basic_ostream<char, std::char_traits<char> >
       & std::__ostream_insert<char, std::char_traits<char> >
       (std::basic_ostream<char,
       std::char_traits<char> >&, char const*, long)
       (/opt/compiler-explorer/gcc-14.2.0/lib64/libstdc++.so.6+0x14cd1c)
       (BuildId: 998334304023149e8c44e633d4a2c69800a2eb79)
    #5 0x407e03 in handle_request /app/example.cpp:39
    #6 0x40f697 in
       std::__n4861::coroutine_handle<void>::resume() const
       /opt/compiler-explorer/gcc-14.2.0/include/c++/14.2.0/coroutine:137
    #7 0x449403 in
       boost::asio::detail::awaitable_frame_base
       <boost::asio::any_io_executor>::resume()
       /app/boost/include/boost/asio/impl/awaitable.hpp:501
    #8 0x445ba4 in
       boost::asio::detail::awaitable_thread
       <boost::asio::any_io_executor>::pump()
       app/boost/include/boost/asio/impl/awaitable.hpp:770
    #9 0x454bc7 in
       boost::asio::detail::awaitable_handler<
       boost::asio::any_io_executor, boost::system::error_code>::operator()
       (boost::system::error_code const&)
       /app/boost/include/boost/asio/impl/use_awaitable.hpp:93
    #10 0x4517e6 in
        boost::asio::detail::binder1<
        boost::asio::detail::awaitable_handler<
        boost::asio::any_io_executor, boost::system::error_code>,
        boost::system::error_code>::operator()()
        /app/boost/include/boost/asio/detail/bind_handler.hpp:115
    #11 0x44f337 in void
        boost::asio::detail::handler_work<
        boost::asio::detail::awaitable_handler<
        boost::asio::any_io_executor, boost::system::error_code>,
        boost::asio::any_io_executor, void>::complete<
        boost::asio::detail::binder1<
        boost::asio::detail::awaitable_handler<
        boost::asio::any_io_executor, boost::system::error_code>,
        boost::system::error_code>>
         (boost::asio::detail::binder1<
        boost::asio::detail::awaitable_handler<boost::asio::any_io_executor,
        boost::system::error_code>, boost::system::error_code>&,
        boost::asio::detail::awaitable_handler<boost::asio::any_io_executor,
        boost::system::error_code>&)
        /app/boost/include/boost/asio/detail/handler_work.hpp:433
    #12 0x44ccfe in
        boost::asio::detail::wait_handler<
        boost::asio::detail::awaitable_handler<
        boost::asio::any_io_executor, boost::system::error_code>,
        boost::asio::any_io_executor>::do_complete
         (void*, boost::asio::detail::scheduler_operation*,
        boost::system::error_code const&, unsigned long)
        /app/boost/include/boost/asio/detail/wait_handler.hpp:76
    #13 0x41747c in
        boost::asio::detail::scheduler_operation::complete
        (void*, boost::system::error_code const&, unsigned long)
        /app/boost/include/boost/asio/detail/scheduler_operation.hpp:40
    #14 0x41e925 in
        boost::asio::detail::scheduler::do_run_one
        (boost::asio::detail::conditionally_enabled_mutex::scoped_lock&,
        boost::asio::detail::scheduler_thread_info&,
        boost::system::error_code const&)
        /app/boost/include/boost/asio/detail/impl/scheduler.ipp:493
    #15 0x41dcfb in
        boost::asio::detail::scheduler::run(boost::system::error_code&)
        /app/boost/include/boost/asio/detail/impl/scheduler.ipp:210
    #16 0x41f27a in boost::asio::io_context::run()
        /app/boost/include/boost/asio/impl/io_context.ipp:64
    #17 0x4099bf in main /app/example.cpp:75

Looks like the following reference...

awaitable<void> handle_request(const Request& r)

...got a little messed up. What went wrong, though?

Coroutines are very complex objects that are deceptively easy to use because of the syntactic sugar. That's the point! Language- and complier-level support for async/await makes it easy to do things that were difficult to be done manually... This ease of use is a characteristic of high-level and safe languages with automatic memory management, such as Python, C#, JavaScript, and Kotlin, but not C++ or Rust (or Zig).

In the example above, we have at least three failure points that contain errors. You can reflect on this as we unfold the coroutine issues in C++.

What's a coroutine?

We can exercise some terminology here, as many different definitions exist. There's a rather abstract and high-level one: a coroutine is a function that can suspend its execution having the option to resume it later.

However, here's a nuance that leads to a common misunderstanding: the "magic" doesn't reside in the function itself but in the object it returns.

I hope JavaScript developers know that this:

async function myFunction() {
  return "Hello";
}

Is the same thing as this:

function myFunction() {
  return Promise.resolve("Hello");
}

Just like this in Rust:

async fn my_foo() -> String
{
  "Hello".to_string()
}

Is the same thing as this:

fn my_foo() -> impl Future<Output = String> {
  // creates the Future anonymous object
  async {
    "Hello".to_string()
  }
}

C++ doesn't have a special syntax for declaring functions as coroutines. Instead, we should explicitly specify the return value type (for example, awaitable). The requirements for this type are specific and aren't clear at first glance:

  • awaitable must satisfy the concept of std::coroutine_handle_traits. In other words, it must have the promise = typename awaitable::promise_type associated type.
  • The promise type must comply with the promise concept documentation, which is so complex that people have to write entire books about coroutines in C++. In short, promise controls the behavior of co_await, co_yield, and co_return operations.
  • The handle = std::coroutine_handle<promise> instantiation must succeed.
  • It must be possible to construct awaitable using promise.get_return_object().

And only then, within the body of the function returning awaitable, it will be possible (and often necessary) to use the syntactic sugar—co_await, co_yield, and co_return:

awaitable<std::string> myFunction() {
  co_return "Hello";
}

Which sheds its sugar coating to reveal something like this underneath (rough pseudocode):

awaitable<std::string> myFunction() {
  using Promise = awaitable::promise_type;
  using Handle = std::coroutine_handle<Promise>;
  Promise p;
  auto state = new ImplicitlyGeneratedStateMachine<Handle>(p);
  // state = _0;
  // ....
  //{  
  //  switch(state) {
  //     case _0: { state = _1; p.initial_suspend(); }
  //     case _1: { p.yield_value("Hello"); }
  //   }
  // }
  return p.get_return_object();
}

If none of the co_* keywords appear in the function body, no magic transformations will happen. I don't think it could've been done without some malicious intent. Look at this!

awaitable<void> process_request(const std::string& r) { 
  co_await some_io(1);
  std::cout << r; 
}

awaitable<void> send_dummy_request() {
  return process_request("hello");
}

int main(){
  boost::asio::io_context io_context(1);
  co_spawn(io_context, send_dummy_request(), detached);
  io_context.run();
}

We run it. We check it. Does it work? It doesn't successfully output anything... This is weird. Let's remove some_io().

awaitable<void> process_request(const std::string& r) { 
  std::cout << r; 
}

We run it. We check it. What do we get? That's right:

<source>: In function 'boost::asio::awaitable<void>
          process_request(const std::string&)':
<source>:19:73: warning: no return statement in function returning
 non-void [-Wreturn-type]
 19 | awaitable<void> process_request(const std::string& r) { std::cout << r; }
    |                                                                         ^
ASM generation compiler returned: 0
<source>: In function 'boost::asio::awaitable<void>
process_request(const std::string&)':
<source>:19:73: warning: no return statement in function
returning non-void [-Wreturn-type]
 19 | awaitable<void> process_request(const std::string& r) { std::cout << r; }
    |                                                                         ^
Execution build compiler returned: 0
Program returned: 132
Program terminated with signal: SIGILL

Of course. Sure. After all, there's no magic without magic keywords, and our process_request function returns nothing. This is undefined behavior!

We'll add co_return:

awaitable<void> process_request(const std::string& r) { 
  std::cout << r; 
  co_return;
}

Nothing is displayed again...

Let's bring the sanitizer in!

==1==ERROR: AddressSanitizer: stack-use-after-return on
address 0x758796100150 at pc 0x7587985d01e6
bp 0x7ffda95e0290 sp 0x7ffda95dfa50
READ of size 5 at 0x758796100150 thread T0
    #0 0x7587985d01e5 
       (/opt/compiler-explorer/gcc-14.2.0/lib64/libasan.so.8+0x821e5)
       (BuildId: e522418529ce977df366519db3d02a8fbdfe4494)
    #1 0x758798428d1c in
       std::basic_ostream<char, std::char_traits<char> >&
       std::__ostream_insert<char, std::char_traits<char> >
       (std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
       (/opt/compiler-explorer/gcc-14.2.0/lib64/libstdc++.so.6+0x14cd1c)
       (BuildId: 998334304023149e8c44e633d4a2c69800a2eb79)
    #2 0x405c47 in process_request /app/example.cpp:21
    #3 0x4082f5 in std::__n4861::coroutine_handle<void>::resume() const
       /opt/compiler-explorer/gcc-14.2.0/include/c++/14.2.0/coroutine:137
    #4 0x427e8b in
       boost::asio::detail::awaitable_frame_base
       <boost::asio::any_io_executor>::resume()
       /app/boost/include/boost/asio/impl/awaitable.hpp:501
    #5 0x425e2c in
       boost::asio::detail::awaitable_thread
       <boost::asio::any_io_executor>::pump()
       /app/boost/include/boost/asio/impl/awaitable.hpp:770

Oh no, what a disaster, the const std::string& r reference seems to have died. Why?

Here's the reason:

awaitable<void> send_dummy_request() {
  return process_request("hello");
}

There're no magic co_* keywords here either. This means we just call the process_request function and it returns a coroutine object, which...

It's a good time to clarify what happens to the function arguments that use co_*: they are implicitly copied to the implicitly created object. A value is copied as a value, a reference is copied as a reference.

awaitable<std::string>
myFunction(int arg1, const std::string& arg2)
{
  using Promise = awaitable::promise_type;
  using Handle = std::coroutine_handle<Promise>;
  Promise p;
  auto state = new ImplicitlyGeneratedStateMachine<Handle>(p);
  //  state = _0;
  //  int __arg1 { arg1 };
  //  const std::string& __arg2 { arg2 };
  //{  
  //  switch(state) {
  //     case _0: { state = _1; p.initial_suspend(); }
  //     case _1: { p.yield_value("Hello"); }
  //   }
  //}
  return p.get_return_object();
}

It means that:

awaitable<void> process_request(const std::string& r) {....}

awaitable<void> send_dummy_request() {
  // The local temporary std::string is implicitly constructed,
  // its reference is passed to the process_request function,
  // where it's further copied to the state machine.
  return process_request("hello");
  // We return the state machine, and the local temporary object dies.
  // Use-after-free when trying to get
  // the result of the state machine.
}

We need those magic words... Meanwhile, can you already feel how messy things can get when it comes to templates? No? What about when it's completely unclear whether we've received the coroutine or not? Still no? Oh, that's okay. Take this as a homework assignment: write std::invoke that supports coroutines. In the meantime, we'll keep adding magic words.

awaitable<void> send_dummy_request() {
  co_return process_request("hello");
}
<source>: In function
'boost::asio::awaitable<void> send_dummy_request()':
<source>:26:5: error: no member named 'return_value' in
'std::__n4861::coroutine_traits
<boost::asio::awaitable<void> >::promise_type' {aka
'boost::asio::detail::awaitable_frame
<void, boost::asio::any_io_executor>'}
   26 |   co_return process_request("hello");
      |   ^~~~~~~~~
Compiler returned: 1

Well, of course—process_request returns a coroutine... Let's break down what should happen step by step. After all, we want to understand and figure things out.

awaitable<void> send_dummy_request() {
  auto task = process_request("hello"); // The function returned
                                        // the state machine
                                        // coroutine.
  auto result = co_await task; // We need to wait for completion
  co_return result;            // and return the result.
}

Oops, here's a little hiccup...

<source>: In function
'boost::asio::awaitable<void> send_dummy_request()':
<source>:27:28: error: use of deleted function
'boost::asio::awaitable<T, Executor>::awaitable
(const boost::asio::awaitable<T, Executor>&)
[with T = void; Executor = boost::asio::any_io_executor]'
   27 |     auto result = co_await task;
      |                            ^~~~
In file included from /app/boost/include/boost/asio/co_spawn.hpp:22,
                 from <source>:8:
/app/boost/include/boost/asio/awaitable.hpp:123:3: note: declared here
  123 |   awaitable(const awaitable&) = delete;
      |   ^~~~~~~~~

This is a feature of Boost.Asio. For better security, co_await must be applied only to rvalue. Here's a little fix:

auto result = co_await std::move(task);

And we get...

<source>:28:10: error: deduced type 'void' for 
result' is incomplete
   28 |     auto result = co_await std::move(task);
      |          ^~~~~~

Does it finally hit you how bad things can get with templates? That's okay, let's not forget that C++ has always had issues with the wonderful void type. We'll just combine co_return and co_await into a single line:

awaitable<void> send_dummy_request() {
  auto task = process_request("hello"); // The function returned
                                        // the state machine
                                        // coroutine.
  
  co_return co_await std::move(task); // Let's wait for completion
                                      // and return the result.
}

The code compiles and...

==1==ERROR: AddressSanitizer: stack-use-after-return on address
0x76cc8d801070 at pc 0x76cc8fa541e6 bp 0x7ffed68d7510
sp 0x7ffed68d6cd0
READ of size 5 at 0x76cc8d801070 thread T0
    #0 0x76cc8fa541e5
       (/opt/compiler-explorer/gcc-14.2.0/lib64/libasan.so.8+0x821e5)
       (BuildId: e522418529ce977df366519db3d02a8fbdfe4494)
    #1 0x76cc8f8acd1c in
       std::basic_ostream<char, std::char_traits<char> >&
       std::__ostream_insert<char, std::char_traits<char> >
       (std::basic_ostream<char, std::char_traits<char> >&,
       char const*, long)
       (/opt/compiler-explorer/gcc-14.2.0/lib64/libstdc++.so.6+0x14cd1c)
       (BuildId: 998334304023149e8c44e633d4a2c69800a2eb79)
    #2 0x405c47 in process_request /app/example.cpp:20
    #3 0x408cc7 in std::__n4861::coroutine_handle<void>::resume() const
       /opt/compiler-explorer/gcc-14.2.0/include/c++/14.2.0/coroutine:137
    #4 0x428cef in
       boost::asio::detail::awaitable_frame_base
       <boost::asio::any_io_executor>::resume()
       /app/boost/include/boost/asio/impl/awaitable.hpp:501
    #5 0x426e00 in
       boost::asio::detail::awaitable_thread
       <boost::asio::any_io_executor>::pump()
       /app/boost/include/boost/asio/impl/awaitable.hpp:770
    #6 0x432170 in
       boost::asio::detail::awaitable_async_op_handler
       <void (), boost::asio::any_io_executor>::operator()()
       /app/boost/include/boost/asio/impl/awaitable.hpp:804
    #7 0x431a95 in
       boost::asio::detail::binder0
       <boost::asio::detail::awaitable_async_op_handler
       <void (), boost::asio::any_io_executor> >::operator()()
       /app/boost/include/boost/asio/detail/bind_handler.hpp:56
    #8 0x431f04 in void
       boost::asio::detail::executor_function::complete
       <boost::asio::detail::binder0
       <boost::asio::detail::awaitable_async_op_handler
       <void (), boost::asio::any_io_executor> >, std::allocator<void> >
        (boost::asio::detail::executor_function::impl_base*, bool)
       /app/boost/include/boost/asio/detail/executor_function.hpp:113
    #9 0x40a5ac in boost::asio::detail::executor_function::operator()()
       /app/boost/include/boost/asio/detail/executor_function.hpp:61
    #10 0x42a66d in
       boost::asio::detail::executor_op
       <boost::asio::detail::executor_function, std::allocator<void>,
       boost::asio::detail::scheduler_operation>::do_complete
       (void*, boost::asio::detail::scheduler_operation*,
       boost::system::error_code const&, unsigned long)
       /app/boost/include/boost/asio/detail/executor_op.hpp:70
    #11 0x410754 in
        boost::asio::detail::scheduler_operation::complete
        (void*, boost::system::error_code const&, unsigned long)
        /app/boost/include/boost/asio/detail/scheduler_operation.hpp:40
    #12 0x41728b in
        boost::asio::detail::scheduler::do_run_one
         (boost::asio::detail::conditionally_enabled_mutex::scoped_lock&,
        boost::asio::detail::scheduler_thread_info&,
        boost::system::error_code const&)
        /app/boost/include/boost/asio/detail/impl/scheduler.ipp:493
    #13 0x41680d in
        boost::asio::detail::scheduler::run(boost::system::error_code&)
        /app/boost/include/boost/asio/detail/impl/scheduler.ipp:210
    #14 0x417be0 in boost::asio::io_context::run()
        /app/boost/include/boost/asio/impl/io_context.ipp:64
    #15 0x406b67 in main /app/example.cpp:36

This is the same issue as before. The string died for the same reason. What if now we put it all in one line of code?

awaitable<void> send_dummy_request() {
  co_return co_await process_request("hello");
}

Now the code is finally correct—the program displays the lovely "hello".

Isn't it great how we went from the incorrect...

awaitable<void> send_dummy_request() {
  return process_request("hello");
}

...to the correct code?

awaitable<void> send_dummy_request() {
  co_return co_await process_request("hello");
}

Could such beauty exist if C++ required to mark function declarations with [co_]async? Then it would be boring, just like in Rust:

[[co_async]] awaitable<void> send_dummy_request() {
  return process_request("hello"); // Compilation error!
                                   // Type mismatch / co_return
                                   // should be used.
}

But it was all just syntactic fun, during which we caught an error: const references, rvalue references, and implicit creation of temporary objects. The state machine implicitly captures references, and implicit temporary objects implicitly die. The recipe is simple: explicitly create temporary variables, control their lifetime, avoid reference parameters in coroutines if possible. Let's go back to the very first example and fix the errors and potential reference issues.

// We now take all parameters for coroutines by value. 

awaitable<void> handle_request(Request r) {
  co_await some_io(15);
  std::cout << "Hello " << r << "\n";
  co_return;
}

template <std::ranges::range Requests>
awaitable<void> process_requests_batch(Requests reqs) 
requires
std::convertible_to<std::ranges::range_value_t<Requests>, Request>
{
  auto executor = co_await this_coro::executor;
  // we add runtime metrics to the query processing
  auto handle_with_metrics = 
   [metrics = MetricEmitter { "batch_processor" } ](auto request) ->
     awaitable<void>
  {
      auto start = steady_clock::now();
      co_await handle_request(std::move(request));
      auto finish = steady_clock::now();
      metrics.emit(duration_cast<milliseconds>(finish - start));
  };
  for (auto&& r: std::move(reqs)) {
    // we run a concurrent execution for each request.
    co_spawn(executor, handle_with_metrics(std::move(r)), detached);
  }
  co_return;
}

awaitable<std::vector<Request>> accept_requests_batch() {
  co_return std::vector{ "Adam"s, "Helen"s, "Bob"s };
}

awaitable<void> run() {
  co_await process_requests_batch(co_await accept_requests_batch());
  co_await some_io(100);
}

We compile the code. We run it. And we get... that's right, a new segmentation error!

==1==ERROR: AddressSanitizer: heap-use-after-free on address
0x511000000228 at pc 0x7c812eca71e6 bp 0x7ffded723390
sp 0x7ffded722b50
READ of size 15 at 0x511000000228 thread T0
    #0 0x7c812eca71e5
       (/opt/compiler-explorer/gcc-14.2.0/lib64/libasan.so.8+0x821e5)
       (BuildId: e522418529ce977df366519db3d02a8fbdfe4494)
    #1 0x7c812eaffd1c in
       std::basic_ostream<char, std::char_traits<char> >&
       std::__ostream_insert<char, std::char_traits<char> >
       (std::basic_ostream<char, std::char_traits<char> >&,
       char const*, long)
        (/opt/compiler-explorer/gcc-14.2.0/lib64/libstdc++.so.6+0x14cd1c)
        (BuildId: 998334304023149e8c44e633d4a2c69800a2eb79)
    #2 0x41f592 in
       MetricEmitter::emit
       (std::chrono::duration<long, std::ratio<1l, 1000l> >)
       const /app/example.cpp:24
    #3 0x40ab8e in operator() /app/example.cpp:52
    #4 0x40f771 in std::__n4861::coroutine_handle<void>::resume() const
       /opt/compiler-explorer/gcc-14.2.0/include/c++/14.2.0/coroutine:137
    #5 0x4494e7 in
       boost::asio::detail::awaitable_frame_base
       <boost::asio::any_io_executor>::resume()
       /app/boost/include/boost/asio/impl/awaitable.hpp:501
    #6 0x445c86 in
       boost::asio::detail::awaitable_thread
       <boost::asio::any_io_executor>::pump()
       /app/boost/include/boost/asio/impl/awaitable.hpp:770
    #7 0x454cab in boost::asio::detail::awaitable_handler
       <boost::asio::any_io_executor, boost::system::error_code>::operator()
       (boost::system::error_code const&)
       /app/boost/include/boost/asio/impl/use_awaitable.hpp:93
    #8 0x4518ca in
       boost::asio::detail::binder1
       <boost::asio::detail::awaitable_handler
       <boost::asio::any_io_executor, boost::system::error_code>,
       boost::system::error_code>::operator()()
       /app/boost/include/boost/asio/detail/bind_handler.hpp:115
    #9 0x44f41b in void
       boost::asio::detail::handler_work
       <boost::asio::detail::awaitable_handler
       <boost::asio::any_io_executor, boost::system::error_code>,
       boost::asio::any_io_executor, void>::complete
       <boost::asio::detail::binder1
       <boost::asio::detail::awaitable_handler
       <boost::asio::any_io_executor, boost::system::error_code>,
       boost::system::error_code> >
       (boost::asio::detail::binder1
       <boost::asio::detail::awaitable_handler
       <boost::asio::any_io_executor, boost::system::error_code>,
       boost::system::error_code>&,
       boost::asio::detail::awaitable_handler<boost::asio::any_io_executor,
       boost::system::error_code>&)
       /app/boost/include/boost/asio/detail/handler_work.hpp:433
    #10 0x44cde2 in
       boost::asio::detail::wait_handler
       <boost::asio::detail::awaitable_handler
       <boost::asio::any_io_executor, boost::system::error_code>,
       boost::asio::any_io_executor>::do_complete
       (void*, boost::asio::detail::scheduler_operation*,
       boost::system::error_code const&, unsigned long)
       /app/boost/include/boost/asio/detail/wait_handler.hpp:76
    #11 0x417556 in
       boost::asio::detail::scheduler_operation::complete
        (void*, boost::system::error_code const&, unsigned long)
       /app/boost/include/boost/asio/detail/scheduler_operation.hpp:40
    #12 0x41e9ff in
       boost::asio::detail::scheduler::do_run_one
        (boost::asio::detail::conditionally_enabled_mutex::scoped_lock&,
       boost::asio::detail::scheduler_thread_info&,
       boost::system::error_code const&)
       /app/boost/include/boost/asio/detail/impl/scheduler.ipp:493
    #13 0x41ddd5 in
       boost::asio::detail::scheduler::run(boost::system::error_code&)
       /app/boost/include/boost/asio/detail/impl/scheduler.ipp:210
    #14 0x41f354 in boost::asio::io_context::run()
       /app/boost/include/boost/asio/impl/io_context.ipp:64
    #15 0x4099ae in main /app/example.cpp:75

Judging by the trace, we now have another dead string—the one stored in MetricEmitter.

template <std::ranges::range Requests>
awaitable<void> process_requests_batch(Requests reqs) 
requires
std::convertible_to<std::ranges::range_value_t<Requests>, Request>
{
  auto executor = co_await this_coro::executor;
  // we add runtime metrics to the query processing
  auto handle_with_metrics =
    [metrics = MetricEmitter { "batch_processor" } ](auto request) ->
      awaitable<void>
  {
    auto start = steady_clock::now();
    co_await handle_request(std::move(request));
    auto finish = steady_clock::now();
    metrics.emit(duration_cast<milliseconds>(finish - start));
  };
  for (auto&& r: std::move(reqs)) {
    // we run a concurrent execution for each request.
    co_spawn(executor, handle_with_metrics(std::move(r)), detached);
  }
  co_return;
}

If you haven't guessed it yet, let me remind you of something non-obvious, and also that coroutines can be class member functions as well.

struct MetricEmitter {
  std::string metric_class;
  void emit(std::chrono::milliseconds elapsed) const {
    std::cout << metric_class << " " << elapsed << "\n";
  }

  awaitable<void> wrap_request(Request r) const {
    auto start = steady_clock::now();
    co_await handle_request(std::move(r));
    auto finish = steady_clock::now();
    // The coroutine also implicitly captures this pointer!
    emit(duration_cast<milliseconds>(finish - start));
  }
};

// So, it's probably obvious that
auto task =
  MetricEmitter{"batch_process"}.wrap_request(request);
co_await task; // MetricEmitter is dead. We'll have use-after-free

We have something similar, although not identical, in our example.

// handle_with_metric is an anonymous structure with
// certain operator().
auto handle_with_metrics =
  [metrics = MetricEmitter { "batch_processor" } ](auto request) ->
    awaitable<void>
{
  auto start = steady_clock::now();
  co_await handle_request(std::move(request));
  auto finish = steady_clock::now();
  // The coroutine implicitly captures this...
  // In this case, this is a pointer to a lambda function! 
  metrics.emit(duration_cast<milliseconds>(finish - start));
};
// If the lambda function dies before the coroutine execution completes,
// there will be use-after-free.

Let's see how we can call it:

for (auto&& r: std::move(reqs)) {
    // We run a concurrent execution for each request...
    // IN THE BACKGROUND!!! The result of calling 
    // the handle_with_metrics coroutine
    // is stored somewhere inside the boost::asio::co_spawn function.
    co_spawn(executor, handle_with_metrics(std::move(r)), detached);
    // It'll be handled later.
}
co_return; // Here's where our lambda dies.

For such unpleasant cases, co_spawn has an overload that takes a function directly rather than awaitable<T>. However, the function must have no arguments.

We can fix the error in different ways by following the recommendation: all coroutine parameters should be passed explicitly and moved inside its body. In C++23, deduced this can help in case of class member functions:

struct MetricEmitter {
  std::string metric_class;
  void emit(std::chrono::milliseconds elapsed) const {
    std::cout << metric_class << " " << elapsed << "\n";
  }

  awaitable<void> wrap_request(this auto self, Request r) {
    // Self is copied by value!
    auto start = steady_clock::now();
    co_await handle_request(std::move(r));
    auto finish = steady_clock::now();
    // The coroutine also implicitly captures this pointer!
    self.emit(duration_cast<milliseconds>(finish - start));
  }
};

Returning coroutines from stateful lambdas isn't recommended. Take away the capture list!

template <std::ranges::range Requests>
awaitable<void> process_requests_batch(Requests reqs)
requires
std::convertible_to<std::ranges::range_value_t<Requests>, Request>
{
  auto executor = co_await this_coro::executor;
  // We add runtime metrics to the query processing.
  auto handle_with_metrics = [](auto request) -> awaitable<void> {
    auto metrics = MetricEmitter { "batch_processor"}; 
    auto start = steady_clock::now();
    co_await handle_request(std::move(request));
    auto finish = steady_clock::now();
    metrics.emit(duration_cast<milliseconds>(finish - start));
  };
  for (auto&& r: std::move(reqs)) {
    // We run a concurrent execution for each request.
    co_spawn(executor, handle_with_metrics(std::move(r)), detached);
  }
  co_return;
}

Everything works now.

Hello Adam
batch_processor 15ms
Hello Helen
batch_processor 15ms
Hello Bob
batch_processor 15ms

Manually tracking reference lifetimes in asynchronous code is extremely difficult. Automation, like in Rust, does this much better, but it may also produce completely incomprehensible reports that can only be understood if one knows what exactly could have gone wrong—this is why async in Rust is disliked and criticized. The simplest and most productive solution to satisfy the borrow checker is to copy everything in a sequence (.clone(), .clone() everywhere).

C++ gives us total control, but with it comes an incredible amount of implicit reference captures! We can do whatever we want with them, and the code will compile without any issues or checks. We can put a lot of effort into keeping track of all the references and ensuring that objects don't die at the wrong time. Alternatively, get desperate, read the guidelines, and always pass everything by value, copying and moving. No references needed.

The reference issue is only the beginning. Things get much worse if we also enable multi-threaded coroutine execution.

  • If we capture a lock using std::unique_lock and hold it while co_await is executing, we should prepare for undefined behavior, since unique_lock can be moved to another thread that doesn't own the lock! Only careful attention, experience, and maybe a static analyzer can help avoid these pitfalls.
  • Of course, there's even more room for race conditions, especially if one forgets to copy something inside the coroutine body.

Oh yeah, there's one more thing: depending on the implementation, coroutines can be lazy or not so much (check out promise::initial_suspend()). Boost::asio::awaitable are lazy, which is why we got that beautiful use-after-free right away. For coroutines that have promise::initial_suspend() return suspend_never, the following code...

awaitable<void> process_request(const std::string& r) { 
  std::cout << r; 
  co_await something();
  /* r isn't used */
  co_return;
}

...can continue to work successfully and not cause any issues for a long time.

C++ coroutines are flexible, powerful, and completely unsafe. I hope you've configured your build and tests with sanitizers before using them.

As of 2024, C++ static analyzers partially highlight such issues:

  • Clang-Tidy has an aggressive check for using references in coroutines;
  • the analyzer can also be configured to highlight stateful lambdas;
  • but they won't notify us that we need to use co_return co_await foo() instead of return foo().

However, these aren't errors in a general sense. We can successfully use references and not pay for copies. We can also avoid unnecessary wrapping in a coroutine layer and return foo() directly.

Useful links

Afterword: static analysis and UB

Undefined behavior (UB) is frequently detected using static code analysis, but it's challenging to precisely describe what the analyzers should look for. There's no specific pattern or description of what exactly needs to be detected. Undefined behavior is a collection of various ways to write incorrect code. Moreover, in the realm of UB, a static analyzer usually addresses separately each vast continent of ever-changing errors: dereferencing null pointers, signed integer overflows, buffer overflows, and so on.

For analyzer developers, detecting UB is both a challenging and interesting task. Simply using one technology isn't enough here. Data-flow analysis can help detect an invalid pointer, but it conflicts with the goal of detecting reserved names. Detecting UB is about detecting a plethora of special cases of errors.

In other words, it involves searching for a variety of patterns, regularities, and edge cases. This is why Andrey Karpov and the PVS-Studio team were so eager to participate in the creation of this book. A lot has already been accomplished in analyzer tools to detect UB, yet there remains just as much work to be done. We hope that this book will become a guiding star and a source of inspiration for developers.

Thanks to Dmitry from the PVS-Studio team!

One more thing

You can refer to this book, published as a series of articles. You can provide examples from these articles, as long as proper references are given. You must obtain the author's permission for copying or any other reproduction. Contact me: dmisvrl1@gmail.com

This content must not be used in paid services or for any fee-based teaching.

Author: Dmitry Sviridkin

Dmitry has over eight years of experience in high-performance software development in C and C++. From 2019 to 2021, Dmitry Sviridkin has been teaching Linux system programming at SPbU and C++ hands-on courses at HSE. Currently works on system and embedded development in Rust and C++ for edge servers as a Software Engineer at AWS (Cloudfront). His main area of interest is software security.

Editor: Andrey Karpov

Andrey has over 15 years of experience with static code analysis and software quality. The author of numerous articles on writing high-quality code in C++. Andrey Karpov has been honored with the Microsoft MVP award in the Developer Technologies category from 2011 to 2021. Andrey is a co-founder of the PVS-Studio project. He has long been the company's CTO and was involved in the development of the C++ analyzer core. Andrey is currently responsible for team management, personnel training, and DevRel activities.

It's time to see what UBs PVS-Studio can find in your code!