Andrey Karpov , Dmitry Sviridkin

Aug 05 2024

Tags:

#Cpp #Knowledge

C++ programmer's guide to undefined behavior: part 3 of 11

Aug 05 2024

Author: Andrey Karpov , Dmitry Sviridkin

Errors in object lifetime: dangling references, pointers, and use-after-free: general cases
Errors in object lifetime: auto or non-auto, reference is still dangling!
- Problem of explicit type declaration
- Problems of automatic type deduction
Errors in object lifetime: string_view is just const& but with extra hassle
Errors in object lifetime: a fly in the syntactic sugar (range-based for)
- C++23
Errors in object lifetime: Schrödinger's self-reference
Errors in object lifetime: std::vector and reference invalidation

Your attention is invited to the third part of an e-book on undefined behavior. This is not a textbook, as it's intended for those who are already familiar with C++ programming. It's a kind of C++ programmer's guide to undefined behavior and to its most secret and exotic corners. The book was written by Dmitry Sviridkin and edited by Andrey Karpov.

Errors in object lifetime: dangling references, pointers, and use-after-free: general cases

80% of undefined behavior in C++ occurs because of these things.

The object lived on the stack and died on the stack. The object lived in the heap and died in the heap. There isn't much difference. The bug reproduction scenario is the same: there's a pointer or a reference to an already dead object somewhere. Then a developer uses that reference (or pointer) to access a dead object. Such a spiritual séance leads to undefined behavior. If lucky, we just get a segmentation fault that'll give a chance to find out exactly the accessed object.

However, the first stack/heap object may differ in whether the dynamic analysis method can detect it or not. To get the stack instrumented with sanitizers, we should recompile a program. To get the instrumented stack, we can substitute the allocator library.

Indeed, it's pretty rare to see code like this because most devs don't write it this way:

int main() {
  int* p = nullptr;
  {
    int x = 5;
    p = &x;
  }
  return *p;
}

However, such C++ code can be cleverly hidden in a layer of abstractions from classes and functions.

Here's a simple example:

int main() {
  const int x = 11;
  auto&& y = std::min(x, 10);
  std::cout << y << "\n";
}

This code has undefined behavior because of a dangling reference. For example, if we use the GCC 14.1 compiler with -std=c++17 -O3, the program won't crash but will unexpectedly output 0.

The problem is that std::min is declared as:

template<class T> const T& min(const T& a, const T& b);

The number 10 is a temporary object (prvalue) that dies immediately when it leaves the std::min function.

In C++, we can assign temporary objects to constant references. In this case, the constant reference extends the lifetime of the temporary object (the object "materializes") and lives until the reference leaves its scope. Further assignments to constant references won't extend the lifetime.

Any code that returns a reference or raw pointer from a function or method is a potential source of problems anywhere. If you see passed-by-reference arguments without a specified save location, it may also be an error hotspot, but in much more subtle cases.

template <class T>
void append_n_copies(std::vector<T>* elements,
                     const T& x, int N)
{
  for (int i = 0; i < N; ++i) {
    elements->push_back(x);
  }
}

void foo() {
  std::vector<int> v; v.push_back(10);
  ...
  append_n_copies(&v, v.front(), 5); // we get UB at vector reallocation! 
}

This code has all the opportunities to burst into a real project and ensnare developers.

In non-performance-critical code blocks, it's better to use pass-by-value and shifting instead of pass-by-reference. While we can't prevent all issues, this approach limits errors to the function call instead of spreading them throughout its body.

template <class T>
std::vector<T> append_n_copies(std::vector<T> elements,
                               T x, int N) {
  for (int i = 0; i < N; ++i) {
    elements.push_back(x);
  }
  return elements;               // implicit move
}

void foo() {
  std::vector<int> v; v.push_back(10);
  ...
  // v = append_n_copies(std::move(v), v.front(), 5);
  // UB, use-after-move, order of evaluation is unspecified:
  // v.front() it can be called on empty vector

  auto el = v.front();
  v = append_n_copies(std::move(v), std::move(el), 5);
}

If you need to use references, you're better of thinking about their security.

For example, we can use std::reference_wrapper, which can't be assigned temporary objects.

#include <utility>

template <class T>
std::reference_wrapper<const T>
  safe_min(std::reference_wrapper<const T> a,
           std::reference_wrapper<const T> b)
{
  return std::min(a, b);
}

int main() {
  const int x = 11;
  auto&& y = safe_min<int>(x, 11);   // compilation error
}

Or we can use forwarding references to analyze the category (rvalue/lvlaue) of the passed argument and decide what to do with it. In C++20, it looks like this:

#include <type_traits>

template <class T1, class T2>
requires
  std::is_same_v<std::decay_t<T1>,
                 std::decay_t<T2>> // std::min requires identical types
decltype(auto) // interfere without dropping references
safe_min(T1&& a, T2&& b) // forwarding reference to each argument.
{
  if constexpr (std::is_lvalue_reference_v<decltype(a)> &&
                std::is_lvalue_reference_v<decltype(b)>) {
      // both arguments were lvalue: we can safely return reference
      return std::min(a, b);
  } else {
    // one of the arguments is a temp object.
    // return it by value.
    // to do this, copy it.
    auto temp = std::min(a,b); // auto&& NO!
                               // otherwise return will output the reference
    return temp;
  }
}

The standard library has safe versions of std::min and std::max that accept arguments by value and return the result by value. Moreover, they "support" more than two arguments.

const int x = 11;
const int y = 20;
auto&& y = std::min({x, 10, 15, y}); // OK

It looks like the issue with forwarding references affects only the const references. It seems like everything should be fine since non-constant references don't have a redundant lifetime extension. However, this isn't quite true.

Above, we cover only free functions and similar static class functions.

However, it's common for class functions to return references. The issues with them are similar but less apparent. It all comes down to passing this pointer to the current object.

For example, it's quite tricky to safely implement the conditional Builder with method chaining:

class VectorBuilder {
  std::vector<int> v;

public:
  VectorBuilder& Append(int x) {
    v.push_back(x);
    return *this;
  }

  const std::vector<int>& GetVector() { return v; }
};

int main() {
  auto&& v = VectorBuilder{}
              .Append(1)
              .Append(2)
              .Append(3)
              .GetVector(); // dangling reference
}

It's all because of a dying object that returns a reference to its contents.

If we overload only GetVector to see the difference between lvalue and rvalue, the problem doesn't disappear:

class VectorBuilder {
  ....
  const std::vector<int>& GetVector() & {
    std::cout << "As Lvalue\n";
    return v;
  }

  std::vector<int> GetVector() && {
    std::cout << "As Rvalue\n";
    return std::move(v);
  }
};

We'll get the "As Lvalue" message. The Append chain implicitly turned an unnamed temporary object into a not-quite-temporary object.

Append also needs to be overloaded to handle the rvalue and lvalue cases:

class VectorBuilder {
  ....
  // lvalue
  VectorBuilder& Append(int x) & {
    v.push_back(x);
    return *this;
  }

  VectorBuilder&& Append(int x) && {
    v.push_back(x);
    return std::move(*this);
  }
};

We've handled the dangling reference to the vector elements.

However, if we want to write it this way:

auto&& builder = VectorBuilder{}.Append(1).Append(2).Append(3);

Again, we'll receive the dangling reference but to the VectorBuilder object. Don't blame the innocent Apprend overload. The implicit this originally glues on the temporary object and extends its lifetime by one time.

To prevent this, we need to choose one between one of the options.

Configure the code analyzer that restricts using auto&& and const auto& with this class in the right part;
Sacrifice performance. The Rvalue version of Append can always return VectorBuilder by value (with the shift). If the VectorBuilder contains many primitive objects, the performance drawdown can be noticeable.

Fully restrict using VectorBuilder in the rvalue context like in the code below:

class VectorBuilder {
  ....
  auto Append() && = delete;
}

However, we couldn't build chains starting from the unnamed temporary object.

Also, it's better to avoid juggling with operation chains on temporary objects like op= (+=, -=, /=). It's rare for them to handle the rvalue case:

struct Min {
  int x;

  Min& operator += (const Min& other) {
    x = std::min(x, other.x);
    return *this;
  };
};


int main() {
  auto&& m = (Min{5} += Min {10});
  return m.x; // dangling reference
}

The program returns the null value. What's interesting is that the undefined behavior causes the compiler to immediately generate the null return. Here's the assembly code (GCC 14.1, -std=c++20 -O3):

main:
  xor eax, eax
  ret

Or here's the code what happens if we use the standard library types:

int main() {
    using namespace std::literals;
    auto&& m = "hello"s += "world";
    std::cout << m; // dangling reference
}

This program, built by GCC 10.1, -std=c++20 -O3, doesn't crash, but it doesn't output anything either. If we take GCC 14.1 and the same keys, we suddenly get "helloworld" in the output. It's old but gold undefined behavior.

Errors in object lifetime: auto or non-auto, reference is still dangling!

When we work with standard containers, we deal with very long and cumbersome type names (std::vector<std::pair<T1, T2>>::const_iterator).

Starting with C++11, the previously useless auto keyword is used to indicate the compiler to auto-infer the type of variable—or, starting with C++ 14, to auto-infer the return value of the function. There is also the decltype(auto) construct that works the same way but in a different way.

Not everyone likes auto-inference in C++. Some developers encourage explicit type declaration, as the code become clearer—especially in the case of the function return value:

template <class T>
auto find(const std::vector<T>& v, const T& x) {
  ....
  // too long body with many returns
  ....
}

What is this auto here? Is it bool, an index, an iterator, or something even more complex and scary? It'd rather explicitly declared it...

It's not as simple as it seems, to choose between explicit type declaration or auto-interference. In both cases, you may bump into one problem or another.

Problem of explicit type declaration

Does explicit type declaration make you write tons of code? The using aliases can help with that. So, it's not a problem. One other thing to keep in mind is that if you're changing one type, you'll probably need to make other synchronized changes.

It could have all been fine: make a change in the declaration, get some compilation errors, and fix them in all the places the errors pointed out. However, C++ has implicit type conversion, which is particularly punishing developers when they use references.

std::map<std::string, int> counters = { {"hello", 5}, {"world", 5} };

// get list of keys, use string_view,
// to avoid redundant copies
std::vector<std::string_view> keys;
keys.reserve(counters.size());
std::transform(std::begin(counters),
               std::end(counters),
               std::back_inserter(keys),
               [](const std::pair<std::string, int>& item) ->
                 std::string_view {
                   return item.first;
               });

// handle list of keys somehow:
for (std::string_view k : keys) {
  std::cout << k << "\n";         // UB! dangling reference!
}

We've made a tiny mistake in the lambda function argument and got a reference to a temporary object with undefined behavior as a gift.

Here's an example of how UB affects the specific options of the GCC 14.1 compiler:

-std=c++17 -O3: outputs "world" twice;
-std=c++17: outputs nothing.

We fix the error by adding const before string:

[](const std::pair<const std::string, int>& item) -> std::string_view

Now we get what we want:

hello
world

It's been a few weeks now, and the code has been refactored. The counters map moves to some class field. The second method now obtains and processes keys. Then, it turns out that the value type in the associative array should be changed to a smaller one—let's go with short.

We change it. We forget about the key handling method. The compiler doesn't issue the warning message.

We run the test and come back to the same error. open icon

int main() {
  std::map<std::string, short> counters = { {"hello", 5}, {"world", 5} };

  // get the list of keys, use string_view,
  // to avoid redundant copies
  std::vector<std::string_view> keys; 
  keys.reserve(counters.size());
  std::transform(std::begin(counters), 
                 std::end(counters),
                 std::back_inserter(keys),
                 [](const std::pair<const std::string, int>& item) ->
                   std::string_view {
                     return item.first;
                 });
  // process list of keys somehow:
  for (std::string_view k : keys) {
    std::cout << k << "\n"; // UB! dangling reference!
  }  
}

Here's the program output, built by the GCC 14.1 compiler:

with -std=c++17 -O3: doubled "world" again.
with -std=c++17: outputs nothing.

In the example, the solution is to replace the explicit type of the lambda function argument with const auto&.

Here's another example: there's no argument, but there's a return value:

template <class K, class V>
class MapWrapper : private std::map<K, V> {
public:
  template <class Predicate>
  const std::pair<K, V>& find_if_or_throw(Predicate p) const {
    auto item = std::find_if(this->cbegin(), this->cend(), p);
    if (item == this->cend()) {
      throw std::runtime_error("not found");
    }
    return *item;
  }
}

We're mistaken again. We need to make another fix. There's a chance that the std::map container will change to something else in the future. If that happens, the iterator won't return the true pair, but the proxy object. In this case, the universal solution would be to decltype(auto) as the return value.

Problems of automatic type deduction

The auto keyword is handy in at least four different approaches.

1. Casual auto. No problems. The result is always a type with no references:

auto x = f(...); // but it may not be what you want:
                 // copy instead reference

class C {
public:
   auto Name() const { return name; } // copy instead reference
private:
   T name;
};

2. const auto&: it can be binded to the dangling reference:

const auto& x = std::min(1, 2);
// x — dangling reference

If the compiler warnings are properly configured, in 90% of cases you won't be able to use const auto& as the return value of the function.

Here's the GCC compiler (-std=c++17 -O3 -Werror) output:

<source>: In function 'const auto& add(int, int)':
<source>:9:14: error: returning reference to temporary
  [-Werror=return-local-addr]
    9 |     return x + y;
      |            ~~^~~
cc1plus: all warnings being treated as errors
Compiler returned: 1

3. The auto&& variable: universal/forwarding reference. Likewise, it can bind to the dangling reference:

auto&& x = std::min(1, 2);
// x — dangling reference

When it comes to the return value, this case is similar to const auto& . We get the compilation error when building with -Werror.

4. The type of decltype(auto) variable is deduced "as declared". If there is a reference on the right, the deduced type will be a reference. If there is no reference on the right, there will be no reference on the left. In some ways it's similar to auto&& when we declare variables, but not quite:

auto&& x = 5;
static_assert(std::is_same_v<decltype(x), int&&>);
decltype(auto) y = 5;
static_assert(std::is_same_v<decltype(y), int>);

The auto&& variable is always a reference, while decltype(auto) is "as declared in the return value". This might be useful for further type evaluations.

The decltype(auto) variable can behave unpredictably when we use it as the return value that needs more attention when we write code:

class C {
public:
  decltype(auto) Name1() const {
    return name; // copy. name is declared as T
  }

  decltype(auto) Name2() const {
    return (name); // reference. (name) expression has const T& type:
                   // but (name) is T&,
                   // but 'this' is labeled as const, so
                   // it means that const T&
  }

  decltype(auto) ConstName() const {
    return const_name; // const is copy. const_name is declared as const T
  }

  decltype(auto) DataRef() const {
    return data_ref; // DataT&, as declared.
    // return (data_ref); it's the same.
    //                    const from this doesn't extend further for
    //                    reference fields and pointers.
  }

  decltype(auto) DanglingName1() const {
    auto&& local_name = Name1(); // returns copy.
                                 // Copy is glued to rvalue reference
    return local_name; // local_name is a reference to local variable
  }

  decltype(auto) DanglingName2() const {
    auto local_name = Name1(); // returns copy.
    return (local_name); // (local_name) is reference to local variable
  }

  decltype(auto) NonDanglingName() const {
    decltype(auto) local_name = Name1(); // returns copy.
    return local_name; // returns copy.
  }

private:
  T name;
  const T const_name;
  DataT& data_ref;
};

The decltype(auto) type is a fragile and delicate mechanism that can turn everything upside down with a minimal change in the code: "redundant" curly brackets or &&.

Useful links

MSDN. decltype (C++).
Arthur O'Dwyer. On lifetime extension and decltype(auto).
Stack Overflow. What are some uses of decltype(auto)?

Errors in object lifetime: string_view is just const& but with extra hassle

C++17 gave us std::string_view to kill two birds with one type:

problems with function overloads that should operate well with both C and C++ strings;
programmers who sometimes forget about the rules of the lifetime extension for temporary objects.

So, the problem is that the function wants to count how many times a particular character appears in the string.

int count_char(const std::string& s, char c) {
  ....
}

count_char("hello world", 'l');
// temporary std::string object will be created,
// memory will be allocated, string will be copied, 
string will die, and memory will 
// be deallocated—it's bad, many redundant operations

So, we need to overload the C strings:

int count_char(const char* s, char c) {
  // we don't know anything about string length here.
  // is it null-terminated, though?

  // we can only write code,
  // naively expecting that it
  // will be correctly called.
  ....
}

We'll either duplicate the code, adapting it a bit for C strings, or we'll write a function:

int count_char_impl(const char* s, size_t len, char c) {
  ....
}

There, we add all the duplicated code and use function overloading:

int count_char(const std::string& s, char c) {
  return count_char_impl(s.data(), s.size(), c);
}

int count_char(const char* s, char c) {
  return count_char_impl(s, strlen(s), c);
}

Our rescue ranger, string_view, comes to help. It's just a pointer and a size. And it kills both overloads:

int count_char(std::string_view s, char c) {
  ....
}

Overall, everything is great, good, and wonderful. However, there is one catch:

The little "catch" is std::string_view. It's in fact a reference type like const&, we can construct it from temporary values. However, we don't extend the lifetime as we did with const&. Well, the lifetime will actually be extended, but not in the way we're expecting.

auto GetString = []() -> std::string { return "hello"; };
std::string_view sv = GetString();
std::cout << sv << "\n"; // dangling reference!

In this example, we're almost clearly shooting ourselves in the foot. We can make it a bit less obvious:

std::string_view common_prefix(std::string_view a, std::string_view b) {
  auto len = std::min(a.size(), b.size());
  auto common_count = [&]{
    for (size_t common_len = 0; common_len < len; ++common_len) {
        if (a[common_len] != b[common_len]) {
            return common_len;
        }
    }
    return len;
  }();
  return a.substr(0, common_count);
}

int main() {
  using namespace std::string_literals;
  {
    auto common =
      common_prefix("helloW",
                    "hello"s + "World111111111111111111111");
   std::cout << common << "\n"; // ok
  }
  {
    auto common =
      common_prefix("hello"s + "World111111111111111111111111",
                    "helloW");
    std::cout << common << "\n"; // dangling ref
  }
}

The case with is similar to std::min. However, protecting against the common_prefix function is harder if we wrap it in the template with the rvalue/lvalue analysis. We need to handle both const char* and std::string cases for each argument. Actually, it's precisely that kind of complexity std::string_view "spared" us from.

We can handle string_view more elegantly, too:

struct Person {
  std::string name;

  std::string_view Initials() const {
    if (name.length() <= 2) {
      return name;
    }
    return name.substr(0, 2); // copy — dangling reference!
  }
};

We can see that Clang (18.1.0) at least issues the warning, while GCC (14.1) does not.

Clang, 18.1.0, -std=c++17 -O3: open icon

<source>:16:16: warning: returning address of local temporary object
  [-Wreturn-stack-address]
   16 |         return name.substr(0, 2); // copy -- dangling reference!
      |                ^~~~~~~~~~~~~~~~~

All because std::string_view is so legendary that the first thing Clang made for it was at least some lifetime checker.

Useful links

Arthur O'Dwyer. std::string_view is a parameter-only type.
Jonathan. std::string_view accepting temporaries: good idea or horrible pitfall?
Nascardriver. C++ Tutorial. Introduction to std::string_view.
Herb Sutter. CppCoreGuidelines. Lifetime safety: Preventing common dangling.

Errors in object lifetime: a fly in the syntactic sugar (range-based for)

As we found out earlier, the constant lvalue—and rvalue, though—references are a lot of fun in C++ thanks to the lifetime extension rule for temporary objects.

The rule is tricky because it's not just that const&& or && extend the lifetime of the temporary object (but only the first such reference). Here's the rule: all temporary objects live until the execution of the entire statement that includes them is finished. Roughly speaking, they live until the nearest semicolon (;), or until the end of the scope of the first const& or && reference that gets in the way of this temporary object, if the reference scope is larger than the lifetime of this temporary object.

It means that:

const int& x = 1 + 2; // there are temporary objects 1 and 2
// that generate temporary object: 3 (sum).
// their lifetime ends on semicolon
// but we assign 3 to constant reference,
// its scope extends further down;
// so, its lifetime is extended.
// Thus: 1 and 2 die, 3 is still alive


const int& y =
  std::max([](const int& a, const int& b) -> const int&
  {
    return a > b ? a : b;
  } (1 + 2, 4), 5); // temporary objects 1,2,3(sum),4, and 5 live 
until THIS semicolon

// 3 and 4 are assigned to constant references in lambda function arguments.
// scope of these references ends after return
// - it's LESS than temporary object lifetime.
// references didn't extend anything but 
deprived temporary object of extended lifetime

// 5 is added to constant reference in std::max argument.
// with references to 4 and 5, std::max successfully operates,
// their lifetime is still. references are valid.

// result reference is assigned to `y`. lifetime no longer extends.
// all temporary objects have already tried 
// their luck on function arguments, no success.
// it stumbles over semicolon, lifetime of all objects (1,2,3,4 and 5) ends.
// `y` is dangling. bring down curtain on it.

With our newfound insights, let's look at another example and start unraveling things once more:

struct Point {
  int x;
  int y;
};

struct Shape {
public:
  using VertexList = std::vector<Point>;
  VertexList vertexes;
};

Shape MakeShape() {
  return { Shape::VertexList{ {1,0}, {0,1}, {0,0}, {1,1} } };
}

int main() {
  for (auto v : MakeShape().vertexes) {
    std::cout << v.x << " " << v.y << "\n";
  }
}

Everything is working as expected. open icon

Let's build the code (GCC 14.1,-std=c++20 -O3) and make sure it outputs the following:

Let's increase encapsulation, do a minimum refactoring, and make vertexes as the private field with the read-only access:

struct Shape {
public:
  using VertexList = std::vector<Point>;
  explicit Shape(VertexList v) : vertexes(std::move(v)) {}

  const VertexList& Vertexes() const {
    return vertexes;
  }

private:
  VertexList vertexes;
};

....

int main() {
  for (auto v : MakeShape().Vertexes()) {
    std::cout << v.x << " " << v.y << "\n";
  }
}

The code has been broken. Oh no, undefined behavior is here. open icon

The code built by old GCC 10.2 (-std=c++20 -O3) outputs garbage values:

5741 0
243908863 -1980499632
0 0
1 1

If we build it with GCC 14.1 (-std=c++20 -O3), we get "Program terminated with signal: SIGSEGV".

How could it be? Although the range-based for header looks like a single expression, it's written and seen as a single expression, it's not a single expression.

Since the C++17 standard, we have the construction:

for (T v : my_cont) {
  ....
}

It roughly extends to the following:

auto&& container_ = my_cont; // sic!
auto&& begin_ = std::begin(container_);
auto&& end_ = std::end(container_);
for (; begin_ != end_; ++begin_) {
  T v = *begin_;
}

In the first case, we get:

auto&& container_ = MakeShape().vertexes;

The temporary Shape object lives until the semicolon. It hasn't encountered any const& or && references yet. The vertexes subobject is seen as temporary too. Its lifetime will end with the semicolon sign. However, it encounters the && reference, whose scope extends further down and extends its lifetime. And it extends not only the vertexes subobject lifetime but the whole temporary Shape object.

In the second case, we get:

auto&& container_ = MakeShape().Vertexes();

The temporary Shape object lives until the semicolon. However, it encounters the implicit const& reference in the Vertexes() function. Its scope is limited to the function body. The lifetime isn't extended. The reference to the part of the temporary object is returned and assigned to container_ reference. It comes down to the semicolon. The Temporary Shape dies, and container_ becomes the dangling reference. Bring down the curtain on it. Again.

That's how simple and broken it is.

How to avoid the range-based for issue?

Never forget to do the rvalue overload for any const methods.
Never use any expression after a colon (:) in the loop header. Use only variables or its fields.
In C++20, use the range-based for syntax with the initializer: for (auto cont = expr; auto x : cont).
If you use the syntax with an initializer, think before using auto&& or const auto& for the initializer. However, it's not only about for...
Use std::ranges::for_each;
Don't use range-based for in C++ until it's fixed.

C++23

The extension of the object lifetime in the range-based for header has finally been fixed. Now it's harder to get a dangling reference. However, it's now easier to catch another problem such as the lifetime extension.

Useful links

Cppreference. std::ranges::for_each, std::ranges::for_each_result.
Cppreference. Range-based for loop.
Cppreference. Reference initialization. Lifetime of a temporary.
Nicolai Josuttis. WG21 D2012R0. Heal C++: Fix the range‐based for loop, Rev0.
Šimon Tóth. C++20 Ranges — Complete Guide.

Errors in object lifetime: Schrödinger's self-reference

The object scope begins immediately after it's declared—in the same line. Therefore, it's very easy in C++ to construct a syntactically correct expression using an object that has not yet been constructed.

// plain and simple
int x = x + 5; // UB

//--------------
// less explicit
const int max_v = 10;

void fun(int y) {
 const int max_v = [&]{
   // local max_v overrides global max_v
   return std::min(max_v, y);
 }();
 ....
}

Indeed, nobody would write such code on purpose. However, it can occur spontaneously when developers use auto-refactoring tools. The local max_v in the second example could have been called another way. Somebody used auto-renaming and got the code with undefined behavior instead of non-compiled code.

Although no issue arises in the following case:

const int max_v = 10;

void fun(int y) {
 const int max_v = [y]{
   // here's only global max_v
   return std::min(max_v, y);
 }();
 ....
}

Code that escapes into undefined behavior when only one character is added. Just the way we like it.

Such code is syntactically valid, and nobody is going to stop you from writing it. What's more, it also doesn't always lead to UB. Roughly speaking, UB only occurs when you dereference this object. Why roughly? Because the rules are similar to nullptr dereferencing. That's why these rules are rather confusing, not simply "you can never use it, you will always catch UB." Although, adopting such a radical interpretation could save you out from a lot of trouble.

By the way, null dereference is a pretty philosophical topic, though. If you doubt the depth of the issue, I suggest you watch JF Bastien's fascinating talk: *(char*)0 = 0; - What Does the C++ Programmer Intend With This Code?

Let's get back to the lifetime.

We can easily refer to the uninitialized object and call some static method on it. For example, if the class name is too long, and we're too lazy to type it out, or if the 80-character limit of the classical terminal isn't enough.

struct ExtremelyLongClassName {

  using UnspeekableInternalType = size_t;

  UnspeekableInternalType val;

  static UnspeekableInternalType Default() { return 5;}
};

// instead of ExtremelyLongClassName::Default()
ExtremelyLongClassName x { x.Default() + 5 }; // Ok, well-defined

For example, a case in the non-executable decltype context: do types have too long names? No problem, let's declare them from ourselves!

ExtremelyLongClassName z {
  [] ()-> decltype(z.Default()) { // Ok, well-defined
    // complex evaluations
    return 1;
  }()
};
// code above is equal to more bulk code:
ExtremelyLongClassName y {
  [] ()-> ExtremelyLongClassName::UnspeekableInternalType {
    // complex evaluations
    return 1;
  }()
};

These examples compile perfectly and are perfectly correct.

Also, the option to refer to a variable during its initialization may be useful in some specific scenarios where an object needs to reference itself. Although these cases are considered specific, they're not rare. Self-referential objects are a common-to-see pattern: circular linked lists, interfaces with nested widgets that notify each other, and more.

struct Iface {
  virtual ~Iface() = default;
  virtual int method(int) const = 0;
};

struct Impl : Iface {
  explicit Impl(const Iface* other_ = nullptr) : other(other_) {
  };

  int method(int x) const override {
    if (x == 0) {
      return 1;
    }
    if (other){
      return x * other->method(x - 1);
    }
    return 0;
  }
  const Iface* other = nullptr;
};

int main() {
  Impl impl {&impl};
  std::cout << impl.method(5);
}

We can create circular references between objects in the same way but more messily, using delegating constructors. But we'll cover it in a separate chapter.

You can avoid using an object during its own initialization by following the AAA rule—Almost Always Auto:

Whenever possible, use the auto x = T {....} entry to declare and initialize variables.

In such an entry, using the declared variable inside the initializing expression results in the compilation error. For example:

auto x = x + 1;

The compilers will say:

GCC: error: use of 'x' before deduction of 'auto';
MSVC: error C3536: 'x': cannot be used before it is initialized;
Clang: error: variable 'x' declared with deduced type 'auto' cannot appear in its own initializer.

Useful links

Almost Always Auto style.
Eugene Neruchek. Lifetime extension of temporary objects in C++: common recommendations and pitfalls.
Richard E. Pattis. Self-Referential Classes/Linked Objects Advanced Programming/Practicum.

Errors in object lifetime: std::vector and reference invalidation

There aren't many sequential containers with dynamic length in the C++ standard library:

std::list;
std::forward_list;
std::deque;
std::vector.

Among them, we more often see the std::vector being used. The others are used only if their specific features become necessary and significantly enhance the performance. For example, inserting at an arbitrary position in a constant number of operations in std::list doesn't give an advantage over std::vector (which requires linear time) unless the containers are large enough or the size of the elements is small.

Although std::vector is the most efficient container, it's also the most insecure due to the invalidation of references and iterators.

Careless use of std::vector with an abundance of "sugared" syntactic constructs can easily lead to undefined behavior.

Here's a simple example with a task queue:

std::optional<Action> evaluate(const Action&);

void run_actions(std::vector<Action> actions) {
  for (auto&& act: actions) { // UB
    if (auto new_act = evaluate(act)) {
      actions.push_back(std::move(*new_act)); // UB
    }
  }
}

Beautiful and short code, with undefined behavior, and consequently incorrect.

The push_back function can lead to the vector reallocation. If the begin/end iterators are invalidated, the loop will continue over the destroyed data.
If reallocation doesn't occur, the loop will pass only through the set of elements that were in the vector initially. It won't get to the ones added in the process.

Here's the fixed code:

void run_actions(std::vector<Action> actions) {
  for (size_t idx = 0; idx < actions.size(); ++idx) {
    const auto& act = actions[idx];
    if (auto new_act = evaluate(act)) {
      actions.push_back(std::move(*new_act));
    }
  }
}

At some point, we want to quickly add logging to check something:

void run_actions(std::vector<Action> actions) {
  for (size_t idx = 0; idx < actions.size(); ++idx) {
    const auto& act = actions[idx];
    if (auto new_act = evaluate(act)) {
      actions.push_back(std::move(*new_act));
    }
    std::cerr << act.Id() << "\n"; // UB!
  }
}

Here we go again—undefined behavior. The push_back can call the vector reallocation, and then the act reference becomes dangling.

Here's the fixed code:

void run_actions(std::vector<Action> actions) {
  for (size_t idx = 0; idx < actions.size(); ++idx) {
    if (auto new_act = evaluate(actions[idx])) {
      actions.push_back(std::move(*new_act));
    }
    std::cerr << actions[idx].Id() << "\n";
  }
}

This simple pattern of reference invalidation in the vector can very easily hide under the layer of abstractions. For example, if the processing loop is a public function of the TaskQueue class, and the processing of each task is handled by its private function changes in one function, which is perfectly correct in isolation, will lead to UB due to implicit influence on the other function.

You can somewhat mitigate the issue using static analyzers that track the program execution thread. For example, PVS-Studio has the V789 diagnostic rule that detects these kinds of problems. Also, sanitizers or memory-checking utilities (like valgrind) can spot this issue too (if, of course, your tests are good enough).

In Rust, the borrow checker detects this problem at compile time.

If you can afford a performance trade-off, it's better to use specialized containers (or container adapters) for specific tasks.

For example, the std::queue and std::stack adapters by default use the std::dequeue container, which doesn't invalidate references when new elements are added. Also, neither std::queue nor std::stack can be used carelessly in range-based for loops: they have no iterators and no begin/end functions.

Useful links:

Baptiste Wicht. C++ benchmark - std::vector VS std::list.
Stack Overflow. Iterator invalidation rules for C++ containers.

Author: Dmitry Sviridkin

Dmitry has over eight years of experience in high-performance software development in C and C++. From 2019 to 2021, Dmitry Sviridkin has been teaching Linux system programming at SPbU and C++ hands-on courses at HSE. Currently works on system and embedded development in Rust and C++ for edge servers as a Software Engineer at AWS (Cloudfront). His main area of interest is software security.

Editor: Andrey Karpov

Andrey has over 15 years of experience with static code analysis and software quality. The author of numerous articles on writing high-quality code in C++. Andrey Karpov has been honored with the Microsoft MVP award in the Developer Technologies category from 2011 to 2021. Andrey is a co-founder of the PVS-Studio project. He has long been the company's CTO and was involved in the development of the C++ analyzer core. Andrey is currently responsible for team management, personnel training, and DevRel activities.

#Cpp #Knowledge

Tags:

#Cpp #Knowledge

Fill out the form in 2 simple steps below:

Your contact information:

Desired license type:

C++ programmer's guide to undefined behavior: part 3 of 11

Errors in object lifetime: dangling references, pointers, and use-after-free: general cases

Errors in object lifetime: auto or non-auto, reference is still dangling!

Problem of explicit type declaration

Problems of automatic type deduction

Useful links

Errors in object lifetime: string_view is just const& but with extra hassle

Useful links

Errors in object lifetime: a fly in the syntactic sugar (range-based for)

C++23

Useful links

Errors in object lifetime: Schrödinger's self-reference

Useful links

Errors in object lifetime: std::vector and reference invalidation

Useful links:

Posts: articles

Poll:

Comments (0)

Want to try PVS‑Studio for free?

Achievements

PVS-Studio

Licensing

Company