>
>
>
C++ programmer's guide to undefined beh…

Andrey Karpov
Articles: 662

Dmitry Sviridkin
Articles: 6

C++ programmer's guide to undefined behavior: part 6 of 11

Your attention is invited to the sixth part of an e-book on undefined behavior. This is not a textbook, as it's intended for those who are already familiar with C++ programming. It's a kind of C++ programmer's guide to undefined behavior and to its most secret and exotic corners. The book was written by Dmitry Sviridkin and edited by Andrey Karpov.

Broken syntax and standard library: ellipsis and functions with arbitrary number of arguments

Surely all C++ (and even more so C) programmers are familiar with the printf function family. Among the amazing features of these functions is the capability to take an arbitrary number of arguments. You can also create full-fledged programs using just printf! There are even articles dedicated to researching and describing this madness.

However, we'll focus only on an arbitrary number of arguments. But first, let me tell you an entertaining story.

Once upon a time some wonderful library had a charming function:

template <class HandlerFunc>
void ProcessBy(HandlerFunc&& fun) 
requires std::is_invocable_v<HandlerFunc, T1, T2, T3, T4, T5>;

A programmer, eager to call this delightful function, decided to use a lambda as HandlerFunc. They didn't care much for the T1, T2, T3, T4, and T5 arguments passed to it. What could they do about them?

Option one was to fairly list all five arguments with their types. Just the way C++ masterminds used to do it in the old days.

ProcessBy([](T1, T2, T3, T4, T5) { do_something(); });

If the type names are short, then why not. But it's still a little too detailed and inconvenient. A new argument is added, and we have to correct it too. This isn't a very modern C++ approach.

Option two was to use functions that take an arbitrary number of arguments.

ProcessBy([](...){ do_something(); });

Wow, so beautiful! It's both compact and impressive. The wonders of technology! The code has compiled and is even working. So, the programmer left it that way.

However, one day the wonderful library got an update. It became better and more secure. It was the time when the mysterious, inexplicable crashes started all over again. SIGILL, SIGABRT, SIGSEGV. All of our favorite errors flooded the project.

What happened? Who was to blame? What could the programmer do? We need help of an experienced detective here...

Let's get to the bottom of this case.

In C, we can define variadic functions that take arbitrary number of arguments. There are two ways to do it:

1. An empty argument list.

void foo() {
  printf("foo");
}

foo(1,2,4,5,6);

Seems like the foo function shouldn't take any arguments. However, this isn't the case. In C, functions declared with an empty argument list are actually functions with an arbitrary number of arguments. A function that really takes no arguments is declared like this:

void foo(void);

This mess has been fixed in C++.

2. Ellipsis and va_list.

#include <stdarg.h>

void sum(int count, /* To access the argument list, 
                       we need at least one explicit argument. */
         ...) {
  int result = 0;
  va_list args;
  va_start(args, count);
  for (int i = 0; i < count; ++i) // The function doesn't know how 
                                  // many arguments have been passed.
  {
    result += va_arg(args, int); // We request another argument.
                                 // The function doesn't know what
                                 // type it is.
                                 // We specify int.
  }
  va_end(args);
  return result;
}

If there's no explicit argument, we can't access the list of others. Moreover, we get implementation-defined behavior.

Also, restrictions are placed on this explicit argument that precedes the variadic part:

  • it can't be marked with the register specifier, yet only few people need it;
  • it can't be of the "promoted" type. Hello to our favorite integer/float promotion. We can't use float, short, or char.

If we violate the explicit argument restrictions, we get undefined behavior. We ask va_arg for a promoted type and get undefined behavior again. We pass the wrong type and get... that's right, undefined behavior.

The numerous possibilities of shooting oneself and code users in the feet are unbelievable! Hackers fully exploit this vulnerability when attacking the printf function.

C++, of course, still has this issue. Moreover, it has considerably worsened!

C is a simple and relatively small language. There are not many types in it: primitives, pointers, and custom structures.

C++ introduces references, along with objects that have interesting constructors and destructors. And you may have already guessed that passing a reference or such an object as an argument to a variadic function results in undefined behavior. There's even more opportunity for fun debugging experience!

However, C++ wouldn't be C++ if it hadn't "addressed" this issue. So, we have the C++ style variadics:

template <class... ArgT>
int avg(ArgT... arg) {
  // The number of arguments is available.
  const size_t args_cnt = sizeof...(ArgT);
  // Their types are available.

  // We can't iterate over the arguments.
  // We need to write recursive calls for processing
  // or use fold expressions.
  return (arg + ... + 0) / ((args_cnt == 0) ? 1 : args_cnt);
}

It's not the most convenient, but it's much better and safer.

Well, now, having all the cards on the table, let's get back to our detective.

The C variadic did it!

ProcessBy([](...){ do_something(); });

When the library got updated, one of the T types passed to HandlerFunc by the ProcessBy function changed, which seemed insignificant at first glance. However, this change has led to undefined behavior.

And the programmer should've used the C++ variadic.

ProcessBy([](auto...){ do_something(); });

That's it. Just one word — auto — and everything would've been fine. It's handy.

Of course, to avoid unnecessary copying, we need to add two ampersands:

ProcessBy([](auto&&...){ do_something(); });

That's it for sure now. This is a great way to accept and ignore as many arguments as you like. Well, I used to be that kind of programmer once.

Useful links

Broken syntax and standard library: operator[] of associative containers

Surprisingly, this chapter won't cover anything related to undefined behavior, at least not directly.

The C++ standard library contains a lot of controversial things. One such thing is that associative containers require the insertion and retrieval to be combined in a single operation.

The subscript operator for associative containers attempts to call the default constructor for the element if it doesn't find the passed key.

On the one hand, it's convenient:

std::map<Word, int> counts;
for (Word c : text) {
  ++counts[word]; // only one key search 
}

In some other languages, one has to try hard to write the same thing and not get repeated searches. For example, here's what happens in Java:

// There's the triple search!
map.put(key, map.containsKey(key) ? map.get(key) + 1 : 1);

// There's the double search!
map.put(key, map.getOrDefault(key, 0) + 1);

Of course, the JIT compiler can optimize it... but we, in C++, love having guarantees.

On the other hand, calling the constructor when an element isn't found can backfire:

struct S {
  int x;
  explicit S (int x) : x {x} {}
};

std::map<int, S> m { { 1, S{2} }}; // Ok
m[0] = S(5);   // A huge hard-to-read compilation error.
auto s = m[1]; // Another huge hard-to-read compilation error.

Here's another case:

struct Huge {
  Huge() { 
    data.reserve(4096);
  }
  std::vector<int> data;
};

std::map<int, Huge> m;
Huge h; 
.... /* filling h */
m[0] = std::move(h); // Useless call to the default constructor,
                     // redundant allocation, 
                     // and then, the move operation.

To overcome this trouble, C++17 (20) associative containers have a whole mountain of the insert_or_assign, try_emplace, and insert member functions with the pair<iterator, bool> return value that is incomprehensible for novice devs.

Of course, all these things are difficult and inconvenient to use. People write long blog posts about how to use container search efficiently...

Of course, using operator[] is easier, "more concise" and faster, but it's also a trap for careless programmers. If we combine it with the awkward features of other objects...

std::map<std::string, std::string> options {
  {"max_value" , "1000"};
}
....
const auto ParseInt = [](std::string s) {
  std::istringstream iss(s);
  int val;
  iss >> val;
  return val;
};

// This isn't right! There's no such data member!
const int value = ParseInt(options["min_value"]);

// value == 0. It's ok. Happy debugging!
// operator[] returned an empty string.
// operator>> read nothing and wrote zero in the result.

By adding const, we can avoid the trouble with operator[] for associative containers. Then this operator won't be available to us, and we'll have to use either .at, which throws exceptions, or everyone's favorite alternative:

if (auto it = m.find(key); it != m.end()) {
  // We can do whatever we want with *it, it->second.
}

Everything is simple.

Broken syntax and standard library: iostreams — good luck debugging!

The C++ standard I/O stream library is old, clunky, and may give you nightmares. It's hard to encounter undefined behavior directly while using the library, it still can make you uneasy. The real fun usually begins when your small, isolated, and perfectly correct code that uses std::istream or std::ostream becomes part of a larger project.

First pitfall. State of i/ostream object

Even the most zealous advocates of function purity and immutability have to admit that deep under the hood, the entity responsible for I/O has some mutable state, and this is perfectly fine.

The not-so-fine part here is that this entity has an additional mutable state responsible for data formatting... and the manipulator mechanism is terrible.

std::cout << std::hex << 10; // 'a', ok
std::cout << 10; // 'a' again?!?!

The manipulator changes the stream state and switches the formatting mode for all subsequent read or write operations until it's back to its original state!

auto state = std::cout.flags();
std::cout << std::hex << 10; // 'a'
std::cout.flags(state);
std::cout << 10; // 10, ok

It's hard to imagine the chaos if someone passed a stream with rearranged formatting flags to our function, or if we forgot to reset the flags back to their original state.

Using the same member function name for setting and retrieving flags is fun too, especially for the fans of returning values via lvalue references in function arguments. But that's how almost all stream customization works, though. So, let's just bear with it.

Well, of course, the formatting state is yet another opportunity to shoot yourself in the foot in a concurrent environment.

Second pitfall. Global locale

As if having a mutable state with formatting flags wasn't enough for us. At least, the state is tied to a specific instance of i/ostream. To top it all off, we also have the construction of new instances tied to a global mutable variable that is the current global locale.

Of course, locales are a real headache not just in C and C++ but in general. However, the topic is far beyond the scope of this book.

The only thing that matters here is that i/ostreams are locale-dependent. They're not the only ones, though functions like std::to_string, atof, strtol, and other wonderful string conversion functions are also locale-dependent.

Now here's a little trick that demonstrates an issue discovered (and then reluctantly fixed) every C++ library that parses text data formats eventually runs into:

int main(int argc, char **argv) {
  auto s = std::to_string(1.5f);
  std::istringstream iss1(s);
  float f1 = 0; iss1 >> f1;
  assert(fabs(f1 - 1.5) < 1e-6); // Ok

  std::locale::global(std::locale("de_DE.UTF8"));
  std::istringstream iss2(s);
  float f2 = 0; iss2 >> f2;
  assert(fabs(f2 - 1.5) < 1e-6); // Surprise! f2 == 1500000
}

Third pitfall. Encoding of file paths and fstream

UTF8 is great. UTF8 is cool. Your code is most likely written in UTF8. By default, Python handles strings in UTF8. It's 2024, everybody's using UTF8! Ave, Unicode!

Although, it's not like everything is always perfect. GCC can't fix the BOM issue for 13 years now.

What if we have a C++ program that receives a UTF8 string with the path to a file to be opened?

void fun(const std::string& filename) {
  std::ifstream file(filename);
}

Is everything okay? Does it all work? What about the Cyrillic letters? What's with the Chinese characters? Are you sure it's working? What about Windows compatibility? That's about the time we find out it doesn't work.

The std::fstream constructor, as well as the C fopen constructor, are not very clever. They can't recognize Unicode, and that it may differ from the system native encoding.

As a result, almost every C++ program running on Windows faces a bug: if a file-path contains a non-ASCII character, the file can't be found.

Fourth pitfall. Binary mode

The binary mode for reading and writing files is another headache across all sorts of programming languages. Reading binary data from stdin, writing to stdout (which is opened in text mode by default), losing or unnecessarily adding CR (\r) bytes, everything just the way we like it.

However, C++ provides additional opportunities for eternal suffering.

I see this mistake quite often and not just in student projects:

std::ifstream file(name, std::ios::binary);
char x = 0;
file >> x; // expecting to read one byte.

Instead, operator>> for standard types always tries to perform a format read. By default, it skips over whitespace characters. Also, we have no way of knowing in what mode the stream is open! We need to manually save the information about it somewhere.

Same here, but the error shows up faster:

std::ifstream file(name, std::ios::binary);
int x = 0;
file >> x; // think there will be a read of sizeof(int) bytes.

Here's also a frequent and really nasty case:

std::ifstream file(name, std::ios::binary);
std::string s;
file.read(reinterpter_cast<char*>(&s), sizeof(s)); // UB!

Inexperienced programmers who test this code on short lines and are perfectly fine with it may conclude that the code works "as intended". This illusion is due to the peculiarities of modern string implementation and SSO (small string optimization) technique: the string isn't only implemented as three data members (date, size, and capacity), but if it's short enough, it's written right above these data members.

Of course, this is wrong.

Fifth pitfall. Read errors. End of stream

I/O streams have other flags that represent the state of the stream: whether there were errors, whether we reached the end. Many people know that you can check whether an operation was successful by putting a stream object into a conditional statement (or any context where it is converted to bool).

Those unfamiliar with it might use the while (!iss.eof()) check that will one day lead to the infinite loop issue. This happens when the file isn't finished, but can no longer be read—say, if the file is on a network drive, and the network has gone down. Well, that's a story for another time. Let's focus on the correct way to check readability.

std::istringstream iss("\t aaaa \n bb  \t ccc dd e ");
std::string token;
int count = 0;
while (iss >> token) {
  ++count;
}
assert(count == 5); // OK

All five tokens from the string will be read here—no more, no less.

What happens, if there's an error?

std::istringstream iss("1 2 3 gd 5");
int token = 0;
int count = 0;
while (iss >> token) {
  ++count;
}
std::cout << token; // Displays 0 !
assert(count == 3); // OK

Well, that makes sense. The output is zeroed for the token where the error occurred. If necessary, we can configure it to throw an exception when it happens.

What if we read the binary data?

std::istringstream iss("12345");
std::array<char, 4> buf;
int read_count = 0;
while (iss.read(buf.data(), 4)) {
  read_count += iss.gcount();
}
assert(read_count == 5); // Oops, the last byte was left out.

Here we have EOF when reading, so this is an error. Even though that one byte was read successfully, it doesn't matter.

Well, okay, C has nice the fread function that immediately return the number of bytes read. So, we get a nice loop. Maybe C++ streams have something like that too? Of course, they do!

Sixth pitfall. The readsome member function

std::istringstream iss("12345");
std::array<char, 4> buf;
int read_count = 0;
while (iss.readsome(buf.data(), 4) > 0) {
  read_count += iss.gcount();
}
assert(read_count == 5);

Wow, it works!

Well, actually, it doesn't. We go to cppreference and read this:

The behavior of this function is highly implementation-specific. For example, when used with std::ifstream, some library implementations fill the underlying filebuf with data as soon as the file is opened (and readsome() on such implementations reads data, potentially, but not necessarily, the entire file), while other implementations only read from file when an actual input operation is requested (and readsome() issued after file opening never extracts any characters).

Yeah, it doesn't work. I invite the reader to exercise in replacing istringstream with ifstream in the example above.

Broken syntax and standard library: comma operator

If you started your programming journey with Pascal or C#, you probably know that in these languages, the elements of a two-dimensional array (as well as arrays of larger size) are accessed by enumerating the indices separated by a comma inside square brackets:

double [,] array = new double[10, 10];
double x = array[1,1];

This or similar (parentheses) methods are also often used in pseudocode or in specialized languages for mathematical calculations (MatLab, MathCAD).

In C and C++, each dimension must have its own set of square brackets:

double array[10][10];
double x = array[1][1];

However, no one is stopping us from writing the code "incorrectly", and the compiler should compile it!

int array[5][5] = {};
std::cout << array[1, 4]; // oops!

Besides implicit type conversion and array overruns, we can run into a lot of bugs when porting code carelessly.

Why does it even compile?

It's all about the comma operator (,). It sequentially evaluates both of its arguments and returns the second (right) one.

int array[2][5] = {}
auto x = array[1, 4]; // Oops! This is array[4]. 
// However, the maximum value for the first dimension is 1.
// Undefined behavior!

In C++20, luckily for us, using the comma operator (,) for indexing arrays was marked as deprecated, and now compilers issue warnings (we can always turn them into errors).

By the way, one can mess up a comma not only when working with arrays. For example, we can make a typo when writing constants:

double A = 1,23; // Oops, A equals 23, not 1.23.

Other comma typos exist. That would be the end of it, except for one thing.

The comma operator (,) overloads

We can overload the comma to wreak even more havoc.

return f1(), f2(), f3();

If (,) isn't overloaded, the standard ensures that functions are called sequentially. If we call the overloaded comma here, there's no such guarantee for pre-C++17 versions.

In the case of the built-in comma, it's guaranteed that the result type matches the last argument in the chain. However, if the operator is overloaded, it can be of any type.

auto test() {
  return f1(), f2(), f3();
}

int main() {
  test();
  static_assert(!std::is_same_v<decltype(f3()), int>);
  static_assert(std::is_same_v<decltype(test()), int>); // ??!
  return 0;
}

We often use the comma in various patterns to expand argument bundles of arbitrary length, or to check for multiple conditions that trigger SFINAE.

Due to the possibility of encountering an overloaded comma in expressions containing it, library authors resort to casting each argument to void. An overload that takes void is impossible to write.

template <class... F>
void invoke_all(F&&... f) {
  (static_cast<void>(f()), ...);
}

int main() {
  invoke_all([]{
    std::cout << "hello!\n";
  },
  []{
    std::cout << "World!\n";
  });
  return 0;
}

Why would one even need to overload a comma?

Maybe they do it for some kind of DSL (domain-specific language).

Or maybe they want to make comma indexing work.

struct Index { size_t idx; };

template <size_t N>
struct MultiIndex : std::array<Index, N> {};

template <size_t N, size_t M>
auto operator , (MultiIndex<N> i1, MultiIndex<M> i2) { .... }

template <size_t M>
auto operator , (Index i1, MultiIndex<M> i2) { .... }

template <size_t N>
auto operator , (MultiIndex<N> i1, Index i2) { .... }

auto operator , (Index i1, Index i2) { .... }

Index operator "" _i (unsigned long long x) {
  return Index { static_cast<size_t>(x) };
}

template <class T, size_t N, size_t M>
struct Array2D {
  T arr[N][M];

  T& operator [] (MultiIndex<2> idx) {
    return arr[idx[0].idx][idx[1].idx];
  }
};

int main() {
  Array2D<int, 5, 6> arr;

  arr[1_i, 2_i] = 5;
  std::cout << arr[1_i, 2_i]; // Ok
  std::cout << arr[1_i, 2_i, 3_i]; // Compilation error
}

Broken syntax and standard library: function-try-block

C++ has an alternative syntax for defining the function body, which enables us to put the catching and exception handling on it entirely.

// The standard option
void f() {
  try {
    may_throw();
  } catch (...) {
    handle_error(); 
  }
}

// Alternative syntax 
void f() try {
  may_throw();
} catch (...) {
  handle_error();
}

Firstly, the code is shortened with less nesting. Secondly, this feature enables us to detect exceptions when it's impossible to do it in the standard way: in the class initialization list, when initializing a base class sub-object, and so on.

struct ThrowInCtor {
  ThrowInCtor() {
    throw std::runtime_error("err1");
  }
};


struct TryStruct1 {
  TryStruct1() try {

  } catch (const std::exception& e) {
    // An exception from the 'c' constructor is caught.
    std::cout << e.what() << "\n";
  }
  ThrowInCtor c;
};

struct TryStruct2 {
  TryStruct2() {
    try {

    } catch (const std::exception& e) {
      // The exception isn't caught
      // because the constructor body
      // is executed after the data member initialization.
      std::cout << e.what() << "\n";
    }
  }
  ThrowInCtor c;
};

In the example with the try-block for the constructor, we encounter a seemingly strange surprise. Despite the catch block, the exception is thrown in the code that calls the constructor, and the above code displays the following:

err1
something wrong
something wrong

This makes sense because if an exception is thrown when initializing the class data members, we have no way to resolve the issue and fix the object.

That's why we sometimes see such cluttered code:

struct S {
  S(....) try :
      a(....),
      b(....) {
    try {
      init();
    } catch (const std::exception& e) {
      log(e);
      try_repair();
    }    
  } catch (const std::exeption& e) {
    // Failed to fix or
    // an unfixable error in the data members.
    log(e);
    // implicit rethrow
  }

  A a;
  B b;
};

Well, okay, what about destructors? After all, throwing exceptions out of destructors is highly discouraged. Also, catch would handle it easily and surely catch everything it can.

struct DctorThrowTry {
  ~DctorThrowTry() try {
    throw std::runtime_error("err");
  } catch (const std::exception& e) {
    std::cout << e.what() << "\n";
  }
};

It looks good enough. However, this is C++, so it doesn't work!

Someone had a "brilliant idea" to make the default behavior of destructors the same as that of constructors. The catch block of the destructor implicitly throws the exception further. So, say hello to all sorts of problems with destructor exceptions, as well as violating the implicit noexcept(true).

However, unlike for constructors, they added the option for destructors to suppress the implicit throwing of a caught exception. All we need to do is add return!

struct DctorThrowTry {
  ~DctorThrowTry() try {
     throw std::runtime_error("err");
  } catch (const std::exception& e) {
    std::cout << e.what() << "\n";
    return; // The exception won't be thrown again!
  }
};

It's surprising, but for that reason C++ has a case where the return statement is the last instruction in a function with a void return type that changes its behavior.

Also, we can't access non-static data members and class member functions in the catch block of destructors and constructors, as this would lead to undefined behavior for obvious reasons. The moment we enter the catch block, they're all dead.

struct S {
  A a;
  B b;

  S() try {
    ....
  } catch (...) {
     do_something(a); // UB!
  }

  ~S() try {
    ....
  } catch (...) {
    do_something(b); // UB!
    return;
  } 
};

// However

bool fun(T1 a, T2 b) try {
  ....
  return true;
} catch (...) {
  // Note: this block doesn't catch exceptions
  // that occur during the initialization of a and b.
  do_something(a); // Ok!
  return false;
}

Summing up

  • For regular functions and main, using the alternate syntax, we can easily and nicely catch any exceptions that would be thrown. The default behavior here is catching. The exception doesn't go further.
  • For constructors, we can catch exceptions from data member constructors, handle them (print to log), but we can't suppress them. Either we throw a new exception ourselves, or the caught exception is implicitly rethrown.
  • There'll also be an implicit rethrow for destructors, but we can suppress it by adding return.

Useful links

Broken syntax and standard library: zero-sized types

In C++, when defining our own classes and structures, nothing stops us from not specifying a single data member, leaving the structure empty:

struct MyTag {};

Of course, not only we can declare empty structures, but also create objects of these types.

struct Tag {};

Tag func(Tag t1) {
  Tag t2;
  return Tag{};
}

This is incredibly useful for:

  • defining an abstract static or dynamic polymorphic interface;
  • entering tags for selecting the required overload;
  • defining various predicates and metafunctions over types.

Why don't we play a game? I'll show you different structure definitions. Try to guess their size in bytes (sizeof). Shall we begin?

struct StdAllocator {};

struct Vector1 {
  int* data;
  int* size_end;
  int* capacity_end;
  StdAllocator alloc;
};

struct Vector2 {
  StdAllocator alloc;
  int* data;
  int* size_end;
  int* capacity_end;
};

struct Vector3 : StdAllocator {
  int* data;
  int* size_end;
  int* capacity_end;
};

Did you guess right?

Vector1 and Vector2 have the 4*sizeof(int*) sizes. How is that possible?! Where 3*sizeof(int*) comes from is quite obvious. But where did the fourth one come from?!

It's very simple because C++ doesn't have zero-sized structures. That's why the size of the empty structure is sizeof(StdAllocator) == 1.

However, sizeof(int*) != 1, at least in x86. Alignment and padding is even easier. Vector1 is padded with end bytes so that its size is a multiple of the first data member alignment. And Vector2 is padded with bytes between alloc and data, so that the offset to data is a multiple of its alignment. It's all very simple and easy! If you, like many other people who don't show interest in it every day, are unsure about how padding works, I recommend using the -Wpadded compiler flag for GCC/Clang.

Okay, we've dealt with Vector1 and Vector2. What about Vector3? Is it 4*sizeof(int*) as well? After all, we know that the base class subobject has to be placed somewhere. Its size, as we've already learned, isn't zero... Not at all! The size of Vector3 is 3*sizeof(int*)! How is that possible?! This is called EBO (empty base optimization).

This is an interesting zero-cost! For comparison, we can look at similar empty structures in Rust where their size can be zero.

Okay, we've learned that careless use of empty structures can increase memory consumption. Let's continue our game.

struct StdAllocator {};
struct StdComparator {};

struct Map1 {
  StdAllocator alloc;
  StdComparator comp;
};

struct Map2 {
  StdAllocator alloc;
  [[no_unique_address]] StdComparator comp;
};

struct Map3 {
  [[no_unique_address]] StdAllocator alloc;
  [[no_unique_address]] StdComparator comp;
};

struct MapImpl1 : Map1 {
  int x;
};

struct MapImpl2 : Map2 {
  int x;
};

struct MapImpl3 : Map3 {
  int x;
};

What are the sizes of Map1, Map2, and Map3?

Well, this is easy:

  • obviously sizeof(Map1) == 2, because it consists of two empty structures, each of size 1;
  • due to the [[no_unique_address]] attribute from the C++20 standard (Clang supports it since C++11), Map2 and Map3 should be of size 1. In Map3, both data members share a common address. It's the same for Map2, and it's never less than 1.

Good, but what about inherited structures now?

Are they all 2*sizeof(int)? Well, they aren't because MapImpl3 benefits from EBO!

All right, then. There's some logic and pattern to it that's still acceptable. Although... In fact, you were right! After all, if you have the MSVC compiler, [[no_unique_address]] just doesn't work. And it won't work, because for a long time, MSVC simply ignored attributes it didn't know of. If there's the [[no_unique_address]] support, then binary compatibility is broken. Use [[msvc::no_unique_address]]! EBO, however, doesn't work yet on MSVC.

zero-size array

Since C99 (not C++), the language enables us to use the following curious construct:

struct ImageHeader{
  int h;
  int w;
};

struct Image {
  struct ImageHeader header;
  char data[];
};

The data data member in the Image structure has the zero size. This is FAM (flexible array member), a very handy feature to access an array of statically unknown length that is placed right after some header in a binary buffer. The length of the array is usually specified in the header. FAMs can only appear as the last data member in a structure.

The C++ standard doesn't allow us to make use of such features, but there's GCC with its non-standard extensions enabled by default.

What happens if we do this?

struct S {
  char data[];
};

What is the size of the S structure?

In standard C, empty structures are forbidden, as they cause undefined behavior in a program. GCC defines their size to be zero when compiling C programs. As we've found out earlier, the size is one when compiling C++ code. Carelessly designing C++ libraries with a C interface or using C libraries in C++ can lead to terrible bugs and real nightmares!

Let's get back to our structure with flexible array member. It has a data member in it. The C standard again requires at least one the non-zero size data member before the FAM. GNU C is happy to give us a zero-sized structure.

Now let's take a look at GCC C++:

struct S1 {
  char data[];
};

struct S2 {};

static_assert(sizeof(S1) != sizeof(S2));
static_assert(sizeof(S1) == 0);

Suddenly, we have zero-sized structures in C++, except it's not standard C++. One should refer to the GCC specifications to see how such structures interact with EBO.

tag dispatching

We've seen that using empty structures carelessly increases the size of other non-empty structures. Are there any other pitfalls when using empty tag structures to select an overload, for example?

Is there a difference between this:

struct Mul {};
struct Add {};

int op(Mul, int x, int y) {
  return x * y;
} 

int op(Add, int x, int y) {
  return x + y;
}

and this:

int mul(int x, int y) {
  return x * y;
} 

int add(int x, int y) {
  return x + y;
}

in terms of generated code?

The short answer is yes, there's a difference. It depends on the specific implementation. The standard doesn't guarantee optimization of empty arguments. Changing tag positions can change the binary interface. We can experiment with the most noticeable changes using MSVC as an example.

Afterword about optimizing structure sizes

Rearranging data members can reduce the structure sizes. For example, for a classic 32-bit architecture, the size of this structure is 16 bytes due to the alignment:

struct A {
  int  x;
  char foo_x;
  int  y;
  char foo_y;
};

And the size of this structure is 12 bytes:

struct A {
  int  x;
  int  y;
  char foo_x;
  char foo_y;
};

Such optimizations are unnecessary if objects are created one at a time, and not always possible if the structures show some kind of an external data structure.

However, when dealing with millions of objects, we can significantly optimize memory consumption with the simplest refactoring. The only problem is that it's not always obvious which structures can be optimized and which ones can't. Different data sizes and alignment rules also apply to different architectures. Code analyzers make life easier. For example, PVS-Studio has the V802 diagnostic rule for this purpose.

We can't get rid of the structure, so I recommend considering an increasingly popular technique: converting an array of structures (AoS, Array-of-Structs) into a structure of arrays (SoA, Struct-of-Arrays).

Useful links

Author: Dmitry Sviridkin

Dmitry has over eight years of experience in high-performance software development in C and C++. From 2019 to 2021, Dmitry Sviridkin has been teaching Linux system programming at SPbU and C++ hands-on courses at HSE. Currently works on system and embedded development in Rust and C++ for edge servers as a Software Engineer at AWS (Cloudfront). His main area of interest is software security.

Editor: Andrey Karpov

Andrey has over 15 years of experience with static code analysis and software quality. The author of numerous articles on writing high-quality code in C++. Andrey Karpov has been honored with the Microsoft MVP award in the Developer Technologies category from 2011 to 2021. Andrey is a co-founder of the PVS-Studio project. He has long been the company's CTO and was involved in the development of the C++ analyzer core. Andrey is currently responsible for team management, personnel training, and DevRel activities.