Webinar: Parsing C++ - 10.10
How often do you see the sizeof(array)/sizeof(array[0]) statement used to get the size of an array? I really hope it's not too often, because it's 2024 already. In this note, we'll talk about the statement flaws, where it comes from in modern code, and how to finally get rid of it.
Some time ago, I was surfing the internet looking for an interesting project to check. OpenTTD — open-source simulator, inspired by Transport Tycoon Deluxe (aka transport company simulation), caught my eye. "A good, mature project," I thought at the time. And there's even an occasion for the check, as the project recently turned 20 years old! Even PVS-Studio is younger :)
At this point, it would be a good idea to move on to the errors found by the analyzer, but that is not the case. I'd like to compliment the developers. Even though the project has been around for over 20 years, its code base looks great: there's CMake, the code supports modern C++ standards, and it doesn't have that many errors. We all have something to learn from the devs.
However, as you might have guessed, this note wouldn't exist if there was nothing to find. Let's look at the following code (GitHub):
NetworkCompanyPasswordWindow(WindowDesc *desc, Window *parent)
: Window(desc)
, password_editbox(
lengthof(_settings_client.network.default_company_pass) // <=
)
{
....
}
Nothing interesting at first glance, but the evaluation of the _settings_client.network.default_company_pass container size confused the analyzer. Upon closer inspection, it turns out that lengthof is a macro, and the actual code looks like this (I've formatted it a bit for your convenience):
NetworkCompanyPasswordWindow(WindowDesc *desc, Window *parent)
: Window(desc)
, password_editbox(
(sizeof(_settings_client.network.default_company_pass) /
sizeof(_settings_client.network.default_company_pass[0]))
)
{
....
}
And since we're putting our cards on the table, here's the analyzer warning:
V1055 [CWE-131] The 'sizeof (_settings_client.network.default_company_pass)' expression returns the size of the container type, not the number of elements. Consider using the 'size()' function. network_gui.cpp 2259
In this case, _settings_client.network.default_company_pass is actually std::string. Often, the size of a container object obtained using sizeof tells us nothing about its true size. An attempt to get the size of a string this way almost always results in an error.
This is all due to the peculiarities of modern standard library container implementations, and std::string in particular. They are usually implemented using two pointers (the start and end of the buffer) and a variable containing the actual number of elements. That's why when we try to determine the size of std::string using sizeof, we get the same value regardless of the actual buffer size. To see for yourself, take a look at a small example that I've prepared for you.
Of course, the standard library you use, and various optimizations have an effect on the implementation and final size of the container (see Small String Optimization), so you may get different results. You can read some interesting research on the inner workings of std::string here.
So, we've figured out an issue and figured out how not to check an array size. Don't you want to know how we did it?
In the case of OpenTTD, it's quite simple. Judging by the blame, almost four years ago, someone changed default_company_pass from char[NETWORK_PASSWORD_LENGTH] to std::string. It's quite interesting that the current value returned by the lenghtof macro is different from the past expected value: 32 vs. 33. I admit that I haven't delved deep into the project code. However, I hope that the developers have considered this detail. According to the comment, the 33rd character after the default_company_pass field is responsible for the terminal null.
// The maximum length of the password, in bytes including '\0'
// (must be >= NETWORK_SERVER_ID_LENGTH)
Legacy code and a bit careless refactoring seem to be the obvious reason for this. Surprisingly, however, this way of determining the array size still appears in the new code. Well, there's no other way in C, but why in C++? To find an answer, I went to Google Search. I can't say I was surprised...
Right at the very beginning, even before the main search results, it gives you this :( I'd like to note that I used a private mode, a clean computer, and other things to negate the suspicion that the search was based on my past queries.
Author's note: that's interesting. Please tell me in the comments what Google shows you in the top results for the same search query.
That's sad. Hopefully, AIs trained on present-day code won't make errors like these.
It wouldn't be nice to point out the issue and not offer good ways to resolve it. All that's left is to figure out what to do about it. Let's start step by step and gradually reach the best solution at the moment.
So, sizeof((expr)) / sizeof((expr)[0]) is literally an error magnet. Just think about it:
Since we're coding in C++ here, let's harness the power of templates! This brings us to the legendary ArraySizeHelper (aka "the safe sizeof" in some articles), which developers write sooner or later in almost every project. In the old days — before C++11 — you could encounter such monstrosities:
template <typename T, size_t N>
char (&ArraySizeHelper(T (&array)[N]))[N];
#define countof(array) (sizeof(ArraySizeHelper(array)))
ArraySizeHelper is a function template that accepts an array of the T type and N size by reference. The function returns a reference to an array of the char type that has the N size.
Let's look at a small example to understand how it works:
void foo()
{
int arr[10];
const size_t count = countof(arr);
}
When calling ArraySizeHelper, the compiler should determine template parameters from the template arguments. In our case, T is deduced as int and N is deduced as 10. The return type of the function is char (&)[10]. As a result, sizeof returns the array size, which is equal to the number of elements.
As you can see, the function is missing a body. The reason for this is that such a function can be used ONLY in an unevaluated context. For example, when a function call is in sizeof.
I'd also like to note that the function signature explicitly states that it accepts an array and nothing else. That's how the protection against passing pointers works. If we try to pass a pointer to such ArraySizeHelper, we get a compilation error:
void foo(uint8_t* data)
{
auto count = countof(arr); // compilation error
....
}
I'm not exaggerating when I talk about the old days. Back in 2011, my colleague has figured out how this magic worked in the Chromium project. With C++11 and C++14, writing such helper functions has become much easier:
template <typename T, size_t N>
constexpr size_t countof(T (&arr)[N]) noexcept
{
return N;
}
But wait, we can do even better!
Most likely, further on, you may want to count the size of containers: std::vector, std::string, or QList — it doesn't matter. Such containers already have the function we need — size. So, that's what we need to call. Let's overload the above function:
template <typename Cont>
constexpr auto countof(const Cont &cont) -> decltype(cont.size())
noexcept(noexcept(cont.size()))
{
return cont.size();
}
Here we've simply defined a function that takes any object and returns the result of the call to its size function. Now our function has the protection against passing pointers, can work with both builtin arrays and containers, and even does it at compile time.
Aaand... congratulations! We've successfully reinvented std::size. This is what I suggest to use starting from C++17 instead of the obsolete sizeof kludges and ArraySizeHelper. You also don't need to rewrite it every time: it's available after including the header file of almost any container.
Below, I also invite you to look at some common scenarios for people who have found their way here from search results. For the following cases, let's assume that std::size is available in the standard library. Otherwise, you can copy the functions described above and use them as its analogs.
Most of the time, it's better to use a member function of the size class. For example, std::string::size, std::vector::size, QList::size, etc. Starting with C++17, I recommend switching to the std::size I've described above.
std::vector<int> first { 1, 2, 3 };
std::string second { "hello" };
....
const auto firstSize = first.size();
const auto secondSize = second.size();
Also use the free std::size function. As we've already learned, it can return the number of elements not only in containers, but also in built-in arrays.
static const int MyData[] = { 2, 9, -1, ...., 14 };
....
const auto size = std::size(MyData);
The obvious advantage of this function is that we get a compilation error if we try to give it an inappropriate type or pointer.
Also use the free std::size function. In addition to being adaptable in terms of object type, it also works at compile time.
template <typename Container>
void DoSomeWork(const Container& data)
{
const auto size = std::size(data);
....
}
There are two options here, depending on your needs. If you just want to know the size, it's enough to use std::distance:
void SomeFunc(iterator begin, iterator end)
{
const auto size = static_cast<size_t>(std::distance(begin, end));
}
If you have something more interesting in mind than just determining the size, use read-only wrapper classes: std::string_view for strings, std::span in general, etc. Here's an example:
void SomeFunc(const char* begin, const char * end)
{
std::string_view view { begin, end };
const auto size = view.size();
....
char first = view[0];
}
The more experienced readers can also add an option with address arithmetic. Although I probably wouldn't discuss it, since the target audience for this note is novice programmers. Let's not teach them bad things :)
In most cases, it's necessary to rewrite the program a bit and add an array size passing. Sadly, that's how it works.
If you work with strings (const char *, const wchar_t *, etc.), and you know for sure that the string contains a terminal null, things get a little better. In such a case, you can use std::basic_string_view:
const char *text = GetSomeText();
std::string_view view { text };
Just like in the example above, we get all the benefits of view classes while initially having only one pointer.
I'd also like to mention a less preferred but in some cases handy option of using std::char_traits::length:
const char *text = GetSomeText();
const auto size = std::char_traits<char>::length(text);
Being literally a Swiss Army knife, std::char_traits is the must-have for working with strings. It can be used to write generalized algorithms no matter what character type is used in the string (char, wchar_t, char8_t, char16_t, char32_t). With it, you can no longer worry about when to use std::strlen or when to use std::wsclen. As I've said, the terminal null should be in the string for a reason. Otherwise, you get undefined behavior.
I hope I've managed to show good alternatives to replace such a simple but dangerous statement as sizeof(array) / sizeof(array[0]). If you think I've unfairly omitted or left something out, feel free to share it in the comments :)
0