The PVS-Studio team along with Sergei Kushnirenko prepared a quiz based on his publications. Take the quiz, challenge your focus and coding skills while looking for errors in the C++ code. This article provides a detailed analysis of all the questions.
If you haven't taken the quiz, but have opened this article, we urge you not to read it just yet. Don't deny yourself the pleasure of "bug-spotting", that's the funniest part.
Take the quiz and come back to us.
Now that you have completed the quiz, you may have questions or not understand some of the tasks and answers. Make yourself some tea or coffee and we'll take a closer look at the tasks. But first, let us say a few words of gratitude.
We created the quiz with the help of Sergei Kushnirenko's articles on Habr. The PVS-Studio team came up with the idea of organizing the code fragments from his articles into an entertaining exercise book.
Sergei Kushnirenko is an AI Gameplay Engineer at 4A Games.
He develops classic game AIs and enjoys dissecting algorithms as much as dismantling old cars. He took the errors and examples from the projects he worked on. Exceptions are disabled in the examples, and they also have vendor compiler features. As a result, some obvious issues may not be detected, and the programs may not crash but behave strangely.
LinkedIn account: dalerank.
Many thanks to Sergei for taking part in this project.
This code fragment was in the Unity 2014 engine to control an object rotation using gizmo (a control element that usually has the shape of three circles — it enables you to rotate the object in the scene). The setDeltaPitch function here changes the object angle relative to its vertical axis (pitch). At angles close to 0 (depending on the editor settings), the function simply turned the object upside down, which really annoyed level designers.
void UObject::setDeltaPitch(const UMatrix &gizmo) {
....
if (_fpzero(amount, eps))
return
rotateAccum.setAnglesXYZ(axis);
....
}
Here are the answer options:
The answer
There is no semicolon after the return. So, under certain conditions, it can cause the SetAnglesXYZ function to be called in the controlled object. As a result, the object would rotate to an arbitrary angle.
A note from the PVS-Studio team
It seems that errors caused by forgotten semicolons are quite common. Various articles mention them frequently. In reality, however, they are very rare. In more than 10 years of checking open-source projects, we've found only 3 such errors. The fact that everyone is aware of such errors and pays attention to them helps. Analyzers and compilers also seem to detect them well.
The compiler had its fun here. The developers used this function to calculate the hash sum when checking file integrity. During content creation, a hash of the files is calculated and stored with the files. Later, when loading the same file, the Unity player would again calculate the hash of the file and compare it to the saved one to ensure that the file hasn't been modified or corrupted. Encryption was used at the final packaging stage. The code author believed that the key will not leak outside the function, but something went wrong. With some of the leaked Unity source code and the bug in the engine, SKIDROW cracked Syberia 3 and several other big Unity games in 2017. The games used native content encryption tools. Denuvo was there too, but SKIDROW learned how to bypass it even earlier.
void UContentPackage::createPackage(dsaFile *m, const void *v, int size) {
unsigned char secretDsaKey[3096], L;
const unsigned char *p = v;
int i, j, t;
....
UContent::generateDsaKey(secretDsaKey, sizeof(salt));
....
// there was some code to encrypt and sign the file
....
memset (secretDsaKey, 0, sizeof(secretDsaKey));
}
Here are the answer options:
The answer
The compiler applied aggressive optimizations and removed the call to the memset function. No operations with secretDsaKey were executed after memset, so the compiler would just optimize the function call out. All key contents remained on the stack.
A note from the PVS-Studio team
This case is the exact opposite of the previous error pattern. The disappearing memset function call seems exotic and rare. However, it's a very common potential vulnerability. At the time of writing the article, our collection already counts 330 such errors in various open-source projects, and the number keeps growing. Although, it seems that a lot has already been written on this topic:
Private data can also be sent externally, for example, over a network. How? We discussed this topic in the article: "Overwriting memory - why?".
Working in two or more threads with any of the a/b variables can be problematic. The error was in the CryTek engine when synchronizing car states over the network. It caused jerky movements and teleportation when players were driving a vehicle in FarCry 1 multiplayer. The more players were on the map, the more likely the last player would teleport. With 16 players on the map, the last player would always teleport when using a car.
struct X {
int a : 2
int b : 2
} x;
Thread 1:
void foo() { x.a = 1 }
Thread 2:
void boo() { x.b = 1 }
Here are the answer options:
Here's the answer
The atomicity of the write operation is violated. The a and b fields are non-atomic, so they can be partially executed and interrupted by other operations. This code fragment has shared access to the complex AB variable, which has two two-bit parts. However, the compiler can't do such an assignment atomically by grabbing the ab byte and using bitwise operations to set the desired value. This can lead to data races and undefined behavior in a multithreaded environment.
foo(): # @foo()
mov AL, BYTE PTR [RIP + x] ; broken assignment 1
and AL, -4 ; broken assignment 2
or AL, 1 ; broken assignment 3
mov BYTE PTR [RIP + x], AL ; finished
ret
boo(): # @boo()
mov AL, BYTE PTR [RIP + x]
and AL, -13
or AL, 4
mov BYTE PTR [RIP + x], AL
ret
The following code fragment contains a data race even with a seemingly functional mutex. The error has been found in the Nintendo Switch 4.5.1 firmware and above. Developers accidentally stumbled across it while trying to speed up the creation of UI texture atlases at the game start. When they tried to load more than 100 textures, the atlas would break. However, when there were fewer textures, the atlas would be built without any issues. The developers still haven't fixed the zombie mutexes on Nintendo Switch. Moreover, there you couldn't create more than 256 mutexes per application. So, the system is "fun".
const size_t maxThreads = 10;
void fill_texture_mt(int thread_id, std::mutex *pm) {
std::lock_guard<std::mutex> lk(*pm);
// Access data protected by the lock.
}
void prepare_texture() {
std::thread threads[maxThreads];
std::mutex m;
for (size_t i = 0; i < maxThreads; ++i) {
threads[i] = std::thread(fill_texture_mt, i, &m);
}
}
Here are the answer options:
The answer
The mutex is deleted after exiting the prepare_texture function. The OS doesn't react to an inactive mutex because, as a core object, it stays for a while. The address is valid and contains correct data but doesn't provide actual thread locking.
Note. According to the documentation, destructors for active std::threads are called when exiting the function. This should cause a crash (a call to terminate). However, according to the code author, this didn't happen in this project. This is strange, but it doesn't really matter because the code is incorrect anyway.
We can define functions so that they take more arguments when called than when declared. Such functions are called variadic. C++ provides two mechanisms for defining a variadic function: a pattern with a variable number of parameters and a C-style ellipsis as the last parameter declaration. A rather unpleasant behavior occurred in the popular sound library — FMOD Engine. The code is given as it was in the source: it seems that the devs wanted to save on templates. You can play with the code on the OnlineGDB website.
int var_add(int first, int second, ...) {
int r = first + second;
va_list va;
va_start(va, second);
while (int v = va_arg(va, int)) {
r += v;
}
va_end(va);
return r;
}
Here are the answer options:
The answer
This is a C-style variadic function for adding a series of integers. The arguments are read until 0 is found. Calling the function without passing 0 as the last argument (after the first two arguments) causes undefined behavior. Passing any type other than int also results in undefined behavior.
Here we didn't lock anything again, even though it looked like we did. This code fragment was found in the Metro: Exodus engine. When working with resources, the game strangely crashed. The developers found it thanks to bug reports and one clever French guy.
static std::mutex m;
static int shared_resource = 0;
void increment_by_42() {
std::unique_lock<std::mutex>(m);
shared_resource += 42;
}
Here are the answer options:
The answer
The code author expects the std::unique_lock anonymous local variable to lock and unlock the m mutex via RAII. In fact, the new m variable is created here, it's initialized by default. So, the mutex isn't captured. See also: "Most vexing parse".
A note from the PVS-Studio team
The error is quite rare. PVS-Studio has the V1025 diagnostic rule for detecting it. However, we have not yet found such an error when checking open-source projects. It was interesting to see it, as they say, in its natural habitat :).
The following code fragment was used to calculate region checksums in PathEngine. In the release build, it had issues on different compilers. The PlayStation build used a specific flag that masked the issue, while the Xbox build didn't. Developers found the error when trying to build the library on a PC using Clang.
struct AreaCrc32 {
unsigned char buffType = 0;
int crc = 0;
};
AreaCrc32 s1 {};
AreaCrc32 s2 {};
void similarArea(const AreaCrc32 &s1, const AreaCrc32 &s2) {
if (!std::memcmp(&s1, &s2, sizeof(AreaCrc32))) {
// ....
}
}
Here are the answer options:
The answer
The fields in the structure are aligned on a 4-byte boundary. Padding bytes, which may be filled with zeros or with garbage, are formed between buffType and sgs.
struct S {
unsigned char buffType = 0;
char[3] _padding = { compiler don't care about content in release };
int crc = 0;
};
The memcmp function compares memory on a byte-by-byte basis, including garbage bytes. So, we get an unknown result when we compare s1 to s2. The PlayStation compiler settings explicitly tell the compiler to fill padding bytes with zeros. The C++ rules for structures state that we need to use the operator == here. The memcmp function isn't designed for this task.
By the way, here's another case of comparing redundancies. We noticed it once during a code review, and just in time.
class C {
int i;
public:
virtual void f();
// ...
};
void similar(C &c1, C &c2) {
if (!std::memcmp(&c1, &c2, sizeof(C))) {
// ...
}
}
In C++, comparing structures and classes using memcmp is a bad idea, especially if they contain a pointer to a virtual function table.
Let's say there are two different classes inherited from C. Then, if c1 and c2 are references to instances of these two different classes, they won't be equal, even if all the data is the same (except for the pointer to the virtual function table).
But wait, there's more. The presence of an implicit pointer in the class can cause alignment issues (the presence of padding bytes).
Just in case, we would like to remind you that this text is based on Sergei Kushnirenko's personal experience and his publications. We would like to avoid questions why the PVS-Studio team asks these things or writes such code. We don't do that :). Sergei is not a part of the PVS-Studio team. We decided to leave this comment to avoid repeating strange discussions like the one that happened after we posted a translation of a third-party article about "Google programmers".
It should be simple, but things like this often confuse applicants during interviews.
struct S { S(const S *) noexcept; /* ... */ };
class T {
int n;
S *s1;
public:
T(const T &rhs) : n(rhs.n),
s1(rhs.s1 ? new S(rhs.s1) : nullptr) {}
~T() { delete s1; }
// ...
T& operator=(const T &rhs) {
n = rhs.n;
delete s1;
s1 = new S(rhs.s1);
return *this;
}
};
Here are the answer options:
The answer
There's no check for assigning the object to itself in operator =. As a result, we delete the data and load some garbage.
There's another weak point in operator =. If exceptions are not disabled in the project, new may potentially throw std::bad_alloc. In this case, this->s1 keeps the old and possibly non-zero value which is then passed to delete in the destructor. As a result, we get double free.
A note from the PVS-Studio team
Indeed, when rearranging the fields in a structure or class, sometimes you can reduce their size. However, it doesn't apply to the class in question. In a 32-bit program, the class size is always 8 bytes; in a 64-bit program, it's always 16 bytes because of the alignment required. At least, this is true for the most common data models and architectures. Learn more about reducing the size of classes and structures: V802.
You can catch an unobvious bug in as little as four lines of code. Especially in Unity Engine, which likes to include common headers in different DLLs and then compare them like this: super_secret() == super_secret(). What could possibly go wrong, right? Well, secret != secret, and the game doesn't run on Windows Phone.
// header a.h
constexpr inline const char* super_secret(void) {
constexpr const char *STRING = "string";
return STRING;
}
// somewhere in a.dll
if (super_secret() == saved_string_ptr)
// somewhere in b.dll
if (super_secret() != saved_string_ptr)
Here are the answer options:
The answer
The error is caused by using a common a.h header in different libraries. Everything works fine as long as the engine is built as a single DLL. By building two different DLLs, we get two different addresses for STRING. The error here occurs because strings are compared as pointers instead of using the strcmp function.
A note from the PVS-Studio team
Most of the bugs we've described in this article are related to the game development industry in one way or another. So, there are probably a lot of game devs reading this article. The PVS-Studio team wants to ask you for something. We have started to implement special diagnostic rules in the analyzer. They are designed to detect bugs specific to Unity and Unreal Engine projects. We appreciate your ideas for new diagnostic rules, so don't hesitate to share them.
The function may have UB with a buffer overflow. This often happens with the -O3 optimization for Clang, and doesn't happen with -O0. The issue has already been fixed for all optimization modes in Clang 12.10 and above. I didn't write the code. It came up during one of the interviews when we were just having a heart-to-heart.
char destBuffer[16];
void serialize(bool x) {
const char* whichString = x ? "true" : "false";
const size_t len = strlen(whichString);
memcpy(destBuffer, whichString, len);
}
int main() {
bool x;
serialize(x);
printf("%s", destBuffer);
return 0;
}
Here are the answer options:
The answer
The compiler fooled itself here. So, what's going on? First, remember that bool isn't just 0 or 1. It's compiler-dependent. Clang determines that 0 is false and everything else is true. A strict compiler optimization converts the len calculation to 'len = 5 - x'. However, the x variable is not initialized, so we may get 'len = 5-7'. In this case, calling memcpy causes the program to crash due to a buffer overflow.
A note from the PVS-Studio team
Programmers are often wrong when they try to predict how undefined behavior will manifest itself. It can't and shouldn't be done. Here's a good article on the subject: "Falsehoods programmers believe about undefined behavior".
What will be displayed as a result of running the program?
void sayHello() {
std::cout << "Hello, World!\n";
}
void sayHello() {
std::cout << "Goodbye, World!\n";
}
int main() {
sayHello();
return 0;
}
Here are the answer options:
The answer
The console will display "Goodbye, World!" when you run the code. This is because the name of the second sayHello function is written in Unicode, which unfortunately can't always be displayed. So, this function is called (Clang, latest).
sayHello(): # @sayHello()
push rbp
mov rbp, rsp
mov rdi, qword ptr [rip + std::cout@GOTPCREL]
lea rsi, [rip + .L.str]
call std::basic_ostream<char, std::char_traits<char> >&
std::operator<< <std::char_traits<char> >(std::basic_ostream<char,
std::char_traits<char> >&, char const*)@PLT
pop rbp
ret
"_Z9sayHellov": # @"_Z9say\D0\9Dellov"
push rbp
mov rbp, rsp
mov rdi, qword ptr [rip + std::cout@GOTPCREL]
lea rsi, [rip + .L.str.1]
call std::basic_ostream<char, std::char_traits<char> >&
std::operator<< <std::char_traits<char> >(std::basic_ostream<char,
std::char_traits<char> >&, char const*)@PLT
pop rbp
ret
A note from the PVS-Studio team
Anyone interested in learning more about the Unicode nuances is welcome to read: "Trojan Source attack for introducing invisible vulnerabilities".
The function has the amazing capability to alter reality depending on how many legs its caller has.
int abs_legs(int my_legs) {
if (my_legs < 0) {
return -my_legs;
}
}
Here are the answer options:
The answer
A value-returning function should return a value (hm!) from all possible branches, because there will always be a condition that leads to undefined behavior.
Here's the correct code:
int abs_legs(int my_legs) {
if (my_legs < 0) {
return -my_legs;
}
return my_legs;
}
A note from the PVS-Studio team
Many people think that undefined behavior here boils down to the fact that the function can return a random value. Let us say once again that this is not the case. All we can say for sure is that this code contains an error. It's impossible to predict exactly how it will reveal itself. For example, the compiler may not generate a function exit statement at all. Don't you believe? Then you may want to read "An example of undefined behavior caused by absence of return".
The following function doesn't fully expand, leaving its magic behind the compiler veil.
int get_money(int index, const int *pockets) {
int a = index + pockets[++index];
// ....
return a;
}
Here are the answer options:
The answer
The compiler may rearrange the evaluation order for index + pockets[++index], leading to undefined behavior with different optimization settings. An unordered or undefined sequence of operations will cause a side effect when working with the index variable.
The same code given in "Order of evaluation" is considered to result in undefined behavior:
n = ++i + i; // undefined behavior
So, we can rewrite the code like this:
int a = index + pockets[index + 1];
++index;
Or like this, depending on what we want to get:
++index;
int a = index + pockets[index];
A note from the PVS-Studio team
Do you know how the following code will behave?
printf("%d, %d\n", i++, i++);
Are you sure? We invite you to read the following article: "How deep the rabbit hole goes, or C++ job interviews at PVS-Studio".
The code looks correct, but we put it here for a reason, right? :)
void umcp_read_buffer_from_pipe() {
char bufKernel[12];
char bufMatrix[12];
std::cin.width(12);
std::cin >> bufKernel;
std::cin >> bufMatrix;
}
Here are the answer options:
The answer
In this example, the first read doesn't lead to overflow and fills bufKernel with a truncated string. But the second read can overflow bufMatrix. To prevent this from happening, let's call std::cin.width(12); before getting the bufMatrix. Or we can safely work via strings.
void umcp_read_buffer_from_pipe() {
std::string bufKernel, bufMatrix;
std::cin >> bufKernel >> bufMatrix;
}
If you're using C++20, the next function overload is selected and everything goes safely. Even if you didn't do it on purpose...
template< class Traits, std::size_t N >
basic_istream<char, Traits>&
operator>>( basic_istream<char, Traits>& st, signed char (&s)[N] );
It seems that in this code fragment, the "we-don't-do-that-here" programmer simply mixes randomness with absurdity and calls it "display customization".
std::string str_func();
void display_string(const char *);
void set_display_options() {
const char *str = str_func().c_str();
display_string(str);
}
Here are the answer options:
The answer
Here std::string::c_str is called for the temporary std::string object. The resulting pointer points to freed but possibly still valid memory after destroying the std::string object at the end of the assignment expression. However, don't count on it. You will get undefined behavior when dereferencing this pointer.
We can fix the code like this:
std::string str = str_func();
const char *cstr = str.c_str();
display_string(cstr);
Or like this:
display_string(str_func().c_str())
Why is it dangerous to go to a bar in the morning?
void morning(const std::string &owner) {
std::fstream bar(owner);
if (!bar.is_open()) {
// Handle error
return;
}
bar << "customer";
std::string str;
bar >> str;
}
Here are the answer options:
The answer
Because there is a chance you won't exit the bar. The data is added to the end of the file and then read from the same file. However, since the position indicator in the file remains in the same place (at the end), an attempt to read the data (from the end) of the file leads nowhere. You have to go to the bar in the evening, then you can exit it without any issues.
void evening(const std::string &owner) {
std::fstream bar(owner);
if (!bar.is_open()) {
// Handle error
return;
}
bar << "customer";
std::string str;
bar.seekg(0, std::ios::beg);
bar >> str;
}
Will the bard become a drunkard?
class Bard {
int _beer;
int _meal;
public:
Bard(int meal) : _meal(meal), _beer(_meal - 1) {}
};
Here are the answer options:
The answer
Class members are initialized in the order of their declaration, not in the order in which they are listed in the constructor initialization list.
The class member initialization order is broken. So, the _beer variable is initialized first, and then _meal. Trying to read the _meal value before it's initialized leads to undefined behavior. Most likely, the bard will get drunk without a snack.
We can fix it like this:
Bard(int meal) : _beer(meal - 1), _meal(meal) {}
Another way to fix this is to use the meal variable for initialization everywhere:
Bard(int meal) : _meal(meal), _beer(meal - 1) {}
We can also swap the class fields:
class Bard {
int _meal;
int _beer;
public:
Bard(int meal) : _meal(meal), _beer(_meal - 1) {}
};
When would a shot in the leg be particularly effective?
std::vector<....> formats;
....
for (auto format = begin(formats), __end = end(formats);
format != __end; ++format)
{
if (snd::CodecNamesEq(....)) {
format->is_stereo = true;
formats.push_back(stereo_format);
}
}
Here are the answer options:
The answer
If the push_back process reallocates inside the formats container, the iterator format points to deleted memory, which causes a lot of issues.
Moreover, a variable named __end is not a good thing. Variables containing a double underscore are reserved for a standard library and internal use.
Note. It's implied that the used container can invalidate iterators when inserting elements at the end, although this isn't specified in the code. That's important because push_back in std::list doesn't do that.
We'd like to thank Sergei Kushnirenko for providing materials for the quiz and this article. A kind word to Andrey Karpov, Mikhail Gelvih, Phillip Khandeliants, and other PVS-Studio team members. Why not actually praise ourselves? :) Thank you, readers, we hope you had fun. If you want more stuff like this, follow the link.
Drop a comment below to share your funny stories or the weirdest code you've ever come across.