Webinar: Parsing C++ - 10.10
It has been three months since 2018 had ended. For many, it has just flew by, but for us, PVS-Studio developers, it was quite an eventful year. We were working up a sweat, fearlessly competing for spreading the word about static analysis and were searching for errors in open source projects, written in C, C++, C#, and Java languages. In this article, we gathered the top 10 most interesting of them right for you!
To find the most intriguing places, we used the PVS-Studio static code analyzer. It can detect bugs and potential vulnerabilities in code, written in languages listed above.
If you're excited about searching for errors by yourself, you're always welcome to download and try our analyzer. We provide the free analyzer version for students and enthusiastic developers, the free license for developers of open-source projects, and also the trial version for all the world and his dog. Who knows, maybe by the next year you will be able to create your own top 10? :)
Note: I invite you to check yourself and before you look at the analyzer warning, try to reveal defects yourself. How many errors will you be able to find?
Source: Into Space Again: how the Unicorn Visited Stellarium
This error was detected when checking a virtual planetarium called Stellarium.
The above code fragment, though small, is fraught with a pretty tricky error:
Plane::Plane(Vec3f &v1, Vec3f &v2, Vec3f &v3)
: distance(0.0f), sDistance(0.0f)
{
Plane(v1, v2, v3, SPolygon::CCW);
}
Found it?
PVS-Studio warning: V603 The object was created but it is not being used. If you wish to call constructor, 'this->Plane::Plane(....)' should be used. Plane.cpp 29
The code author intended to initialize some object's fields, using another constructor, nested in the main one. Well, instead of it, he only managed to create a temporary object destroyed when leaving its scope. By so doing, several object's fields will remain uninitialized.
The author should have used a delegate constructor, introduced in C++11, instead of a nested constructor call. For example, he could have written like this:
Plane::Plane(Vec3f& v1, Vec3f& v2, Vec3f& v3)
: Plane(v1, v2, v3, SPolygon::CCW)
{
distance = 0.0f;
sDistance = 0.0f;
}
This way, all necessary fields would have been initialized correctly. Isn't it wonderful?
Source: Perl 5: How to Hide Errors in Macros
A very remarkable macro stands out in all its beauty on the ninth place.
When gathering errors for writing an article, my colleague Svyatoslav came across a warning, issued by the analyzer, which related to macro usage. Here it is:
PP(pp_match)
{
....
MgBYTEPOS_set(mg, TARG, truebase, RXp_OFFS(prog)[0].end);
....
}
To find out what was wrong, Svyatoslav dug deeper. He opened the macro definition and saw that it contained several nested macros, some of which in turn also had nested macros. It was so hard to make sense out of that, so he had to use a preprocessed file. Sadly, it didn't help. This is what Svyatoslav found in the previous line of code:
(((targ)->sv_flags & 0x00000400) && (!((targ)->sv_flags & 0x00200000) ||
S_sv_only_taint_gmagic(targ)) ? (mg)->mg_len = ((prog->offs)[0].end),
(mg)->mg_flags |= 0x40 : ((mg)->mg_len = (((targ)->sv_flags & 0x20000000)
&& !__builtin_expect(((((PL_curcop)->cop_hints + 0) & 0x00000008) ?
(_Bool)1 :(_Bool)0),(0))) ? (ssize_t)Perl_utf8_length( (U8 *)(truebase),
(U8 *)(truebase)+((prog->offs)[0].end)) : (ssize_t)((prog->offs)[0].end),
(mg)->mg_flags &= ~0x40));
PVS-Studio warning: V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '&&' operator. pp_hot.c 3036
I think, it would be challenging to simply notice such an error. We have been dwelling on this code for long, but, frankly speaking, we haven't found an error in it. Anyway, it's quite an amusing example of poorly readable code.
They say that macros are evil. Sure, there are cases, when macros are indispensable, but if you can replace a macro with a function - you should definitely do it.
Nested macros are especially full of pitfalls. Not only because it is difficult to make sense of them, but also because they can give unpredictable results. If a programmer makes a mistake in such a macro - it will be much more difficult to find it in a macro, than in a function.
Source: Chromium: Other Errors
Next example was taken from the series of articles on the analysis of the Chromium project. The error was hiding in the WebRTC library.
std::vector<SdpVideoFormat>
StereoDecoderFactory::GetSupportedFormats() const
{
std::vector<SdpVideoFormat> formats = ....;
for (const auto& format : formats) {
if (cricket::CodecNamesEq(....)) {
....
formats.push_back(stereo_format);
}
}
return formats;
}
PVS-Studio warning: V789 CWE-672 Iterators for the 'formats' container, used in the range-based for loop, become invalid upon the call of the 'push_back' function. stereocodecfactory.cc 89
The error is that the size of the formats vector varies within the range-based for loop. Range-based loops are based on iterators, that's why changing of the container size inside of such loops might result in invalidation of these iterators.
This error persists, if rewrite the loop with an explicit usage of iterators. For clarity, I can cite the following code:
for (auto format = begin(formats), __end = end(formats);
format != __end; ++format) {
if (cricket::CodecNamesEq(....)) {
....
formats.push_back(stereo_format);
}
}
For example, when using the push_back method, a vector reallocation may occur - this way, the iterators will address an invalid memory location.
To avoid such errors, follow the rule: never change a container size inside a loop with conditions bound to this container. It also relates to range-based loops and loops using iterators. You're welcome to read this discussion on Stack Overflow that covers the topic of operations causing invalidation of iterators.
Source: Godot: On Regular Use of Static Analyzers
The first example from the game industry will be a code snippet that we found in the Godot game engine. Probably, it'll take some work to notice the error, but I'm sure that our conversant readers will cope with that.
void AnimationNodeBlendSpace1D::add_blend_point(
const Ref<AnimationRootNode> &p_node, float p_position, int p_at_index)
{
ERR_FAIL_COND(blend_points_used >= MAX_BLEND_POINTS);
ERR_FAIL_COND(p_node.is_null());
ERR_FAIL_COND(p_at_index < -1 || p_at_index > blend_points_used);
if (p_at_index == -1 || p_at_index == blend_points_used) {
p_at_index = blend_points_used;
} else {
for (int i = blend_points_used - 1; i > p_at_index; i++) {
blend_points[i] = blend_points[i - 1];
}
}
....
}
PVS-Studio warning: V621 CWE-835 Consider inspecting the 'for' operator. It's possible that the loop will be executed incorrectly or won't be executed at all. animation_blend_space_1d.cpp 113
Let's have a close look at the loop condition. The counter variable is initialized by the value blend_points_used - 1. In addition, judging from two previous checks (in ERR_FAIL_COND and in if), it becomes clear, that by the moment of the blend_points_used loop execution, blend_points_used will always be greater, than p_at_index. Thus, either the loop condition is always true or the loop isn't executed at all.
If blend_points_used - 1 == p_at_index, the loop doesn't execute.
In all other cases the check i > p_at_index will always be true, as the i counter goes up on each loop iteration.
It seems that the loop is eternal, but it is not so.
Firstly, an integer overflow of the i variable (which is undefined behavior) will occur. This means, we shouldn't to rely on it.
If i was unsigned int, then after the counter reaches the largest possible value, the operator i++ would turn it into 0. Such behavior is defined by the standard and is called "Unsigned wrapping". However, you should be aware that the use of such a mechanism is also not a good idea.
It was the first point, but we still have the second! The case is that we won't even get to an integer overflow. The array index will go out of bounds a way earlier. This means, that there will be an attempt to access memory outside the block allocated for the array. Which is undefined behavior as well. A classic example:)
I can give you a couple of recommendations to make it easier to avoid similar errors:
Source: Amazon Lumberyard: A Scream of Anguish
Here is another example from the gamedev industry, namely from the source code of the AAA-engine of Amazon Lumberyard.
void TranslateVariableNameByOperandType(....)
{
// Igor: yet another Qualcomm's special case
// GLSL compiler thinks that -2147483648 is
// an integer overflow which is not
if (*((int*)(&psOperand->afImmediates[0])) == 2147483648)
{
bformata(glsl, "-2147483647-1");
}
else
{
// Igor: this is expected to fix
// paranoid compiler checks such as Qualcomm's
if (*((unsigned int*)(&psOperand->afImmediates[0])) >= 2147483648)
{
bformata(glsl, "%d",
*((int*)(&psOperand->afImmediates[0])));
}
else
{
bformata(glsl, "%d",
*((int*)(&psOperand->afImmediates[0])));
}
}
bcatcstr(glsl, ")");
....
}
PVS-Studio warning: V523 The 'then' statement is equivalent to the 'else' statement. toglsloperand.c 700
Amazon Lumberyard is developed as a cross-platform engine. For this reason, developers try to support as many compilers as possible. As we can see from the comments, a programmer Igor came against the Qualcomm compiler.
We don't know if he managed to carry out his task and wade though "paranoid" compiler checks, but he left very strange code. The weird thing about it is that both then- and else-branches of the if statement contain absolutely identical code. Most likely, such an error resulted from using a sloppy Copy-Paste method.
I don't even know what to advise here. So I just wish Amazon Lumberyard developers all the best in fixing errors and good luck for the developer Igor!
Source: Once again, the PVS-Studio analyzer has proved to be more attentive than a person
An interesting story happened with the next example. My colleague Andrey Karpov was preparing an article about another check of the Qt framework. When writing out some notable errors, he stumbled upon the analyzer warning, which he considered false. Here is that code fragment and the warning for it:
QWindowsCursor::CursorState QWindowsCursor::cursorState()
{
enum { cursorShowing = 0x1, cursorSuppressed = 0x2 };
CURSORINFO cursorInfo;
cursorInfo.cbSize = sizeof(CURSORINFO);
if (GetCursorInfo(&cursorInfo)) {
if (cursorInfo.flags & CursorShowing) // <= V616
....
}
PVS-Studio warning: V616 CWE-480 The 'CursorShowing' named constant with the value of 0 is used in the bitwise operation. qwindowscursor.cpp 669
Which means, that PVS-Studio was complaining at the place, which obviously had no error! It's impossible for the CursorShowing constant to be 0, as just a couple of lines above it is initialized by 1.
Since Andrey was using an unstable analyzer version, he questioned the correctness of the warning. He carefully looked through that piece of code and still didn't find a bug. He eventually gave it a false positive in the bugtracker so other colleagues could remedy the situation.
Only a detailed analysis showed that PVS-Studio turned out to be more careful than a person again. The 0x1 value is assigned to a named constant called cursorShowing while CursorShowing takes part in a bitwise "and" operation. These are two totally different constants, the first begins with a lowercase letter, the second - with a capital.
The code compiles successfully, because the class QWindowsCursor really contains a constant with this name. Here's its definition:
class QWindowsCursor : public QPlatformCursor
{
public:
enum CursorState {
CursorShowing,
CursorHidden,
CursorSuppressed
};
....
}
If you don't assign a value to an enum constant explicitly, it will be initialized by default. As CursorShowing is the first element in the enumeration, it will be assigned 0.
To avoid such errors, you shouldn't give entities too similar names. You should particularly closely follow this rule if entities are of the same type or can be implicitly cast to each other. As in such cases it will be almost impossible to notice the error, but the incorrect code will still be compiled and live in easy street inside your project.
Source: Shoot yourself in the foot when handling input data
We are getting closer to the top three finalists and the next in line is the error from the FreeSWITCH project.
static const char *basic_gets(int *cnt)
{
....
int c = getchar();
if (c < 0) {
if (fgets(command_buf, sizeof(command_buf) - 1, stdin)
!= command_buf) {
break;
}
command_buf[strlen(command_buf)-1] = '\0'; /* remove endline */
break;
}
....
}
PVS-Studio warning: V1010 CWE-20 Unchecked tainted data is used in index: 'strlen(command_buf)'.
The analyzer warns you that some unchecked data is used in the expression strlen(command_buf) - 1. Indeed: if command_buf is an empty string in terms of the C language (containing the only character - '\0'), strlen(command_buf) will return 0. In such a case, command_buf[-1] will be accessed, which is undefined behavior. That's bad!
The actual zest of this error is not why it occurs, but how. This error is one of those nicest examples, which you "touch" by yourself, reproduce. You can run FreeSwitch, undertake some actions that will lead to execution of the code fragment mentioned above and pass an empty string to the input of the program.
As a result, with a subtle movement of the hand a working program turns into a nonworking! You can find the details about how to reproduce this error in the source article by the link given above. Meanwhile, let me provide you a telling result:
Keep in mind, that output data can be anything, so you should always check it. This way, the analyzer will not complain and the program will become more reliable.
Now it's time to go for our winner: we are in the endgame now! By the way, bugs-finalists have already waited a long wait, then got bored and even started being chicky. Just look what they staged while we were away!
Source: NCBI Genome Workbench: Scientific Research under Threat
A code snippet from the NCBI Genome Workbench project, which is a set of tools for studying and analyzing genetic data, opens the top 3 winners. Even though you don't have to be a genetically modified superhuman to find this bug, only few people know about the contingency to make an error here.
/**
* Crypt a given password using schema required for NTLMv1 authentication
* @param passwd clear text domain password
* @param challenge challenge data given by server
* @param flags NTLM flags from server side
* @param answer buffer where to store crypted password
*/
void
tds_answer_challenge(....)
{
....
if (ntlm_v == 1) {
....
/* with security is best be pedantic */
memset(hash, 0, sizeof(hash));
memset(passwd_buf, 0, sizeof(passwd_buf));
...
} else {
....
}
}
PVS-Studio warnings:
Did you find a bug? If yes, you are an attaboy!..or a genetically modified superhuman.
The fact of the matter is that modern optimizing compilers are able to do a lot to enable a built program to work faster. Including the fact that compilers can now track that a buffer, passed to memset, is not used anywhere else.
In this case, they can remove the "unnecessary" call of memset, having all rights for that. Then the buffer that stores important data may remain in memory to the delight of the attackers.
Against this background, this geek comment "with security is best be pedantic" sounds even funnier. Judging by a small number of warnings, given for this project, its developers did their best to be precise and write safe code. However, as we can see, one can easily overlook such a security defect. According to the Common Weakness Enumeration, this defect is classified as CWE-14: Compiler Removal of Code to Clear Buffers.
You should use the memset_s() function so that the memory deallocation was safe. The function is both safer than memset() and cannot be ignored by a compiler.
Source: How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers
A silver medalist was kindly sent to us by one of our clients. He was sure that the analyzer issued some false positives.
Evgeniy got the email, looked through that and sent to Svyatoslav. Svyatoslav had a close look at the piece of code, sent by the client and thought: "how is it possible the analyzer has made such a blunder?". So he went for advice to Andrey. He also checked that place and determined: indeed, the analyzer generated false positives.
So it goes, that needed fixing. Only after Svyatoslav began making synthetic examples to create the task in our bug tracker, he got what was wrong.
None of the programmers could find the errors, but they really were in the code. Frankly speaking, the author of this article also failed to find them despite the fact, that the analyzer clearly issued warnings for erroneous places!
Will you find such a crafty bug? Test yourself on vigilance and attentiveness.
PVS-Studio warning:
If you did it - kudos to you!
The error lies in the fact that the logical negation operator (!) is not applied to the entire condition, but only its first sub-expression:
!((ch >= 0x0FF10) && (ch <= 0x0FF19))
If this condition is true, the ch variable value lies in the range [0x0FF10...0x0FF19]. Thus, four further comparisons are already meaningless: they will always be either true or false.
To avoid such errors, it's worth sticking to a few rules. Firstly, it is very convenient and informative to align the code like a table. Secondly, you shouldn't overload the expressions with parentheses. For example, this code could be rewritten like this:
const bool isLetterOrDigit = (ch >= 0x0FF10 && ch <= 0x0FF19) // 0..9
|| (ch >= 0x0FF21 && ch <= 0x0FF3A) // A..Z
|| (ch >= 0x0FF41 && ch <= 0x0FF5A); // a..z
if (!isLetterOrDigit)
This way, there will be fewer parentheses and on the other hand - you will more likely notice an occasional error.
Here comes the cherry on top - let's move on to the first place!
Source: Shocked System: Interesting Errors in the Source Code of the Legendary System Shock
Today's top finalist is an error from the legendary System Shock! It is a game released quite long ago in 1994, which became a predecessor and inspiration for such iconic games, as Dead Space, BioShock and Deus Ex.
But first I have something to confess. What I'm going to show you now, does not contain any errors. Actually, it is not even a piece of code, but I just couldn't help sharing it with you!
The thing is that while analyzing the source code of the game, my colleague Victoria discovered plenty of fascinating comments. In different fragments she found some witty and ironic remarks, and even poetry.
// I'll give you fish, I'll give you candy,
// I'll give you, everything I have in my hand
// that kid from the wrong side came over my house again,
// decapitated all my dolls
// and if you bore me, you lose your soul to me
// - "Gepetto", Belly, _Star_
// And here, ladies and gentlemen,
// is a celebration of C and C++ and their untamed passion...
// ==================
TerrainData terrain_info;
// Now the actual stuff...
// =======================
// this is all outrageously horrible, as we dont know what
// we really need to deal with here
// And if you thought the hack for papers was bad,
// wait until you see the one for datas... - X
// Returns whether or not in the humble opinion of the
// sound system, the sample should be politely obliterated
// out of existence
// it's a wonderful world, with a lot of strange men
// who are standing around, and they all wearing towels
This is how the comments left in games by developers in latest 90s look like... By the way, Doug Church - a Chief Designer of the System Shock, was also busy writing code. Who knows, maybe some of these comments were written by him? Hope, men-in-towels stuff is not his handiwork :)
In conclusion, I'd like to thank my colleagues for searching for new bugs and writing about them in articles. Thank you, guys! Without you, this article wouldn't be that interesting.
Also I'd like to tell a bit about our achievements, as the whole year we haven't been busy with only searching for errors. We've also been developing and improving the analyzer, which resulted in its significant changes.
For example, we have added support of several new compilers and expanded the list of diagnostic rules. Also we have implemented initial support of standards MISRA C and MISRA C++. The most important and time-consuming new feature was support of a new language. Yes, we now can analyze code in Java! And what is more, we have a renewed icon :)
I also want to thank our readers. Thanks for reading our articles and writing to us! You are so responsive and you're so important for us!
Our top 10 C++ errors of 2018 has come to an end. What fragments did you like most and why? Did you come across some interesting examples in 2018?
All the best, see you next time!
0