How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers
Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more attentive than several people.
A guy sent an email to our support saying that the analyzer was producing four false positives at once on one line of his code. The email initially got into Evgeny Ryzhkov's email box. He glanced through the feedback, found nothing strange and forwarded it to our leading developer Svyatoslav Razmyslov. Since Evgeny didn't really examine the code, he counts as just half a programmer :).
Svyatoslav read the email and didn't believe the analyzer could be so wrong. So he came to me and asked for help. He hoped I had a better eye for such things and could notice something to help us find out the reason why the analyzer had issued all those strange messages. Sadly, I could only admit that they were strange indeed and shouldn't have been there. Yet I still had no idea about the cause. So we opened a task in the bug tracker to track it down.
It was not until Svyatoslav started making up synthetic tests to describe the problem in detail in the bug tracker that he had the "Aha!" moment. Now, let's see if you guys can quickly spot the defect that triggered those four messages.
Here's the email text (published with the author's permission) along with the attached image illustrating the problem.
V560 warnings here are all false. Running with most recent version of PVS-Studio for personal use. Basically, "IF" statement is correct. Outer one is done for speed - inner ones are still needed and non are always true or false.
Click on the image to enlarge.
Now guys, it's time to test yourselves! Can you see the bug?
Don't hurry, look carefully. And the unicorn will just sit here and wait.
With that introduction, I bet it didn't take you much time to spot the bug. When you are determined to find one, it comes up quickly. But it's way harder to notice it after reading an email that calls it "false positives" :).
Now let me explain it to those who were too lazy to bother trying. Look at the condition once again:
if (!((ch >= 0x0FF10) && (ch <= 0x0FF19)) || ((ch >= 0x0FF21) && (ch <= 0x0FF3A)) || ((ch >= 0x0FF41) && (ch <= 0x0FF5A)))
The programmer intended to check that the character didn't fall into any of the three ranges.
The error here is that the logical NOT (!) operator is applied only to the first subexpression.
If this condition is true:
!((ch >= 0x0FF10) && (ch <= 0x0FF19))
then further evaluation of the expression is aborted, just as prescribed by the short-circuit evaluation semantics. If the condition is false, then the value of the ch variable lies in the range [0xFF10..0xFF19] and the next four comparisons make no sense since they will all be either true or false.
So, once again, just to make it clear: if ch is within the range [0xFF10..0xFF19] and the evaluation continues, then:
- ch >= 0x0FF21 is always false
- ch <= 0x0FF3A is always true
- ch >= 0x0FF41 is always false
- ch <= 0x0FF5A is always true
That's what PVS-Studio is telling us.
That's it. The static analyzer proved to be more attentive than one user and two and a half programmers from our team.
To fix the bug, we just need to write additional parentheses:
if (!(((ch >= 0x0FF10) && (ch <= 0x0FF19)) || ((ch >= 0x0FF21) && (ch <= 0x0FF3A)) || ((ch >= 0x0FF41) && (ch <= 0x0FF5A))))
Or rewrite the condition:
if (((ch < 0x0FF10) || (ch > 0x0FF19)) && ((ch < 0x0FF21) || (ch > 0x0FF3A)) && ((ch < 0x0FF41) || (ch > 0x0FF5A)))
Actually, I wouldn't recommend either of these solutions. Personally, I'd make the code clearer by writing it as follows:
const bool isLetterOrDigit = (ch >= 0x0FF10 && ch <= 0x0FF19) // 0..9 || (ch >= 0x0FF21 && ch <= 0x0FF3A) // A..Z || (ch >= 0x0FF41 && ch <= 0x0FF5A); // a..z if (!isLetterOrDigit)
Note how I removed some of the parentheses. As you just saw, adding a bunch of parentheses doesn't help prevent an error. Parentheses are meant to make code easier to read, not to obscure it. Programmers remember very well that the precedence of the comparison operations =< and => is higher than that of the && operator. That's why you don't need parentheses to handle them. But if you ask which operator - && or || - has higher precedence, many will be confused. That's why it's better to add parentheses to define the order of evaluation of && and || just to be sure.
The question why it is better to write || at the beginning was addressed in my article "The Ultimate Question of Programming, Refactoring, and Everything" (see the chapter "Table-style formatting").
Thanks for reading. Come over to our website to download PVS-Studio and give it a try. It'll help you catch lots of bugs and potential vulnerabilities at the earliest development stages.