This introduction begins a series of articles dealing with a recurrent check of a Chromium project using PVS-Studio static code analyzer. The articles include various patterns of errors and recommendations that reduce the likelihood of such errors appearing in code. However, to start with, some sort of an introduction should be presented, which will answer a number of questions in advance, and present all of the bugs discovered to the developers of Chromium, so that they can start fixing them without waiting for the end of this article series.
My name is Andrey Karpov and I am the evangelist of static analysis as a whole and of the PVS-Studio static analysis tool in particular. However, the term "technical evangelist" is already outdated and was replaced by "developer advocate".
I dedicate a lot of time for writing material on improving code quality and increasing reliability of programs. Now I have a new reason to write more articles on this topic, which is a check of an open source project Chromium using PVS-Studio. This is a big project, and in any large project you can find bugs of various kinds living in it. Such diversity allows to review several interesting topics related to the causes of these bugs and ways of preventing them.
It is worth noting that this is not the first article dedicated to the Chromium project. Here are my previous publications:
As you can see, I was not that good at thinking of interesting titles for the articles and I had ran out of energy. So, the baton was picked up by my colleagues:
By the way, while I was studying a fresh report, I could not resist and posted a small note about a bug, that I liked. Since the article is already published, I'll give a link here to it as well:
Every time we checked this project, a huge number of errors were discovered in it. The new check is not an exception. Moreover, since the PVS-Studio analyzer is getting better at detecting errors, at first I just didn't know what to do with all of them. Briefly looking through the report, I wrote down about 250 errors and pondered. Shall I describe all the 250 errors in one article? It will be some kind of horror: long, boring and uninteresting. Separate this account into several parts? It will not do better, as we will get several boring articles instead of one.
Then I decided to divide bugs by type and consider them separately. Besides, I decided not to just describe the errors, but to suggest some methods of dealing them in addition to static code analysis. It is far better not to make an error than to find it then using static/dynamic code analysis/or something else. It is even worse, if a user finds errors. So, if you can improve your coding style in a manner that reduces the possibility of a bug occurrence, than this topic is worth talking about. This will be the matter of our series of articles.
Before considering the patterns of errors, I need an introduction that you are reading. For example, I need to explain why I haven't found enough energy to carefully study the report, why I can't say about the percentage of false positives and where you can get acquainted with all errors that I discovered.
At the end of 2017, my colleague Svyatoslav Razmyslov downloaded source codes of the Chromium project, did some magic with them and gave me the generated project for Visual Studio and a report of PVS-Studio. Unfortunately, it turned out to be impossible to work with the solution in Visual Studio environment. The environment could not stand the solution containing 5021 project.
Everything was incredibly slow, and the environment crashed after a while. That's why I studied the report using PVS-Studio Standalone. It's certainly not as convenient to use as the familiar Visual Studio environment, but quite acceptable.
It should be reminded, that the Chromium project is very large. Not just large. This is an ENORMOUS project.
The Chromium project and the libraries used in it consist of 114 201 files in C and C++. The number of lines of code is 30 263 757. Comments constitute 16%.
It is already an achievement, that PVS-Studio can check a project that large :).
During Christmas holidays, I spent three evenings looking through the report and wrote down about 250 fragments of code, which, in my opinion, require reviewing and correction. I confess that I have not found time and energy to study the report carefully. I glanced through many warnings very quickly, and even ignored some of them, when I became bored of some kind of error. I will give more details about it in the next chapter.
It is important that I found a lot of bugs, which will be enough to be described in a several articles. By the time I finish publishing the last line, the information about errors in the project may become slightly out of date. But it doesn't matter. My purpose is to demonstrate the methodology of static code analysis and share with readers some advice on coding style.
I cited the errors that I have found in a separate file so that developers of Chromium and the libraries could correct them without waiting for the end of the series of articles. This also had to be done by the reason that, perhaps, not all of the warnings will be presented in the articles.
Link to the file with a description of the discovered defects is available here: chromium.txt.
I have not configured the analyzer to reduce the number of false positives. Therefore, false warnings hindered me from reviewing the report, and I was often skipping similar messages, not looking at them.
Even more, I skipped fragments of code, where it was not clear at once if there was an error or not. A lot of warnings and one of me. If I started looking carefully at the code, I would write articles only in several months.
Let me demonstrate with examples why some warnings are so difficult to understand, especially if it's unfamiliar code. And I'm unfamiliar with ALL code in Chromium.
So, PVS-Studio analyzer had issued a warning on one of files of V8 project:
V547 CWE-570 Expression 'truncated' is always false. objects.cc 2867
Is this an error, or a false positive? Try to understand yourself what is the matter here. I added the comment "// <=" at which the analyzer points.
void String::StringShortPrint(StringStream* accumulator,
bool show_details) {
int len = length();
if (len > kMaxShortPrintLength) {
accumulator->Add("<Very long string[%u]>", len);
return;
}
if (!LooksValid()) {
accumulator->Add("<Invalid String>");
return;
}
StringCharacterStream stream(this);
bool truncated = false;
if (len > kMaxShortPrintLength) {
len = kMaxShortPrintLength;
truncated = true;
}
bool one_byte = true;
for (int i = 0; i < len; i++) {
uint16_t c = stream.GetNext();
if (c < 32 || c >= 127) {
one_byte = false;
}
}
stream.Reset(this);
if (one_byte) {
if (show_details)
accumulator->Add("<String[%u]: ", length());
for (int i = 0; i < len; i++) {
accumulator->Put(static_cast<char>(stream.GetNext()));
}
if (show_details) accumulator->Put('>');
} else {
// Backslash indicates that the string contains control
// characters and that backslashes are therefore escaped.
if (show_details)
accumulator->Add("<String[%u]\\: ", length());
for (int i = 0; i < len; i++) {
uint16_t c = stream.GetNext();
if (c == '\n') {
accumulator->Add("\\n");
} else if (c == '\r') {
accumulator->Add("\\r");
} else if (c == '\\') {
accumulator->Add("\\\\");
} else if (c < 32 || c > 126) {
accumulator->Add("\\x%02x", c);
} else {
accumulator->Put(static_cast<char>(c));
}
}
if (truncated) { // <=
accumulator->Put('.');
accumulator->Put('.');
accumulator->Put('.');
}
if (show_details) accumulator->Put('>');
}
return;
}
Did you figure it out? Was it difficult?
Yeap! This is the reason I cannot review all of the analyzer warnings myself.
For those who were lazy to go dip, I will explain the main point.
So, the analyzer says that the condition if (truncated) is always false. Let's cut the function, leaving the main point:
void F() {
int len = length();
if (len > kMaxShortPrintLength)
return;
bool truncated = false;
if (len > kMaxShortPrintLength)
truncated = true;
if (truncated) { // <=
accumulator->Put('.');
accumulator->Put('.');
accumulator->Put('.');
}
}
The truncated flag has to be true, if the text is too long, i.e. if the condition if (len > kMaxShortPrintLength) is executed.
However, if the text is too long, then exit from the function occurs above.
This is the reason why truncated is always false and three dots will not be added to the end. And even now, after ascertaining the reason for which the analyzer issues a warning, I don't know how the code should be written. Either you need to leave the function at once, and the code which adds the dots is redundant, or the points are indeed needed, and the first check which prematurely terminates the function should be removed. It is very, very difficult to review the errors in third party code. PVS-Studio analyzer issued a lot of V547 warnings. I looked through only the 10th part of them. Therefore, if you undertake to view them closely, you will find a lot more errors than I cited.
Here's another example explaining why I was bored to work with all those warnings.
void ResourcePrefetcher::OnReadCompleted(net::URLRequest* request,
int bytes_read) {
DCHECK_NE(net::ERR_IO_PENDING, bytes_read);
if (bytes_read <= 0) {
FinishRequest(request);
return;
}
if (bytes_read > 0)
ReadFullResponse(request);
}
PVS-Studio warning: V547 CWE-571 Expression 'bytes_read > 0' is always true. resource_prefetcher.cc 308
Unlike the previous case, here everything is simple. The analyzer is surely right, stating that the second condition is always true.
However, it is not an error, but a redundant code. Is this code worth editing? Difficult question. By the way, this is the reason why it is much better to write code right under the supervision of the analyzer, rather than to heroically make your way through the warnings during one-time runs.
If the analyzer was used regularly, most likely the redundant code would not even get into the version control system. The programmer would see the warning and write more gracefully. For example, as follows:
void ResourcePrefetcher::OnReadCompleted(net::URLRequest* request,
int bytes_read) {
DCHECK_NE(net::ERR_IO_PENDING, bytes_read);
if (bytes_read <= 0)
FinishRequest(request);
else
ReadFullResponse(request);
}
The analyzer hasn't produced any warnings. Besides, code became shorter, simpler and clearer.
In addition to V547, the analyzer issued a bunch of V560 warnings. This warning informs you that not the whole condition, but some part of it is always true or false.
These messages were also boring to study. It does not mean that the V560 warnings are bad. But the real, serious errors are quite rare. Basically these warnings point at redundant code of low-quality.
An example of a boring redundant check:
template <typename ConditionT, typename ActionT>
std::unique_ptr<DeclarativeRule<ConditionT, ActionT>>
DeclarativeRule<ConditionT, ActionT>::Create(....) {
....
bool bad_message = false; // <=
std::unique_ptr<ActionSet> actions = ActionSet::Create(
browser_context, extension, rule->actions, error,
&bad_message); // <=
if (bad_message) { // <=
*error = "An action of a rule set had an invalid "
"structure that should have been caught "
"by the JSON validator.";
return std::move(error_result);
}
if (!error->empty() || bad_message) // <=
return std::move(error_result);
....
}
PVS-Studio warning: V560 CWE-570 A part of conditional expression is always false: bad_message. declarative_rule.h 472
A condition:
if (!error->empty() || bad_message)
can be simplified to:
if (!error->empty())
Another option is to rewrite the code as follows:
if (bad_message) {
*error = "An action of a rule set had an invalid "
"structure that should have been caught "
"by the JSON validator.";
}
if (!error->empty() || bad_message)
return std::move(error_result);
I hope I could explain why I have not studied the report carefully. It's a big job that requires a lot of time.
I can't say what is the percentage of false positives. Firstly, I was not even able to look through the entire log to the end and I do not know the exact number of errors detected by PVS-Studio. Secondly, there is no point to talk about the percentage of false positives without the preliminary configuration of the analyzer.
If you configure the PVS-Studio analyzer, you can expect 10-15% of false positives. An example of such a configuration is described in the article "Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives.
Of course, it is possible to perform such a configuration for Chromium, but it is unreasonable to do so, aiming just to cite some numbers in the article. It's a big job that we are ready to do, but not for free. Google may well involve our team to configure the analyzer and, at the same time, to fix all found errors. Yes, you can think of it as a hint.
Undoubtedly, the configuration will give a good result. For example, about a half of all false positives is related to the use of DCHECK macro in code.
This is how this macro looks like:
#define LAZY_STREAM(stream, condition) \
!(condition) ? (void) 0 : ::logging::LogMessageVoidify() & (stream)
#define DCHECK(condition) \
LAZY_STREAM(LOG_STREAM(DCHECK), !ANALYZER_ASSUME_TRUE(condition))\
<< "Check failed: " #condition ". "
According to PVS-Studio analyzer, it is just a check of a condition and a set of actions preceding the execution of the rest of the code.
As a result, the analyzer issues false positives, for example, for such code:
bool Value::Equals(const Value* other) const {
DCHECK(other);
return *this == *other;
}
PVS-Studio informs: V1004 CWE-476 The 'other' pointer was used unsafely after it was verified against nullptr. Check lines: 621, 622. values.cc 622
In terms of the analyzer, a check of the pointer other for equality to nullptr is performed. But regardless of whether the other is a null pointer or not, its dereference will occur further. Analyzer considers such actions as dangerous.
DCHECK macro is a kind of assert-macros. The analyzer knows what is assert, but as for DCHECK - it does not. To explain better what is happening, I will write pseudo code:
bool Equals(T* ptr) const
{
if (!ptr)
LogMessage();
return *this == *ptr;
}
This is how the analyzer considers the code. Initially, the pointer is checked for equality to nullptr. If the pointer is null, then the function LogMessage is called. Yet the function is not marked as one that does not return control. It means, that despite the fact if the ptr is null or not, the function continues to be executed.
Further on, the pointer is dereferenced. But there was a check, where it was checked for null! Therefore, the pointer might be null and the analyzer indicates about the problem in code. And this is how the analyzer issues a lot of correct but useless warnings.
By the way, this implementation of macro confuses not only PVS-Studio. So, for the analyzer, which is built into Visual Studio, a special "backup" is made:
#if defined(_PREFAST_) && defined(OS_WIN)
// See comments on the previous use of __analysis_assume.
#define DCHECK(condition) \
__analysis_assume(!!(condition)), \
LAZY_STREAM(LOG_STREAM(DCHECK), false) \
<< "Check failed: " #condition ". "
#define DPCHECK(condition) \
__analysis_assume(!!(condition)), \
LAZY_STREAM(PLOG_STREAM(DCHECK), false) \
<< "Check failed: " #condition ". "
#else // !(defined(_PREFAST_) && defined(OS_WIN))
If you also implement a similar backup for the PVS-Studio analyzer, the situation with false positives will change dramatically. According to my estimate, a half of false positives will immediately disappear. Yes, exactly a half. The thing is that the DCHECK macro is used so many times.
This is the end of the introductory article and here I'll gradually give links to other articles. Thank you for your attention.