Webinar: Parsing C++ - 10.10
We checked Chromium more than once before, and those who follow our blog could reasonably ask, "Why another check? Weren't there enough of them?" Sure, Chromium's source code is particularly clean, which was shown by each of the previous checks, but new errors inevitably continue to appear. Repeated checks prove that the more often you use static analysis, the better. A good practice is to use the analyzer every day. An even better practice is to analyze the new code right after you finish writing it (automatic analysis of recently modified code).
We have checked Chromium four times already:
All the previous checks were done with the Windows-version of PVS-Studio. Now it supports Linux as well, and it is this version that we used this time.
The Chromium solution has grown over the years: at the time of the third check, the number of projects reached the 1169 mark. When I was writing this article, there were 4420 projects. The source code has grown in size quite a bit, too, and is now 370 Mbytes (260 Mbytes in 2013).
The previous four checks found Chromium's source code to be of extremely high quality, given its size. Has it become worse during these two and a half years? No, it hasn't. It's still up to the mark; but since it is so big and is still in development, there are still plenty of errors for us to catch there.
Let's talk about the details of analyzing Chromium using PVS-Studio. We are going to do it under Linux this time. Once you have downloaded the source files using depot_tools and prepared them for analysis (see the details here, before the 'Building' section), build the solution:
pvs-studio-analyzer trace -- ninja -C out/Default chrome
After that, run the following command (in one line):
pvs-studio-analyzer analyze -l /path/to/PVS-Studio.lic
-o /path/to/save/chromium.log -j<N>
where the "-j" option initializes analysis in multithreaded mode. The recommended number of threads is the number of physical CPU cores plus one (for example, "-j5" for a four-core CPU).
When the check is finished, PVS-Studio will output an analysis log. Use PlogConverter utility, which comes with the PVS-Studio package, to convert that log into one of the three formats that can be conveniently viewed in other applications: xml, errorfile, tasklist. We will be using the tasklist format in this article. Here, we are interested only in the General Analysis warnings of every severity level (High, Medium, Low). This is what the conversion command should look like (in one line):
plog-converter -t tasklist -o /path/to/save/chromium.tasks
-a GA:1,2,3 /path/to/saved/chromium.log
More information on PlogConverter's parameters can be found here. To open the "chromium.tasks" tasklist in QtCreator (you need to install it in advance), run the following command:
qtcreator path/to/saved/chromium.tasks
We strongly recommend examining the warnings of the High and Medium levels first: they are very likely to deal with real defects and errors. Low-level warnings might point to potential bugs, but they are also more likely to produce false positives, so we don't usually discuss them in our articles.
This is how the log is displayed in QtCreator:
Figure 1 - Viewing analysis results in QtCreator (click to enlarge)
PVS-Studio issued a total of 2312 warnings. The chart below shows the distribution of the warnings across the severity levels:
Figure 2 - Warning distribution across severity levels
Let me briefly comment on this chart: the analyzer issued 171 High-level, 290 Medium-level, and 1851 Low-level warnings.
Despite the seemingly large amount of warnings, it's actually small for such a huge project. The total number of SLOC, without the linked libraries, is 6468751. If we consider the warnings of High and Medium levels only, I'd say there are just 220 genuine errors among them. Well, that's the statistics, and the real error density is 0,034 per 1000 LOC. This figure, however, takes into account only those errors that PVS-Studio found, or, to be more exact, that caught my eye when looking through the log.
Other projects usually have higher error density, so the Chromium developers did well! Even so, don't get lax: there are still errors, and they are far from harmless.
The most interesting ones are discussed below.
PVS-Studio warning: V501 There are identical sub-expressions 'request_body_send_buf_ == nullptr' to the left and to the right of the '&&' operator. http_stream_parser.cc 1222
bool HttpStreamParser::SendRequestBuffersEmpty()
{
return request_headers_ == nullptr
&& request_body_send_buf_ == nullptr
&& request_body_send_buf_ == nullptr; // <=
}
This is classic. The request_body_send_buf_ pointer is compared with nullptr twice. It must be a typo, so there is some other class member that should be compared with nullptr.
PVS-Studio warning: V766 An item with the same key '"colorSectionBorder"' has already been added. ntp_resource_cache.cc 581
void NTPResourceCache::CreateNewTabCSS()
{
....
substitutions["colorSectionBorder"] = // <=
SkColorToRGBAString(color_section_border);
....
substitutions["colorSectionBorder"] = // <=
SkColorToRGBComponents(color_section_border);
....
}
The analyzer detected a strange double initialization of the object associated with the "colorSectionBorder" key. The substitutions variable is an associative array here. When being initialized, the color_section_border variable of type SkColor (defined as uint32_t) is cast to a string representation of RGBA (as suggested by the SkColorToRGBAString method's name) and mapped to the "colorSectionBorder" key. After that, color_section_border is cast to another string format (method SkColorToRGBComponents) and mapped to the same key. It means that the previous value associated with the key "colorSectionBorder" will be lost. If this is what the programmer intended, then one of the assignments should be removed. Otherwise, the color components should be mapped to different keys.
Note. By the way, this is the first error found by the V766 diagnostic in a real-life project. This is a specific type of bugs, but Chromium is so big that even exotic errors like that can be found there.
Now a small warm-up for your brains. Look at the code below and try to find the bug by yourself.
// Returns the item associated with the component |id| or nullptr
// in case of errors.
CrxUpdateItem* FindUpdateItemById(const std::string& id) const;
void ActionWait::Run(UpdateContext* update_context,
Callback callback)
{
....
while (!update_context->queue.empty())
{
auto* item =
FindUpdateItemById(update_context->queue.front());
if (!item)
{
item->error_category =
static_cast<int>(ErrorCategory::kServiceError);
item->error_code =
static_cast<int>(ServiceError::ERROR_WAIT);
ChangeItemState(item, CrxUpdateItem::State::kNoUpdate);
} else {
NOTREACHED();
}
update_context->queue.pop();
}
....
}
PVS-Studio warning: V522 Dereferencing of the null pointer 'item' might take place. action_wait.cc 41
The authors of this code made a conscious decision to shoot themselves in the foot. The code iterates over the queue queue consisting of identifiers presented as strings. An identifier is taken out of the queue, and then the FindUpdateItemById method is called to return a pointer to the object of type CrxUpdateItem associated with that identifier. If FindUpdateItemById fails, it will return nullptr, which will then be dereferenced in the if statement's then branch.
This is the fixed code:
....
while (!update_context->queue.empty())
{
auto* item =
FindUpdateItemById(update_context->queue.front());
if (item != nullptr)
{
....
}
....
}
....
PVS-Studio warning: V620 It's unusual that the expression of sizeof(T)*N kind is being summed with the pointer to T type. string_conversion.cc 62
int UTF8ToUTF16Char(const char *in, int in_length, uint16_t out[2])
{
const UTF8 *source_ptr = reinterpret_cast<const UTF8 *>(in);
const UTF8 *source_end_ptr = source_ptr + sizeof(char);
uint16_t *target_ptr = out;
uint16_t *target_end_ptr = target_ptr + 2 * sizeof(uint16_t); // <=
out[0] = out[1] = 0;
....
}
The analyzer detected a code fragment with strange address arithmetic. As suggested by its name, the function converts characters from the UTF-8 format to UTF-16. The current standard, Unicode 6.x, implies widening a UTF-8 character to four bytes, which is the reason why a UTF-8 character is decoded as two UTF-16 characters (UTF-16 characters are hardcoded with two bytes). Decoding is done using four pointers: two pointing to the beginning, and two others pointing to the end of the arrays in and out. The pointers to the end of the arrays act like STL iterators: they point to the location after the last array element. While the source_end_ptr pointer is evaluated correctly, things get complicated for target_end_ptr. It was meant to point to the location after the second element of the out array (i.e. move by four bytes in relation to the out pointer), but what it will be actually pointing to is the address after the fourth element (i.e. out will be shifted by eight bytes).
This is the planned logic:
And this is what actually happens:
The fixed code:
int UTF8ToUTF16Char(const char *in, int in_length, uint16_t out[2])
{
const UTF8 *source_ptr = reinterpret_cast<const UTF8 *>(in);
const UTF8 *source_end_ptr = source_ptr + 1;
uint16_t *target_ptr = out;
uint16_t *target_end_ptr = target_ptr + 2;
out[0] = out[1] = 0;
....
}
The analyzer also reported one more potential defect of this type:
Another warm-up. Can you find the bug in the code below?
CheckReturnValue& operator=(const CheckReturnValue& other)
{
if (this != &other)
{
DCHECK(checked_);
value_ = other.value_;
checked_ = other.checked_;
other.checked_ = true;
}
}
PVS-Studio warning: V591 Non-void function should return a value. memory_allocator.h 39
We are dealing with undefined behavior here. The C++ standard says that any non-void method must return a value. What about our example? In the assignment statement, the current object is tested for being equal to itself (the objects are compared by using their pointers) and the fields are copied (if the pointers are different). However, the method does not return the reference to itself (return *this).
Two more non-void methods that don't return:
PVS-Studio warning: V583 The '?:' operator, regardless of its conditional expression, always returns one and the same value: 1. configurator_impl.cc 133
int ConfiguratorImpl::StepDelay() const
{
return fast_update_ ? 1 : 1;
}
This code always returns 1 as the delay time. Perhaps it's just incomplete code to be developed later, but the current implementation of the ternary operator doesn't do any good.
PVS-Studio warning: V590 Consider inspecting the 'rv == OK || rv != ERR_ADDRESS_IN_USE' expression. The expression is excessive or contains a misprint. udp_socket_posix.cc 735
int UDPSocketPosix::RandomBind(const IPAddress& address)
{
DCHECK(bind_type_ == DatagramSocket::RANDOM_BIND
&& !rand_int_cb_.is_null());
for (int i = 0; i < kBindRetries; ++i) {
int rv = DoBind(IPEndPoint(address,
rand_int_cb_
.Run(kPortStart, kPortEnd)));
if (rv == OK || rv != ERR_ADDRESS_IN_USE) // <=
return rv;
}
return DoBind(IPEndPoint(address, 0));
}
The analyzer warns us about a potential redundant comparison. The code above maps an IP to a random port. Successful mapping terminates the loop (which counts the number of mapping attempts). Removing one of the comparisons won't affect the code's logic (in the current version, the loop stops if the mapping has succeeded or if no error about the port being mapped to another IP was issued).
PVS-Studio warning: V523 The 'then' statement is equivalent to the 'else' statement.
bool ResourcePrefetcher::ShouldContinueReadingRequest(
net::URLRequest* request,
int bytes_read
)
{
if (bytes_read == 0) { // When bytes_read == 0, no more data.
if (request->was_cached())
FinishRequest(request); // <=
else
FinishRequest(request); // <=
return false;
}
return true;
}
The analyzer detected identical statements in the then and else branches of the if statement. What are the possible implications? The current logic suggests that an uncached URL-request (net::URLRequest *request) will be finished in the same way as a cached one. If this is exactly what the programmer meant, then the else statement can be safely removed:
....
if (bytes_read == 0) { // When bytes_read == 0, no more data.
FinishRequest(request); // <=
return false;
}
....
Otherwise, a wrong method will be called, which could result in spending numerous sleepless nights and drinking tons of coffee trying to debug the code.
PVS-Studio warning: V609 Divide by zero. Denominator range [0..4096]. addr.h 159
static int BlockSizeForFileType(FileType file_type)
{
switch (file_type)
{
....
default:
return 0; // <=
}
}
static int RequiredBlocks(int size, FileType file_type)
{
int block_size = BlockSizeForFileType(file_type);
return (size + block_size - 1) / block_size; // <=
}
What about this code? It may produce an elusive bug. The RequiredBlocks method performs division by the value of the block_size variable (evaluated by the BlockSizeForFileType method). The switch statement in the BlockSizeForFileType method compares the value of the FileType enumeration passed to the method with some values and returns one of them, but there is also the default value, 0. Suppose the programmer decided to add a new value to the FileType enumeration but forgot to add the corresponding case label to the switch statement's body. This mistake would lead to undefined behavior: the C++ standard does not imply raising a software exception when division-by-zero occurs. Instead, a hardware exception will be raised, which can't be caught by using the standard try/catch block (instead, signal handlers are used; more information can be found here and here).
PVS-Studio warning: V519 The '* list' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 136, 138. util.cc 138
bool GetListName(ListType list_id, std::string* list)
{
switch (list_id) {
....
case IPBLACKLIST:
*list = kIPBlacklist;
break;
case UNWANTEDURL:
*list = kUnwantedUrlList;
break;
case MODULEWHITELIST:
*list = kModuleWhitelist; // <=
case RESOURCEBLACKLIST:
*list = kResourceBlacklist;
break;
default:
return false;
}
....
}
This is a common mistake when implementing a switch statement. The programmer expects that if the list_id variable is found to be equal to the value MODULEWHITELIST from the ListType enumeration, the string pointed to by the list pointer will be initialized to the value kModuleWhitelist and execution will leave the switch statement. However, because of the missing break statement, the execution will move on to the next case label, RESOURCEBLACKLIST, which will result in associating *list with the kResourceBlacklist string instead.
Chromium is as cool as it used to be, but PVS-Studio can still catch bugs in its code, again and again. Static analysis can help you detect bugs as early as at the coding stage, before testing.
What static analysis tools to use? Well, there are actually lots of them. As for me, I naturally suggest trying PVS-Studio. It can integrate smoothly with the Visual Studio IDE or, alternatively, any build system. There is also a Linux version available since recently. More information about the Windows and Linux versions can be found here and here.
0