Webinar: Evaluation - 05.12
Today's article is somewhat unusual, if only because instead of reviewing one project, we'll be comparing three projects at once, looking for the one with the most interesting bugs and - which is of particular interest - the one with the highest code quality. The projects we are going to review are Firebird, MySQL, and PostgreSQL. So let's get started!
Firebird (FirebirdSQL) is an open-source SQL relational database management system that runs on Linux, Microsoft Windows, macOS X, and a variety of Unix. The database forked from Borland's open-source edition of InterBase in 2000, but since Firebird 1.5 the code has been largely rewritten.
Additional information:
MySQL is an open-source relational database management system (RDBMS). MySQL is typically used as a server for local and remote clients, but the distribution also includes an embedded MySQL server library, which makes it possible to run a MySQL server inside a client application.
MySQL supports multiple table types, which makes it a very flexible tool: users can choose between MyISAM tables, which support full-text search, and InnoDB tables, which support transactions at the level of individual records. MySQL also comes with a special table type called EXAMPLE, which is used to demonstrate the principles of creating new table types. Thanks to the open architecture and GPL-licensing, new types are regularly added to MySQL.
Additional information:
PostgreSQL is an object-relational database management system (ORDBMS).
It can handle workloads ranging from small single-machine applications to large Internet-facing applications (or for data warehousing) with many concurrent users; on macOS Server, PostgreSQL is the default database; and it is also available for Microsoft Windows and Linux (supplied in most distributions). PostgreSQL is developed by the PostgreSQL Global Development Group, a diverse group of many companies and individual contributors. It is free and open-source, released under the terms of the PostgreSQL License, a permissive software license.
Additional information:
I was using static code analyzer PVS-Studio to detect bugs. PVS-Studio is an analyzer for source code written in C, C++, and C# which helps reduce software development costs due to early detection of bugs, defects, and security issues in programs' source code. It runs on Windows and Linux.
Download links:
Because each of the three projects is fairly easy to build and includes .sln files (either available right from the start or generated through CMake), the analysis itself becomes quite a trivial task: you just need to start a check in the PVS-Studio plugin for Visual Studio.
Before starting our discussion, we have to decide what comparison criteria to use. This is one of the primary concerns of this article.
"Head-on" comparison based on the number of error messages produced by the analyzer (or rather the number of messages / number of LOC ratio) for each project is not a good idea, even though it's the least costly way. Why so? Take PostgreSQL project, for instance. It triggers 611 high-certainty-level GA warnings, but if you filter these warnings by the code of PVS-Studio diagnostic rule (V547) and by the part of the message ret < 0, you'll see that there are 419 warnings! That's too many, isn't it? It seems all these messages come from a single source such as a macro or automatically generated code. Well, the comments in the beginning of the files, at which the warnings were issued, prove that our assumption is correct:
/* This file was generated automatically
by the Snowball to ANSI C compiler */
Now that you know that the code was generated automatically, you have two options:
Another problem is errors found in third-party components used in the projects. Again, you have to choose between the same two options:
These are just a couple of examples of how you have to make a choice that may affect (sometimes drastically) the number of warnings to deal with.
Let's agree right off to leave out messages of the 3 (low-certainty) level. These issues are not the ones that worth to be paid attention in the first place. Sure, some of them might be interesting, but it's better to ignore them when you write articles and when you are only getting started with static analysis.
This review is not a full-fledged comparison, as such a comparison would be too tedious for many reasons. For one thing, it would require preliminary configuration of the analyzer for each of the projects, as well as looking through and examining hundreds of messages after the check. It all takes too much time, while there's doubt if such an undertaking is really worth it.
Instead, I will look through the logs for each of the projects, pick the most interesting bugs, comment on them, and check the other two projects for similar issues.
There's one more thing I should mention. We've started to pay attention to security issues lately and even posted an article titled "How Can PVS-Studio Help in the Detection of Vulnerabilities?" Since one of today's participants, MySQL, had been mentioned in that article, I was curious to see if PVS-Studio would detect any of those specific code patterns. No gimmicks - we'll just additionally look for warnings similar to those discussed in the article above.
So, again, I'll be evaluating the code quality based on the following criteria:
As we proceed, I'll be giving demerit points to each project, so the one with the fewest points will be the winner (within the restrictions discussed earlier). There are some specific details, of course, but I'll be commenting on these along the way and at the end of the article.
Here we go!
The table below shows the total analysis results "as is', i.e. with no false-positives suppressed, without any filtering by folders, and so on. Note that the warnings refer only to the General Analysis set.
Project |
High Certainty |
Medium Certainty |
Low Certainty |
Total |
---|---|---|---|---|
Firebird |
156 |
680 |
1045 |
1881 |
MySQL |
902 |
1448 |
2925 |
5275 |
PostgreSQL |
611 |
1432 |
1576 |
3619 |
This table, however, is a poor basis for drawing any conclusions about the code quality. As I already said, there is a number of reasons:
As for the density of warnings (not bugs!), i.e. the ratio between the number of messages and LOC, as measured without preliminary configuration, it is roughly the same for Firebird and PostgreSQL, and is a bit higher for MySQL. But let's not jump to conclusions because, you know, the devil is in the detail.
The V597 diagnostic is issued by a presence of such a call of memset function, performing data clearing, which can be removed by a compiler when optimization. As a result, private data might remain uncleared. For details, see the documentation on the diagnostic.
Neither Firebird, nor PostgreSQL triggered any messages of this type, but MySQL did. So, it is MySQL that the following example is taken from:
extern "C"
char *
my_crypt_genhash(char *ctbuffer,
size_t ctbufflen,
const char *plaintext,
size_t plaintext_len,
const char *switchsalt,
const char **params)
{
int salt_len;
size_t i;
char *salt;
unsigned char A[DIGEST_LEN];
unsigned char B[DIGEST_LEN];
unsigned char DP[DIGEST_LEN];
unsigned char DS[DIGEST_LEN];
....
(void) memset(A, 0, sizeof (A));
(void) memset(B, 0, sizeof (B));
(void) memset(DP, 0, sizeof (DP));
(void) memset(DS, 0, sizeof (DS));
return (ctbuffer);
}
PVS-Studio warnings:
The analyzer detected a function with as many as 4 buffers (!), which must be forcibly cleared. However, the function could fail to do so, causing the data to remain in memory "as is". Since buffers A, B, DP, and DS are not used later on, the compiler is allowed to remove the call to the memset function because such an optimization does not affect the program's behavior from the viewpoint of the C/C++ language. For more information about this issue, see the article "Safe Clearing of Private Data".
The rest messages are not any different, so I'll just list them:
Here's a more interesting case.
void win32_dealloc(struct event_base *_base, void *arg)
{
struct win32op *win32op = arg;
....
memset(win32op, 0, sizeof(win32op));
free(win32op);
}
PVS-Studio warning: V597 The compiler could delete the 'memset' function call, which is used to flush 'win32op' object. The RtlSecureZeroMemory() function should be used to erase the private data. win32.c 442
It is similar to the previous example except that after the memory block is cleared the pointer will be passed to the free function. But even then the compiler is still allowed to remove the call to memset, leaving only the call to free (which clears the memory block). As a result, the data that was to be cleared remains in memory. For more information, see the above-mentioned article.
Assigning demerit points. This is quite a serious error - even more so because there are three instances of it. 3 demerit points go to MySQL.
All the three projects triggered V769 warnings.
Since we agreed to ignore third-level warnings, we continue without Firebird (so much the better for it). All the three warnings in PostgreSQL proved irrelevant too. This leaves only MySQL: it also triggered a few false positives, but some of the warnings are worth looking at.
bool
Gcs_message_stage_lz4::apply(Gcs_packet &packet)
{
....
unsigned char *new_buffer =
(unsigned char*) malloc(new_capacity);
unsigned char *new_payload_ptr =
new_buffer + fixed_header_len + hd_len;
// compress payload
compressed_len=
LZ4_compress_default((const char*)packet.get_payload(),
(char*)new_payload_ptr,
static_cast<int>(old_payload_len),
compress_bound);
....
}
PVS-Studio warning: V769 The 'new_buffer' pointer in the 'new_buffer + fixed_header_len' expression could be nullptr. In such case, resulting value will be senseless and it should not be used. Check lines: 74, 73. gcs_message_stage_lz4.cc 74
If it fails to allocate the requested memory block, the malloc function returns a null pointer that could be stored to the new_buffer variable. Next, as the new_payload_ptr variable is initialized, the value of the new_buffer pointer is added to the values of variables fixed_header_len and hd_len. This is a point of no return for new_payload_ptr: if later on (say, in another function) we decide to check it for NULL, such a check won't help. No need to tell you what the implications are. So, it would be wiser to make sure that new_buffer is non-null before initializing new_payload_ptr.
You may argue that since malloc has failed to allocate the requested memory block, then there's not much sense in checking its return value for NULL either. The application can't continue its normal work anyway, so why not let it crash the next time it uses the pointer?
Since quite a lot of developers stick to this approach, it can be called legitimate - but is this approach right? After all, you could try to somehow handle that case to save the data or have the application crash in a "softer way". Besides, this approach might lead to security issues because if the application happens to handle another memory block (null pointer + value) rather than the null pointer itself, it may well damage some data. All this makes your program even more vulnerable. Are you sure you want it that way? Anyway, you have to decide for yourself what the pros and cons are and which choice is right.
I recommend the second approach - the V769 diagnostic will help you detect those issues.
However, if you are sure that such functions can never return NULL, tell the analyzer about it so you don't get the same warnings again. See the article "Additional diagnostics configuration" to find out how.
Assigning demerit points. Considering everything said above, MySQL is given 1 demerit point.
Warnings of this type (diagnostic V575) were found in each of the three projects.
This is an example from Firebird (medium certainty):
static void write_log(int log_action, const char* buff)
{
....
log_info* tmp = static_cast<log_info*>(malloc(sizeof(log_info)));
memset(tmp, 0, sizeof(log_info));
....
}
PVS-Studio warning: V575 The potential null pointer is passed into 'memset' function. Inspect the first argument. Check lines: 1106, 1105. iscguard.cpp 1106
This defect is similar to the previous one - no check for the return value of the malloc function. If it fails to allocate the requested block of memory, malloc will return a null pointer, which will then be passed to the memset function.
Here is a similar example from MySQL:
Xcom_member_state::Xcom_member_state(....)
{
....
m_data_size= data_size;
m_data= static_cast<uchar *>(malloc(sizeof(uchar) * m_data_size));
memcpy(m_data, data, m_data_size);
....
}
PVS-Studio warning: V575 The potential null pointer is passed into 'memcpy' function. Inspect the first argument. Check lines: 43, 42. gcs_xcom_state_exchange.cc 43
This is similar to what we saw in Firebird. Just to make it clear, there are some fragments of code where the returned value malloc is checked for inequality to null.
The following is a similar fragment from PostgreSQL:
static void
ecpg_filter(const char *sourcefile, const char *outfile)
{
....
n = (char *) malloc(plen);
StrNCpy(n, p + 1, plen);
....
}
PVS-Studio warning: V575 The potential null pointer is passed into 'strncpy' function. Inspect the first argument. Check lines: 66, 65. pg_regress_ecpg.c 66
MySQL and PostgreSQL, however, triggered a few high-certainty-level warnings, which are of more interest.
An example from MySQL:
View_change_event::View_change_event(char* raw_view_id)
: Binary_log_event(VIEW_CHANGE_EVENT),
view_id(), seq_number(0), certification_info()
{
memcpy(view_id, raw_view_id, strlen(raw_view_id));
}
PVS-Studio warning: V575 The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. control_events.cpp 830
The memcpy function is used to copy the string from raw_view_id to view_id; the number of bytes to copy is calculated using the strlen function. The problem here is that strlen ignores the terminating null character, so the string is copied without it. If you then don't add it by hand, other string functions will not be able to handle view_id properly. To ensure correct copying of the string, use strcpy / strcpy_s.
Now, the following fragment from PostgreSQL looks very much the same:
static int
PerformRadiusTransaction(char *server,
char *secret,
char *portstr,
char *identifier,
char *user_name,
char *passwd)
{
....
uint8 *cryptvector;
....
cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH);
memcpy(cryptvector, secret, strlen(secret));
}
PVS-Studio warning: V575 The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. auth.c 2956
There is, however, an interesting difference from the previous example. The cryptvector variable is of type uint8*. While uint8 is an alias for unsigned char, the programmer seems to be using it to explicitly indicate that these data is not meant to be handled as a string; so, given the context, this operation is valid and is not as suspicious as the previous case.
Some of the reported fragments, however, don't look that safe.
int
intoasc(interval * i, char *str)
{
char *tmp;
errno = 0;
tmp = PGTYPESinterval_to_asc(i);
if (!tmp)
return -errno;
memcpy(str, tmp, strlen(tmp));
free(tmp);
return 0;
}
PVS-Studio warning: V575 The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. informix.c 677
This issue follows the same pattern but is more like the example from MySQL: it deals with string operations and copying of a string's contents (except for the terminating null character) to memory used outside the function...
Assigning demerit points. 1 demerit point goes to Firebird and 3 demerit points go to PostgreSQL and MySQL each (one point for a medium-certainty warning, two points for a high-certainty one).
Only Firebird triggered a few V618 warnings.
Take a look at this example:
static const char* const USAGE_COMP = " USAGE IS COMP";
static void gen_based( const act* action)
{
....
fprintf(gpreGlob.out_file, USAGE_COMP);
....
}
PVS-Studio warning: V618 It's dangerous to call the 'fprintf' function in such a manner, as the line being passed could contain format specification. The example of the safe code: printf("%s", str); cob.cpp 1020
What alerted the analyzer is the fact that formatted-output function fprintf is used, while the string is writtendirectly, without using the format string and related specifiers. This may be dangerous and even cause a security issue (see CVE-2013-4258) if the input string happens to contain format specifiers. In this case, though, the USAGE_COMP string is explicitly defined in the source code and doesn't include any format specifiers, so fprintf can be used safely here.
The same applies to the rest cases: the input strings are hard-coded and have no format specifiers.
Assigning demerit points. Considering all said above, I'm not giving any demerit points to Firebird.
None of the projects triggered any V642 and V640 warnings - they all did well.
An example from MySQL:
enum wkbType
{
wkb_invalid_type= 0,
wkb_first= 1,
wkb_point= 1,
wkb_linestring= 2,
wkb_polygon= 3,
wkb_multipoint= 4,
wkb_multilinestring= 5,
wkb_multipolygon= 6,
wkb_geometrycollection= 7,
wkb_polygon_inner_rings= 31,
wkb_last=31
};
bool append_geometry(....)
{
....
if (header.wkb_type == Geometry::wkb_multipoint)
....
else if (header.wkb_type == Geometry::wkb_multipolygon)
....
else if (Geometry::wkb_multilinestring)
....
else
DBUG_ASSERT(false);
....
}
PVS-Studio warning: V768 The enumeration constant 'wkb_multilinestring' is used as a variable of a Boolean-type. item_geofunc.cc 1887
The message actually says it all. Two of the conditional expressions compare header.wkb_type with the elements of the Geomerty enumeration, while the entire third expression is itself an enumerator. Since Geometry::wkb_multilinestring has the value 5, the body of the third conditional statement will execute every time the previous two checks fail. Therefore, the else-branch, containing the call to the DBUG_ASSERT macro, will never be executed. This suggests that the third conditional expression was meant to look like this:
header.wkb_type == Geometry::wkb_multilinestring
What about the rest? PostgreSQL didn't trigger any warnings of this type, while Firebird triggered as many as 9. Those, however, are all one level less critical (medium certainty), and the detected pattern is different too.
The V768 diagnostic detects the following bug patterns:
While there is no excuse for first-level warnings, second-level ones leave room for debate.
For example, this is what most cases look like:
enum att_type {
att_end = 0,
....
};
void fix_exception(...., att_type& failed_attrib, ....)
{
....
if (!failed_attrib)
....
}
PVS-Studio warning: V768 The variable 'failed_attrib' is of enum type. It is odd that it is used as a variable of a Boolean-type. restore.cpp 8580
The analyzer finds it suspicious that the failed_attrib variable is checked for the value att_type::att_end in a way like that. If you ask me, I'd prefer an explicit comparison with the enumerator, yet I can't call this code incorrect. True, I don't like this style (and neither does the analyzer), but it's still legitimate.
However, two fragments look more suspecious. Both have the same pattern, so we'll discuss only one of them.
namespace EDS {
....
enum TraScope {traAutonomous = 1, traCommon, traTwoPhase};
....
}
class ExecStatementNode : ....
{
....
EDS::TraScope traScope;
....
};
void ExecStatementNode::genBlr(DsqlCompilerScratch* dsqlScratch)
{
....
if (traScope)
....
....
}
PVS-Studio warning: V768 The variable 'traScope' is of enum type. It is odd that it is used as a variable of a Boolean-type. stmtnodes.cpp 3448
This example is similar to the previous one: the programmer is also checking that the value of the traScope variable is the same as the non-zero value of the enumerator member. However, unlike the previous example, there are no enumerator members with the value '0' here, which makes this code more suspicious.
Now that we've started talking about medium-certainty warnings, I should add that 10 such messages were issued for MySQL as well.
Assigning demerit points. Firebird is given 1 demerit point and MySQL is given 2 points.
Now, here's another interesting fragment of code. Note that we already saw it when discussing the problem with the clearing of private data.
struct win32op {
int fd_setsz;
struct win_fd_set *readset_in;
struct win_fd_set *writeset_in;
struct win_fd_set *readset_out;
struct win_fd_set *writeset_out;
struct win_fd_set *exset_out;
RB_HEAD(event_map, event_entry) event_root;
unsigned signals_are_broken : 1;
};
void win32_dealloc(struct event_base *_base, void *arg)
{
struct win32op *win32op = arg;
....
memset(win32op, 0, sizeof(win32op));
free(win32op);
}
PVS-Studio warning: V579 The memset function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. win32.c 442
Note the third argument in the call to the memset function. The sizeof operator returns the size of its argument in bytes, but here its argument is a pointer, so it returns the size of the pointer rather than the size of the structure.
This will result in incomplete memory clearing even if the compiler won't throw away the call to memset.
The moral is that you should choose variables' names carefully and try to avoid using similarly looking names. It's not always possible, so pay special attention to such cases. A lot of errors detected by diagnostic V501 in C/C++ projects and V3001 in C# projects stem from this variable-naming issue.
No V579 warnings were issued for the other two projects.
Assigning demerit points. MySQL is given 2 points.
Another similar bug was also found in MySQL.
typedef char Error_message_buf[1024];
const char* get_last_error_message(Error_message_buf buf)
{
int error= GetLastError();
buf[0]= '\0';
FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM,
NULL, error, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
(LPTSTR)buf, sizeof(buf), NULL );
return buf;
}
PVS-Studio warning: V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (buf)' expression. common.cc 507
Error_message_buf is an alias for an array of 1024 elements of type char. There's one crucial thing to keep in mind: even if a function signature is written like this:
const char* get_last_error_message(char buf[1024])
buf is still a pointer, while the array size is only a hint to the programmer. This means that the sizeof(buf) expression works with the pointer here, not the array. This results in passing an incorrect buffer size to the function - 4 or 8 instead of 1024.
Again, no warnings of this type in Firebird and PostgreSQL.
Assigning demerit points. MySQL is given 2 points.
Here's another interesting bug - this time in... MySQL again. It's a small fragment, so I'm giving it in full:
mysqlx::XProtocol* active()
{
if (!active_connection)
std::runtime_error("no active session");
return active_connection.get();
}
PVS-Studio warning: V596 The object was created but it is not being used. The 'throw' keyword could be missing: throw runtime_error(FOO); mysqlxtest.cc 509
The programmer creates an object of class std::runtime_error but doesn't use it in any way. They obviously meant to throw an exception but forgot to write the throw keyword. As a result, this case (active_connection == nullptr) can't be handled as expected.
Neither Firebird, nor PostgreSQL triggered any warnings of this type.
Assigning demerit points. 2 demerit points are given to MySQL.
The following example is taken from Firebird.
class Message
{
....
void createBuffer(Firebird::IMessageMetadata* aMeta)
{
unsigned l = aMeta->getMessageLength(&statusWrapper);
check(&statusWrapper);
buffer = new unsigned char[l];
}
....
~Message()
{
delete buffer;
....
}
.....
unsigned char* buffer;
....
};
PVS-Studio warning: V611 The memory was allocated using 'new T[]' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] buffer;'. Check lines: 101, 237. message.h 101
Block of memory for the buffer (pointed to by the buffer pointer, a member of class Message) is allocated in a special method called createBuffer by using the new[] operator, in accordance with the standard. However, the class destructor deallocates the block of memory by using the delete operator instead of delete[].
No errors of this type were found in MySQL and PostgreSQL.
Assigning demerit points. 2 demerit points go to Firebird.
Summing up the demerit points, we get the following:
Remember: the fewer points, the better. And if you ask me (a person with a wicked taste), I'd prefer... MySQL! It has the most interesting bugs and it's the leader, which makes it a perfect choice for analysis!
Firebird and PostgreSQL are trickier. On the one hand, even a one-point margin counts; on the other hand, it's quite a small difference, especially because that point was given for a V768 warning of the medium-certainty level... But then again, the codebase of PostgreSQL is way larger, yet it issued four hundred warnings at its automatically generated code...
Anyway, to figure out which of the two projects, Firebird or PostgreSQL, is better, we'd have to do a more thorough comparison. For now, I put them on one podium place so no one is offended. Maybe one day we'll compare them again more carefully, but it will be quite a different story...
So, the code-quality rankings are as follows:
Please remember that any review or comparison, including this one, is subjective. Different approaches may produce different results (though it is mostly true for Firebird and PostgreSQL, but not for MySQL).
So what about static analysis? I hope you are convinced now that it is useful for detecting defects of various types. Want to find out if your codebase has any of those bugs? Then it's the right time to try PVS-Studio! You write perfectly clean code? Then why not check your colleagues' code? ;)
0