>
>
Leo Tolstoy and static code analysis

Andrey Karpov
Articles: 674

Leo Tolstoy and static code analysis

This time we checked Apache HTTP Server with PVS-Studio. As we had expected, we found errors there. The errors are few. We expected this either.

Other developers come across this situation too while testing PVS-Studio on their projects. Unfortunately, the first conclusion you would like to draw on seeing just a few errors is that a tool like this is useless to you. I have just invented a good analogy to show why it is not so.

Imagine the following situation. We have a work by Leo Tolstoy, for instance, "War and Peace". We want to test the mechanism of spelling check integrated into Microsoft Word on this book. We launch analysis and find just a few real errors and a lot of false reports. Besides, Leo Tolstoy was apt to write long sentences. Microsoft Word, being unable to understand the text, will underline a lot of fragments with green supposing that there are punctuation mistakes. We are looking at this all and turning our nose up at it. We don't like that Word considers many names incorrect. We don't like the punctuation analysis. We conclude that spelling check in Word is an absolutely useless thing.

Let's find out what is wrong about this way of studying the tool.

We forget that people HAVE ALREADY WORKED on errors in the book. And they have spent much time and effort on that. Leo Tolstoy was rewriting his drafts and eliminating mistakes. The editor was working turning the text into the book. Something was fixed in the next editions.

Of course, the program will not find all the mistakes in the book. People can do it better. But the program does it much faster and therefore greatly reduces waste of human efforts. The true advantage of this tool lies in its regular use: you have just written a paragraph, noticed an underlined mistake and fixed it instantly. Now, while rereading the text, a person can concentrate on the contents and subtle mistakes instead of fixing trivial misprints.

Remarks on the issue of false reports sound unconvincing too. You only once need to add an unknown word to the dictionary and it will no longer bother the text's author. Well, and for the rest obviously false warnings you can click the right mouse button and select "ignore".

I think it's unreasonable to argue on the usefulness of text checking mechanisms embedded in such programs as Word, Thunderbird and TortoiseSVN. We use them and don't worry that they find few mistakes in the book "War and Peace ".

There is a similar situation with static analysis of software source code. It's meaningless to check a project like Apache HTTP Server and try to estimate a probable profit to be drawn from analysis. The code of Apache HTTP Server is old from the viewpoint of programming. As a project, Apache HTTP Server has existed for more than 15 years. The code's quality is confirmed by a huge amount of projects based on it. Of course, a lot of errors were fixed due to wide use of the code by plenty of programmers and a long development time.

It's logical to suppose that many of the errors found earlier could have been found much easier if static analysis had been used. The analogy with Word is complete in this case. Regular search of errors is much more useful than one-time search.

Like in the Word editor, false reports in PVS-Studio can be easily eliminated: you just have to click on a false message and hide it (a special comment will be automatically added into the corresponding code fragment).

Let's make the final comparison by the parameter of error detection speed. The Word editor can underline mistakes at once. The PVS-Studio analyzer does it after compiling the files. This can't be done earlier because the syntax is too complicated to analyze incomplete code. It's enough, however. We may consider PVS-Studio as a tool to get more compiler-generated warnings. By the way, PVS-Studio can do it in various Visual Studio versions: 2005, 2008, 2010. So you are welcome to try the incremental analysis.

All in all, I find the analogy with Word absolute. If somebody disagrees, we can continue the discussion in comments or via e-mail (karpov [@] viva64.com).

I have formulated the main idea I wanted to share with you so urgently. However, I would disappoint the readers if I didn't write some examples of errors found in Apache HTTP Server. So, let's speak a bit on the errors detected.

An example of unnecessary sizeof().

PSECURITY_ATTRIBUTES GetNullACL(void)
{
  PSECURITY_ATTRIBUTES sa;

  sa  = (PSECURITY_ATTRIBUTES) LocalAlloc(LPTR,
           sizeof(SECURITY_ATTRIBUTES));
  sa->nLength = sizeof(sizeof(SECURITY_ATTRIBUTES));
  ...
}

PVS-Studio's diagnostic message is the following: V568 It's odd that the argument of sizeof() operator is the 'sizeof (SECURITY_ATTRIBUTES)' expression. libhttpd util_win32.c 115

The size of the SECURITY_ATTRIBUTES structure is set incorrectly. As a result, correct handling of this structure is impossible.

However, I suspect that most errors I will cite here are located in code that gets control very rarely. That's why you shouldn't worry too much. The next sample is alike. The error is in the function responsible for error handling.

apr_size_t ap_regerror(int errcode, const ap_regex_t *preg,
  char *errbuf, apr_size_t errbuf_size)
{
  ...
  apr_snprintf(errbuf, sizeof errbuf,
               "%s%s%-6d", message, addmessage, 
                 (int)preg->re_erroffset);
  ...
}

PVS-Studio's diagnostic message is the following: V579 The apr_snprintf function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the second argument. libhttpd util_pcre.c 85

It means that it is the pointer size which is passed into a function instead of the buffer size. As a result, the error message is formulated incorrectly. Actually the message's length will take 3 or 7 characters.

There are also classic Copy-Paste errors.

static int magic_rsl_to_request(request_rec *r)
{
  ...
  if (state == rsl_subtype || state == rsl_encoding ||
      state == rsl_encoding) {
  ...
}

PVS-Studio's diagnostic message is the following: V501 There are identical sub-expressions 'state == rsl_encoding' to the left and to the right of the '||' operator. mod_mime_magic mod_mime_magic.c 787

The comparison really must look this way:

if (state == rsl_subtype || state == rsl_separator ||
    state == rsl_encoding) {

There are errors that will occur only in Windows operating system.

typedef UINT_PTR SOCKET;
static unsigned int __stdcall win9x_accept(void * dummy)
{
  SOCKET csd;
  ...
  do {
      clen = sizeof(sa_client);
      csd = accept(nsd, (struct sockaddr *) &sa_client, &clen);
  } while (csd < 0 && APR_STATUS_IS_EINTR(apr_get_netos_error()));
  ...
}

PVS-Studio's diagnostic message is the following: V547 Expression 'csd < 0' is always false. Unsigned type value is never < 0. libhttpd child.c 404

The accept function in Visual Studio header files returns a value that has the unsigned SOCKET type. That's why the check 'csd < 0' is invalid since its result is always false. The returned values must be explicitly compared to different constants, for instance, SOCKET_ERROR.

There is a sort of errors that can be referred to potential vulnerabilities. I don't know if it is possible, at least in theory, to use them for attack. Still, I want to give a couple of such samples here.

typedef  size_t apr_size_t;
APU_DECLARE(apr_status_t) apr_memcache_getp(...)
{
  ...
  apr_size_t len = 0;
  ...
  len = atoi(length);
  ...
  if (len < 0) {
    ...
  }
  else {
    ...  
  }
}

PVS-Studio's diagnostic message is the following: V547 Expression 'len < 0' is always false. Unsigned type value is never < 0. aprutil apr_memcache.c 814

The "len < 0" condition in this code is always false since the 'len' variable gas an unsigned type. Correspondingly, if we send a string with a negative number to input, a failure is likely to occur.

Apache also has much code where various buffers with data are being cleared. Most likely, it is determined by the safety purposes. But clearing itself often fails because of the following error:

#define MEMSET_BZERO(p,l)       memset((p), 0, (l))

void apr__SHA256_Final(sha2_byte digest[], SHA256_CTX* context) {
  ...
  MEMSET_BZERO(context, sizeof(context));
  ...
}

PVS-Studio's diagnostic message is the following: V512 A call of the 'memset' function will lead to underflow of the buffer '(context)'. apr sha2.c 560

Only the first bytes in the buffer are being cleared because the sizeof() operator returns the pointer size instead of the size of the data buffer. I've found at least 6 such errors.

The rest errors are boring, so I won't write about them. Thank you for your attention. Come and try PVS-Studio. We still give keys to developers of free open-source projects and those who feel a strong urge to try the analyzer at its full. Write to us.