To get a trial key
fill out the form below
Team License (a basic version)
Enterprise License (an extended version)
* By clicking this button you agree to our Privacy Policy statement

Request our prices
New License
License Renewal
--Select currency--
USD
EUR
GBP
RUB
* By clicking this button you agree to our Privacy Policy statement

Free PVS-Studio license for Microsoft MVP specialists
* By clicking this button you agree to our Privacy Policy statement

To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

Message submitted.

Your message has been sent. We will email you at


If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

>
>
>
Wanna Play a Detective? Find the Bug in…

Wanna Play a Detective? Find the Bug in a Function from Midnight Commander

Feb 07 2019
Author:

In this article, we invite you to try to find a bug in a very simple function from the GNU Midnight Commander project. Why? For no particular reason. Just for fun. Well, okay, it's a lie. We actually wanted to show you yet another bug that a human reviewer has a hard time finding and the static code analyzer PVS-Studio can catch without effort.

0610_Detective_and_MC/image1.png

A user sent us an email the other day, asking why he was getting a warning on the function EatWhitespace (see the code below). This question is not as trivial as it might seem. Try to figure out for yourself what's wrong with this code.

static int
EatWhitespace (FILE * InFile)
  /* ----------------------------------------------------------------------- **
   * Scan past whitespace (see ctype(3C)) and return the first non-whitespace
   * character, or newline, or EOF.
   *
   *  Input:  InFile  - Input source.
   *
   *  Output: The next non-whitespace character in the input stream.
   *
   *  Notes:  Because the config files use a line-oriented grammar, we
   *          explicitly exclude the newline character from the list of
   *          whitespace characters.
   *        - Note that both EOF (-1) and the nul character ('\0') are
   *          considered end-of-file markers.
   *
   * ----------------------------------------------------------------------- **
   */
{
    int c;

    for (c = getc (InFile); isspace (c) && ('\n' != c); c = getc (InFile))
        ;
    return (c);
}                               /* EatWhitespace */

As you can see, EatWhitespace is a tiny function; its body is even smaller than the comment on it :). Now, let's check a few details.

Here's the description of the function getc:

int getc ( FILE * stream );

Returns the character currently pointed to by the internal file position indicator of the specified stream. The internal file position indicator is then advanced to the next character. If the stream is at the end-of-file when called, the function returns EOF and sets the end-of-file indicator for the stream. If a read error occurs, the function returns EOF and sets the error indicator for the stream (ferror).

And here's the description of the function isspace:

int isspace( int ch );

Checks if the given character is a whitespace character as classified by the currently installed C locale. In the default locale, the whitespace characters are the following:

  • space (0x20, ' ');
  • form feed (0x0c, '\f');
  • line feed LF (0x0a, '\n');
  • carriage return CR (0x0d, '\r');
  • horizontal tab (0x09, '\t');
  • vertical tab (0x0b, '\v').

Return value. Non-zero value if the character is a whitespace character; zero otherwise.

The EatWhitespace function is expected to skip all whitespace characters except line feed '\n'. The function will also stop reading from the file when it encounters End of file (EOF).

Now that you know all that, try to find the bug!

The two unicorns below will make sure you don't accidentally peek at the comment.

0610_Detective_and_MC/image2.png

Figure 1. Time for bug searching. The unicorns are waiting.

Still no luck?

Well, you see, it's because we have lied to you about isspace. Bwa-ha-ha! It's not a standard function at all - it's a custom macro. Yeah, we're baddies and we got you confused.

0610_Detective_and_MC/image3.png

Figure 2. Unicorn confusing readers about isspace.

It's not us or our unicorn to blame, of course. The fault for all the confusion lies with the authors of the GNU Midnight Commander project, who made their own implementation of isspace in the file charset.h:

#ifdef isspace
#undef isspace
#endif
....
#define isspace(c) ((c)==' ' || (c) == '\t')

With this macro, the authors confused other developers. The code was written under the assumption that isspace is a standard function, which considers carriage return (0x0d, '\r') a whitespace character.

The custom macro, in its turn, treats only space and tab characters as whitespace characters. Let's substitute that macro and see what happens.

for (c = getc (InFile);
     ((c)==' ' || (c) == '\t') && ('\n' != c);
     c = getc (InFile))

The ('\n' != c) subexpression is unnecessary (redundant) since it will always evaluate to true. That's what PVS-Studio warns you about by outputting the warning:

V560 A part of conditional expression is always true: ('\n' != c). params.c 136.

To make it clear, let's examine 3 possible outcomes:

  • End of file reached. EOF is not a space or tab character. The ('\n' != c) subexpression is not evaluated due to short-circuit evaluation. The loop terminates.
  • The function has read some character that's not a space or tab character. The ('\n' != c) subexpression is not evaluated due to short-circuit evaluation. The loop terminates.
  • The function has read a space or horizontal tab character. The ('\n' != c) subexpression is evaluated, but its result is always true.

In other words, the code above is equivalent to the following:

for (c = getc (InFile); c==' ' || c == '\t'; c = getc (InFile))

We have found that it doesn't work the desired way. Now let's see what the implications are.

A developer, who wrote the call of isspace in the body of the EatWhitespace function expected that the standard function would be called. That's why they added the condition preventing the LF character ('\n') from being treated as a whitespace character.

It means that, besides space and horizontal tab characters, they were planning to skip form feed and vertical tab characters as well.

What's more remarkable is that they wanted the carriage return character (0x0d, '\r') to be skipped too. It doesn't happen though - the loop terminates when encountering this character. The program will end up behaving unexpectedly if newlines are represented by the CR+LF sequence, which is the type used in some non-UNIX systems such as Microsoft Windows.

For more details about the historical reasons for using LF or CR+LF as newline characters, see the Wikipedia page "Newline".

The EatWhitespace function was meant to process files in the same way, whether they used LF or CR+LF as newline characters. But it fails in the case of CR+LF. In other words, if your file is from the Windows world, you're in trouble :).

While this might not be a serious bug, especially considering that GNU Midnight Commander is used in UNIX-like operating systems, where LF (0x0a, '\n') is used as a newline character, trifles like that still tend to lead to annoying problems with compatibility of data prepared on Linux and Windows.

What makes this bug interesting is that you are almost sure to overlook it while carrying out standard code review. The specifics of the macro's implementation are easy to forget, and some project authors may not know them at all. It's a very vivid example of how static code analysis contributes to code review and other bug detection techniques.

Overriding standard functions is a bad practice. By the way, we discussed a similar case of the #define sprintf std::printf macro in the recent article "Appreciate Static Code Analysis".

A better solution would have been to give the macro a unique name, for example, is_space_or_tab. This would have helped to avoid all the confusion.

Perhaps the standard isspace function was too slow and the programmer created a faster version, sufficient for their needs. But they still shouldn't have done it that way. A safer solution would be to define isspace so that you would get non-compilable code, while the desired functionality could be implemented as a macro with a unique name.

Thanks for reading. Don't hesitate to download PVS-Studio and try it with your projects. As a reminder, we now support Java too.

Popular related articles
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
The Ultimate Question of Programming, Refactoring, and Everything

Date: Apr 14 2016

Author: Andrey Karpov

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The au…
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
Appreciate Static Code Analysis!

Date: Oct 16 2017

Author: Andrey Karpov

I am really astonished by the capabilities of static code analysis even though I am one of the developers of PVS-Studio analyzer myself. The tool surprised me the other day as it turned out to be sma…
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…

Comments (0)

Next comments
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →
Accept