Anton Tokarev

Mar 19 2015

Tags:

#Cpp

Analyzing Vim by PVS-Studio in GNU/Linux

Mar 19 2015

Author: Anton Tokarev

A few words about Vim for a start
Now about the check itself
An excessive check
A strange memset
A strange loop
Scopes
An error with signed and unsigned types in one expression
A pointer initialized to NULL and not changed anywhere while still being used
Unsafe use of realloc
A few false positives
Poorly written code where only the first byte of a structure is filled
The poor practice of using short names
A strange assignment of i in a condition
Conclusion

You probably thought that this would be another article about how we checked one more open-source project. But this article is actually not as much about the check itself, as it is about the practice of using the PVS-Studio analyzer in the fully GNU/Linux environment. It's not by chance that we chose the Vim project for the check, for it had also contributed to the fulfillment of this task.

A few words about Vim for a start

Vim (http://www.vim.org/) is a cross-platform free text editor with a 30-year history, a successor of the vi editor, coming from the world of Unix systems.

Vim is widely used in administration and development tasks, and is the default text editor in many GNU/Linux distributions. What distinguishes it from other text editors is that it is designed to be used with the keyboard only, its text interface, and rich extension capabilities through a system of Vim Script plugins.

Now about the check itself

One of the ways to analyze projects under Linux is to integrate the analyzer into the build system, for example GNU Make. It is this method that we chose to check Vim. For each compiler call, we added an analyzer call into the make-file. To make it more convenient, we wrapped this call into the Make variable in the following way:

#PVS Studio vars
PVS_CFLAGS = $(ALL_CFLAGS)
PVS_INCFLAGS = -I$(srcdir)
PVS_STUDIO = ~/PVS-Studio/PVS-Studio -cfg \
    ~/PVS-Studio/PVS-Studio_vim.cfg --source-file \
    $< --cl-params $(PVS_CFLAGS) -c $(PVS_INCFLAGS) $<

Then we built the project in the usual way through the make command (if you wish, you can add a separate target for analysis, for example ".analysis"). Besides the built project itself, the output also included a raw analysis log.

Note. When a project is built in parallel, the analyzer can run in parallel too. Every running instance of the analyzer adds its own portion of diagnostic messages into the log. So keep in mind that the analyzer doesn't clear the file with the raw log. Therefore, you need to delete the log of the previous check manually before running a new analysis.

It's hardly possible to work with the raw log because it contains plenty of duplicate messages (when one .h-file is included into several .cpp-files). After making changes in the analysis parameters, by editing the configuration file, you need to re-launch the analysis to apply these changes, which significantly increases the total analysis time for large projects. You have to do this even if you just wanted, for example, to turn off messages for files in a certain folder. To solve this issue, we wrote a log-parsing utility in C++ to processes PVS-Studio's raw log, remove duplicate messages, apply filters specified in its options file to the messages, and output the warnings in one of the supported formats. The utility is very fast (complete log parsing takes no more than 2-3 seconds, even with huge projects), which allows the user to quickly and easily change any analysis parameters, and get a new list of diagnostic messages.

If necessary, you can add other output formats. By default, the tool currently supports two of them: xml and the so called errorfile. So far as I know, it doesn't have any official name; this format is the one that many Linux programs use to output their messages, for example grep, gcc compilation errors, and so on. And it was also this format that we made use of for our task.

Unlike Windows, where the majority of developers use Visual Studio, the GNU/Linux world offers a variety of IDE's, text editors and other programs, each of which has its own followers. There's no prominent preference or single opinion among programmers regarding which tool to use, so everyone chooses tools to their liking. Nevertheless, when you do project analysis, you need to not only get messages, but also have a mechanism to conveniently work with them, as is provided by PVS-Studio's integration with Visual Studio. The error message format described above is sort of standard for Linux programs, and most editors and IDE's support it to a certain degree, though in most cases this support only allows reading compiler messages from stderr when building the project. And in our case, it is much more convenient to take the analyzer's messages from a file prepared beforehand.

This is where the Vim editor came in handy. Of course, we could develop a corresponding plugin for any of the other tools, but Vim appeared to provide this option by default.

Figure 1 - Running Vim with analysis log.

You just need to run the vim -q <errorfile> command after the analyzer and the log-processing utility are finished with their job, after which the editor will open, where you should run a command to create a buffer with errors, for example :cw 20. And now we've got a comfortable environment to work with the analyzer's messages, and perform code navigation. Yes, I had to spend a few hours on studying Vim itself, for I had never worked in it before, and the basics of its usage are very different to more traditional text editors. However, I can say at last that I do like how comfortable it is to work with, and that now I count it among useful and powerful tools, instead of treating it as some mysterious alien thing. Therefore, I didn't have to think long which project to choose for analysis - surely it had to be Vim itself. Its code proved to be of a very high quality and I found no obvious bugs in it (though the coding style is somewhat arguable in certain places, but I think that has more to do with the project age than anything). Nevertheless, I still found a few fragments which should be reviewed. Let's take a closer look at them.

An excessive check

    if (ptr == NULL)
    {
        if (compl_leader != NULL)
            ptr = compl_leader;
        else
            return;  /* nothing to do */
    }
    if (compl_orig_text != NULL)
    {
        p = compl_orig_text;
        for (len = 0; p[len] != NUL && p[len] == ptr[len]; ++len)
        ;
#ifdef FEAT_MBYTE
        if (len > 0)
            len -= (*mb_head_off)(p, p + len);
#endif
        for (p += len; *p != NUL; mb_ptr_adv(p))
            AppendCharToRedobuff(K_BS);
    }
    else
        len = 0;
    if (ptr != NULL)
        AppendToRedobuffLit(ptr + len, -1);

PVS-Studio's diagnostic message: V595 (1) The 'ptr' pointer was utilized before it was verified against nullptr. Check lines: 3922, 3933.

The ptr pointer is already checked for NULL earlier in the code and assigned the comp_leader pointer, which is surely non-null if that check evaluates to true. So the second check is not necessary.

A strange memset

/*
* If requested, store and reset the global values controlling
* the exception handling (used when debugging). Otherwise avoid
* clear it to a bogus compiler warning when the optimizer
* uses inline functions...
*/
if (flags & DOCMD_EXCRESET)
  save_dbg_stuff(&debug_saved);
else
  vim_memset(&debug_saved, 0, 1);

where debug_saved is a structure object

struct dbg_stuff
{
    int        trylevel;
    int        force_abort;
    except_T    *caught_stack;
    char_u    *vv_exception;
    char_u    *vv_throwpoint;
    int        did_emsg;
    int        got_int;
    int        did_throw;
    int        need_rethrow;
    int        check_cstack;
    except_T    *current_exception;
};

PVS-Studio's diagnostic message: V512 (1) A call of the 'memset' function will lead to underflow of the buffer '& debug_saved'.

It's hard to say why the programmer would like to clear only the first byte of the structure. If it is used as a flag, it should be defined as a separate field of the structure (union will do too).

A strange loop

/* check for out-of-memory */
for (i = 0; i < num_names; ++i)
{
  if (names[i] == NULL)
  {
    for (i = 0; i < num_names; ++i)
      vim_free(names[i]);
    num_names = 0;
  }
}

PVS-Studio's diagnostic message: V535 (1) The variable 'i' is being used for this loop and for the outer loop. Check lines: 1893, 1897.

Both in the external and internal loops, one and the same counter i is used to iterate through one and the same array. Yes, the very first triggering of the if (names[i] == NULL) condition will prevent execution of the next step of this loop, but a programmer not familiar with this code will have to think it over for a while to figure out the logic of this code, while its odd style provokes some doubt if the author really meant this behavior. In other words, although there is no bug here, the code still smells a bit. I think the 'break' operator would suit better to terminate the loop.

Scopes

char_u *p, *old;
//...
{
    char_u        buffer[BUFLEN + 1];
    //...
    for (p = buffer; p < buffer + len; p += l)
    //...

PVS-Studio's diagnostic message: V507 (2) Pointer to local array 'buffer' is stored outside the scope of this array. Such a pointer will become invalid.

There are lots of fragments like this in Vim's code (another example of the issue with the coding style). The p pointer declared in the very beginning of the function (sometimes even with the global scope), is used to store a pointer to an array that exists only in a smaller scope, and will be deleted after leaving its code block. If I got it right after brief examination, the p pointer is used only when assigned a new value after leaving the buffer scope, but there's a risk of missing it in some places. I can't understand why one would choose to do it like that, instead of simply declaring another variable inside the buffer scope (can it be just for the sake of saving space on the stack?). This code is very difficult to read and maintain.

An error with signed and unsigned types in one expression

for (cu = 1; cu <= 255; cu++)
    if (VIM_ISDIGIT(cu))
        regc(cu);

where

#define VIM_ISDIGIT(c) ((unsigned)(c) - '0' < 10)

PVS-Studio's diagnostic message: V658 (2) A value is being subtracted from the unsigned variable. This can result in an overflow. In such a case, the '<' comparison operation can potentially behave unexpectedly. Consider inspecting the '(unsigned)(cu) - '0' < 10' expression.

This code looks rather like a dirty hacking trick. When evaluating the ((unsigned)(c) - '0' < 10) expression, the subtraction operation will evaluate to an unsigned value, while comparing both parts of the expression will also be cast to the unsigned type. Therefore, when the cu variable is smaller than the numerical value 0, an overflow will occur. In this particular case, the code works fine, and fulfills its purpose (to check if a character is a numeral), but I don't think one should use tricks like that when it's not really necessary. The loop could have been set to start iterating with '0', without the type conversion to unsigned.

A pointer initialized to NULL and not changed anywhere while still being used

char_u    *retval = NULL;
//...
if (round == 2)
  vim_strncpy(retval, s, len); //first use of retval
//...
if (retval == NULL)
{

PVS-Studio's diagnostic message: V595 (1) The 'retval' pointer was utilized before it was verified against nullptr. Check lines: 7903, 7907.

Now, this does look like a bug. The analyzer warns us about an excessive check, but the real problem is actually about quite a different thing. The retval pointer is initialized to 0, and I haven't found a single line in this function where its value changes. At the same time, it is used multiple times as a target for strncpy. After that the programmer suddenly decides to check it for NULL.

Unsafe use of realloc

/* TODO: check for vim_realloc() returning NULL. */
l->t = vim_realloc(l->t, newlen * sizeof(nfa_thread_T));

PVS-Studio's diagnostic message V701 (2) realloc() possible leak: when realloc() fails in allocating memory, original pointer 'l->t' is lost. Consider assigning realloc() to a temporary pointer.

It's a very frequent bug in many projects which is described in detail in the message text. Luckily, as suggested by the comment, it will be fixed soon. In all the rest of Vim's code, realloc is used correctly.

A few false positives

if (ireg_icombine && len == 0)
{
  /* If \Z was present, then ignore composing characters.
   * When ignoring the base character this always matches. */
   if (len == 0 && sta->c != curc)
     result = FAIL;

V560 (2) A part of conditional expression is always true: len == 0.

V571 (2) Recurring check. The 'len == 0' condition was already verified in line 6032.

if (VIsual_active)
{
  if (VIsual_active
      && (VIsual_mode != wp->w_old_visual_mode
      || type == INVERTED_ALL))

V571 (2) Recurring check. The 'VIsual_active' condition was already verified in line 1515.

There are a few other fragments with similar checks. They are not of much interest to us, and do not affect the code in most cases, but some of them still may contain logic errors, so these fragments should be reviewed.

Poorly written code where only the first byte of a structure is filled

#ifdef FEAT_TAG_BINS
  /* This is only to avoid a compiler warning for using search_info
  * uninitialised. */
  vim_memset(&search_info, 0, (size_t)1);
#endif

V512 (1) A call of the 'memset' function will lead to underflow of the buffer '& search_info'.

It is explained in the comment why the programmer did this, but this is quite a strange method indeed. There are much neater ways to avoid the compiler's warning.

The poor practice of using short names

extern char *UP, *BC, PC;

PVS-Studio's diagnostic message: V707 (2) Giving short names to global variables is considered to be bad practice. It is suggested to rename 'UP', 'BC', 'PC' variables.

This practice is not rare in Vim. Many variables have 1- or 2-character names, often with a large scope, and in this particular case it's even global. Add functions occupying 500+ code lines and you get code which is very hard to read.

A strange assignment of i in a condition

int i = 2; /* index in s[] just after <Esc>[ or CSI */
//...
if (n >= 8 && t_colors >= 16
    && ((s[0] == ESC && s[1] == '[')
        || (s[0] == CSI && (i = 1) == 1))
    && s[i] != NUL
    && (STRCMP(s + i + 1, "%p1%dm") == 0
    || STRCMP(s + i + 1, "%dm") == 0)
    && (s[i] == '3' || s[i] == '4'))

PVS-Studio's diagnostic message: V560 (2) A part of conditional expression is always true: (i = 1) == 1.

I can't say for sure if this is a bug or just an odd way to assign a one to i. But one shouldn't write it that way for sure.

Conclusion

To sum up, I'd like you to note that it has now become feasible, and pretty comfortable, to analyze projects with PVS-Studio under GNU Linux without using a Windows machine. Among other things, it was made possible thanks to Vim, which made it the first candidate to undergo such a check.

#Cpp