>
>
>
PVS-Studio Probes into Linux' Innards (…

Andrey Karpov
Articles: 673

Sviatoslav Razmyslov
Articles: 90

PVS-Studio Probes into Linux' Innards (3.18.1)

For the sake of advertisement, we decided to analyze the Linux kernel with our static code analyzer. The difficulty of this task makes it especially interesting. Linux' source codes have been checked, and are still checked, by a number of different tools. So finding anything new was hardly probable. However, if we succeeded, it would be a nice advertisement for the PVS-Studio analyzer's capabilities.

What was checked

The Linux kernel was taken from the The Linux Kernel Archives site. We checked the Latest Stable Kernel 3.18.1.

At the time of writing this article, the kernel version 3.19-rc1 has been already released. Unfortunately, analyzing a project and writing an article takes quite a bit of work and time, so we have to settle for a check of a slightly obsolete version.

Here's my reply to those who may argue that we ought to have checked the latest version available.

  • We regularly check a number of projects, and we have lots of other tasks to fulfill besides free analysis of projects. This is why we absolutely cannot start all over, just because a new version was released. In doing so, we'd risk never publishing anything at all :).
  • 99% of all errors we found are still there. So you can still rely on this article to make the Linux kernel's code a bit better.
  • The purpose of this article is to advertise PVS-Studio. If we can find errors in version X of some project, then we can certainly find something in version Y too. Our checks are quite superficial (as we are not familiar with the project code) and their goal is to help us gather material for promo articles like this. What can really benefit a project is purchasing a PVS-Studio license, and regular use of the tool by the project authors.

How the check was done

We used the PVS-Studio static code analyzer version 5.21 to check the kernel.

For the check of the Linux Kernel, we took the Ubuntu-14.04 distribution, on which a lot of detailed guides are available, explaining how to configure and build the kernel. The analyzer checks preprocessed files that need to be obtained for successfully compilable files, so building a project is one of the most important analysis stages.

We then wrote a small utility in C++ that could save a command line, the current folder, and environment variables for each of the running compiler processes. Those of you familiar with PVS-Studio products, will immediately recall the PVS-Studio Standalone utility, that allows one to check any project under Windows. We use WinAPI to address processes in this utility, so we only had to rewrite this monitoring mechanism for Linux, while the rest of the code, dealing with preprocessing launch and analysis, was completely ported. So a Linux kernel check was only a matter of time.

A few words about security for a start

It happened somehow that people grew to treat the PVS-Studio analyzer solely as a tool to detect errors, and no one cares that it can also detect certain type of vulnerabilities. It's our own fault of course, and we have to improve the situation.

You see, messages generated by PVS-Studio can be treated in different ways. For instance, an issue can be both a typo and a vulnerability at the same time. It all depends on how you look at it.

I want you to take a look at a few warnings generated by PVS-Studio when analyzing Linux. It's not that I mean to say the analyzer found true vulnerabilities in Linux, but the warnings cited below could well do that.

Dangerous use of the memcmp() function

static unsigned char eprom_try_esi(
  struct atm_dev *dev, unsigned short cmd,
  int offset, int swap)
{
  unsigned char buf[ZEPROM_SIZE];
  struct zatm_dev *zatm_dev;
  int i;

  zatm_dev = ZATM_DEV(dev);
  for (i = 0; i < ZEPROM_SIZE; i += 2) {
    eprom_set(zatm_dev,ZEPROM_CS,cmd); /* select EPROM */
    eprom_put_bits(zatm_dev,ZEPROM_CMD_READ,ZEPROM_CMD_LEN,cmd);
    eprom_put_bits(zatm_dev,i >> 1,ZEPROM_ADDR_LEN,cmd);
    eprom_get_byte(zatm_dev,buf+i+swap,cmd);
    eprom_get_byte(zatm_dev,buf+i+1-swap,cmd);
    eprom_set(zatm_dev,0,cmd); /* deselect EPROM */
  }
  memcpy(dev->esi,buf+offset,ESI_LEN);
  return memcmp(dev->esi,"\0\0\0\0\0",ESI_LEN);
}

PVS-Studio's diagnostic message: V642 Saving the 'memcmp' function result inside the 'unsigned char' type variable is inappropriate. The significant bits could be lost breaking the program's logic. zatm.c 1168

Notice the 'return' operator at the very end of the function body.

The 'memcmp' function returns the following values of the 'int' type:

  • < 0 - buf1 less than buf2;
  • 0 - buf1 identical to buf2;
  • > 0 - buf1 greater than buf2;

Notice the following:

  • "> 0" means any number, not 1;
  • "< 0" is not necessarily -1.

There may be different return values: -100, 2, 3, 100, 256, 1024, 5555, and so on. It means that this result cannot be cast to the 'unsigned char' type (this is the type returned by the function).

Implicit type conversion may result in truncating significant bits, which will break program execution logic.

What is dangerous about such errors, is that the return value may depend on the architecture and implementation of a particular function on the given architecture. For example, a program may work well in the 32-bit version, but fail in the 64-bit.

So what does this mean? Just an incorrect check of something related to EPROM. It's an error of course, but what does it have to do with a vulnerability?

It means that the V642 diagnostic can reveal a vulnerability too! You don't believe me? OK, here is an identical piece of code from MySQL/MariaDB.

typedef char my_bool;
...
my_bool check(...) {
  return memcmp(...);
}

It was not PVS-Studio that had found this issue; but it well could have.

This error caused a severe vulnerability in MySQL/MariaDB up to versions 5.1.61, 5.2.11, 5.3.5, 5.5.22. The point about this, is that when a new MySQL /MariaDB user logs in, the token (SHA of the password and hash) is calculated and compared to the expected value by the 'memcmp' function. On some platforms, the return value may fall out of the [-128..127] range, so in 1 case out of 256, the procedure of comparing the hash to the expected value always returns 'true' regardless of the hash. As a result, an intruder can use a simple bash-command to gain root access to the vulnerable MySQL server, even if he doesn't know the password. This vulnerability was caused by the code fragment cited above, found in the 'sql/password.c' file.

Now let's get back to Linux. Here's another dangerous code fragment:

void sci_controller_power_control_queue_insert(....)
{
  ....
  for (i = 0; i < SCI_MAX_PHYS; i++) {
    u8 other;
    current_phy = &ihost->phys[i];
  
    other = memcmp(current_phy->frame_rcvd.iaf.sas_addr,
                   iphy->frame_rcvd.iaf.sas_addr,
                   sizeof(current_phy->frame_rcvd.iaf.sas_addr));

    if (current_phy->sm.current_state_id == SCI_PHY_READY &&
        current_phy->protocol == SAS_PROTOCOL_SSP &&
        other == 0) {
      sci_phy_consume_power_handler(iphy);
      break;
    }
  }
  ....
}

PVS-Studio's diagnostic message: V642 Saving the 'memcmp' function result inside the 'unsigned char' type variable is inappropriate. The significant bits could be lost, breaking the program's logic. host.c 1846

The return result of the memcmp() function is saved into the other variable of the unsigned char type. I don't think we're dealing with any vulnerability here, but the SCSI controller's work is in danger.

Here are a couple of other fragments of this kind:

  • V642 Saving the 'memcmp' function result inside the 'unsigned char' type variable is inappropriate. The significant bits could be lost breaking the program's logic. zatm.c 1168
  • V642 Saving the 'memcmp' function result inside the 'unsigned char' type variable is inappropriate. The significant bits could be lost breaking the program's logic. host.c 1789

Dangerous use of the memset() function

We continue searching for dangerous issues. Now let's check functions that clear private data. These are usually various encryption functions. Unfortunately, memory clearing is not always done correctly, and you risk getting quite an unpleasant result. To learn more about these unpleasant results, see the article "Overwriting memory - why?".

Let's take a look at a sample of incorrect code:

static int crypt_iv_tcw_whitening(....)
{
  ....
  u8 buf[TCW_WHITENING_SIZE];
  ....
  out:
  memset(buf, 0, sizeof(buf));
  return r;
}

PVS-Studio's diagnostic message: V597 The compiler could delete the 'memset' function call, which is used to flush 'buf' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. dm-crypt.c 708

All looks fine at first glance. The crypt_iv_tcw_whitening() function allocates a temporary buffer on the stack, encrypts something, and then clears the buffer with private data by calling the memset() function. However, the call of the memset() function will actually be deleted by the compiler in the course of optimization. From the viewpoint of the C/C++ language, the buffer is not used in any way after it has been cleared. Which means it's not necessary to clear it.

At the same time, this issue is very easy to miss. It can hardly be covered by unit testing; the debugger won't let you see it either (the call of the memset function will be there in the debug-version).

I want to draw your attention to this idea: this is not a "theoretically possible behavior" of the compiler, but rather, very much a real one. Compilers do tend to remove memset() function calls. To learn more about that, see the description of the V597 diagnostic.

In this particular example, PVS-Studio gives somewhat inappropriate recommendations about using the RtlSecureZeroMemory() function - but it's because it is oriented towards Windows. There's no such function in Linux of course, but the main point is to warn the user, while picking the necessary analogous function is not difficult at all.

Another similar example:

static int sha384_ssse3_final(struct shash_desc *desc, u8 *hash)
{
  u8 D[SHA512_DIGEST_SIZE];

  sha512_ssse3_final(desc, D);

  memcpy(hash, D, SHA384_DIGEST_SIZE);
  memset(D, 0, SHA512_DIGEST_SIZE);

  return 0;
}

PVS-Studio's diagnostic message: V597 The compiler could delete the 'memset' function call, which is used to flush 'D' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. sha512_ssse3_glue.c 222

Below is an example of code, where 4 buffers at once may fail to be cleared: keydvt_out, keydvt_in, ccm_n, mic. The code is taken from the security.c file (lines 525 - 528).

int wusb_dev_4way_handshake(....)
{
  ....
  struct aes_ccm_nonce ccm_n;
  u8 mic[8];
  struct wusb_keydvt_in keydvt_in;
  struct wusb_keydvt_out keydvt_out;
  ....
error_dev_update_address:
error_wusbhc_set_gtk:
error_wusbhc_set_ptk:
error_hs3:
error_hs2:
error_hs1:
  memset(hs, 0, 3*sizeof(hs[0]));
  memset(&keydvt_out, 0, sizeof(keydvt_out));
  memset(&keydvt_in, 0, sizeof(keydvt_in));
  memset(&ccm_n, 0, sizeof(ccm_n));
  memset(mic, 0, sizeof(mic));
  if (result < 0)
    wusb_dev_set_encryption(usb_dev, 0);
error_dev_set_encryption:
  kfree(hs);
error_kzalloc:
  return result;
  ....
}

And finally the last example of a password left "hanging about" in memory:

int
E_md4hash(const unsigned char *passwd, unsigned char *p16,
  const struct nls_table *codepage)
{
  int rc;
  int len;
  __le16 wpwd[129];

  /* Password cannot be longer than 128 characters */
  if (passwd) /* Password must be converted to NT unicode */
    len = cifs_strtoUTF16(wpwd, passwd, 128, codepage);
  else {
    len = 0;
    *wpwd = 0; /* Ensure string is null terminated */
  }

  rc = mdfour(p16, (unsigned char *) wpwd, len * sizeof(__le16));
  memset(wpwd, 0, 129 * sizeof(__le16));

  return rc;
}

PVS-Studio's diagnostic message: V597 The compiler could delete the 'memset' function call, which is used to flush 'wpwd' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. smbencrypt.c 224

Let's stop here. See the following files for 3 more bad memset() calls:

  • sha256_ssse3_glue.c 214
  • dev-sysfs.c 104
  • qp.c 143

Dangerous checks

The PVS-Studio analyzer includes the V595 diagnostic rule to detect issues when a pointer is first dereferenced, and then checked for NULL. Sometimes there's nothing tricky about this diagnostic. Let's examine the following simple case:

static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n)
{
  struct net *net = sock_net(skb->sk);
  struct nlattr *tca[TCA_ACT_MAX + 1];
  u32 portid = skb ? NETLINK_CB(skb).portid : 0;
  ....
}

PVS-Studio's diagnostic message: V595 The 'skb' pointer was utilized before it was verified against nullptr. Check lines: 949, 951. act_api.c 949

It's simple here. If the 'skb' pointer is null, we're in trouble. The pointer is dereferenced in the first line.

It should be noted, that it's not because of an unchecked pointer being dereferenced that the analyzer is angry with this code. There would be too many false positives that way. After all, it is impossible for a function argument to equal 0 sometimes, isn't it? And the check might have well been done somewhere before.

So, the logic of this diagnostic is different. PVS-Studio treats code as dangerous if a pointer is first dereferenced, and then checked. If there is a check for a pointer, then the programmer assumes it may equal 0. Therefore, a warning should be generated.

We're done with this simple example. But it was not that, which we were actually interested in.

Let's now pass on to a more complicated case related to compiler-driven optimizations.

static int podhd_try_init(struct usb_interface *interface,
        struct usb_line6_podhd *podhd)
{
  int err;
  struct usb_line6 *line6 = &podhd->line6;

  if ((interface == NULL) || (podhd == NULL))
    return -ENODEV;
  ....
}

PVS-Studio's diagnostic message: V595 The 'podhd' pointer was utilized before it was verified against nullptr. Check lines: 96, 98. podhd.c 96

This is an example of code people would start arguing about, claiming everything's alright. Their line of thought is the following.

Let the podhd pointer be equal to NULL. The &podhd->line6 expression doesn't look neat. But there's no error here. There's no memory addressing; it's just the address of one of the class members being calculated here. True, the value of the 'line6' pointer is incorrect - it points to "nowhere". But this pointer is not used, is it? An incorrect address was calculated, so what? There's a check a bit further in the code, so if 'podhd' is null, the function will terminate. The 'line6' pointer is not used anywhere, that's why no error will occur in reality.

Ladies and gentlemen, you are mistaken! You still can't do it that way. Don't be lazy about fixing code like this.

Here's the optimizing compiler's line of thought; the pointer is dereferenced here: podhd->line6. Aha, the programmer knows what he's doing. Then the pointer is surely not null here. Nice, I'll remember that.

And then the compiler stumbles across the following check:

if ((interface == NULL) || (podhd == NULL))
  return -ENODEV;

What does it now do? It optimizes it. It believes that the 'podhd' pointer doesn't equal zero. That's why it will reduce the check to the following code:

if ((interface == NULL))
  return -ENODEV;

Just like with memset(), working with the debug-version won't let you know that this check will be absent from the code, which makes this issue especially difficult to find.

As a result, if you pass a null pointer to the function, it will continue working instead of returning the (-ENODEV) status. The consequences of it are hard to predict.

The point here, is that the compiler can delete an important pointer check from a poorly written code. That is, there are functions which only pretend to check pointers. But in fact they will handle null pointers. I don't know if it can be exploited in any way, but I assume issues like this can be treated as potential vulnerabilities.

Another similar example:

int wpa_set_keys(struct vnt_private *pDevice, void *ctx,
     bool fcpfkernel) __must_hold(&pDevice->lock)
{
  ....
  if (is_broadcast_ether_addr(&param->addr[0]) ||
      (param->addr == NULL)) {
  ....
}

PVS-Studio's diagnostic message: V713 The pointer param->addr was utilized in the logical expression before it was verified against nullptr in the same logical expression. wpactl.c 333

While carrying out optimization, the compiler can reduce the check to the following code:

if (is_broadcast_ether_addr(&param->addr[0]))

The Linux kernel is large, so I got over 200 V595 warnings from the analyzer. Shame on me, I felt too lazy to look through all of them, and only picked one example for the article. All the other suspicious fragments are left for the developers to investigate. Here's the complete list: Linux-V595.txt.

True, not all of these warnings reveal genuine errors; in many cases, a pointer is absolutely sure not to be null. However, this list still ought to be checked. I bet there are a couple dozen genuine errors there.

Suspicious fragments

Perhaps not all the code fragments described in this article really contain errors. But they are quite strange and suspicious, and worthy of investigation by the developers.

Incorrect logic conditions

void b43legacy_phy_set_antenna_diversity(....)
{
  ....
  if (phy->rev >= 2) {
    b43legacy_phy_write(
      dev, 0x0461, b43legacy_phy_read(dev, 0x0461) | 0x0010);
    ....
  } else if (phy->rev >= 6)
    b43legacy_phy_write(dev, 0x049B, 0x00DC);
  ....
}

PVS-Studio's diagnostic message: V695 Range intersections are possible within conditional expressions. Example: if (A < 5) { ... } else if (A < 2) { ... }. Check lines: 2147, 2162. phy.c 2162

The second condition will never be true. Let's simplify the code to make it clearer:

if ( A >= 2)
  X();
else if ( A >= 6)
  Y();

As you can see, there's no such value in the 'A' variable that could trigger the call of the Y() function.

Now let's examine other similar cases. They don't need to be commented on.

static int __init scsi_debug_init(void)
{
  ....
  if (scsi_debug_dev_size_mb >= 16)
    sdebug_heads = 32;
  else if (scsi_debug_dev_size_mb >= 256)
   sdebug_heads = 64;
  ....
}

PVS-Studio's diagnostic message: V695 Range intersections are possible within conditional expressions. Example: if (A < 5) { ... } else if (A < 2) { ... }. Check lines: 3858, 3860. scsi_debug.c 3860

static ssize_t ad5933_store(....)
{
  ....
  /* 2x, 4x handling, see datasheet */
  if (val > 511)
    val = (val >> 1) | (1 << 9);
  else if (val > 1022)
    val = (val >> 2) | (3 << 9);
  ....
}

PVS-Studio's diagnostic message: V695 Range intersections are possible within conditional expressions. Example: if (A < 5) { ... } else if (A < 2) { ... }. Check lines: 439, 441. ad5933.c 441

There are a couple of other issues of this kind, which I won't cite here in order to keep the article short:

  • V695 Range intersections are possible within conditional expressions. Example: if (A < 5) { ... } else if (A < 2) { ... }. Check lines: 1417, 1422. bnx2i_hwi.c 1422
  • V695 Range intersections are possible within conditional expressions. Example: if (A < 5) { ... } else if (A < 2) { ... }. Check lines: 4815, 4831. stv090x.c 4831

Now let's examine another type of suspicious condition.

static int dgap_parsefile(char **in)
{
  ....
  int module_type = 0;
  ....
  module_type = dgap_gettok(in);
  if (module_type == 0 || module_type != PORTS ||
      module_type != MODEM) {
    pr_err("failed to set a type of module");
    return -1;
  }
  ....
}

PVS-Studio's diagnostic message: V590 Consider inspecting the 'module_type == 0 || module_type != 68' expression. The expression is excessive or contains a misprint. dgap.c 6733

I'm not familiar with the code, and I have no ideas on what this check should look like, so I won't make any comments on it. Here's another of the same kind:

  • V590 Consider inspecting the 'conc_type == 0 || conc_type != 65' expression. The expression is excessive or contains a misprint. dgap.c 6692

"Red-eye"

While studying the analyzer's messages, I came across a function named name_msi_vectors(). Although it is short, you absolutely don't feel like reading it. This is probably why it contains a very suspicious line.

static void name_msi_vectors(struct ipr_ioa_cfg *ioa_cfg)
{
  int vec_idx, n = sizeof(ioa_cfg->vectors_info[0].desc) - 1;

  for (vec_idx = 0; vec_idx < ioa_cfg->nvectors; vec_idx++) {
    snprintf(ioa_cfg->vectors_info[vec_idx].desc, n,
       "host%d-%d", ioa_cfg->host->host_no, vec_idx);
    ioa_cfg->vectors_info[vec_idx].
      desc[strlen(ioa_cfg->vectors_info[vec_idx].desc)] = 0;
  }
}

PVS-Studio's diagnostic message: V692 An inappropriate attempt to append a null character to a string. To determine the length of a string by 'strlen' function correctly, a string ending with a null terminator should be used in the first place. ipr.c 9409

It is the last line which I find strange:

ioa_cfg->vectors_info[vec_idx].
  desc[strlen(ioa_cfg->vectors_info[vec_idx].desc)] = 0;

Now I will omit it and you'll notice at once that something's not right here:

S[strlen(S)] = 0;

This statement is absolutely pointless. Zero will be written where it already is. I suspect the programmer wanted something else to happen.

Endless wait

static int ql_wait_for_drvr_lock(struct ql3_adapter *qdev)
{
  int i = 0;

  while (i < 10) {
    if (i)
      ssleep(1);

    if (ql_sem_lock(qdev,
        QL_DRVR_SEM_MASK,
        (QL_RESOURCE_BITS_BASE_CODE | (qdev->mac_index)
         * 2) << 1)) {
      netdev_printk(KERN_DEBUG, qdev->ndev,
              "driver lock acquired\n");
      return 1;
    }
  }

  netdev_err(qdev->ndev,
             "Timed out waiting for driver lock...\n");
  return 0;
}

PVS-Studio's diagnostic message: V654 The condition 'i < 10' of loop is always true. qla3xxx.c 149

The function is trying to lock the driver. If it fails, it waits for 1 second and tries again. There are total 10 attempts to make.

This number, however, will actually become infinite. The reason is that the 'i' variable is not incremented anywhere.

Incorrect error message

static int find_boot_record(struct NFTLrecord *nftl)
{
  ....
  if ((ret = nftl_read_oob(mtd, block * nftl->EraseSize +
                           SECTORSIZE + 8, 8, &retlen,
                           (char *)&h1) < 0) ) {
    printk(KERN_WARNING "ANAND header found at 0x%x in mtd%d, "
           "but OOB data read failed (err %d)\n",
           block * nftl->EraseSize, nftl->mbd.mtd->index, ret);
    continue;
  ....
}

PVS-Studio's diagnostic message: V593 Consider reviewing the expression of the 'A = B < C' kind. The expression is calculated as following: 'A = (B < C)'. nftlmount.c 92

Should an error occur, the function must print the information about it; including the error code. But it is actually (err 0) or (err 1) that will be printed instead of the real code error.

The reason is that the programmer made a mess of the operation precedence. He wanted to put the return result of the nftl_read_oob() function into the 'ret' variable at first; then he wanted to compare this variable to 0, and if (ret < 0) then get the error message printed.

In reality, it all works quite the other way. At first, the result of the nftl_read_oob() function is compared to 0. The comparison result is value 0 or 1. This value will be written into the 'ret' variable.

Thus, if the nftl_read_oob() function has returned a negative number, then ret == 1. The message will be printed alright, but it will be incorrect.

As you can see, additional parentheses are used in the condition. It's not known whether they were used to suppress the compiler warning about assignment inside 'if', or to explicitly specify the operation sequence. If the latter was meant, then we're dealing with a typo - a closing parenthesis is put in the wrong place. The correct way to write this code is as follows:

if ((ret = nftl_read_oob(mtd, block * nftl->EraseSize +
                         SECTORSIZE + 8, 8, &retlen,
                         (char *)&h1)) < 0 ) {

Potential typo

int wl12xx_acx_config_hangover(struct wl1271 *wl)
{
  ....
  acx->recover_time = cpu_to_le32(conf->recover_time);
  acx->hangover_period = conf->hangover_period;
  acx->dynamic_mode = conf->dynamic_mode;
  acx->early_termination_mode = conf->early_termination_mode;
  acx->max_period = conf->max_period;
  acx->min_period = conf->min_period;
  acx->increase_delta = conf->increase_delta;
  acx->decrease_delta = conf->decrease_delta;
  acx->quiet_time = conf->quiet_time;
  acx->increase_time = conf->increase_time;
  acx->window_size = acx->window_size;         // <=
  ....
}

PVS-Studio's diagnostic message: V570 The 'acx->window_size' variable is assigned to itself. acx.c 1728

All the fields of one structure are copied into fields of another structure, save for one:

acx->window_size = acx->window_size;

Is it an error? Correct code? I don't know.

Suspicious octal number

static const struct XGI330_LCDDataDesStruct2
  XGI_LVDSNoScalingDesData[] = {
  {0,  648,  448,  405,  96, 2}, /* 00 (320x200,320x400,
                                        640x200,640x400) */
  {0,  648,  448,  355,  96, 2}, /* 01 (320x350,640x350) */
  {0,  648,  448,  405,  96, 2}, /* 02 (360x400,720x400) */
  {0,  648,  448,  355,  96, 2}, /* 03 (720x350) */
  {0,  648,    1,  483,  96, 2}, /* 04 (640x480x60Hz) */
  {0,  840,  627,  600, 128, 4}, /* 05 (800x600x60Hz) */
  {0, 1048,  805,  770, 136, 6}, /* 06 (1024x768x60Hz) */
  {0, 1328,    0, 1025, 112, 3}, /* 07 (1280x1024x60Hz) */
  {0, 1438,    0, 1051, 112, 3}, /* 08 (1400x1050x60Hz)*/
  {0, 1664,    0, 1201, 192, 3}, /* 09 (1600x1200x60Hz) */
  {0, 1328,    0, 0771, 112, 6}  /* 0A (1280x768x60Hz) */
                  ^^^^
                  ^^^^
};

PVS-Studio's diagnostic message: V536 Be advised that the utilized constant value is represented by an octal form. Oct: 0771, Dec: 505. vb_table.h 1379

All the numbers in this structure are defined in decimal format. And suddenly there is one octal number: 0771. The analyzer didn't like it. Nor did I.

I suspect the programmer wrote this zero just for the column to look neatly even. But the value is obviously incorrect then.

Suspicious line

static void sig_ind(PLCI *plci)
{
  ....
  byte SS_Ind[] =
    "\x05\x02\x00\x02\x00\x00"; /* Hold_Ind struct*/
  byte CF_Ind[] =
    "\x09\x02\x00\x06\x00\x00\x00\x00\x00\x00";
  byte Interr_Err_Ind[] =
    "\x0a\x02\x00\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00";
  byte CONF_Ind[] =
    "\x09\x16\x00\x06\x00\x00\0x00\0x00\0x00\0x00";
                              ^^^^^^^^^^^^^^^^^^^
  ....
}

PVS-Studio's diagnostic message: V638 A terminal null is present inside a string. The '\0x00' characters were encountered. Probably meant: '\x00'. message.c 4883

The arrays contain some magic numbers. What I don't like is the contents of the CONF_Ind[] array. It contains nulls together with the "x00" text. I think it's a typo, and actually this line should look as follows:

byte CONF_Ind[] =
  "\x09\x16\x00\x06\x00\x00\x00\x00\x00\x00";

That is, '0' before 'x' is excessive and was added by mistake. It results in the "x00" values being interpreted as text, not as character codes.

Suspicious code formatting

static int grip_xt_read_packet(....)
{
  ....
  if ((u ^ v) & 1) {
    buf = (buf << 1) | (u >> 1);
    t = strobe;
    i++;
  } else

  if ((((u ^ v) & (v ^ w)) >> 1) & ~(u | v | w) & 1) {
  ....
}

PVS-Studio's diagnostic message: V705 It is possible that 'else' block was forgotten or commented out, thus altering the program's operation logics. grip.c 152

I don't think there's an error here. But the code is terribly formatted - that's why I decided to include it in the article. Perhaps it should be checked just in case.

Undefined behavior in shift operations

static s32 snto32(__u32 value, unsigned n)
{
  switch (n) {
  case 8:  return ((__s8)value);
  case 16: return ((__s16)value);
  case 32: return ((__s32)value);
  }
  return value & (1 << (n - 1)) ? value | (-1 << n) : value;
}

PVS-Studio's diagnostic message: V610 Undefined behavior. Check the shift operator '<<. The left operand '-1' is negative. hid-core.c 1016

Shifting negative numbers causes undefined behavior. I wrote a lot on that and won't dwell on it now. Those unfamiliar with the issue, see the article "Wade not in unknown waters. Part three (about shift operators)".

I can anticipate objections like, "but it works!"

Well, it probably does. But I don't think the Linux kernel is the kind of software where one can rely on such an approach. The code should be rewritten.

There are quite a lot of shifts like that, so I collected them all in one file: Linux-V610.txt.

Mess with enum

There are two enum's in the code:

enum iscsi_param {
  ....
  ISCSI_PARAM_CONN_PORT,
  ISCSI_PARAM_CONN_ADDRESS,        // <=
  ....
};

enum iscsi_host_param {
  ISCSI_HOST_PARAM_HWADDRESS,
  ISCSI_HOST_PARAM_INITIATOR_NAME,
  ISCSI_HOST_PARAM_NETDEV_NAME,
  ISCSI_HOST_PARAM_IPADDRESS,       // <=
  ISCSI_HOST_PARAM_PORT_STATE,
  ISCSI_HOST_PARAM_PORT_SPEED,
  ISCSI_HOST_PARAM_MAX,
};

Note the constants ISCSI_PARAM_CONN_ADDRESS, and ISCSI_HOST_PARAM_IPADDRESS; they have similar names, and this is what I feel to be the source of the mess.

Take a look at the following code fragment:

int iscsi_conn_get_addr_param(
  struct sockaddr_storage *addr,
  enum iscsi_param param, char *buf)
{
  ....
  switch (param) {
  case ISCSI_PARAM_CONN_ADDRESS:
  case ISCSI_HOST_PARAM_IPADDRESS:        // <=
  ....
  case ISCSI_PARAM_CONN_PORT:
  case ISCSI_PARAM_LOCAL_PORT:
  ....
  default:
    return -EINVAL;
  }

  return len;
}

PVS-Studio's diagnostic message: V556 The values of different enum types are compared: switch(ENUM_TYPE_A) { case ENUM_TYPE_B: ... }. libiscsi.c 3501

The ISCSI_HOST_PARAM_IPADDRESS constant doesn't relate to enum iscsi_param. This is most likely a typo, and the ISCSI_PARAM_CONN_ADDRESS constant must be used instead.

Other similar PVS-Studio's messages:

  • V556 The values of different enum types are compared: switch(ENUM_TYPE_A) { case ENUM_TYPE_B: ... }. svm.c 1360
  • V556 The values of different enum types are compared: switch(ENUM_TYPE_A) { case ENUM_TYPE_B: ... }. vmx.c 2690
  • V556 The values of different enum types are compared: switch(ENUM_TYPE_A) { case ENUM_TYPE_B: ... }. request.c 2842
  • V556 The values of different enum types are compared: switch(ENUM_TYPE_A) { case ENUM_TYPE_B: ... }. request.c 2868

Strange loop

I can't show you the code fragment for this as it is pretty large, and I don't know how to reduce and make it nicely formatted. So here is a pseudo-code instead.

void pvr2_encoder_cmd ()
{
  do {
    ....
    if (A) break;
    ....
    if (B) break;
    ....
    if (C) continue;
    ....
    if (E) break;
    ....
  } while(0);
}

The loop is executed once. I suspect the programmer chose to implement it that way in order to avoid using the goto operator. If something goes wrong, the 'break' operator is called, and the program starts executing operators after the loop.

What embarrasses me is that there is the 'continue' operator instead of 'break' in one case. At the same time, it works as if it were 'break'. Let me explain the point.

Here's what the standard have to say about this:

§6.6.2 in the standard: "The continue statement (...) causes control to pass to the loop-continuation portion of the smallest enclosing iteration-statement, that is, to the end of the loop." (Not to the beginning.)

Thus, the (0) condition will be checked after calling the 'continue' operator, and the loop will terminate as the condition is false.

There are 2 possible explanations.

  • The code is correct. The 'continue' operator is indeed meant to terminate the loop. If this is the case, I recommend replacing it with 'break' for the sake of uniformity, and in order not to confuse developers who will maintain the code in the future.
  • The 'continue' operator is meant to resume the loop. Then the code is incorrect, and should be rewritten.

Copy-Paste error

void dm_change_dynamic_initgain_thresh(
  struct net_device *dev, u32 dm_type, u32 dm_value)
{
  ....
  if (dm_type == DIG_TYPE_THRESH_HIGH)
  {
    dm_digtable.rssi_high_thresh = dm_value;
  }
  else if (dm_type == DIG_TYPE_THRESH_LOW)
  {
    dm_digtable.rssi_low_thresh = dm_value;
  }
  else if (dm_type == DIG_TYPE_THRESH_HIGHPWR_HIGH)      // <=
  {                                                      // <=
    dm_digtable.rssi_high_power_highthresh = dm_value;   // <=
  }                                                      // <=
  else if (dm_type == DIG_TYPE_THRESH_HIGHPWR_HIGH)      // <=
  {                                                      // <=
    dm_digtable.rssi_high_power_highthresh = dm_value;   // <=
  }                                                      // <=
  ....
}

PVS-Studio's diagnostic message: V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 1755, 1759. r8192U_dm.c 1755

The code was written through the Copy-Paste technique, and in one place the programmer forgot to replace:

  • DIG_TYPE_THRESH_HIGHPWR_HIGH with DIG_TYPE_THRESH_HIGHPWR_LOW
  • rssi_high_power_highthresh with rssi_high_power_lowthresh

Also, I'd like the developers to pay attention to the following fragments:

  • V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 1670, 1672. rtl_dm.c 1670
  • V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 530, 533. ioctl.c 530

Re-initialization

There are strange fragments where a variable is assigned different values twice on end. I guess these places should be examined.

static int saa7164_vbi_fmt(struct file *file, void *priv,
                           struct v4l2_format *f)
{
  /* ntsc */
  f->fmt.vbi.samples_per_line = 1600;           // <=
  f->fmt.vbi.samples_per_line = 1440;           // <=
  f->fmt.vbi.sampling_rate = 27000000;
  f->fmt.vbi.sample_format = V4L2_PIX_FMT_GREY;
  f->fmt.vbi.offset = 0;
  f->fmt.vbi.flags = 0;
  f->fmt.vbi.start[0] = 10;
  f->fmt.vbi.count[0] = 18;
  f->fmt.vbi.start[1] = 263 + 10 + 1;
  f->fmt.vbi.count[1] = 18;
  return 0;
}

PVS-Studio's diagnostic message: V519 The 'f->fmt.vbi.samples_per_line' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 1001, 1002. saa7164-vbi.c 1002

static int saa7164_vbi_buffers_alloc(struct saa7164_port *port)
{
  ....
  /* Init and establish defaults */
  params->samplesperline = 1440;
  params->numberoflines = 12;                           // <=
  params->numberoflines = 18;                           // <=
  params->pitch = 1600;                                 // <=
  params->pitch = 1440;                                 // <=
  params->numpagetables = 2 +
    ((params->numberoflines * params->pitch) / PAGE_SIZE);
  params->bitspersample = 8;
   ....
}

PVS-Studio's diagnostic messages:

  • V519 The 'params->numberoflines' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 118, 119. saa7164-vbi.c 119
  • V519 The 'params->pitch' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 120, 121. saa7164-vbi.c 121

Conclusion

Errors can be found in any large project. The Linux kernel is no exception. However, running a static analyzer for occasional checks, is not the right way of using it. True, they can help you write a promo article like this, but they are of little use to the project.

Use static analysis regularly, and it will help you save plenty of time by detecting a number of errors almost as soon as they have been introduced into the code. Protect your project from bugs with a static analyzer!

Anyone interested is welcome to try PVS-Studio on their projects. The analyzer runs under Windows. If you want to use it in development of large Linux applications, write to us, and we'll discuss possible options for drawing a contract on adapting PVS-Studio for your projects and tasks.