Evgeniy Ovsyannikov

Sep 03 2020

Tags:

#Cpp

Checking QEMU using PVS-Studio

Sep 03 2020

Author: Evgeniy Ovsyannikov

About the analysis
Conclusion

QEMU is a rather well-known application for emulation. Static analysis can help developers of complex projects such as QEMU catch errors at early stages and generally improve quality and reliability of a project. In this article, we will check the source code of the QEMU application for potential vulnerabilities and errors using the PVS-Studio static analysis tool.

QEMU is free software designed to emulate the hardware of various platforms. It allows you to run applications and operating systems on hardware platforms that are different from target ones. For example, it is possible to run an application written for MIPS for the x86 architecture. QEMU also supports emulation of various peripherals, such as video cards, usb, etc. The project is quite complex and worthy of attention. Such projects are of interest in terms of static analysis, so we decided to scan its code using PVS-Studio.

About the analysis

The source code of the project can be obtained from the mirror on github. The project is quite large and can be compiled for various platforms. For easier code checking, let's use the PVS-Studio compilation monitoring system. This system is designed for very simple integration of static analysis into almost any build platform. The system is based on tracking compiler calls during the build and allows you to collect all the information for later files analysis. In other words, we just run the build, PVS-Studio collects the necessary information, and then we run the analysis - everything is simple. Details can be found by the link above.

After checking, the analyzer found a lot of potential problems. As for diagnostics related to general analysis, we got: 1940 diagnostics of High level, 1996 - Medium level, 9596 - Low level. After viewing all the warnings, I decided to focus on the diagnostics of the High level of certainty. There were quite a few such warnings (1940), but most of them are either of the same type or are associated with repeated use of a suspicious macro. For example, let's take a look at the g_new macro.

#define g_new(struct_type, n_structs)
                        _G_NEW (struct_type, n_structs, malloc)

#define _G_NEW(struct_type, n_structs, func)       \
  (struct_type *) (G_GNUC_EXTENSION ({             \
    gsize __n = (gsize) (n_structs);               \
    gsize __s = sizeof (struct_type);              \
    gpointer __p;                                  \
    if (__s == 1)                                  \
      __p = g_##func (__n);                        \
    else if (__builtin_constant_p (__n) &&         \
             (__s == 0 || __n <= G_MAXSIZE / __s)) \
      __p = g_##func (__n * __s);                  \
    else                                           \
      __p = g_##func##_n (__n, __s);               \
    __p;                                           \
  }))

For each use of this macro, the analyzer issues the V773 warning (Visibility scope of the '__p' pointer was exited without releasing the memory. A memory leak is possible). The g_new macro is defined in the glib library, it uses the _g_new macro, and this macro in turn uses another G_GNUC_EXTENSION macro that tells the GCC compiler to skip warnings about non-standard code. It is this non-standard code that triggers the analyzer's warning. Just look as the last but one line of code. In fact, the macro is valid. There were 848 warnings of this type, which means that almost half of the warnings occur in just one single place in the code.

All these unnecessary warnings can be easily removed using the analyzer settings. However, this particular case, which occurred when writing the article, is the reason for our team to slightly refine the logic of the analyzer for such situations.

Thus, a large number of warnings doesn't always indicate poor code quality. However, there are some really suspicious places. Well, let's get down to reviewing the warnings.

Warning N1

V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 2395, 2397. megasas.c 2395

#define MEGASAS_MAX_SGE 128             /* Firmware limit */
....
static void megasas_scsi_realize(PCIDevice *dev, Error **errp)
{
  ....
  if (s->fw_sge >= MEGASAS_MAX_SGE - MFI_PASS_FRAME_SIZE) {
    ....
  } else if (s->fw_sge >= 128 - MFI_PASS_FRAME_SIZE) {
    ....
  }
  ....
}

Any use of "magic" numbers in the code is always suspicious. There are two conditions here, and at a first glance, they seem different, but if you look at the value of the MEGASAS_MAX_SGE macro, it turns out that the conditions duplicate each other. Most likely, there is a typo and a different number should be written instead of 128. Sure, this is the problem with all "magic" numbers, one can easily mistype them. Using macros and constants will help a developer a lot in this case.

Warning N2

V523 The 'then' statement is equivalent to the 'else' statement. cp0_helper.c 383

target_ulong helper_mftc0_cause(CPUMIPSState *env)
{
  ....
  CPUMIPSState *other = mips_cpu_map_tc(env, &other_tc);

  if (other_tc == other->current_tc) {
    tccause = other->CP0_Cause;
  } else {
    tccause = other->CP0_Cause;
  }
  ....
}

In the code above, then and else bodies of the if statement are identical. Most likely, it's copy-paste. The author just copied the body of then branch, and forgot to fix it. As far as I can see, env should have been used instead of the other object. Fixing this suspicious place can look like this:

if (other_tc == other->current_tc) {
  tccause = other->CP0_Cause;
} else {
  tccause = env->CP0_Cause;
}

Only the developers of this code can clearly say how it should actually be. Another similar fragment:

V523 The 'then' statement is equivalent to the 'else' statement. translate.c 641

Warning N3

V547 Expression 'ret < 0' is always false. qcow2-cluster.c 1557

static int handle_dependencies(....)
{
  ....
  if (end <= old_start || start >= old_end) {
    ....
  } else {

    if (bytes == 0 && *m) {
      ....
      return 0;           // <= 3
    }

    if (bytes == 0) {
      ....
      return -EAGAIN;     // <= 4
    }
  ....
  }
  return 0;               // <= 5
}

int qcow2_alloc_cluster_offset(BlockDriverState *bs, ....)
{
  ....
  ret = handle_dependencies(bs, start, &cur_bytes, m);
  if (ret == -EAGAIN) {   // <= 2
    ....
  } else if (ret < 0) {   // <= 1
    ....
  }
}

Here, the analyzer found that the condition (comment 1) will never be met. The value of the ret variable is initialized by the result of executing the handle_dependencies function. This function returns only 0 or -EAGAIN (comments 3, 4, 5). Just above, in the first condition, we checked the value of ret for -EAGAIN (comment 2), so the result of executing the expression ret < 0 will always be false. It is possible that the handle_dependencies function used to return other values, but then, as a result of refactoring, for example, the behavior changed. Here one just has to complete the refactoring. Similar warnings:

V547 Expression is always false. qcow2.c 1070
V547 Expression 's->state != MIGRATION_STATUS_COLO' is always false. colo.c 595
V547 Expression 's->metadata_entries.present & 0x20' is always false. vhdx.c 769

Warning N4

V557 Array overrun is possible. The 'dwc2_glbreg_read' function processes value '[0..63]'. Inspect the third argument. Check lines: 667, 1040. hcd-dwc2.c 667

#define HSOTG_REG(x) (x)                                             // <= 5
....
struct DWC2State {
  ....
#define DWC2_GLBREG_SIZE    0x70
  uint32_t glbreg[DWC2_GLBREG_SIZE / sizeof(uint32_t)];              // <= 1
  ....
}
....
static uint64_t dwc2_glbreg_read(void *ptr, hwaddr addr, int index,
                                 unsigned size)
{
  ....
  val = s->glbreg[index];                                            // <= 2
  ....
}
static uint64_t dwc2_hsotg_read(void *ptr, hwaddr addr, unsigned size)
{
  ....
  switch (addr) {
    case HSOTG_REG(0x000) ... HSOTG_REG(0x0fc):                      // <= 4
        val = dwc2_glbreg_read(ptr, addr,
                              (addr - HSOTG_REG(0x000)) >> 2, size); // <= 3
    ....
  }
  ....
}

This code has a potential problem - an index outside the array bounds. The DWC2State structure defines a glbreg array consisting of 28 elements (comment 1). In the dwc2_glbreg_read function, our array is accessed by index (comment 2). Now note that the function dwc2_glbreg_read is passed the expression (addr - HSOTG_REG(0x000)) >> 2 (comment 3) as an index, which can take a value in the range [0..63]. To make sure of it, pay attention to comments 4 and 5. Perhaps, the range of values from comment 4 has to be fixed.

More similar warnings:

V557 Array overrun is possible. The 'dwc2_hreg0_read' function processes value '[0..63]'. Inspect the third argument. Check lines: 814, 1050. hcd-dwc2.c 814
V557 Array overrun is possible. The 'dwc2_hreg1_read' function processes value '[0..191]'. Inspect the third argument. Check lines: 927, 1053. hcd-dwc2.c 927
V557 Array overrun is possible. The 'dwc2_pcgreg_read' function processes value '[0..127]'. Inspect the third argument. Check lines: 1012, 1060. hcd-dwc2.c 1012

Warning N5

V575 The 'strerror_s' function processes '0' elements. Inspect the second argument. commands-win32.c 1642

void qmp_guest_set_time(bool has_time, int64_t time_ns, 
                        Error **errp)
{
  ....
  if (GetLastError() != 0) {
    strerror_s((LPTSTR) & msg_buffer, 0, errno);
    ....
  }
}

The strerror_s function returns the text description of the system error code. Its signature looks like this:

errno_t strerror_s( char *buf, rsize_t bufsz, errno_t errnum );

The first parameter is a pointer to the buffer where the text description will be copied, the second parameter is the buffer size, and the third parameter - the error code. The code passes 0 as the buffer size, which is clearly an incorrect value. By the way, it is possible to find out in advance how many bytes to allocate: one just needs to call strerrorlen_s, which returns the length of the error text description. This value can be used to allocate a buffer of sufficient size.

Warning N6

V595 The 'blen2p' pointer was utilized before it was verified against nullptr. Check lines: 103, 106. dsound_template.h 103

static int glue (
    ....
    DWORD *blen1p,
    DWORD *blen2p,
    int entire,
    dsound *s
    )
{
  ....
  dolog("DirectSound returned misaligned buffer %ld %ld\n",
        *blen1p, *blen2p);                         // <= 1
  glue(.... p2p ? *p2p : NULL, *blen1p,
                            blen2p ? *blen2p : 0); // <= 2
....
}

In this code, the value of the blen2p argument is first used (comment 1), and then checked for nullptr (comment 2). This extremely suspicious place looks as if one just forgot to insert a check before the first use (comment 1). As a correction option, one can just add a check:

dolog("DirectSound returned misaligned buffer %ld %ld\n",
      *blen1p, blen2p ? *blen2p : 0);

There is also a question about the blen1p argument. It can probably also be a null pointer, and you will also need to add a check here. A few more similar warnings:

V595 The 'ref' pointer was utilized before it was verified against nullptr. Check lines: 2191, 2193. uri.c 2191
V595 The 'cmdline' pointer was utilized before it was verified against nullptr. Check lines: 420, 425. qemu-io.c 420
V595 The 'dp' pointer was utilized before it was verified against nullptr. Check lines: 288, 294. onenand.c 288
V595 The 'omap_lcd' pointer was utilized before it was verified against nullptr. Check lines: 81, 87. omap_lcdc.c 81

Warning N7

V597 The compiler could delete the 'memset' function call, which is used to flush 'op_info' object. The RtlSecureZeroMemory() function should be used to erase the private data. virtio-crypto.c 354

static void virtio_crypto_free_request(VirtIOCryptoReq *req)
{
  if (req) {
    if (req->flags == CRYPTODEV_BACKEND_ALG_SYM) {
      ....
      /* Zeroize and free request data structure */
      memset(op_info, 0, sizeof(*op_info) + max_len); // <= 1
      g_free(op_info);
    }
    g_free(req);
  }
}

In this code fragment, the memset function is called for the op_info object (comment 1). After that, op_info is immediately deleted. In other words, after clearing, this object isn't modified anywhere else. This is exactly the case when the compiler can delete the memset call during optimization. To avoid this potential behavior, you can use special functions that the compiler never deletes. See also the article "Safe clearing of private data".

Warning N8

V610 Unspecified behavior. Check the shift operator '>>'. The left operand is negative ('number' = [-32768..2147483647]). cris.c 2111

static void
print_with_operands (const struct cris_opcode *opcodep,
         unsigned int insn,
         unsigned char *buffer,
         bfd_vma addr,
         disassemble_info *info,
         const struct cris_opcode *prefix_opcodep,
         unsigned int prefix_insn,
         unsigned char *prefix_buffer,
         bfd_boolean with_reg_prefix)
{
  ....
  int32_t number;
  ....
  if (signedp && number > 127)
    number -= 256;            // <= 1
  ....
  if (signedp && number > 32767)
    number -= 65536;          // <= 2
  ....
  unsigned int highbyte = (number >> 24) & 0xff;
  ....
}

Since the number variable can have a negative value, a bitwise shift to the right is an unspecified behavior. To make sure that the variable in question can take a negative value, look at comments 1 and 2. To eliminate differences in the behavior of your code on different platforms, you should avoid such cases.

More warnings:

V610 Undefined behavior. Check the shift operator '<<'. The left operand is negative ('(hclk_div - 1)' = [-1..15]). aspeed_smc.c 1041
V610 Undefined behavior. Check the shift operator '<<'. The left operand '(target_long) - 1' is negative. exec-vary.c 99
V610 Undefined behavior. Check the shift operator '<<'. The left operand is negative ('hex2nib(words[3][i * 2 + 2])' = [-1..15]). qtest.c 561

There are also several warnings of the same type, the difference is that the left operand is -1.

V610 Undefined behavior. Check the shift operator '<<'. The left operand '-1' is negative. hppa.c 2702

int print_insn_hppa (bfd_vma memaddr, disassemble_info *info)
{
  ....
  disp = (-1 << 10) | imm10;
  ....
}

Other similar warnings:

V610 Undefined behavior. Check the shift operator '<<'. The left operand '-1' is negative. hppa.c 2718
V610 Undefined behavior. Check the shift operator '<<'. The left operand '-0x8000' is negative. fmopl.c 1022
V610 Undefined behavior. Check the shift operator '<<'. The left operand '(intptr_t) - 1' is negative. sve_helper.c 889

Warning N9

V616 The 'TIMER_NONE' named constant with the value of 0 is used in the bitwise operation. sys_helper.c 179

#define HELPER(name) ....

enum {
  TIMER_NONE = (0 << 30),        // <= 1
  ....
}

void HELPER(mtspr)(CPUOpenRISCState *env, ....)
{
  ....
  if (env->ttmr & TIMER_NONE) {  // <= 2
    ....
  }
}

You can easily make sure that the value of the TIMER_NONE macro is zero (comment 1). This macro is then used in a bitwise operation, the result of which is always 0. As a result, the body of the conditional if statement if (env->ttmr & TIMER_NONE) will never be executed.

Warning N10

V629 Consider inspecting the 'n << 9' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. qemu-img.c 1839

#define BDRV_SECTOR_BITS   9
static int coroutine_fn convert_co_read(ImgConvertState *s, 
                  int64_t sector_num, int nb_sectors, uint8_t *buf)
{
  uint64_t single_read_until = 0;
  int n;
  ....
  while (nb_sectors > 0) {
    ....
    uint64_t offset;
    ....
    single_read_until = offset + (n << BDRV_SECTOR_BITS);
    ....
  }
  ....
}

In this code fragment, the n variable of the 32-bit signed type is shifted, then this 32-bit signed result is expanded to a 64-bit signed type. After that this result is added to the offset unsigned 64-bit variable as an unsigned type. Let's assume that at the time of executing the expression, the variable n has some significant high 9 bits. We perform a 9-bit shift operation (BDRV_SECTOR_BITS), and this, in turn, is undefined behavior, then we can get the set bit in the highest order as a result. Let me quickly remind you that this bit in the signed type is responsible for the sign, so the result can become negative. Since the n variable is of the signed type, the extension will take the sign into account. Further, the result is added to the offset variable. From these considerations, it is not difficult to see that the result of executing an expression may differ from the intended one. One possible solution is to replace the type of the n variable with a 64-bit unsigned type, i.e. uint64_t.

Here are other similar warnings:

V629 Consider inspecting the '1 << refcount_order' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. qcow2.c 3204
V629 Consider inspecting the 's->cluster_size << 3' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. qcow2-bitmap.c 283
V629 Consider inspecting the 'i << s->cluster_bits' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. qcow2-cluster.c 983
V629 Consider inspecting the expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. vhdx.c 1145
V629 Consider inspecting the 'delta << 2' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. mips.c 4341

Warning N11

V634 The priority of the '*' operation is higher than that of the '<<' operation. It's possible that parentheses should be used in the expression. nand.c 310

static void nand_command(NANDFlashState *s)
{
  ....
  s->addr &= (1ull << s->addrlen * 8) - 1;
  ....
}

This fragment is simply suspicious. It is not clear what the developer wanted to do first: shift or multiplication. Even if there is no error here, one still needs to look at the code again and put the parentheses correctly. This is just one of the places that developers should check out to make sure that their algorithm is correct. Other such fragments:

V634 The priority of the '*' operation is higher than that of the '<<' operation. It's possible that parentheses should be used in the expression. exynos4210_mct.c 449
V634 The priority of the '*' operation is higher than that of the '<<' operation. It's possible that parentheses should be used in the expression. exynos4210_mct.c 1235
V634 The priority of the '*' operation is higher than that of the '<<' operation. It's possible that parentheses should be used in the expression. exynos4210_mct.c 1264

Warning N12

V646 Consider inspecting the application's logic. It's possible that 'else' keyword is missing. pl181.c 400

static void pl181_write(void *opaque, hwaddr offset,
                        uint64_t value, unsigned size)
{
  ....
  if (s->cmd & PL181_CMD_ENABLE) {
    if (s->cmd & PL181_CMD_INTERRUPT) {
      ....
    } if (s->cmd & PL181_CMD_PENDING) { // <= else if
      ....
    } else {
      ....
    }
    ....
  }
  ....
}

In this code, judging by the formatting, the use of else if instead of if seems most attractive. Perhaps the author forgot to add else here. This way, the fragment can be fixed as follows:

} else if (s->cmd & PL181_CMD_PENDING) { // <= else if

However, there is a chance that this code is all right, and there is incorrect formatting of the program text, which is confusing. Then the correct code could look like this:

if (s->cmd & PL181_CMD_INTERRUPT) {
  ....
}
if (s->cmd & PL181_CMD_PENDING) { // <= if
  ....
} else {
  ....
}

Warning N13

V773 The function was exited without releasing the 'rule' pointer. A memory leak is possible. blkdebug.c 218

static int add_rule(void *opaque, QemuOpts *opts, Error **errp)
{
  ....
  struct BlkdebugRule *rule;
  ....
  rule = g_malloc0(sizeof(*rule));                   // <= 1
  ....
  if (local_error) {
    error_propagate(errp, local_error);
    return -1;                                       // <= 2
  }
  ....
  /* Add the rule */
  QLIST_INSERT_HEAD(&s->rules[event], rule, next);   // <= 3
  ....
}

In this code, the rule object is allocated (comment 1) and added to the list for later use (comment 3), but in case of an error, the function returns without deleting the previously created rule object (comment 2). The error just has to be handled correctly: one can delete the previously created object, otherwise there will be a memory leak.

Warning N14

V781 The value of the 'ix' index is checked after it was used. Perhaps there is a mistake in program logic. uri.c 2110

char *uri_resolve_relative(const char *uri, const char *base)
{
  ....
  ix = pos;
  if ((ref->path[ix] == '/') && (ix > 0)) {
  ....
}

Here, the analyzer detected a potential array index out of bounds. First, the ref->path array element is read by the ix index, and then ix is checked for correctness (ix > 0). The right solution here is to reverse these actions:

if ((ix > 0) && (ref->path[ix] == '/')) {

There were several such places:

V781 The value of the 'ix' index is checked after it was used. Perhaps there is a mistake in program logic. uri.c 2112
V781 The value of the 'offset' index is checked after it was used. Perhaps there is a mistake in program logic. keymaps.c 125
V781 The value of the 'quality' variable is checked after it was used. Perhaps there is a mistake in program logic. Check lines: 326, 335. vnc-enc-tight.c 326
V781 The value of the 'i' index is checked after it was used. Perhaps there is a mistake in program logic. mem_helper.c 1929

Warning N15

V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. cadence_gem.c 1486

typedef struct CadenceGEMState {
  ....
  uint32_t regs_ro[CADENCE_GEM_MAXREG];
}
....
static void gem_write(void *opaque, hwaddr offset, uint64_t val,
        unsigned size)
{
  ....
  val &= ~(s->regs_ro[offset]);
  ....
}

This code performs a bitwise operation with objects of different types. The left operand is the val argument that has a 64-bit unsigned type. The right operand is the received value of the array element s->regs_ro by the offset index, which has a 32-bit unsigned type. The result of the operation in the right side (~(s->regs_ro[offset])) is a 32-bit unsigned type. Before bitwise multiplication it will expand into the 64-bit type with zeros, that is, after evaluating the entire expression, all the higher bits of the val variable will be reset to zero. These places always look dubious. Here we can only recommend that developers review this code again. More similar fragments:

V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. xlnx-zynq-devcfg.c 199
V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. soc_dma.c 214
V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. fpu_helper.c 418

Warning N16

V1046 Unsafe usage of the 'bool' and 'unsigned int' types together in the operation '&='. helper.c 10821

static inline uint32_t extract32(uint32_t value, int start, int length);
....
static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
                                          ARMMMUIdx mmu_idx)
{
  ....
  bool epd, hpd;
  ....
  hpd &= extract32(tcr, 6, 1);
}

In this code snippet, a bitwise AND operation is performed with the hpd variable, which has the bool type, and with the result of executing the extract32 function, which is of the uint32_t type. Since the bit value of a boolean variable can only be 0 or 1, the result of the expression will always be false if the lowest bit returned by the extract32 function is zero. Let's consider this case using the example. Let's assume that the hpd value is true, and the function returns the value 2. So in the binary representation, the operation will look like 01 & 10 = 0, and the result of the expression will be false. Most likely, the programmer wanted to set the true value if the function returns something other than zero. Apparently, one has to fix the code so that the result of executing the function is cast to the bool type, for example, like this:

hpd = hpd && (bool)extract32(tcr, 6, 1);

Conclusion

As you can see, the analyzer found a lot of hinky places. It is possible that these potential problems so far do not manifest themselves in any way, but their presence can not but worry, since they are able to reveal themselves at the most unexpected moment. It's better to view all iffy places in advance and tweak them than keep fixing an endless flow of bugs. Obviously, for complex projects like this, static analysis can bring significant benefits, especially if you organize regular checks of the project.