>
>
>
About embedded again: searching for bug…

George Gribkov
Articles: 20

About embedded again: searching for bugs in the Embox project

Embox is a cross-platform, multi-tasking real-time operating system for embedded systems. It is designed to work with limited computing resources and allows you to run Linux-based applications on microcontrollers without using Linux itself. Certainly, the same as other applications, Embox couldn't escape from bugs. This article is devoted to the analysis of errors found in the code of the Embox project.

A few months ago, I already wrote an article about checking FreeRTOS, another OS for embedded systems. I did not find errors in it then, but I found them in libraries added by the guys from Amazon when developing their own version of FreeRTOS.

The article that you are reading at the moment, in some way continues the topic of the previous one. We often received requests to check FreeRTOS, and we did it. This time, there were no requests to check a specific project, but I began to receive emails and comments from embedded developers who liked the previous review, and wanted more of them.

Well, the new publication of the column "PVS-Studio Embedded" is completed and is right in front of you. Enjoy reading!

The analysis procedure

The analysis was carried out using PVS-Studio - the static code analyzer for C, C++, C#, and Java. Before the analysis, the project needs to be built - this way we will be sure that the project code is working, and we will also give the analyzer the opportunity to collect the built information that can be useful for better code checking.

The instructions in the official Embox repository offer the ability to build under different systems (Arch Linux, macOS, Debian) and using Docker. I decided to add some variety to my life - to build and analyze the project under Debian, which I've recently installed on my virtual machine.

The build went smoothly. Now I had to move on to the analysis. Debian is one of the Linux-based systems supported by PVS-Studio. A convenient way to check projects under Linux is to trace compiler runs. This is a special mode in which the analyzer collects all the necessary information about the build so that you can then start the analysis with one click. All I had to do was:

1) Download and install PVS-Studio;

2) Launch the build tracking by going to the folder with Embox and typing in the terminal

pvs-studio-analyzer analyze -- make

3) After waiting for the build to complete, run the command:

pvs-studio-analyzer analyze -o /path/to/output.log

4) Convert the raw report to any convenient format The analyzer comes with a special utility PlogConverter, with which you can do this. For example, the command to convert the report to task list (for viewing, for example, in QtCreator) will look like this:

plog-converter -t tasklist -o /path/to/output.tasks /path/to/project

And that's it! It took me no more than 15 minutes to complete these steps. The report is ready, now you can view the errors. So let's get going!

Strange loop

One of the errors found by the analyzer was the strange while loop:

int main(int argc, char **argv) {
  ....

  while (dp.skip != 0 ) {
    n_read = read(ifd, tbuf, dp.bs);
    if (n_read < 0) {
      err = -errno;
      goto out_cmd;
    }
    if (n_read == 0) {
      goto out_cmd;
    }

    dp.skip --;
  } while (dp.skip != 0);       // <=

  do {
    n_read = read(ifd, tbuf, dp.bs);
    if (n_read < 0) {
      err = -errno;
      break;
    }

    if (n_read == 0) {
      break;
    }

    ....

    dp.count --;
  } while (dp.count != 0);
  ....
}

PVS-Studio warning: V715 The 'while' operator has empty body. Suspicious pattern detected: 'while (expr) {...} while (dp.skip != 0) ;'. dd.c 225

Hm. A weird loop indeed. The expression while (dp.skip != 0) is written twice, once right above the loop, and the second time - just below it. In fact, now these are two different loops: one contains expressions in curly braces, and the second one is empty. In this case, the second loop will never be executed.

Below is a do ... while loop with a similar condition, which leads me to think: the strange loop was originally meant as do ... while, but something went wrong. I think, this piece of code most likely contains a logical error.

Memory leaks

Yes, they also sneaked in a plug.

int krename(const char *oldpath, const char *newpath) {
  
  char *newpatharg, *oldpatharg;

  ....

  oldpatharg =
    calloc(strlen(oldpath) + diritemlen + 2, sizeof(char));
  newpatharg =
    calloc(strlen(newpath) + diritemlen + 2, sizeof(char));
  if (NULL == oldpatharg || NULL == newpatharg) {
    SET_ERRNO(ENOMEM);
    return -1;
  }

  ....
}

PVS-Studio warnings:

  • V773 The function was exited without releasing the 'newpatharg' pointer. A memory leak is possible. kfsop.c 611
  • V773 The function was exited without releasing the 'oldpatharg' pointer. A memory leak is possible. kfsop.c 611

The function creates the local variables newpatharg and oldpatharg inside itself. These pointers are assigned the addresses of new memory locations allocated internally using calloc. If a problem occurs while allocating memory, calloc returns a null pointer.

What if only one block of memory can be allocated? The function will crash without any memory being freed. The fragment that happened to be allocated will remain in memory without any opportunity to access it again and free it for further use.

Another example of a memory leak, a more illustrative one:

static int block_dev_test(....) {
  int8_t *read_buf, *write_buf;
  
  ....

  read_buf = malloc(blk_sz * m_blocks);
  write_buf = malloc(blk_sz * m_blocks);

  if (read_buf == NULL || write_buf == NULL) {
    printf("Failed to allocate memory for buffer!\n");

    if (read_buf != NULL) {
      free(read_buf);
    }

    if (write_buf != NULL) {
      free(write_buf);
    }

    return -ENOMEM;
  }

  if (s_block >= blocks) {
    printf("Starting block should be less than number of blocks\n");
    return -EINVAL;            // <=
  }

  ....
}

PVS-Studio warnings:

  • V773 The function was exited without releasing the 'read_buf' pointer. A memory leak is possible. block_dev_test.c 195
  • V773 The function was exited without releasing the 'write_buf' pointer. A memory leak is possible. block_dev_test.c 195

Here the programmer has shown neatness and correctly processed the case in which only one piece of memory was allocated. Processed correctly ... and literally in the next expression made another mistake.

Thanks to a correctly written check, we can be sure that at the time the return -EINVAL expression is executed, we will definitely have memory allocated for both read_buf and write_buf. Thus, with such a return from the function, we will have two leaks at once.

I think that getting a memory leak on an embedded device can be more painful than on a classic PC. In conditions when resources are severely limited, you need to monitor them especially carefully.

Pointers mishandling

The following erroneous code is concise and simple enough:

static int scsi_write(struct block_dev *bdev, char *buffer,
size_t count, blkno_t blkno) {
  struct scsi_dev *sdev;
  int blksize;

  ....

  sdev = bdev->privdata;
  blksize = sdev->blk_size; // <=

  if (!sdev) {              // <=
    return -ENODEV;
  }

  ....
}

PVS-Studio warning: V595 The 'sdev' pointer was utilized before it was verified against nullptr. Check lines: 116, 118. scsi_disk.c 116

The sdev pointer is dereferenced just before it is checked for NULL. It is logical to assume that if someone wrote such a check, then this pointer may be null. In this case, we have the potential dereferencing of the null pointer in the line blksize = sdev->blk_size.

The error is that the check is not located where it is needed. It should have come after the line"sdev = bdev->privdata;", but before the line "blksize = sdev->blk_size;". Then potential accessing by the null address could be avoided.

PVS-Studio found two more errors in the following code:

void xdrrec_create(....)
{
  char *buff;

  ....

  buff = (char *)malloc(sendsz + recvsz);
  assert(buff != NULL);

  ....

  xs->extra.rec.in_base = xs->extra.rec.in_curr = buff;
  xs->extra.rec.in_boundry 
    = xs->extra.rec.in_base + recvsz;                    // <=

  ....
  xs->extra.rec.out_base
    = xs->extra.rec.out_hdr = buff + recvsz;             // <= 
  xs->extra.rec.out_curr 
    = xs->extra.rec.out_hdr + sizeof(union xdrrec_hdr);

  ....
}

PVS-Studio warnings:

  • V769 The 'xs->extra.rec.in_base' pointer in the 'xs->extra.rec.in_base + recvsz' expression could be nullptr. In such case, resulting value will be senseless and it should not be used. Check lines: 56, 48. xdr_rec.c 56
  • V769 The 'buff' pointer in the 'buff + recvsz' expression could be nullptr. In such case, resulting value will be senseless and it should not be used. Check lines: 61, 48. xdr_rec.c 61

The buf pointer is initialized with malloc, and then its value is used to initialize other pointers. The malloc function can return a null pointer, and this should always be checked. One would think, that there is the assert checking buf for NULL, and everything should work fine.

But not so fast! The fact is that asserts are used for debugging, and when building the project in the Release configuration, this assert will be deleted. It turns out that when working in Debug, the program will work correctly, and when building in Release, the null pointer will get further.

Using NULL in arithmetic operations is incorrect, because the result of such an operation will not make any sense, and you can't use such a result. This is what the analyzer warns us about.

Someone may object that the absence of the check after malloc/realloc/calloc is not crucial. Meaning that, at the first access by a null pointer, a signal / exception will occur and nothing scary will happen. In practice, everything is much more complicated. If the lack of the check does not seem dangerous to you, I suggest that you check out the article "Why it is important to check what the malloc function returned".

Incorrect handling of arrays

The following error is very similar to the example before last:

int fat_read_filename(struct fat_file_info *fi,
                      void *p_scratch,
                      char *name) {
  int offt = 1;

  ....

  offt = strlen(name);
  while (name[offt - 1] == ' ' && offt > 0) { // <=
    name[--offt] = '\0';
  }
  log_debug("name(%s)", name);

  return DFS_OK;
}

PVS-Studio warning: V781 The value of the 'offt' index is checked after it was used. Perhaps there is a mistake in program logic. fat_common.c 1813

The offt variable is first used inside the indexing operation, and only then it is checked that its value is greater than zero. But what happens if name turns out to be an empty string? The strlen() function will return 0, followed by epic shooting yourself in the foot. The program will access by a negative index, which will lead to undefined behavior. Anything can happen, including a program crash. Not good at all!

Suspicious conditions

Just can't do without them! We find such errors literally in every project that we check.

int index_descriptor_cloexec_set(int fd, int cloexec) {
  struct idesc_table *it;

  it = task_resource_idesc_table(task_self());
  assert(it);

  if (cloexec | FD_CLOEXEC) {
    idesc_cloexec_set(it->idesc_table[fd]);
  } else {
    idesc_cloexec_clear(it->idesc_table[fd]);
  }
  return 0;
}

PVS-Studio warning: V617 Consider inspecting the condition. The '0x0010' argument of the '|' bitwise operation contains a non-zero value. index_descriptor.c 55

In order to get where the error hides, let's look at the definition of the FD_CLOEXEC constant:

#define FD_CLOEXEC 0x0010

It turns out that there is always a nonzero constant in the expression if (cloexec | FD_CLOEXEC) to the right of the bitwise "or". The result of such an operation will always be a nonzero number. Thus, this expression will always be equivalent to the if(true) expression, and we will always process only the then branch of the if statement.

I suspect that this macro constant is used to pre-configure the Embox OS, but even if so this always true condition looks strange. Perhaps authors wanted to use the & operator, but made a typo.

Integer division

The following error relates to one feature of the C language:

#define SBSIZE    1024

static int ext2fs_format(struct block_dev *bdev, void *priv) {
  size_t dev_bsize;
  float dev_factor;

  ....

  dev_size = block_dev_size(bdev);
  dev_bsize = block_dev_block_size(bdev);
  dev_factor = SBSIZE / dev_bsize;            // <=

  ext2_dflt_sb(&sb, dev_size, dev_factor);
  ext2_dflt_gd(&sb, &gd);

  ....
}

PVS-Studio warning: V636 The '1024 / dev_bsize' expression was implicitly cast from 'int' type to 'float' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. ext2.c 777

This feature is as follows: if we divide two integer values, then the result of the division will be integer as well. Thus, division will occur without a remainder, or, in other words, the fractional part will be discarded from the division result.

Sometimes programmers forget about it, and errors like this come out. The SBSIZE constant and the dev_bsize variable are of the integer type (int and size_t, respectively). Therefore, the result of the SBSIZE / dev_bsize expression will also be of the integer type.

But hold on. The dev_factor variable is of the float type! Obviously, the programmer expected to get a fractional division result. This can be further verified if you pay attention to the further use of this variable. For example, the ext2_dflt_sb function, where dev_factor is passed as the third parameter, has the following signature:

static void ext2_dflt_sb(struct ext2sb *sb, size_t dev_size, float dev_factor);

Similarly, in other places where the dev_factor variable is used: everything indicates that a floating-point number is expected.

To correct this error, one just has to cast one of the division operands to the floating-point type. For example:

dev_factor = float(SBSIZE) / dev_bsize;

Then the result of the division will be a fractional number.

Unchecked input data

The following error is related to the use of unchecked data received from outside of the program.

int main(int argc, char **argv) {
  int ret;
  char text[SMTP_TEXT_LEN + 1];

  ....

  if (NULL == fgets(&text[0], sizeof text - 2, /* for \r\n */
      stdin)) { ret = -EIO; goto error; }
    text[strlen(&text[0]) - 1] = '\0'; /* remove \n */    // <=

  ....
}

PVS-Studio warning: V1010 Unchecked tainted data is used in index: 'strlen(& text[0])'. sendmail.c 102

Let's start with considering what exactly the fgets function returns. In case of successful reading of a string, the function returns a pointer to this string. In case if end-of-file is read before at least one element, or an input error occurs, the fgets function returns NULL.

Thus, the expression NULL == fgets(....) checks if the input received is correct. But there is one detail. If you pass a null terminal as the first character to be read (this can be done, for example, by pressing Ctrl + 2 in the Legacy mode of the Windows command line), the fgets function takes it into account without returning NULL. In doing so, there will be only one element in the string supposed for writing which is \0'.

What will happen next? The expression strlen(&text[0]) will return 0. As a result, we get a call by a negative index:

text[ 0 - 1 ] = '\0';

As a result, we can crash the program by simply passing the line termination character to the input. It is rather sloppy and it could potentially be used to attack systems that are using Embox.

My colleague who was developing this diagnostic rule even made a recording of an example of such an attack on the NcFTP project:

I recommend checking out if you still do not believe that it might happen :)

The analyzer also found two more places with the same error:

  • V1010 Unchecked tainted data is used in index: 'strlen(& from[0])'. sendmail.c 55
  • V1010 Unchecked tainted data is used in index: 'strlen(& to[0])'. sendmail.c 65

MISRA

MISRA is a set of guidelines and rules for writing secure C and C++ code for highly dependable embedded systems. In some way, this is a set of guidelines, following which you will be able to get rid of so called "code smells" and also protect your program from vulnerabilities.

MISRA is used where human lives depend on the quality of your embedded system: in the medical, automotive, aircraft and military industries.

PVS-Studio has an extensive set of diagnostic rules that allow you to check your code for compliance with MISRA C and MISRA C++ standards. By default, the mode with these diagnostics is turned off, but since we are looking for errors in a project for embedded systems, I simply could not do without MISRA.

Here is what I managed to find:

/* find and read symlink file */
static int ext2_read_symlink(struct nas *nas,
                             uint32_t parent_inumber,
                             const char **cp) {
  char namebuf[MAXPATHLEN + 1];

  ....

  *cp = namebuf;              // <=
  if (*namebuf != '/') {
    inumber = parent_inumber;
  } else {
    inumber = (uint32_t) EXT2_ROOTINO;
  }
  rc = ext2_read_inode(nas, inumber);

  return rc;
}

PVS-Studio warning: V2548 [MISRA C 18.6] Address of the local array 'namebuf' should not be stored outside the scope of this array. ext2.c 298

The analyzer detected a suspicious assignment that could potentially lead to undefined behavior.

Let's take a closer look at the code. Here, namebuf is an array created in the local scope of the function, and the cp pointer is passed to the function by pointer.

According to C syntax, the name of the array is a pointer to the first element in the memory area in which the array is stored. It turns out that the expression *cp = namebuf will assign the address of the array namebuf to the variable pointed by cp. Since cp is passed to the function by pointer, a change in the value that it points to will affect the place where the function was called.

It turns out that after the ext2_read_symlink function completes its work, its third parameter will indicate the area that the namebuf array once occupied.

There is only one slight hitch: since namebuf is an array reserved on the stack, it will be deleted when the function exits. Thus, a pointer that exists outside the function will point to the freed part of memory.

What will be at that address? No one can say for sure. It is possible that for some time the contents of the array will continue to be in memory, or it is possible that the program will immediately replace this area with something else. In general, accessing such an address will return an undefined value, and using such a value is a gross error.

The analyzer also found another error with the same warning:

  • V2548 [MISRA C 18.6] Address of the local variable 'dst_haddr' should not be stored outside the scope of this variable. net_tx.c 82

Conclusion

I liked working with the Embox project. Despite the fact that I did not cite all the found errors in the article, the total number of warnings was relatively small, and in general, the project code is of high quality. Therefore, I express my gratitude to the developers, as well as to those who contributed to the project on behalf of the community. You did great!

On this occasion, let me send my best to the developers. Hope that it's not very cold in St. Petersburg right now :)

At this point, my article comes to an end. I hope you enjoyed reading it, and you found something new for yourself.

If you are interested in PVS-Studio and would like to independently check a project using it, download and try it. This will take no more than 15 minutes.