Yuri Minaev

Oct 23 2018

Tags:

#Cpp #Embedded

PVS-Studio Now Supports GNU Arm Embedded Toolchain

Oct 23 2018

Author: Yuri Minaev

Introduction
Mbed OS
Issues
Memory leaks
Memset
An unconditional 'return' operator in a loop
Mistakes in conditions
Ownerless expression
Conclusion

Embedded systems have been around for a long time. It is paramount that they should be stable and reliable, and fixing bugs in them is extremely costly. That's why embedded developers benefit greatly from regular use of specialized code quality control tools. This article is going to tell you about support for GNU Arm Embedded Toolchain in the PVS-Studio analyzer, and demonstrate some code issues found in the Mbed OS project.

Introduction

The PVS-Studio analyzer already supports several commercial compilers aimed to embedded systems, for example:

Now, another developer tool joins them - GNU Embedded Toolchain.

GNU Embedded Toolchain - is a compiler collection developed by the Arm company and based on GNU Compiler Collection. Officially released in 2012 for the first time, it has been evolving since together with GCC.

The main purpose of GNU Embedded Toolchain is generation of code targeted to bare metal, that is, code that is meant to work on the CPU directly, without an operating system. The package includes C and C++ compilers, an assembler, GNU Binutils, and the Newlib library. All the components are open-source; they are distributed under the GNU GPL license. You can download pre-built toolchain versions for Window's, Linus, and macOS from the official website.

Mbed OS

In order to test the analyzer, one needs a lot of source code. Usually, this is not an issue, but when dealing with embedded development, targeted primarily to IoT devices, finding enough big projects can be problematic. Fortunately, we were able to solve this issue by using specialized embedded operating systems, which are, in most cases, open-source. We'll talk about one of them further.

Although the main goal of this article is to tell you about GNU Embedded Toolchain support, it is difficult to say a lot on this topic. In addition, our readers are most certainly waiting to see some interesting bugs and errors, so let us not keep them waiting. Let's run the analyzer against the Mbed OS project instead. It is an open-source operating system, for which Arm takes part in development.

Official website: https://www.mbed.com/

Source code: https://github.com/ARMmbed/mbed-os

Mbed OS wasn't picked by accident, here's how its developers describe it:

Arm Mbed OS is an open source embedded operating system designed specifically for the "things" in the Internet of Things. It includes all the features you need to develop a connected product based on an Arm Cortex-M microcontroller, including security, connectivity, an RTOS and drivers for sensors and I/O devices.

It looks like a perfect project for GNU Embedded Toolchain, especially considering Arm's participation in its development. Now, I must tell you in advance that I didn't have a goal to find as many issues in a specific project as possible, so I'll describe the analysis results briefly.

Issues

The PVS-Studio run against Mbed OS's source code resulted in 693 warnings, 86 of which had the high priority. Many of them are far from being interesting, so I won't describe them all. For example, there was a lot of V547 warnings (Expression is always true/false) coming from similar code snippets. Of course, there's a way to tweak the analyzer in order to greatly reduce the number of false or simply uninteresting messages, but this wasn't related to my goal. If you'd like to see an example of such tweaking, refer to the Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives article.

For this article, I selected several interesting issues, just to demonstrate how the analyzer works.

Memory leaks

Let's begin with one class of errors often encountered in C and C++ - memory leaks.

Analyzer warning: V773 CWE-401 The function was exited without releasing the 'read_buf' pointer. A memory leak is possible. cfstore_test.c 565

int32_t cfstore_test_init_1(void)
{
   ....
  read_buf = (char*) malloc(max_len);
  if(read_buf == NULL) {
    CFSTORE_ERRLOG(....);
    return ret;
  }
  ....
  while(node->key_name != NULL)
  {
    ....
    ret = drv->Create(....);
    if(ret < ARM_DRIVER_OK){
      CFSTORE_ERRLOG(....);
      return ret;              // <=
    }
  ....
  free(read_buf);
  return ret;
}

This is a classic mistake related to dynamic memory manipulations. A buffer allocated with malloc is used only inside the function, and is released before the function exits. The problem is that this doesn't happen if the function returns prematurely. Also, pay attention to similar code in the two if blocks. It looks like the programmer copied the upper code fragment, and simply forgot to add a free call.

Here's another example similar to the previous one.

Analyzer warning: V773 CWE-401 The function was exited without releasing the 'interface' pointer. A memory leak is possible. nanostackemacinterface.cpp 204

nsapi_error_t Nanostack::add_ethernet_interface(
    EMAC &emac,
    bool default_if,
    Nanostack::EthernetInterface **interface_out,
    const uint8_t *mac_addr)
{
  ....
  Nanostack::EthernetInterface *interface;
  interface = new (nothrow) Nanostack::EthernetInterface(*single_phy);
  if (!interface) {
    return NSAPI_ERROR_NO_MEMORY;
  }

  nsapi_error_t err = interface->initialize();
  if (err) {
    return err;              // <=
  }

  *interface_out = interface;
  return NSAPI_ERROR_OK;
}

The pointer to allocated memory is returned via an out-parameter, but this doesn't happen if the call to the initialize method fails - in this case, a memory leak occurs because the interface local variable leaves its scope, and the pointer simply gets lost. A delete call should've been here, or at least, the address stored in the interface variable should've been returned in any case, so the caller could release the memory.

Memset

Using the memset function often means bugs. You can see examples of those in the article "The most dangerous function in the C/C++ world".

Let's check this warning out:

V575 CWE-628 The 'memset' function processes '0' elements. Inspect the third argument. mbed_error.c 282

mbed_error_status_t mbed_clear_all_errors(void)
{
    ....
    //Clear the error and context capturing buffer
    memset(&last_error_ctx, sizeof(mbed_error_ctx), 0);
    //reset error count to 0
    error_count = 0;
    ....
}

The intention here was to zero the memory occupied by the last_error_ctx structure, but the programmer placed the second and third arguments in a wrong order. As a result, exactly 0 bytes are getting filled with the sizeof(mbed_error_ctx) value.

Here's a similar warning which occurs a hundred lines above:

V575 CWE-628 The 'memset' function processes '0' elements. Inspect the third argument. mbed_error.c 123

An unconditional 'return' operator in a loop

Analyzer warning: V612 CWE-670 An unconditional 'return' within a loop. thread_network_data_storage.c 2348

bool thread_nd_service_anycast_address_mapping_from_network_data (
          thread_network_data_cache_entry_t *networkDataList,
          uint16_t *rlocAddress,
          uint8_t S_id)
{
  ns_list_foreach(thread_network_data_service_cache_entry_t,
                  curService, &networkDataList->service_list) {
    // Go through all services
    if (curService->S_id != S_id) {
      continue;
    }
    ns_list_foreach(thread_network_data_service_server_entry_t,
                    curServiceServer, &curService->server_list) {
      *rlocAddress = curServiceServer->router_id;
      return true;                     // <=
    }
  }
  return false;
}

In this code snippet, ns_list_foreach is a macro which expands to a for operator. The internal loop performs a single iteration at most because of a return call right after the line, which initializes the function's out-parameter. This code might work as planned, however, the internal loop looks quite weird in this context. Most likely, initialization of rlocAddress and a subsequent return should occur on some condition. It is also possible that the internal loop is redundant.

Mistakes in conditions

Like I said in the intro, there were many uninteresting V547's, so I checked them briefly. Only a couple of cases were worth looking at.

https://www.viva64.com/ru/w/v547/V547 CWE-570 Expression 'pcb->state == LISTEN' is always false. lwip_tcp.c 689

enum tcp_state {
  CLOSED      = 0,
  LISTEN      = 1,
  ....
};

struct tcp_pcb *
tcp_listen_with_backlog_and_err(struct tcp_pcb *pcb, u8_t backlog, err_t *err)
{
  ....
  LWIP_ERROR("tcp_listen: pcb already connected",
             pcb->state == CLOSED,
             res = ERR_CLSD; goto done);

  /* already listening? */
  if (pcb->state == LISTEN) {               // <=
    lpcb = (struct tcp_pcb_listen*)pcb;
    res = ERR_ALREADY;
    goto done;
  }
  ....
}

The analyzer thinks that the pcb->state == LISTEN condition is always false. Let's see why it does so.

Before the if operator, there's a call to LWIP_ERROR, which is a macro behaving in a way similar to assert. It is defined as follows:

#define LWIP_ERROR(message, expression, handler) do { if (!(expression)) { \
  LWIP_PLATFORM_ERROR(message); handler;}} while(0)

If the condition is false, the macro reports an error and executes whatever is passed to it via the handler argument. In the current code snippet, we have an unconditional goto.

This example checks the condition 'pcb->state == CLOSED', that is, a jump to the done label only occurs when pcb->state has any other value. The if operator after the LWIP_ERROR call checks whether pcb->state equals LISTEN - a condition which is never true because state in this line can only be equal to CLOSED.

One more warning related to conditions: V517 CWE-570 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 62, 65. libdhcpv6_server.c 62

static void libdhcpv6_address_generate(....)
{
  ....
  if (entry->linkType == DHCPV6_DUID_HARDWARE_EUI64_TYPE) // <=
  {
    memcpy(ptr, entry->linkId, 8);
   *ptr ^= 2;
  }
  else if (entry->linkType == DHCPV6_DUID_HARDWARE_EUI64_TYPE)// <=
  {
    *ptr++  = entry->linkId[0] ^ 2;
    *ptr++  = entry->linkId[1];
  ....
  }
}

Here, if and else if exactly check the same condition, which makes code inside the else if block unreachable. Bugs like this are often related to 'copy-paste' method of programming.

Ownerless expression

Let's take a look at a funny code snippet.

Analyzer warning: V607 Ownerless expression '& discover_response_tlv'. thread_discovery.c 562

static int thread_discovery_response_send(
                        thread_discovery_class_t *class,
                        thread_discovery_response_msg_t *msg_buffers)
{
  ....
  thread_extension_discover_response_tlv_write(
             &discover_response_tlv, class->version,
             linkConfiguration->securityPolicy);
  ....
}

Now, let's check the definition of the thread_extension_discover_response_tlv_write macro:

#define thread_extension_discover_response_tlv_write \
( data, version, extension_bit)\
(data)

The macro expands to its data argument, a call to it inside the thread_discovery_response_send function turns into the (&discover_response_tlv) expression after preprocessing.

I have no further comments. It is possible, that there's no mistake here, but such code always makes me look similar to the above picture :).

Conclusion

The list of compilers supported in PVS-Studio has been expanded. If you have a project meant to be built with GNU Arm Embedded Toolchain, I suggest that you try checking it with our analyzer. A demo version is available here. Also, note that we have a free license available, which will suite some small development projects quite well.

#Cpp #Embedded