>
>
Checking the code of Zephyr operating s…

Andrey Karpov
Articles: 674

Checking the code of Zephyr operating system

Some time ago we announced PVS-Studio's new feature that enabled it to integrate into PlatformIO. Naturally, our team kept in touch with the PlatformIO team while working on that feature, and they suggested that we check the real-time operating system Zephyr to see if we could find any interesting bugs in its code. We thought it was a good idea, and so here's this article about the check results.

PlatformIO

Before proceeding with the main topic of this article, I'd like to mention PlatformIO to the developers of embedded systems - it can make their life a bit easier. PlatformIO is a cross-platform tool for microcontroller programming. The core of PlatformIO is a command-line tool, however it is recommended to use it as a plugin for Visual Studio Code. It supports a large number of modern microchips, and boards based on them. It can automatically download suitable build systems. The site has a large collection of libraries for managing plug-in electronic components. There is support for several static code analyzers, including PVS-Studio.

PVS-Studio

PVS-Studio is not much known in the world of embedded systems yet, so here's a brief overview of our tool in case you haven't heard of it. Our regular readers may skip over to the next section.

PVS-Studio is a static code analyzer that can detect bugs and potential vulnerabilities in the code of programs written in C, C++, C#, and Java. As for C and C++, the following compilers are supported:

  • Windows. Visual Studio 2010-2019 C, C++, C++/CLI, C++/CX (WinRT)
  • Windows. IAR Embedded Workbench, C/C++ Compiler for ARM C, C++
  • Windows. QNX Momentics, QCC C, C++
  • Windows/Linux. Keil µVision, DS-MDK, ARM Compiler 5/6 C, C++
  • Windows/Linux. Texas Instruments Code Composer Studio, ARM Code Generation Tools C, C++
  • Windows/Linux/macOS. GNU Arm Embedded Toolchain, Arm Embedded GCC compiler, C, C++
  • Windows/Linux/macOS. Clang C, C++
  • Linux/macOS. GCC C, C++
  • Windows. MinGW C, C++

The analyzer uses its own warning classification system, but you can also have warnings issued according to the coding standards CWE, SEI CERT, and MISRA.

PVS-Studio can be quickly adopted and put to regular use even in a large legacy project. This is achieved thanks to a special mechanism of mass warning suppression. It makes the analyzer treat all existing warnings as technical debt and hide them, thus allowing you to focus on the warnings produced only on newly written or modified code. This enables the team to start using the analyzer in their everyday work, getting back every now and then to address the technical debt and gradually eliminate it.

PVS-Studio allows many other use scenarios. For instance, you can run it as a plugin for SonarQube. It can also integrate with such systems as Travis CI, CircleCI, GitLab CI/CD, and so on. A detailed description of PVS-Studio is outside the scope of this article, so please refer to the following article, which offers many useful links and answers to many questions: "Why you should choose the PVS-Studio static analyzer to integrate into your development process".

Zephyr

While working on PVS-Studio's integration into PlatformIO, we kept in touch with the PlatformIO team, and they suggested that we check Zephyr, a project from the embedded software world. We liked the idea, and that's how this article appeared.

Zephyr is a small real-time operating system for connected, resource-constrained and embedded devices (with an emphasis on microcontrollers) supporting multiple architectures and released under the Apache License 2.0. Supported platforms: ARM (Cortex-M0, Cortex-M3, Cortex-M4, Cortex-M23, Cortex-M33, Cortex-R4, Cortex-R5, Cortex-A53), x86, x86-64, ARC, RISC-V, Nios II, Xtensa.

Here are some of its distinguishing features:

  • Single address space. Combines application-specific code with a custom kernel to create a monolithic image that gets loaded and executed on a system's hardware.
  • Highly configurable / Modular for flexibility. Allows an application to incorporate only the capabilities it needs as it needs them, and to specify their quantity and size.
  • Compile-time resource definition. Reduces code size and increases performance for resource-limited systems.
  • Minimal error checking. The same as the previous one, with complete debugging information provided during testing if needed.
  • A number of services for development: multi-threading, interrupt, inter-thread synchronization, memory allocation, power management, and many other services.

One of the curious things about Zephyr is that Synopsys is involved into its development. In 2014, Synopsys bought Coverity, the creator of a static analyzer of the same name.

Hence it's only natural that Zephyr is being checked with Coverity from the very beginning. This tool is a leader among analyzers, which helped guarantee the high quality of Zephyr's source code.

The quality of Zephyr's code

If you ask me, Zephyr is a high-quality project. These are the reasons why I think so:

  • PVS-Studio produced 122 general-purpose warnings of the High level and 367 warnings of the Medium level. That's not much, considering the total number of C/C++ files checked – 560. The kernel is checked through samples checking. The total estimate I got for the project was 7810 C/C++ files and 10075 header files, which means the check covered only part of the project. But then again, I didn't aim at checking the entire code base, and the number of warnings I got still corresponds to a low warning density.
  • Many of the warnings turned out to be false positives or "semi-false positives". What I mean by the latter is explained below.
  • I used the SourceMonitor utility to scan Zephyr's source code, and according to its statistics, comments make 48% of the code. That's quite a bit, and, as my practice proves, it means the developers really care about their code's quality and readability.
  • The project is checked with the Coverity static analyzer. This must explain why PVS-Studio – while having found some bugs – still hasn't performed as impressively as it sometimes does when analyzing other projects.

Taking all this into account, I conclude that the project's authors care about their code's quality and reliability. Now let's look at some of the warnings issued by the PVS-Studio analyzer (version 7.06).

"Semi-false" warnings

Since the project's code deals with low-level functionality, it's written in a specific way and uses a lot of conditional compilation (#ifdef). This leads to a large number of warnings that don't point at genuine bugs yet can't be viewed as outright false. This can be best explained with examples.

"Semi-false" positives: example 1

static struct char_framebuffer char_fb;

int cfb_framebuffer_invert(struct device *dev)
{
  struct char_framebuffer *fb = &char_fb;

  if (!fb || !fb->buf) {
    return -1;
  }

  fb->inverted = !fb->inverted;

  return 0;
}

PVS-Studio diagnostic message: V560 A part of conditional expression is always false: !fb. cfb.c 188

Obtaining the address of a static variable always yields a non-null pointer, so the fb pointer is never equal to zero and, therefore, the check is not needed.

Yet it's obviously not a bug but simply a redundant, harmless check. Besides, the compiler will optimize it away when building a Release version, so this check wouldn't even cause any slowdown.

That's what I call "semi-false" positives. Technically, the analyzer is totally correct in pointing out that redundant check, and it's better to remove it. On the other hand, minor issues like that are too petty and plain even to mention here.

"Semi-false" positives: example 2

int hex2char(u8_t x, char *c)
{
  if (x <= 9) {
    *c = x + '0';
  } else if (x >= 10 && x <= 15) {
    *c = x - 10 + 'a';
  } else {
    return -EINVAL;
  }
  return 0;
}

PVS-Studio diagnostic message: V560 A part of conditional expression is always true: x >= 10. hex.c 31

Again, the analyzer is technically correct by pointing out an always-true conditional subexpression. If the x variable is not less than or equal to 9, then it naturally will be always greater than or equal to 10. So the code can be simplified:

} else if (x <= 15) {

Again, the redundant check is not a true, harmful bug but just a decoration.

"Semi-false" positives: example 3, more complicated case

First let's look at the possible implementations of the CHECKIF macro:

#if defined(CONFIG_ASSERT_ON_ERRORS)
#define CHECKIF(expr) \
  __ASSERT_NO_MSG(!(expr));   \
  if (0)
#elif defined(CONFIG_NO_RUNTIME_CHECKS)
#define CHECKIF(...) \
  if (0)
#else
#define CHECKIF(expr) \
  if (expr)
#endif

Depending on the compilation mode, the check will be either executed or skipped. In our case, when analyzing the project with PVS-Studio, the following implementation was selected:

#define CHECKIF(expr) \
  if (expr)

Let's see where we get from here.

int k_queue_append_list(struct k_queue *queue, void *head, void *tail)
{
  CHECKIF(head == NULL || tail == NULL) {
    return -EINVAL;
  }

  k_spinlock_key_t key = k_spin_lock(&queue->lock);
  struct k_thread *thread = NULL;
  if (head != NULL) {
    thread = z_unpend_first_thread(&queue->wait_q);
  }
  ....
}

PVS-Studio diagnostic message: V547 [CWE-571] Expression 'head != NULL' is always true. queue.c 244

The analyzer believes the (head != NULL) check is always true. That's correct. If the head pointer is equal to NULL, the check at the beginning of the function will have it exit sooner:

CHECKIF(head == NULL || tail == NULL) {
  return -EINVAL;
}

As a reminder, this is what the macro expands into in this implementation:

if (head == NULL || tail == NULL) {
  return -EINVAL;
}

So again, PVS-Studio is technically right and its warning is to the point. But you can't just remove the check because it's still needed. Should the other scenario be selected, the macro would expand as follows:

if (0) {
  return -EINVAL;
}

Now you want the redundant check. Sure, the analyzer wouldn't produce the warning in that case, but it does in this one, where we deal with the Debug version.

I hope it's clear now where "semi-false" positives come from. They aren't a problem, though. PVS-Studio provides a handful of false warning suppression mechanisms, which are described in detail in the documentation.

Relevant warnings

Were there any interesting warnings then? Yes, there were, and we are going to take a look at some bugs of different types. But I'd like to make two statements first:

  • Static analysis is not about one-time checks like this. The correct use strategy is to regularly run the analyzer on the project, which is actually exactly how Coverity is used in the development of Zephyr. And that's how adopting PVS-Studio or any other analyzer allows you to detect even more bugs and thus make them cheaper to fix at earlier stages.
  • While writing this article, I didn't aim at finding as many bugs as possible and I could have well missed many of the bugs or erroneously discarded them as "semi-false" positives. If Zephyr's authors are reading this, I recommend that you check the project and study the analysis report on your own. Since the project is open-source and available on GitHub, you can use a free PVS-Studio licensing option.

Fragment 1, typo

static void gen_prov_ack(struct prov_rx *rx, struct net_buf_simple *buf)
{
  ....
  if (link.tx.cb && link.tx.cb) {
    link.tx.cb(0, link.tx.cb_data);
  }
  ....
}

PVS-Studio diagnostic message: V501 [CWE-571] There are identical sub-expressions to the left and to the right of the '&&' operator: link.tx.cb && link.tx.cb pb_adv.c 377

The link.tx.cb variable is checked twice. This must be a typo, with link.tx.cb_data being the second variable to be checked instead.

Fragment 2, buffer overflow

Let's take a look at the net_hostname_get function, which will be used further.

#if defined(CONFIG_NET_HOSTNAME_ENABLE)
const char *net_hostname_get(void);
#else
static inline const char *net_hostname_get(void)
{
  return "zephyr";
}
#endif

In my case, the #else branch implementation was selected at the preprocessing stage, i.e. the preprocessed file will contain the following implementation of the function:

static inline const char *net_hostname_get(void)
{
  return "zephyr";
}

The function returns a pointer to an array of 7 bytes (including the terminating null character at the end of the string).

Now, here's the code where the overflow occurs.

static int do_net_init(void)
{
  ....
  (void)memcpy(hostname, net_hostname_get(), MAX_HOSTNAME_LEN);
  ....
}

PVS-Studio diagnostic message: V512 [CWE-119] A call of the 'memcpy' function will lead to the 'net_hostname_get()' buffer becoming out of range. log_backend_net.c 114

After the preprocessing, MAX_HOSTNAME_LEN expands as follows:

(void)memcpy(hostname, net_hostname_get(),
    sizeof("xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx"));

Therefore, when copying the data, the program will end up accessing memory beyond the string literal's bounds. How exactly it's going to affect the execution is hard to tell because this is undefined behavior.

Fragment 3, potential buffer overflow

int do_write_op_json(struct lwm2m_message *msg)
{
  u8_t value[TOKEN_BUF_LEN];
  u8_t base_name[MAX_RESOURCE_LEN];
  u8_t full_name[MAX_RESOURCE_LEN];
  ....
  /* combine base_name + name */
  snprintf(full_name, TOKEN_BUF_LEN, "%s%s", base_name, value);
  ....
}

PVS-Studio diagnostic message: V512 [CWE-119] A call of the 'snprintf' function will lead to overflow of the buffer 'full_name'. lwm2m_rw_json.c 826

Substituting the macros' values leads us to the following:

u8_t value[64];
u8_t base_name[20];
u8_t full_name[20];
....
snprintf(full_name, 64, "%s%s", base_name, value);

Only 20 bytes are allocated for the full_name buffer, which is where the string is formed, while the parts that the string is formed from are stored in two buffers 20 and 64 bytes long. In addition, the constant 64 passed to the snprintf function and intended to prevent the overflow is obviously a bit too large!

This code won't necessarily end up with a buffer overflow. Perhaps the developers always get away with it because the substrings are always too short. But overall, this code has no protection against an overflow, which is a classic security weakness CWE-119.

Fragment 4, expression is always true

static int keys_set(const char *name, size_t len_rd, settings_read_cb read_cb,
                    void *cb_arg)
{
  ....
  size_t len;
  ....
  len = read_cb(cb_arg, val, sizeof(val));
  if (len < 0) {
    BT_ERR("Failed to read value (err %zu)", len);
    return -EINVAL;
  }
  ....
}

PVS-Studio diagnostic message: V547 [CWE-570] Expression 'len < 0' is always false. Unsigned type value is never < 0. keys.c 312

The len variable is unsigned, which means it can't be less than 0. Therefore, the error state isn't handled in any way. Elsewhere, the value returned by the read_cb function is stored in a variable of type int or ssize_t. For example:

static inline int mesh_x_set(....)
{
 ssize_t len;
 len = read_cb(cb_arg, out, read_len);
 if (len < 0) {
 ....
}

Note. Actually, the read_cb function doesn't look fine at all. Look at its declaration:

static u8_t read_cb(const struct bt_gatt_attr *attr, void *user_data)

The type u8_t is actually unsigned char.

The function always returns only positive values of type unsigned char. If we store such a value into a signed variable of type int or ssize_t, it will still be a positive value. It means error state checks don't work in other cases either. But I didn't dig too deep into this issue.

Fragment 5, something weird

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    ((u8_t *)mntpt)[strlen(mntpt)] = '\0';
    memcpy(cpy_mntpt, mntpt, strlen(mntpt));
  }
  return cpy_mntpt;
}

PVS-Studio diagnostic message: V575 [CWE-628] The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null. shell.c 427

The developer was trying to make a function similar to strdup but failed.

The warning says the memcpy function copies a string but fails to copy the terminating null character, which is a very strange behavior.

You may think the copying of the terminating null takes place in the following line:

((u8_t *)mntpt)[strlen(mntpt)] = '\0';

But that's wrong! It's a typo that causes the terminating null to get copied into itself! Note that the target array is mntpt, not cpy_mntpt. As a result, the mntpt_prepare function returns a non-terminated string.

This is what should be written instead:

((u8_t *)cpy_mntpt)[strlen(mntpt)] = '\0';

But I still can't see the reason for such a complicated implementation! This code can be reduced to the following:

static char *mntpt_prepare(char *mntpt)
{
  char *cpy_mntpt;

  cpy_mntpt = k_malloc(strlen(mntpt) + 1);
  if (cpy_mntpt) {
    strcpy(cpy_mntpt, mntpt);
  }
  return cpy_mntpt;
}

Fragment 6, pointer dereferencing before check

int bt_mesh_model_publish(struct bt_mesh_model *model)
{
  ....
  struct bt_mesh_model_pub *pub = model->pub;
  ....
  struct bt_mesh_msg_ctx ctx = {
    .send_rel = pub->send_rel,
  };
  ....
  if (!pub) {
    return -ENOTSUP;
  }
  ....
}

PVS-Studio diagnostic message: V595 [CWE-476] The 'pub' pointer was utilized before it was verified against nullptr. Check lines: 708, 719. access.c 708

This is a very common bug pattern. The pointer is first dereferenced to initialize a struct member:

.send_rel = pub->send_rel,

And only then is it checked for null.

Fragments 7-9, pointer dereferencing before check

int net_tcp_accept(struct net_context *context, net_tcp_accept_cb_t cb,
                   void *user_data)
{
  ....
  struct tcp *conn = context->tcp;
  ....
  conn->accept_cb = cb;

  if (!conn || conn->state != TCP_LISTEN) {
    return -EINVAL;
  }
  ....
}

PVS-Studio diagnostic message: V595 [CWE-476] The 'conn' pointer was utilized before it was verified against nullptr. Check lines: 1071, 1073. tcp2.c 1071

This case is the same as the previous one. No comments needed.

Two more errors like that:

  • V595 [CWE-476] The 'context->tcp' pointer was utilized before it was verified against nullptr. Check lines: 1512, 1518. tcp.c 1512
  • V595 [CWE-476] The 'fsm' pointer was utilized before it was verified against nullptr. Check lines: 365, 382. fsm.c 365

Fragment 10, incorrect check

static int x509_get_subject_alt_name( unsigned char **p,
                                      const unsigned char *end,
                                      mbedtls_x509_sequence *subject_alt_name)
{
  ....
    while( *p < end )
    {
        if( ( end - *p ) < 1 )
            return( MBEDTLS_ERR_X509_INVALID_EXTENSIONS +
                    MBEDTLS_ERR_ASN1_OUT_OF_DATA );
    ....
  }
  ....
}

PVS-Studio diagnostic message: V547 [CWE-570] Expression '(end - * p) < 1' is always false. x509_crt.c 635

Look closely at these conditions:

  • *p < end
  • (end - *p) < 1

They are mutually opposite.

If (*p < end), then (end - *p) will always yield the value 1 or larger. Something is wrong with this code, but I have no idea how it should be fixed.

Fragment 11, unreachable code

uint32_t lv_disp_get_inactive_time(const lv_disp_t * disp)
{
    if(!disp) disp = lv_disp_get_default();
    if(!disp) {
        LV_LOG_WARN("lv_disp_get_inactive_time: no display registered");
        return 0;
    }

    if(disp) return lv_tick_elaps(disp->last_activity_time);

    lv_disp_t * d;
    uint32_t t = UINT32_MAX;
    d          = lv_disp_get_next(NULL);
    while(d) {
        t = LV_MATH_MIN(t, lv_tick_elaps(d->last_activity_time));
        d = lv_disp_get_next(d);
    }

    return t;
}

PVS-Studio diagnostic message: V547 [CWE-571] Expression 'disp' is always true. lv_disp.c 148

The function returns if disp is a null pointer. This is followed by an opposite check – whether the disp pointer is non-null (which is always true) – and the function returns all the same.

Because of this logic, part of the code in the function's body will never get control.

Fragment 12, strange return value

static size_t put_end_tlv(struct lwm2m_output_context *out, u16_t mark_pos,
        u8_t *writer_flags, u8_t writer_flag,
        int tlv_type, int tlv_id)
{
  struct tlv_out_formatter_data *fd;
  struct oma_tlv tlv;
  u32_t len = 0U;

  fd = engine_get_out_user_data(out);
  if (!fd) {
    return 0;
  }

  *writer_flags &= ~writer_flag;

  len = out->out_cpkt->offset - mark_pos;

  /* use stored location */
  fd->mark_pos = mark_pos;

  /* set instance length */
  tlv_setup(&tlv, tlv_type, tlv_id, len);
  len = oma_tlv_put(&tlv, out, NULL, true) - tlv.length;
  return 0;
}

PVS-Studio diagnostic message: V1001 The 'len' variable is assigned but is not used by the end of the function. lwm2m_rw_oma_tlv.c 338

The function has two return statements both of which return 0. It's strange for a function to return 0 in any case. And it's strange that the len variable is never used after it has been assigned a value. I strongly suspect that the programmer actually meant to write the following code:

  len = oma_tlv_put(&tlv, out, NULL, true) - tlv.length;
  return len;
}

Fragments 13-16, synchronization error

static int nvs_startup(struct nvs_fs *fs)
{
  ....
  k_mutex_lock(&fs->nvs_lock, K_FOREVER);
  ....
  if (fs->ate_wra == fs->data_wra && last_ate.len) {
    return -ESPIPE;
  }
  ....
end:
  k_mutex_unlock(&fs->nvs_lock);
  return rc;
}

PVS-Studio diagnostic message: V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 620, 549. nvs.c 620

The function may return without unlocking the mutex. As far as I understand, the following should be written instead:

static int nvs_startup(struct nvs_fs *fs)
{
  ....
  k_mutex_lock(&fs->nvs_lock, K_FOREVER);
  ....
  if (fs->ate_wra == fs->data_wra && last_ate.len) {
    rc = -ESPIPE;
    goto end;
  }
  ....
end:
  k_mutex_unlock(&fs->nvs_lock);
  return rc;
}

Three more bugs of this type:

  • V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 574, 549. nvs.c 574
  • V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 908, 890. net_context.c 908
  • V1020 The function exited without calling the 'k_mutex_unlock' function. Check lines: 1194, 1189. shell.c 1194

Conclusion

I hope you enjoyed reading this article. Be sure to check our blog for more project checks and other interesting articles.

Use static analyzers to eliminate tons of bugs and potential vulnerabilities at the earlier coding stage. Early bug detection is especially crucial to embedded systems, where updates are expensive and time-consuming.

I also encourage you to go and check your own projects with PVS-Studio. For detailed information about how to do that see the article "How to quickly check out interesting warnings given by the PVS-Studio analyzer for C and C++ code?".