Webinar: Evaluation - 05.12
We have long wanted to write an article with the idea that unit tests are cool. But we shouldn't forget that they can also contain errors. Recently we have checked the DPDK project, whose tests demonstrate this aspect very well. So let's see how typical errors in unit tests look like, and how static code analysis detects them.
Any programmer realizes that unit tests created to detect errors may themselves contain errors. Most often, these errors lead to the fact that tests do not actually perform the intended checks.
Basically, it's no secret, but this topic is not often discussed. There is no malicious intent or special silence, but there are two reasons why this happens.
Firstly, errors in unit tests are not so critical. Just because a test contains an error does not mean that there is an error in the code it is checking. A broken test can be compared to its absence. The lack of a test is bad, but it's not a tragedy either.
I'm not saying that errors in unit tests are not important at all. If there are no tests for an application code, a real bug may not be timely detected. However, you should agree that there is a very low probability that a terrible vulnerability is hiding in an application because of that very test that doesn't work. Most likely, after finding a bug in tests, a programmer will simply fix it and nothing will change. Such bugs are not for articles or conference talks.
Secondly, programmers don't really know how to detect such errors. App users don't report about them as they have nothing to do with unit tests. Manual testing is not the option either. And it's not very clear how exactly to test unit tests. People are not going to write unit tests for unit tests. And who will write unit tests for unit tests testing unit tests? :)
Still you can (and should) review unit tests code. But it is not a silver bullet. Especially since the code of unit tests is monotonous, it's hard to focus on checking it. One can sound a note of warning, but there is no reason for a big discussion.
Can we just ignore the problem of errors in unit tests? Some single errors are not a big deal. But in general we should definitely check unit tests for errors. The thing is, there are a lot of them out there. Due to the fact that tests are not checked, bugs remain alive and well in them. They build nests there :). So, it's still rational to do something to reduce their number.
Hopefully I made it clear why the topic is neglected, and why we still would like to have fewer errors in tests. So how do we reduce their number?
Dynamic and static code analyzers are good helpers in this case.
Dynamic code analysis. One can occasionally run unit-tests together with a dynamic code analyzer. Doing this on a regular basis is likely to be troublesome, as tests can run dozens of times slower in this mode. A compromise might be night or weekend checks.
These runs have two benefits. First, developers will be open to the idea of covering as many application code as possible while writing unit tests. That is, to look into as many branches of algorithms as possible. Accordingly, a large percentage of the code will be executed under the control of a dynamic analyzer, which will help identify errors there. Secondly, the analyzer will check the unit tests themselves.
But things are not that rosy, though. Most errors in unit tests are uncatchable for dynamic analyzers. Further, you will see from the examples that some errors do not violate something (they are not memory leaks or array overruns). Tests just don't test everything they're supposed to.
Static code analysis. Static analyzers complement unit tests very well. For example, they are good at finding bugs that are hidden in error handlers and bugs that are hard to get to in unit tests. It's a tricky task to write a test that simulates a connection failure when reading a file over the network and checks that the program correctly handles this situation.
Static analyzers are also great at finding errors in unit tests themselves. They study the tests much more closely than people do when reviewing the code. Unlike humans, this code is not boring or tiring for them :)
I've long wanted to demonstrate how static analysis helps improve the quality of unit tests. I planned to sit down, get thoughts together and collect examples from our last few articles, but somehow I couldn't get my hands on it. And here comes the good fortune – the DPDK project, that we've checked to write another note in our blog.
The Data Plane Development Kit (DPDK) is a set of open source libraries that accelerate packet processing by allowing networking hardware to communicate directly with applications, bypassing the Linux kernel.
Its unit tests turned out to have enough errors to write an article. Actually, it is the article you are reading right now. Check out examples of bugs in unit tests and how skillfully you can search for them using PVS-Studio.
Note. I'll cover other errors from the DPDK project that do not relate to tests in a separate article.
Not every function contains test in its name. So why did I decide that all the code below refers to tests? Simple answer: files with this code locate in the test* folders.
#define MAX_PKT_BURST (512)
static int
test_activebackup_rx_burst(void)
{
....
int i, j, burst_size = 17;
....
for (i = 0; i < test_params->bonding_member_count; i++) {
/* Generate test bursts of packets to transmit */
TEST_ASSERT_EQUAL(generate_test_burst(
&gen_pkt_burst[0], burst_size, 0, 1, 0, 0, 0),
burst_size, "burst generation failed");
....
/* free mbufs */
for (i = 0; i < MAX_PKT_BURST; i++) {
if (rx_pkt_burst[i] != NULL) {
rte_pktmbuf_free(rx_pkt_burst[i]);
rx_pkt_burst[i] = NULL;
}
}
/* reset bonding device stats */
rte_eth_stats_reset(test_params->bonding_port_id);
}
....
}
PVS-Studio warning:
V535 The variable 'i' is being used for this loop and for the outer loop. Check lines: 2239, 2291. test_link_bonding.c 2291
Two loops use the same variable i as a counter:
for (i = 0; i < test_params->bonding_member_count; i++) {
....
for (i = 0; i < MAX_PKT_BURST; i++) {
If this has been the case of an infinite loop, the error in the test would have been noticed and corrected immediately. So it's not an infinite loop. Why?
Apparently, the value of the MAX_PKT_BURST constant (equal to 512) is always greater than or equal to the value of the test_params->bonding_member_count variable. The outer loop stops immediately after completing only one iteration.
Thus, the test only works a little bit :)
By the way, this error is copied and can be found in the same file below: V535 The variable 'i' is being used for this loop and for the outer loop. Check lines: 4340, 4390. test_link_bonding.c 4390
static int
test_set_primary_member(void)
{
....
TEST_ASSERT_SUCCESS(memcmp(&read_mac_addr, &read_mac_addr,
sizeof(read_mac_addr)),
"bonding port mac address not set to that of primary port\n");
....
}
PVS-Studio warning:
V549 The first argument of 'memcmp' function is equal to the second argument. test_link_bonding.c 795
The memcmp function compares the same memory buffer to itself. Naturally, the function will always return 0 (two memory blocks contain identical data). An obvious typo due to carelessness or haste.
This unit test checks nothing.
Apparently, one of the arguments must be the expected_mac_addr pointer. I concluded this because I found this memcmp call nearby:
TEST_ASSERT_SUCCESS(memcmp(expected_mac_addr, &read_mac_addr,
sizeof(read_mac_addr)),
"bonding port mac address not set to that of primary port\n");
First, let's look at how the rte_ipv6_get_next_ext function is declared.
/**
* Parse next IPv6 header extension
* ....
* @return
* next protocol number if proto is an IPv6 extension, -EINVAL otherwise
*/
static inline int
rte_ipv6_get_next_ext(const uint8_t *p, int proto, size_t *ext_len);
Note that the function returns a negative value (-EINVAL, namely -22) in case of an error.
Now let's see how this function is used in tests.
test_vector_payload_populate(....)
{
....
int proto;
....
proto = hdr->proto;
p += sizeof(struct rte_ipv6_hdr);
while (proto != IPPROTO_FRAGMENT &&
(proto = rte_ipv6_get_next_ext(p, proto, &ext_len) >= 0)) // <=
p += ext_len;
/* Found fragment header, update the frag offset */
if (proto == IPPROTO_FRAGMENT) {
....
}
PVS-Studio warning:
V593 Consider reviewing the expression of the 'A = B >= C' kind. The expression is calculated as following: 'A = (B >= C)'. test_security_inline_proto_vectors.h 523
The programmer wanted:
To do this, the author wrote the expression:
(proto = rte_ipv6_get_next_ext(p, proto, &ext_len) >= 0)
But it works differently. The priority of the comparison operation >= is higher than that of the assignment operator =. As a result, the result of the check, i.e. 0 or 1, will be placed into the proto variable. Next, the unit test works with incorrect values and checks the wrong thing :).
It seems that the programmer knew about the priority of operations and used additional parentheses. Then the author made a typo and put the right parenthesis incorrectly. Correct version of this code:
(proto = rte_ipv6_get_next_ext(p, proto, &ext_len)) >= 0
static inline void
evdev_set_conf_values(struct rte_event_dev_config *dev_conf,
struct rte_event_dev_info *info)
{
memset(dev_conf, 0, sizeof(struct rte_event_dev_config));
dev_conf->dequeue_timeout_ns = info->min_dequeue_timeout_ns;
dev_conf->nb_event_ports = NB_TEST_PORTS;
dev_conf->nb_event_queues = NB_TEST_QUEUES;
dev_conf->nb_event_queue_flows = info->max_event_queue_flows;
dev_conf->nb_event_port_dequeue_depth = info->max_event_port_dequeue_depth;
dev_conf->nb_event_port_enqueue_depth = info->max_event_port_enqueue_depth;
dev_conf->nb_event_port_enqueue_depth = info->max_event_port_enqueue_depth;
dev_conf->nb_events_limit = info->max_num_events;
}
PVS-Studio warning:
V519 The 'dev_conf->nb_event_port_enqueue_depth' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 1166, 1168. test_event_crypto_adapter.c 1168
The analyzer noticed this repeated code:
dev_conf->nb_event_port_enqueue_depth = info->max_event_port_enqueue_depth;
dev_conf->nb_event_port_enqueue_depth = info->max_event_port_enqueue_depth;
Let's take a look at the structure rte_event_dev_config. I commented which data members are assigned some values.
struct rte_event_dev_config {
uint32_t dequeue_timeout_ns; // There is an assignment
int32_t nb_events_limit; // There is an assignment
uint8_t nb_event_queues; // There is an assignment
uint8_t nb_event_ports; // There is an assignment
uint32_t nb_event_queue_flows; // There is an assignment
uint32_t nb_event_port_dequeue_depth; // There is an assignment
uint32_t nb_event_port_enqueue_depth; // There is an assignment twice
uint32_t event_dev_cfg; //
uint8_t nb_single_link_event_port_queues; //
};
At the beginning, the entire structure is filled with zeros (see call of the memset function). Then all data members except the last two are initialized with new values. The nb_event_port_enqueue_depth data member is assigned a value twice. It's all very suspicious.
Perhaps the second assignment line is just unnecessary. Or maybe one forgot to initialize another data member. Or one even forgot to initialize two data members. I'm not sure how exactly it should look like, since I am not familiar with the project code, but there's something fishy here.
Naughty Copy-Paste programming.
static int
test_tls_record_proto_all(const struct tls_record_test_flags *flags)
{
....
if (flags->zero_len &&
((flags->content_type == TLS_RECORD_TEST_CONTENT_TYPE_HANDSHAKE) ||
(flags->content_type == TLS_RECORD_TEST_CONTENT_TYPE_HANDSHAKE) ||
(flags->content_type == TLS_RECORD_TEST_CONTENT_TYPE_HANDSHAKE))) {
if (ret == TEST_SUCCESS)
return TEST_FAILED;
goto skip_decrypt;
}
....
}
PVS-Studio warning:
V501 There are identical sub-expressions '(flags->content_type == TLS_RECORD_TEST_CONTENT_TYPE_HANDSHAKE)' to the left and to the right of the '||' operator. test_cryptodev.c 12173
It's all simple here. We compare a variable with the same constant twice. The unit test does not check a certain case. One of the constants must obviously be different.
It reminds me of my own article: ''Zero, one, two, Freddy's coming for you''. Except it's about 6 here.
static int
test_missing_c_flag(void)
{
....
if (rte_lcore_is_enabled(0) && rte_lcore_is_enabled(1) &&
rte_lcore_is_enabled(2) && rte_lcore_is_enabled(3) &&
rte_lcore_is_enabled(3) && rte_lcore_is_enabled(5) &&
rte_lcore_is_enabled(4) && rte_lcore_is_enabled(7) &&
launch_proc(argv29) != 0) {
....
}
PVS-Studio warning:
V501 There are identical sub-expressions 'rte_lcore_is_enabled(3)' to the left and to the right of the '&&' operator. test_eal_flags.c 678
The programmer started with calling the function rte_lcore_is_enabled, successively specifying the constants: 0, 1, 2, 3.
And then our programmer went wild: 3, 5, 4, 7. This might have happened because of the rush or distraction.
This bug is hard to spot on a code review. It just looks like some different numbers. Seems like it's all correct. Reading of such code is boring. As a result, the unit test will not check the function call for value 6.
It's good to have the PVS-Studio analyzer, which is not lazy to look closely at each line and constant :)
int
port_meter_policy_add(portid_t port_id, uint32_t policy_id,
const struct rte_flow_action *actions)
{
....
for (i = 0; i < RTE_COLORS; i++) {
for (act_n = 0, start = act;
act->type != RTE_FLOW_ACTION_TYPE_END; act++)
act_n++;
if (act_n && act->type == RTE_FLOW_ACTION_TYPE_END)
policy.actions[i] = start;
else
policy.actions[i] = NULL;
act++;
}
....
}
PVS-Studio warning:
V560 A part of conditional expression is always true: act->type == RTE_FLOW_ACTION_TYPE_END. config.c 2249
The check act->type == RTE_FLOW_ACTION_TYPE_END makes no sense because the loop above will stop only if this condition is met.
The check is redundant, or a wrong condition is checked. And for an attentive reader, I have a promo code dpdk_6 here for a monthly trial. Have fun hunting for bugs in the code.
static int
comp_names_to_index(struct context *ctx, const struct token *token,
unsigned int ent, char *buf, unsigned int size,
const char *const names[], size_t names_size)
{
RTE_SET_USED(ctx);
RTE_SET_USED(token);
if (!buf)
return names_size;
if (names[ent] && ent < names_size)
return rte_strscpy(buf, names[ent], size);
return -1;
}
PVS-Studio warning:
V781 The value of the 'ent' index is checked after it was used. Perhaps there is a mistake in program logic. cmdline_flow.c 12870
The function is passed an array of names, the size of this array, and the index of the name to be handled. There is a check in the function that the index is not out of array bounds. There is another check that the name is not empty: the pointer to the name is non-null.
But the order of these checks is messed up.
if (names[ent] && ent < names_size)
The pointer to the name is extracted from the array BEFORE checking the index. As a result, array overrun becomes possible.
For a unit test, such an error is not terrible. But actually, that's a big fail!
I will cite the whole function to make it clear why I call the code strange.
static int
parse_lcore_dma(struct test_configure *test_case, const char *value)
{
struct lcore_dma_map_t *lcore_dma_map;
char *input, *addrs;
char *ptrs[2];
char *start, *end, *substr;
uint16_t lcore_id;
int ret = 0;
if (test_case == NULL || value == NULL)
return -1;
input = strndup(value, strlen(value) + 1);
if (input == NULL)
return -1;
addrs = input;
while (*addrs == '\0')
addrs++;
if (*addrs == '\0') {
fprintf(stderr, "No input DMA addresses\n");
ret = -1;
goto out;
}
....
}
PVS-Studio warning:
V547 Expression '* addrs == '\0'' is always false. main.c 234
The loop ends only when a non-zero character is found.
while (*addrs == '\0')
addrs++;
if (*addrs == '\0') {
Therefore, a follow-up check does not make sense. The pointer will always point not to terminal null.
This code looks reckless in the whole. The addrs pointer points to a copy of the value string obtained by calling the strndup function.
What's the point of trying to skip all null characters in the loop?
while (*addrs == '\0')
addrs++;
If the string is empty (with '\0' written at the beginning), the array will be overridden.
If the string is non-empty, the loop will stop immediately without performing any iterations.
Maybe the author of code wanted to write something like this?
while (*addrs != '\0')
addrs++;
No, it doesn't make sense either.
I suspect that the loop is redundant here, and this fragment should be written like this:
input = strndup(value, strlen(value) + 1);
if (input == NULL)
return -1;
addrs = input;
if (*addrs == '\0') {
fprintf(stderr, "No input DMA addresses\n");
ret = -1;
goto out;
}
#define MAX_NUM 1 << 20
static int
test_align(void)
{
....
for (p = 1; p <= MAX_NUM / 2; p++) { // <=
for (i = 1; i <= MAX_NUM / 2; i++) { // <=
val = RTE_ALIGN_MUL_CEIL(i, p);
if (val % p != 0 || val < i)
FAIL_ALIGN("RTE_ALIGN_MUL_CEIL", i, p);
val = RTE_ALIGN_MUL_FLOOR(i, p);
if (val % p != 0 || val > i)
FAIL_ALIGN("RTE_ALIGN_MUL_FLOOR", i, p);
val = RTE_ALIGN_MUL_NEAR(i, p);
if (val % p != 0 || ((val != RTE_ALIGN_MUL_CEIL(i, p))
& (val != RTE_ALIGN_MUL_FLOOR(i, p))))
FAIL_ALIGN("RTE_ALIGN_MUL_NEAR", i, p);
}
}
....
}
PVS-Studio warnings:
I don't like macros for various reasons. One of them is that they severely disrupt the perception of the program. Here is a fitting example: individually, the code looks fine, but together it becomes a mess.
A block of code from unit tests seems to be OK.
The MAX_NUM macro also looks fine at first glance:
#define MAX_NUM 1 << 20
However, when substituting into the loop condition, we get the following expression:
1 << 20 / 2
The priority of the division operation is higher than that of the shift operation. This results in a shift of 10 bits rather than 20: 1 << 10.
Let's count how many options are iterated in loops due to an error:
(1 << (20 / 2)) * (1 << (20 / 2)) = 1024 * 1024 = 1048576
This is how many options should be iterated according to the programmer's idea:
((1 << 20) / 2) * ((1 << 20) / 2) = 524288 * 524288 = 274877906944
Total, the unit test performs 274877906944/1048576 = 262144 times fewer checks.
When reviewing the code, you may notice such an error but there is no guarantee. You need to clearly look at how the macro is written, make substitutions in your head, and remember about the priority of operations. A static analyzer is a decent helper in detecting such errors.
To fix the code, one should add parentheses to the macro:
#define MAX_NUM (1 << 20)
If we write in C++, it is better to use constexpr. And you are welcome to read the article "Design and evolution of constexpr in C++".
By the way, if one fixes this bug, there may be another problem with this unit test. Or rather, it will turn out to be too slow. And someone will have to rewrite it.
In the next article, I will review the remaining bugs that I spotted when checking the DPDK. In the meantime, don't hesitate to download PVS-Studio and see if there is something interesting in the unit tests of your projects. If you find something worthwhile, let me know in comments.
0