Breaking down bugs in TDengine to master refactoring, part 2: stack-consuming macro

Get ready for code smells, classic errors, and typos when checking the TDengine project using PVS-Studio. Much of this is avoidable if we design code carefully from the beginning, keep the logic simple, and stay away from macros. Let's break down some code snippets and find ways to refactor them to leave bugs no room there.

In the previous post, we had a good feed of the sausage code. In this article, I'll use a new example to show why macros are dangerous and why I dislike them.

If you can do without a macro, don't use it. I'm certainly not into copy-paste programming instead of using macros :) But when you can use another language construct that suggests approximately the same functional and compact code, do it.

Look at this sorting function in the TDengine project:

void mndSortVnodeGid(SVgObj *pVgroup) {
  for (int32_t i = 0; i < pVgroup->replica; ++i) {
    for (int32_t j = 0; j < pVgroup->replica - 1 - i; ++j) {
      if (pVgroup->vnodeGid[j].dnodeId > pVgroup->vnodeGid[j + 1].dnodeId) {
        TSWAP(pVgroup->vnodeGid[j], pVgroup->vnodeGid[j + 1]);
      }
    }
  }
}

Do you see the problem?

Perhaps someone will say that using such an inefficient algorithm as bubble sort is a mistake. No, I'm not talking about the performance issue. However, bubble sort also adds to it.

So, back to the hidden surprise. It's hard to notice it on refactoring, as everything looks logical. Nevertheless, PVS-Studio analyzer issues the following warning:

V505 [CWE-770, CERT-MEM05-C] The 'alloca' function is used inside the loop. This can quickly overflow stack. mndVgroup.c 807

Why? It seems to simply change the contents of two objects. Nothing looks suspicious:

TSWAP(pVgroup->vnodeGid[j], pVgroup->vnodeGid[j + 1]);

Most likely, no one raised a red flag when reviewing this code snippet. Well, since we are breaking it down here, no one actually did it :)

Let's look into another file to see what TSWAP is. That's where we'll learn why this code is bad. It's a macro with a trap inside. Devs use a temporary buffer to exchange two strings. It's allocated on the stack using the alloca function.

#define TSWAP(a, b)                  \
  do {                               \
    char *__tmp = (char*)alloca(sizeof(a)); \
    (void)memcpy(__tmp, &(a), sizeof(a));  \
    (void)memcpy(&(a), &(b), sizeof(a));   \
    (void)memcpy(&(b), __tmp, sizeof(a));  \
  } while (0)

Description of the alloca function on the Linux manual page.

The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.

The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined.

Most likely, the author wanted to enhance the program performance and allocated the temporary memory buffer on the stack instead of on the heap. It's faster. Plus, it's easier since you don't have to worry about checking the received pointer for NULL and writing code to free memory. The memory on the stack is freed on its own when the function terminates. Anyway, the developer wanted to do a good job, but it turned out badly.

Two weaknesses—the algorithm and the macro—found each other. Even though it may not be a problem right now, it's definitely a potential landmine for the future. Let's dig deeper in it.

The author sorts structures of the SVnodeGid type:

typedef struct {
  int32_t    dnodeId;
  ESyncState syncState;
  int64_t    syncTerm;
  bool       syncRestore;
  bool       syncCanRead;
  int64_t    roleTimeMs;
  int64_t    startTimeMs;
  ESyncRole  nodeRole;
  int32_t    learnerProgress;
} SVnodeGid;

This structure size is 48 bytes. It's quite large, and only the small size of the sorted array saves the program from crashing:

#define TSDB_MAX_REPLICA               5
#define TSDB_MAX_LEARNER_REPLICA       10
....
typedef struct {
  ....
  SVnodeGid vnodeGid[TSDB_MAX_REPLICA + TSDB_MAX_LEARNER_REPLICA];
  ....
} SVgObj;

Now the array consists of only 15 elements. Therefore, inefficient sorting and memory allocation on the stack will almost certainly remain unnoticed.

In the worst case, bubble sort is executed in (N - 1)*N/2 elements swaps. So, in this case, the TSWAP macro will be executed (15-1)*15/2 = 105 times, taking 105 * 48 = 5040 bytes on the stack.

It's not a very big deal. Therefore, this code might have never caused stack exhaustion problems. As you may guess, this macro can consume memory on the stack if the vnodeGid array size increases.

Will it ever happen? Who knows. A good example is the Ariane 4 rocket—its creators didn't envision their code to be reused in Ariane 5, which had different technical specifications. We covered this case in more detail in the article: "A space error: 370.000.000 $ for an integer overflow."

Imagine someone needs to increase the array to at least 100 elements. Then in the worst case, ((100-1)*100/2)*48 = 237,600 bytes will be allocated on the stack. This may be the last straw to break the camel's back. Plus, the function requires a varying amount of memory depending on the input data. The program may successfully pass all test scenarios, but fail when dealing with user data.

Let's revamp the code.

At first, I wanted to reduce the size of the SVnodeGid structure by rearranging data members (see the approach from the "Lesson 23. Pattern 15. Growth of structures' sizes"). In this case, it won't work—the number of alignment bytes will remain the same, but will be placed at the end of the structure instead of at the beginning. At least, now you may learn more about this approach at the link.

Let's get rid of the macro. It can and should be replaced by a function. This eliminates the risk of running out of stack space at the worst possible moment. In C++, there is std::swap—end of story. But we are dealing with C. I suggest the following universal swap function for two elements:

void tswap(void *a, void *b, size_t size) 
{
  char *__tmp = (char*)alloca(size);
  (void)memcpy(__tmp, a, size);
  (void)memcpy(a, b, size);
  (void)memcpy(b, __tmp, size);
}

I can't call it perfect, but it's better. Some might argue that the code with a macro is more efficient — no need to call a function.

I disagree with this point.

Compared to copying structures back and forth, a function call is insignificant, so it's not worth considering.
Modern optimizing compilers may well inline a function body to a loop on their own. A compiler is very likely to make the code as efficient without our assistance.
It's not the code that we should optimize. An efficient sorting algorithm will do a lot more good. Optimization with a macro is definitely much less efficient.

The fixed code looks as follows:

void tswap(void *a, void *b, size_t size) 
{
  char *__tmp = (char*)alloca(size);
  (void)memcpy(__tmp, a, size);
  (void)memcpy(a, b, size);
  (void)memcpy(b, __tmp, size);
}

void mndSortVnodeGid(SVgObj *pVgroup) {
  for (int32_t i = 0; i < pVgroup->replica; ++i) {
    for (int32_t j = 0; j < pVgroup->replica - 1 - i; ++j) {
      if (pVgroup->vnodeGid[j].dnodeId > pVgroup->vnodeGid[j + 1].dnodeId) {
        tswap(&pVgroup->vnodeGid[j],
              &pVgroup->vnodeGid[j + 1], sizeof(SVnodeGid);
      }
    }
  }
}

Now we have to pass the structure size. Though the temporary buffer allocated in tswap is freed every time, which eliminates possible stack exhaustion.

I think, we've fixed the main problem, but bubble sort still performs unnecessary data copies. For example, in the worst case, it performs N/2 times as many swaps as selection sort. There are other more efficient algorithms. However, sorting algorithms discussion is beyond this article scope, especially in the case of a 15-elements-array.

However, I'd rather use the qsort function instead of writing a sorting algorithm myself. Here are the reasons:

The fewer reinvented wheels — the better.
Standard library implementations have already been tested many times and may also contain various optimizations.
If we use standard algorithms, we don't need our own macro or function to rearrange the elements.
We've ensured that performance won't drop in case the elements' number in the array increases. The qsort's implementation is definitely better than a self-made bubble.

int CmpSVnodeGid(const void *a, const void *b)
{
  int32_t aId = ((const SVnodeGid *)(a))->dnodeId;
  int32_t bId = ((const SVnodeGid *)(b))->dnodeId;
  if (aId < bId) return -1;
  if (aId > bId) return 1;
  return 0;
}

void mndSortVnodeGid(SVgObj *pVgroup) {
  qsort(pVgroup->vnodeGid, pVgroup->replica,
        sizeof(SVnodeGid), CmpSVnodeGid);
}

Looks nice! Efficient and scalable code, no stack-consuming, no need to use a macro. The great power of reusing off-the-shelf standard solutions.

Learn more: