Breaking down bugs in TDengine to master refactoring, part 3: price of laziness

31 Mar 2025

Author: Andrey Karpov

Get ready for code smells, classic errors, and typos when checking the TDengine project using PVS-Studio. Developers could've prevented many of them if they had carefully designed the code in the first place, kept the logic simple, and avoided macros. Let's break down some of these errors and figure out how refactoring can help eliminate them completely.

This time, we'll talk about writing code using the copy-paste method. On the one hand, programmers know that copying code and modifying it later can lead to errors and typos. On the other hand, writing similar code from scratch every time is tedious and inefficient. The key is finding the right balance—something that's hard to define and only truly comes with experience.

I've never advocated for completely abandoning code copying. It sounds nice in theory, but it's unrealistic. However, once you see the following code, I think you'll agree that this is a case where copy-paste has gone way beyond reasonable limits.

This fragment from the TDengine project brings back memories of the first note about the sausage code.

Even shortening the code lines doesn't help here, so it's still difficult to spot an error.

int32_t mndSplitVgroup(SMnode *pMnode, SRpcMsg *pReq,
                       SDbObj *pDb, SVgObj *pVgroup)
{
  ....
  mInfo("vgId:%d, vgroup info after split, replica:"
        "%d hashrange:[%u, %u] vnode:0 dnode:%d",
        newVg1.vgId, newVg1.replica,
        newVg1.hashBegin, newVg1.hashEnd, newVg1.vnodeGid[0].dnodeId);
  for (int32_t i = 0; i < newVg1.replica; ++i) {
    mInfo("vgId:%d, vnode:%d dnode:%d",
          newVg1.vgId, i, newVg1.vnodeGid[i].dnodeId);
  }
  mInfo("vgId:%d, vgroup info after split, replica:%d hashrange:"
        "[%u, %u] vnode:0 dnode:%d", newVg2.vgId, newVg2.replica,
        newVg2.hashBegin, newVg2.hashEnd, newVg2.vnodeGid[0].dnodeId);
  for (int32_t i = 0; i < newVg1.replica; ++i) {
    mInfo("vgId:%d, vnode:%d dnode:%d",
          newVg2.vgId, i, newVg2.vnodeGid[i].dnodeId);
  }
  ....
}

Can you see the error? Okay, don't strain your eyes.

The PVS-Studio warning:

V778 Two similar code fragments were found. Perhaps, this is a typo and 'newVg2' variable should be used instead of 'newVg1'. mndVgroup.c 3270

The error lies in the second loop:

for (int32_t i = 0; i < newVg1.replica; ++i) {

It's a copy of the first loop where developers forgot to replace newVg1 with newVg2. Due to this error, either some information from the newVg2 object won't be printed, or there'll be an array overrun.

We can trace the bug's origin. First, developers wrote the code to print the values of the newVg1 object. Then, they repeated it for newVg2, but instead of writing it properly, they took the lazy route—copying the existing code and just replacing every 1 with a 2.

And as we know, where there's one and two, Freddy's coming for you :)

The issue is that in this code block...

mInfo("vgId:%d, vgroup info after split, replica:"
      "%d hashrange:[%u, %u] vnode:0 dnode:%d",
      newVg1.vgId, newVg1.replica,
      newVg1.hashBegin, newVg1.hashEnd, newVg1.vnodeGid[0].dnodeId);
for (int32_t i = 0; i < newVg1.replica; ++i) {
  mInfo("vgId:%d, vnode:%d dnode:%d",
        newVg1.vgId, i, newVg1.vnodeGid[i].dnodeId);
}

...we need to change 1 to 2 eight times! This is a lot. Some things are easy to overlook, and that's what actually happened.

Time-saving (or laziness) in this case leads to the error, which may take much more time to find and fix. How much time did the programmer save? Five minutes.

What are the hypothetical consequences? Let's say a user encounters an array overrun and submits a core dump. This sets off a ripple effect: communicating with tech support, creating an issue for a bug, finding and fixing it, testing everything, rebuilding, releasing an updated version, and so on. This process alone can take hours—even in an optimistic scenario, where the user's report immediately points to where the bug is hiding. Sometimes tracking down such defects can take significantly longer as developers struggle to understand what's crashing, how, and why it's happening on the user's side. The result is wasted resources and a tarnished reputation.

All in all, copy-paste is like a knife—a dangerous thing—yet we'll continue to use both knives and copy-paste. The real question is: where do we set the limit?

There's no strict rule on how many lines or replacements should trigger the decision to stop copying the code block but write a function—it takes a certain level of intuition.

In this case, it's clear that we need to write a function. There are simply too many lines and replacements. Let's refactor the code:

void PrintVGroupInfo(const SVgObj *vg)
{
  mInfo("vgId:%d, vgroup info after split, replica:"
        "%d hashrange:[%u, %u] vnode:0 dnode:%d",
        vg->vgId, vg->replica,
        vg->hashBegin, vg->hashEnd, vg->vnodeGid[0].dnodeId);
  for (int32_t i = 0; i < vg->replica; ++i) {
    mInfo("vgId:%d, vnode:%d dnode:%d",
          vg->vgId, i, vg->vnodeGid[i].dnodeId);
  }
}
....
int32_t mndSplitVgroup(SMnode *pMnode, SRpcMsg *pReq,
                       SDbObj *pDb, SVgObj *pVgroup)
{
  ....
  PrintVGroupInfo(&newVg1);
  PrintVGroupInfo(&newVg2);
  ....
}

Profit:

There's just no room for a typo now, which is the most important part.
The overly long mndSplitVgroup function has been shortened, making the code more compact overall.

The conclusion. Keep your code clean. Beautiful code tends to be higher quality and more secure. In this case, the code with the function looks cleaner—take this as a sign to make the change.

Additional links: