Alexey Smolskas

May 29 2024

Tags:

#Cpp #Compiler

What errors are lurking in LLVM code?

May 29 2024

Author: Alexey Smolskas

LLVM is an open-source project with a pretty large code base. The acme in terms of code quality, considering its size and open-source nature. After all, it's the developers of compiler tools who know best about language features and their proper use. Their top-notch code is always a challenge for our analyzer, and we always accept it with pleasure.

A couple of months ago, LLVM released version 18. It's time to once again ensure its code quality. If interested, you can read our articles about previous checks here and there.

These checks are always very special for us because static analyzers operate almost the same way as compilers do when they analyze code. Compilers also leverage static analysis to issue warnings. They're almost cousins, though. However, each of them is good at their own thing. This article is proof of that. The Clang compiler, a part of LLVM, compiled our analyzer and got it working. We even have an article about the switch from MSVC to it. In return, our analyzer detected errors in the compiler. Isn't that proof of synergy?

The checked project version is LLVM 18.1.0.

Fragment N1

Here's an example of how the logical error can occur in a blanket of conditions and lead to an unreachable code.

if (Tok->is(tok::hash)) {
  // Start of a macro expansion.
  First = Tok;
  Tok = Next;
  if (Tok)
    Tok = Tok->getNextNonComment();
} else if (Tok->is(tok::hashhash)) {
  // Concatenation. Skip.
  Tok = Next;
  if (Tok)
    Tok = Tok->getNextNonComment();
} else if (Keywords.isVerilogQualifier(*Tok) ||
           Keywords.isVerilogIdentifier(*Tok)) {
  First = Tok;
  Tok = Next;
  // The name may have dots like `interface_foo.modport_foo`.
  while (Tok && Tok->isOneOf(tok::period, tok::coloncolon) &&
         (Tok = Tok->getNextNonComment())) {
    if (Keywords.isVerilogIdentifier(*Tok))
      Tok = Tok->getNextNonComment();
  }
} else if (!Next) {
  Tok = nullptr;
} else if (Tok->is(tok::l_paren)) {
  // Make sure the parenthesized list is a drive strength. Otherwise the
  // statement may be a module instantiation in which case we have already
  // found the instance name.
  if (Next->isOneOf(
          Keywords.kw_highz0, Keywords.kw_highz1, Keywords.kw_large,
          Keywords.kw_medium, Keywords.kw_pull0, Keywords.kw_pull1,
          Keywords.kw_small, Keywords.kw_strong0, Keywords.kw_strong1,
          Keywords.kw_supply0, Keywords.kw_supply1, Keywords.kw_weak0,
          Keywords.kw_weak1)) {
    Tok->setType(TT_VerilogStrength);
    Tok = Tok->MatchingParen;
    if (Tok) {
      Tok->setType(TT_VerilogStrength);
      Tok = Tok->getNextNonComment();
    }
  } else {
    break;
  }
} else if (Tok->is(tok::hash)) {
  if (Next->is(tok::l_paren))
    Next = Next->MatchingParen;
  if (Next)
    Tok = Next->getNextNonComment();
}

The PVS-Studio warning:

V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 3016, 3058. TokenAnnotator.cpp

Let's take a look more precisely. In the first condition, we have the Tok->is(tok::hash) check. In the last condition, we have the same one in the else if statement. However, the token we're working with doesn't change. So, in the last else if, the code will never be executed. It's that critical here because these conditions contain different code.

if (Tok->is(tok::hash)) {
  // Start of a macro expansion.
  First = Tok;
  Tok = Next;
  if (Tok)
    Tok = Tok->getNextNonComment();
} else
....
else if (Tok->is(tok::hash)) {
  if (Next->is(tok::l_paren))
    Next = Next->MatchingParen;
  if (Next)
    Tok = Next->getNextNonComment();
}

It might be better if we change the statement to switch, and it might help developers notice the error. It's a matter of taste, though.

Fragment N2

This code snippet is pretty riveting, and I can't help but share it with you:

assert(bArgs.size() == reduc.size() + needsUniv ? 1 : 0);

The analyzer warnings:

V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '+' operator. LoopEmitter.cpp 983
V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '+' operator. LoopEmitter.cpp 1039

There are two warnings because the code fragment occurs in two different places.

The analyzer indicates that developers may have used the ternary operator incorrectly. Let's get into it.

First, let's refresh on the operator precedence. Operators are listed in descending order of precedence: the + operator, the == operator, the ternary operator. I should note that the needsUniv variable is of the bool type.

So, the results of reduc.size() and needsUniv will be added up first. However, needsUniv implicitly casts to the size_t type. Then, the addition result will be compared to the result of bArgs.size(). Then the ternary operator will be executed, and it will return either 1 or 0.

That's kind of odd. I think developers probably meant to write such code:

assert(bArgs.size() == reduc.size() + (needsUniv ? 1 : 0));

In such a case, the ternary operator will be executed first and will return either 1 or 0. Then this value will be added up to the result of reduc.size() and compared to the result of bArgs.size().

A curious fact: in the first and second cases, we'll get the same result.

A more curious fact: devs could have written the code like this:

assert(bArgs.size() == reduc.size() + needsUniv)

Here, the result would be the same but without the redundant ternary operator.

All in all, it's an peculiar case when the code is written incorrectly but still works.

Fragment N3

Here's a rather interesting use of the postfix increment. Watch out for the second argument of the custom Printf function called on the strm object:

static void DumpTargetInfo(uint32_t target_idx, Target *target,
                           const char *prefix_cstr,
                           bool show_stopped_process_status, Stream &strm) 
{
  ....
  uint32_t properties = 0;
  if (target_arch.IsValid()) 
  {
    strm.Printf("%sarch=", properties++ > 0 ? ", " : " ( ");
    target_arch.DumpTriple(strm.AsRawOstream());
    properties++;
  }
}

The analyzer warning:

V547 Expression 'properties ++ > 0' is always false. CommandObjectTarget.cpp:100

As far as I can say, developers may have intended to compare properties against zero and avoid incrementing the properties variable and superfluous code, so they decided to use the postfix increment right away.

Thus, the previous value of the variable would have been compared, but it'd still be incremented by 1. However, it's unclear why devs decided to increment it further.

Then we gain the insight that the ternary operator isn't needed here at all. This snippet might have been in a loop at some point. Feel free to share your guesses in the comments.

Fragments N4-8

Let's take a look at the following code and the PVS-Studio warning:

bool areStatementsIdentical(const Stmt *FirstStmt, const Stmt *SecondStmt,
                            const ASTContext &Context, bool Canonical) 
{
  ....
 
  if (FirstStmt->getStmtClass() != FirstStmt->getStmtClass())
    return false;
  ....
}

The PVS-Studio warning:

V501 There are identical sub-expressions 'FirstStmt->getStmtClass()' to the left and to the right of the '!=' operator. ASTUtils.cpp:99

It may seem like a drop in the bucket. Mixing FirstStmt and SecondStmt up? Whatever.

Then we encounter the following code:

static bool sameFunctionParameterTypeLists(Sema &S,
                                           const OverloadCandidate &Cand1,
                                           const OverloadCandidate &Cand2) {
  if (!Cand1.Function || !Cand2.Function)
    return false;

  FunctionDecl *Fn1 = Cand1.Function;
  FunctionDecl *Fn2 = Cand2.Function;

  if (Fn1->isVariadic() != Fn1->isVariadic())
    return false;
  ....
}

The analyzer warning:

V501 There are identical sub-expressions to the left and to the right of the '!=' operator: Fn1->isVariadic() != Fn1->isVariadic(). SemaOverload.cpp:10190

We think, "Well, it's wrong again. Whatever."

Then we see the code like this:

if (G1->Rank < G1->Rank)
  G1->Group = G2;
else {
  G2->Group = G1;
}

The analyzer warning:

V501 There are identical sub-expressions to the left and to the right of the '<' operator: G1->Rank < G1->Rank. SCCIterator.h:285

We're starting to have doubts.

And then the code snippet jumps out:

ValueBoundsConstraintSet::areOverlappingSlices(MLIRContext *ctx,
                                               HyperrectangularSlice slice1,
                                               HyperrectangularSlice slice2) {
  assert(slice1.getMixedOffsets().size() == slice1.getMixedOffsets().size()
      && "expected slices of same rank");
  assert(slice1.getMixedSizes().size()   == slice1.getMixedSizes().size() 
      && "expected slices of same rank");
  assert(slice1.getMixedStrides().size() == slice1.getMixedStrides().size() 
      && "expected slices of same rank");
  ....
}

The analyzer warnings:

V501 There are identical sub-expressions 'slice1.getMixedOffsets().size()' to the left and to the right of the '==' operator. ValueBoundsOpInterface.cpp:581
V501 There are identical sub-expressions 'slice1.getMixedSizes().size()' to the left and to the right of the '==' operator. ValueBoundsOpInterface.cpp:583
V501 There are identical sub-expressions 'slice1.getMixedStrides().size()' to the left and to the right of the '==' operator. ValueBoundsOpInterface.cpp:585

Here's another one almost the same:

ValueBoundsConstraintSet::areEquivalentSlices(MLIRContext *ctx,
                                              HyperrectangularSlice slice1,
                                              HyperrectangularSlice slice2) {
  assert(slice1.getMixedOffsets().size() == slice1.getMixedOffsets().size() 
      && "expected slices of same rank");
  assert(slice1.getMixedSizes().size()   == slice1.getMixedSizes().size() 
      && "expected slices of same rank");
  assert(slice1.getMixedStrides().size() == slice1.getMixedStrides().size() 
      && "expected slices of same rank");
  ....
}

The PVS-Studio warnings:

V501 There are identical sub-expressions 'slice1.getMixedOffsets().size()' to the left and to the right of the '==' operator. ValueBoundsOpInterface.cpp:646
V501 There are identical sub-expressions 'slice1.getMixedSizes().size()' to the left and to the right of the '==' operator. ValueBoundsOpInterface.cpp:648
V501 There are identical sub-expressions 'slice1.getMixedStrides().size()' to the left and to the right of the '==' operator. ValueBoundsOpInterface.cpp:650

We wonder, "Wow, how many bugs can quietly live in code?" These errors can cause something to fall off or not operate as intended.

By the way, when I see all these Fn1, G1, slice1, my colleague's article comes to mind: "Zero, one, two, Freddy's coming for you".

I've saved these similar warnings for the last:

V501 There are identical sub-expressions 'EltRange.getEnd() >= Range.getEnd()' to the left and to the right of the '||' operator. HTMLLogger.cpp:421
V501 There are identical sub-expressions 'SrcExpr.get()->containsErrors()' to the left and to the right of the '||' operator. SemaCast.cpp:2938
V501 There are identical sub-expressions 'ND->getDeclContext()' to the left and to the right of the '!=' operator. SemaDeclCXX.cpp:4391
V501 There are identical sub-expressions 'DepType != OMPC_DOACROSS_source' to the left and to the right of the '&&' operator. SemaOpenMP.cpp:24348
V501 There are identical sub-expressions '!OldMethod->isStatic()' to the left and to the right of the '&&' operator. SemaOverload.cpp:1425
V501 There are identical sub-expressions 'lldb::eTypeClassUnion' to the left and to the right of the '|' operator. JSONUtils.cpp:139
V501 There are identical sub-expressions to the left and to the right of the '&&' operator: !BFI &&!BFI. JumpThreading.cpp:2531
V501 There are identical sub-expressions 'BI->isConditional()' to the left and to the right of the '&&' operator. VPlanHCFGBuilder.cpp:401
V501 There are identical sub-expressions to the left and to the right of the '==' operator: getNumRows() == getNumRows(). Simplex.cpp:108

Fragment N9

Here's another interesting code snippet. Try to figure out for yourself where the error lies:

const Expr *CGOpenMPRuntime::getNumTeamsExprForTargetDirective(
    CodeGenFunction &CGF, const OMPExecutableDirective &D, int32_t &MinTeamsVal,
    int32_t &MaxTeamsVal) 
{
  ....
  if (isOpenMPParallelDirective(NestedDir->getDirectiveKind()) ||
          isOpenMPSimdDirective(NestedDir->getDirectiveKind())) {
    MinTeamsVal = MaxTeamsVal = 1;
    return nullptr;
  }
  MinTeamsVal = MaxTeamsVal = 1;
  return nullptr;
  ....
}

Now let's see what the analyzer issues.

The analyzer warning:

V523 The 'then' statement is equivalent to the subsequent code fragment. CGOpenMPRuntime.cpp:6040, 6036

It shows that the code after if is exactly the same as in the then branch. Therefore, either the check or the code part after it is unnecessary.

Fragment N10

The following snippet looks that weird:

explicit MapLattice(Container C) { C = std::move(C); }

The PVS-Studio warning:

V570 The 'C' variable is assigned to itself. MapLattice.h:52

It's the reason, guess. Inside the body of the MapLattice class constructor, we can see the shadowing of a non-static field. The field has the same name as the parameter. In this fragment, devs forgot to explicitly set this to the left of the assignment operator.

Just a small fix and the code will operate as it should:

explicit MapLattice(Container C) { this->C = std::move(C); }

Although, IMHO, it'd be much neater to use the constructor initialization list:

explicit MapLattice(Container C) : C { std::move(C) } {};

In this case, there is no shadowing because of the name lookup rules (click, click).

Overall, you can add a prefix or postfix to the names of private fields. It makes it easier to distinguish them from parameters in code.

Fragment N11

Do you ever feel like you've forgotten something? Like, you can't remember if you've brought your keys with you or not. You grope your pockets but can't find them.

So, in the following fragment, it looks like devs have forgotten to use the function result:

ScalarEvolution::getRangeRefIter(const SCEV *S,
                                 ScalarEvolution::RangeSignHint SignHint) 
{
  ....
  for (const SCEV *P : reverse(drop_begin(WorkList))) {
    getRangeRef(P, SignHint);
    ....
  }
  ....
}

The analyzer warning:

V530 The return value of function 'getRangeRef' is required to be utilized. ScalarEvolution.cpp:6587

You may notice that the result of getRangeRef isn't used.

Here's the function signature, which validates that there is an operation result:

const ConstantRange &getRangeRef(const SCEV *S, RangeSignHint Hint,
                                 unsigned Depth = 0);

Further, we can see that the result of the signature is used throughout the code.

However, it's not that simple. Sometimes your senses mislead you, and it turns out that the keys are right in your hand all along.

The code snippet has the comment:

// Use getRangeRef to compute ranges for items in the worklist in reverse
// order. This will force ranges for earlier operands to be computed before
// their users in most cases.

Developers may have written this code deliberately. However, if we use the get function in this way and make it multipurpose, it's a bad programming practice. It may mislead other developers.

Thus, the analyzer formally turns out to be right, but in fact there is no error here. In such cases, we have ways to suppress certain warnings. For example, using a comment in the code:

for (const SCEV *P : reverse(drop_begin(WorkList))) {
    getRangeRef(P, SignHint); //-V530

Fragment N12

A quest for the most attentive ones: find three differences between the code in the if and else branches. Okay, find at least one:

case OptionParser::eOptionalArgument:
  if (OptionParser::GetOptionArgument() != nullptr) {
    option_element_vector.push_back(OptionArgElement(
        opt_defs_index,
        FindOriginalIndex(dummy_vec[OptionParser::GetOptionIndex() - 2],
                          args),
        FindOriginalIndex(dummy_vec[OptionParser::GetOptionIndex() - 1],
                          args)));
  } else {
    option_element_vector.push_back(OptionArgElement(
        opt_defs_index,
        FindOriginalIndex(dummy_vec[OptionParser::GetOptionIndex() - 2],
                          args),
        FindOriginalIndex(dummy_vec[OptionParser::GetOptionIndex() - 1],
                          args)));
  }

And you're absolutely right. They aren't any.

The analyzer warning:

V523 The 'then' statement is equivalent to the 'else' statement. Options.cpp 1212

Conclusion

All good things come to an end. But that's not about this article. The big project begets the big article, or even better, two articles. I've decided to go the second way. Next, we'll dive into some pretty serious bugs (spoiler: they're related to UB).

Even the pros make mistakes—and not just in code. What to say about ordinary users. However, mistakes are no reason to get frustrated. Mistakes are just an opportunity to get better and grow. You can tell this to your team lead every time something breaks in the prod.

To avoid any errors in the code, you may try new methods of searching for them. For example, you may use dynamic and static code analysis tools in addition to tests and code review.

Wondering what errors are lurking in your code? Check your project for free!

#Cpp #Compiler