Andrey Karpov

Oct 27 2020

Tags:

#Cpp #Compiler

Checking Clang 11 with PVS-Studio

Oct 27 2020

Author: Andrey Karpov

Time for rechecking Clang
11 suspicious code snippets
Conclusion
Further reading on checking compilers with PVS-Studio

Every now and then, we have to write articles about how we've checked another fresh version of some compiler. That's not really much fun. However, as practice shows, if we stop doing that for a while, folks start doubting whether PVS-Studio is worth its title of a good catcher of bugs and vulnerabilities. What if the new compiler can do that too? Sure, compilers evolve, but so does PVS-Studio – and it proves, again and again, its ability to catch bugs even in high-quality projects such as compilers.

Time for rechecking Clang

To tell you the truth, I wrote this article based on the earlier post "Checking the GCC 10 Compiler with PVS-Studio". So if some paragraphs seem familiar, it's because you've already read them before :).

It's no secret that compilers employ their own built-in static code analyzers, and those are developing as well. That's why we write articles every now and then to show that our static analyzer, PVS-Studio, can find bugs even inside compilers and that we are worth our salt :).

In fact, you can't compare classic static analyzers with compilers. Static analyzers not only detect bugs in source code but also involve a highly developed infrastructure. For one thing, it includes integration with such systems as SonarQube, Azure DevOps, Travis CI, CircleCI, GitLab CI/CD, Jenkins, and Visual Studio. It includes mechanisms for mass suppression of warnings, which allows you to start using PVS-Studio right off even in a large project. It includes sending notifications by email. And so on and so forth. But the first question developers will still ask is, "Can your PVS-Studio find anything that compilers can't?" And that means we are doomed to write articles about how we check the compilers themselves over and over again.

Let's get back to Clang. There's no need to dwell on the subject and tell you what the project is about. Actually, we checked not only the code of Clang 11 itself but also the code of the LLVM 11 library it's based on. From this article's viewpoint, it doesn't matter if a defect was found in the compiler's or the library's code.

I found the code of Clang/LLVM much clearer than that of GCC. At least it's not teeming with all those awful macros and it extensively employs C++'s modern features.

Even so, the project is still big enough to make examining the analysis report tedious without prior customization. What mostly gets in the way is "semi-false" positives. By "semi-false" positives I mean cases where the analyzer is technically correct to point out certain issues but those warnings are of no practical use. For example, a lot of such warnings refer to unit tests and generated code.

Here's an example for unit tests:

Spaces.SpacesInParentheses = false;               // <=
Spaces.SpacesInCStyleCastParentheses = true;      // <=
verifyFormat("Type *A = ( Type * )P;", Spaces);
verifyFormat("Type *A = ( vector<Type *, int *> )P;", Spaces);
verifyFormat("x = ( int32 )y;", Spaces);
verifyFormat("int a = ( int )(2.0f);", Spaces);
verifyFormat("#define AA(X) sizeof((( X * )NULL)->a)", Spaces);
verifyFormat("my_int a = ( my_int )sizeof(int);", Spaces);
verifyFormat("#define x (( int )-1)", Spaces);

// Run the first set of tests again with:
Spaces.SpacesInParentheses = false;               // <=
Spaces.SpaceInEmptyParentheses = true;
Spaces.SpacesInCStyleCastParentheses = true;      // <=
verifyFormat("call(x, y, z);", Spaces);
verifyFormat("call( );", Spaces);

The analyzer warns us that the variables are assigned the same values they already have:

V1048 The 'Spaces.SpacesInParentheses' variable was assigned the same value. FormatTest.cpp 11554
V1048 The 'Spaces.SpacesInCStyleCastParentheses' variable was assigned the same value. FormatTest.cpp 11556

Technically, this warning is to the point and the snippet needs simplifying or fixing. But it's also clear that this code is fine as it is and there's no point fixing anything in it.

Here's another example: the analyzer outputs a ton of warnings on the autogenerated file Options.inc. Look at the "wall" of code it contains:

This bulk of code triggers a flood of warnings:

V501 There are identical sub-expressions to the left and to the right of the '==' operator: nullptr == nullptr Options.inc 26
V501 There are identical sub-expressions to the left and to the right of the '==' operator: nullptr == nullptr Options.inc 27
V501 There are identical sub-expressions to the left and to the right of the '==' operator: nullptr == nullptr Options.inc 28
and so on – one warning per line...

Yet all that is not a big deal. It can be solved by excluding irrelevant files from analysis, marking certain macros and functions, suppressing certain diagnostic types, and so on. Yes, it can, but it's not a very interesting job to do when you write an article. That's why I did the same thing as in the article about checking the GCC compiler: I kept reading the report until I collected 11 interesting examples to include in the article. Why 11? I just thought that since it was the 11-th version of Clang, I needed 11 examples :).

11 suspicious code snippets

Snippet 1, modulo operation on 1

This is a cool one! I like bugs like that!

void Act() override {
  ....
  // If the value type is a vector, and we allow vector select, then in 50%
  // of the cases generate a vector select.
  if (isa<FixedVectorType>(Val0->getType()) && (getRandom() % 1)) {
    unsigned NumElem =
        cast<FixedVectorType>(Val0->getType())->getNumElements();
    CondTy = FixedVectorType::get(CondTy, NumElem);
  }
  ....
}

PVS-Studio diagnostic message: V1063 The modulo by 1 operation is meaningless. The result will always be zero. llvm-stress.cpp 631

The programmer is using a modulo operation to get a random value of either 0 or 1. But the value 1 seems to confuse developers and make them write the classic anti-pattern in which the modulo operation is performed on 1 instead of 2. The X % 1 operation is meaningless as it always evaluates to 0. This is the fixed version:

if (isa<FixedVectorType>(Val0->getType()) && (getRandom() % 2)) {

The recently added V1063 diagnostic is awfully simple, but, as you can see, it works perfectly.

We know that compiler developers keep an eye on our work and borrow our ideas. That's totally fine. It's nice to know that PVS-Studio is the driving force behind progress. Let's see how much it will take for a similar diagnostic to appear in Clang and GCC :).

Snippet 2, a typo in a condition

class ReturnValueSlot {
  ....
  bool isNull() const { return !Addr.isValid(); }
  ....
};

static bool haveSameParameterTypes(ASTContext &Context, const FunctionDecl *F1,
                                   const FunctionDecl *F2, unsigned NumParams) {
  ....
  unsigned I1 = 0, I2 = 0;
  for (unsigned I = 0; I != NumParams; ++I) {
    QualType T1 = NextParam(F1, I1, I == 0);
    QualType T2 = NextParam(F2, I2, I == 0);
    if (!T1.isNull() && !T1.isNull() && !Context.hasSameUnqualifiedType(T1, T2))
      return false;
  }
  return true;
}

PVS-Studio diagnostic message: V501 There are identical sub-expressions to the left and to the right of the '&&' operator: !T1.isNull() && !T1.isNull() SemaOverload.cpp 9493

The !T1.isNull() check is performed twice. This is obviously a typo; the second part of the condition must check the T2 variable.

Snippet 3, potential array-index-out-of-bounds

std::vector<Decl *> DeclsLoaded;

SourceLocation ASTReader::getSourceLocationForDeclID(GlobalDeclID ID) {
  ....
  unsigned Index = ID - NUM_PREDEF_DECL_IDS;

  if (Index > DeclsLoaded.size()) {
    Error("declaration ID out-of-range for AST file");
    return SourceLocation();
  }

  if (Decl *D = DeclsLoaded[Index])
    return D->getLocation();
  ....
}

PVS-Studio diagnostic message: V557 Array overrun is possible. The 'Index' index is pointing beyond array bound. ASTReader.cpp 7318

Suppose the array stores one element and the value of the Index variable is also 1. Then the (1 > 1) condition is false, and, therefore, the array will be indexed beyond its bounds. Here's the correct check:

if (Index >= DeclsLoaded.size()) {

Snippet 4, argument evaluation order

void IHexELFBuilder::addDataSections() {
  ....
  uint32_t SecNo = 1;
  ....
  Section = &Obj->addSection<OwnedDataSection>(
      ".sec" + std::to_string(SecNo++), RecAddr,
      ELF::SHF_ALLOC | ELF::SHF_WRITE, SecNo);
  ....
}

PVS-Studio diagnostic message: V567 Unspecified behavior. The order of argument evaluation is not defined for 'addSection' function. Consider inspecting the 'SecNo' variable. Object.cpp 1223

Note that the SecNo argument is used twice, getting incremented in the meanwhile. The problem is, you can't tell in what exact order the arguments will be evaluated. The result will, therefore, vary depending on the compiler version or compilation parameters.

Here's a synthetic example to illustrate this point:

#include <cstdio>
int main()
{
  int i = 1;
  printf("%d, %d\n", i, i++);
  return 0;
}

Depending on the compiler, this code may output either "1, 1" or "2, 1". I ran it on Compiler Explorer and got the following outputs:

when compiled with Clang 11.0.0, the program outputs 1, 1.
when compiled with GCC 10.2, the program outputs 2, 1.

Interestingly, this simple case causes Clang to issue a warning:

<source>:6:26: warning:
unsequenced modification and access to 'i' [-Wunsequenced]
printf("%d, %d\n", i, i++);

For some reason, though, this warning wasn't issued on the real code. Either it's disabled as a not very practical one or that case is too complicated for the compiler to cope with.

Snippet 5, a strange duplicate check

template <class ELFT>
void GNUStyle<ELFT>::printVersionSymbolSection(const ELFFile<ELFT> *Obj,
                                               const Elf_Shdr *Sec) {

  ....
  Expected<StringRef> NameOrErr =
      this->dumper()->getSymbolVersionByIndex(Ndx, IsDefault);
  if (!NameOrErr) {
    if (!NameOrErr) {
      unsigned SecNdx = Sec - &cantFail(Obj->sections()).front();
      this->reportUniqueWarning(createError(
          "unable to get a version for entry " + Twine(I) +
          " of SHT_GNU_versym section with index " + Twine(SecNdx) + ": " +
          toString(NameOrErr.takeError())));
    }
    Versions.emplace_back("<corrupt>");
    continue;
  }
  ....
}

PVS-Studio diagnostic message: V571 Recurring check. The 'if (!NameOrErr)' condition was already verified in line 4666. ELFDumper.cpp 4667

The second check is a clone of the first and is, therefore, redundant. Maybe it could be safely removed. But what's more likely is that it contains a typo and was meant to check some other variable.

Snippet 6, potential null pointer dereference

void RewriteObjCFragileABI::RewriteObjCClassMetaData(
  ObjCImplementationDecl *IDecl, std::string &Result)
{
  ObjCInterfaceDecl *CDecl = IDecl->getClassInterface();

  if (CDecl->isImplicitInterfaceDecl()) {
    RewriteObjCInternalStruct(CDecl, Result);
  }

  unsigned NumIvars = !IDecl->ivar_empty()
  ? IDecl->ivar_size()
  : (CDecl ? CDecl->ivar_size() : 0);
  ....
}

PVS-Studio diagnostic message: V595 The 'CDecl' pointer was utilized before it was verified against nullptr. Check lines: 5275, 5284. RewriteObjC.cpp 5275

When performing the first check, the developer never hesitates to dereference the CDecl pointer:

if (CDecl->isImplicitInterfaceDecl())

But if you look at the code a few lines further, it becomes clear that the pointer can be null:

(CDecl ? CDecl->ivar_size() : 0)

The first check was probably meant to look like this:

if (CDecl && CDecl->isImplicitInterfaceDecl())

Snippet 7, potential null pointer dereference

bool
Sema::InstantiateClass(....)
{
  ....
  NamedDecl *ND = dyn_cast<NamedDecl>(I->NewDecl);
  CXXRecordDecl *ThisContext =
      dyn_cast_or_null<CXXRecordDecl>(ND->getDeclContext());
  CXXThisScopeRAII ThisScope(*this, ThisContext, Qualifiers(),
                              ND && ND->isCXXInstanceMember());
  ....
}

PVS-Studio diagnostic message: V595 The 'ND' pointer was utilized before it was verified against nullptr. Check lines: 2803, 2805. SemaTemplateInstantiate.cpp 2803

This error is similar to the previous one. It's dangerous to dereference a pointer without a prior check when its value is acquired using a dynamic type cast. More than that, subsequent code confirms that such a check is needed.

Snippet 8, a function keeping running despite an error state

bool VerifyObject(llvm::yaml::Node &N,
                  std::map<std::string, std::string> Expected) {
  ....
  auto *V = llvm::dyn_cast_or_null<llvm::yaml::ScalarNode>(Prop.getValue());
  if (!V) {
    ADD_FAILURE() << KS << " is not a string";
    Match = false;
  }
  std::string VS = V->getValue(Tmp).str();
  ....
}

PVS-Studio diagnostic message: V1004 The 'V' pointer was used unsafely after it was verified against nullptr. Check lines: 61, 65. TraceTests.cpp 65

The V pointer may be a null pointer. This is obviously an error state, which is even reported with an error message. But the function will just keep running as if nothing happened and will end up dereferencing that very null pointer. The programmer probably wanted the function to stop at this point, in which case it should be fixed as follows:

auto *V = llvm::dyn_cast_or_null<llvm::yaml::ScalarNode>(Prop.getValue());
if (!V) {
  ADD_FAILURE() << KS << " is not a string";
  Match = false;
  return false;
}
std::string VS = V->getValue(Tmp).str();

Snippet 9, a typo

const char *tools::SplitDebugName(const ArgList &Args, const InputInfo &Input,
                                  const InputInfo &Output) {
  if (Arg *A = Args.getLastArg(options::OPT_gsplit_dwarf_EQ))
    if (StringRef(A->getValue()) == "single")
      return Args.MakeArgString(Output.getFilename());

  Arg *FinalOutput = Args.getLastArg(options::OPT_o);
  if (FinalOutput && Args.hasArg(options::OPT_c)) {
    SmallString<128> T(FinalOutput->getValue());
    llvm::sys::path::replace_extension(T, "dwo");
    return Args.MakeArgString(T);
  } else {
    // Use the compilation dir.
    SmallString<128> T(
        Args.getLastArgValue(options::OPT_fdebug_compilation_dir));
    SmallString<128> F(llvm::sys::path::stem(Input.getBaseInput()));
    llvm::sys::path::replace_extension(F, "dwo");
    T += F;
    return Args.MakeArgString(F);       // <=
  }
}

PVS-Studio diagnostic message: V1001 The 'T' variable is assigned but is not used by the end of the function. CommonArgs.cpp 873

Look at the last lines of the function. The local variable T changes but isn't used in any way. This must be a typo and the function should probably end as follows:

T += F;
return Args.MakeArgString(T);

Snippet 10, zero as the divisor

typedef int32_t si_int;
typedef uint32_t su_int;

typedef union {
  du_int all;
  struct {
#if _YUGA_LITTLE_ENDIAN
    su_int low;
    su_int high;
#else
    su_int high;
    su_int low;
#endif // _YUGA_LITTLE_ENDIAN
  } s;
} udwords;

COMPILER_RT_ABI du_int __udivmoddi4(du_int a, du_int b, du_int *rem) {
  ....
  if (d.s.low == 0) {
    if (d.s.high == 0) {
      // K X
      // ---
      // 0 0
      if (rem)
        *rem = n.s.high % d.s.low;
      return n.s.high / d.s.low;
    }
  ....
}

PVS-Studio diagnostic messages:

V609 Mod by zero. Denominator 'd.s.low' == 0. udivmoddi4.c 61
V609 Divide by zero. Denominator 'd.s.low' == 0. udivmoddi4.c 62

I don't know if this is a bug or some tricky contraption, but the code does look strange. It has two regular integer variables, one of which is divided by the other. But the interesting part is that the division operation takes place only if both variables are zeroes. What task is it supposed to accomplish?

Snippet 11, copy-paste

bool MallocChecker::mayFreeAnyEscapedMemoryOrIsModeledExplicitly(....)
{
  ....
  StringRef FName = II->getName();
  ....
  if (FName == "postEvent" &&
      FD->getQualifiedNameAsString() == "QCoreApplication::postEvent") {
    return true;
  }

  if (FName == "postEvent" &&
      FD->getQualifiedNameAsString() == "QCoreApplication::postEvent") {
    return true;
  }
  ....
}

PVS-Studio diagnostic message: V581 The conditional expressions of the 'if' statements situated alongside each other are identical. Check lines: 3108, 3113. MallocChecker.cpp 3113

A code fragment was cloned but never modified afterwards. This clone should either be removed or modified to perform some useful check.

Conclusion

Remember that you can use this free-license option to check open-source projects. We provide other ways to use PVS-Studio for free as well, some of them even allowing analysis of proprietary code. See the full list of options here: "Ways to get a free PVS-Studio license". Thank you for reading!