Andrey Karpov

Nov 20 2024

Tags:

#Cpp #Compiler

19 errors in LLVM 19

Nov 20 2024

Author: Andrey Karpov

Previous LLVM checks
Found errors
Conclusions
P.S. Webinars on parsing C++ code

The PVS-Studio static analyzer can find errors even in such a high-quality and well-tested project as LLVM. To make sure it's not empty words, we occasionally check the project and post notes about the checks just like this one.

Previous LLVM checks

LLVM is a software infrastructure project for creating compilers and their associated utilities. It's a set of high-level language compilers, optimization system, interpretation and compilation into machine code. It is written in a modern version of C++ language. The project is used by a huge number of users and is well tested.

Once in a while we check this project to demonstrate what PVS-Studio is capable of. Previous posts:

These articles demonstrate well the usefulness of the static code analysis methodology. However, this has nothing to do with improving the quality and reliability of a software project. Static analyzers should be used regularly during the development process to help detect bugs early on.

If I understand correctly, LLVM is already regularly checked with Coverity Scan Static Analysis and Clang Static Analyzer. PVS-Studio would look great next to the above tools :) That would be a horse of a different colour!

Found errors

When writing the article, I wanted to share some juicy errors with you. At the same time, I didn't want to snow you under with the number of warnings. Anyway, it's hard to study the warnings issued on unfamiliar code, so I decided to stop by writing out 19 errors. 19 seems to be the right number for the 19th version of the project. Of course, we can detect a way more errors.

Bug 1. Misplaced closing parenthesis

First, let's take a look at the prototype of the isStoreUsed function. Note that the last argument is optional.

bool isStoreUsed(const FrameIndexEntry &StoreFIE, ExprIterator Candidates,
                 bool IncludeLocalAccesses = true) const;

There are places where this function is used correctly:

if (SRU.isStoreUsed(*FIEX,
    Prev ? SRU.expr_begin(*Prev) : SRU.expr_begin(BB),
    /*IncludeLocalAccesses=*/false)) {

But at some point, the programmer's hand slipped and the parenthesis happened to be in the wrong place. Well, no one is immune to such mistakes.

void CalleeSavedAnalysis::analyzeSaves() {
  ....
  // If this stack position is accessed in another function, we are
  // probably dealing with a parameter passed in a stack -- do not mess
  // with it
  if (SRU.isStoreUsed(*FIE,
                      Prev ? SRU.expr_begin(*Prev) : SRU.expr_begin(BB)),
      /*IncludeLocalAccesses=*/false) {
    BlacklistedRegs.set(FIE->RegOrImm);
    CalleeSaved.reset(FIE->RegOrImm);
    Prev = &Inst;
    continue;
  }
  ....
}

The PVS-Studio analyzer generates two warnings for this fragment:

V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct. ShrinkWrapping.cpp 80
V639 Consider inspecting the expression for 'isStoreUsed' function call. It is possible that one of the closing ')' parentheses was positioned incorrectly. ShrinkWrapping.cpp 80

Delightful error. The code compiles because of the last optional argument of the function. It is actually equivalent to this:

SRU.isStoreUsed(*FIE,
                Prev ? SRU.expr_begin(*Prev) : SRU.expr_begin(BB));
if (/*IncludeLocalAccesses=*/false) {

It is not so easy to spot such an error on a code review. The PVS-Studio analyzer can be a good helper in detecting such typos.

Bug 2. Confusion in lowerBound/upperBound names

FailureOr<ConstantOrScalableBound>
ScalableValueBoundsConstraintSet::computeScalableBound(....)
{
  ....
  AffineMap bound = [&] {
    if (boundType == BoundType::EQ && !invalidBound(lowerBound) &&
        lowerBound[0] == lowerBound[0]) {                               // <=
      return lowerBound[0];
    } else if (boundType == BoundType::LB && !invalidBound(lowerBound)) {
      return lowerBound[0];
    } else if (boundType == BoundType::UB && !invalidBound(upperBound)) {
      return upperBound[0];
    }
    return AffineMap{};
  }();
  ....
}

PVS-Studio warning:

V501 There are identical sub-expressions to the left and to the right of the '==' operator: lowerBound[0] == lowerBound[0] ScalableValueBoundsConstraintSet.cpp 110

A simple typo when working with similar array names. I'm not sure how to describe it more, so we'll move on to the next one.

Bug 3. 0, 1, 2

For those who haven't read it, check out the note "Zero, one, two, Freddy's coming for you".

Value buildTernaryFn(TernaryFn ternaryFn, Value arg0, Value arg1,
                     Value arg2) {
  bool headBool =
    isInteger(arg0) && arg0.getType().getIntOrFloatBitWidth() == 1;
  bool tailFloatingPoint =
    isFloatingPoint(arg0) && isFloatingPoint(arg1) && isFloatingPoint(arg2);
  bool tailInteger = isInteger(arg0) && isInteger(arg1) && isInteger(arg1);
  OpBuilder::InsertionGuard g(builder);
  builder.setInsertionPointToEnd(&block);
  ....
}

PVS-Studio warning:

V501 There are identical sub-expressions 'isInteger(arg1)' to the left and to the right of the '&&' operator. LinalgOps.cpp 502

The last call to the function reuses arg1:

bool tailInteger = isInteger(arg0) && isInteger(arg1) && isInteger(arg1);

The code is hard to read, which is the reason for the typo. Perhaps one can avoid the error by writing the code in a more "sparse" and aligned way:

bool tailFloatingPoint =    isFloatingPoint(arg0)
                         && isFloatingPoint(arg1)
                         && isFloatingPoint(arg2);
bool tailInteger =    isInteger(arg0)
                   && isInteger(arg1)
                   && isInteger(arg1);

In case of tabular formatting, the error will be easier to notice during code review.

Bug 4-6. More similar typos

StringRef ARMAsmParser::splitMnemonic(StringRef Mnemonic, ....) {
  ....
  if (isMnemonicVPTPredicable(Mnemonic, ExtraToken) && Mnemonic != "vmovlt" &&
      Mnemonic != "vshllt" && Mnemonic != "vrshrnt" && Mnemonic != "vshrnt" &&
      Mnemonic != "vqrshrunt" && Mnemonic != "vqshrunt" &&
      Mnemonic != "vqrshrnt" && Mnemonic != "vqshrnt" && Mnemonic != "vmullt" &&
      Mnemonic != "vqmovnt" /*** <= ***/ && Mnemonic != "vqmovunt" &&
      Mnemonic != "vqmovnt" /*** <= ***/ && Mnemonic != "vmovnt" &&
      Mnemonic != "vqdmullt" && Mnemonic != "vpnot" &&
      Mnemonic != "vcvtt" && Mnemonic != "vcvt") {
  ....
}

PVS-Studio warning:

V501 There are identical sub-expressions 'Mnemonic != "vqmovnt"' to the left and to the right of the '&&' operator. ARMAsmParser.cpp 6633

template <typename T>
LIBC_NAMESPACE::cpp::enable_if_t<LIBC_NAMESPACE::cpp::is_floating_point_v<T>,
                                 bool>
ValuesEqual(T x1, T x2) {
  LIBC_NAMESPACE::fputil::FPBits<T> bits1(x1);
  LIBC_NAMESPACE::fputil::FPBits<T> bits2(x2);
  // If either is NaN, we want both to be NaN.
  if (bits1.is_nan() || bits2.is_nan())
    return bits2.is_nan() && bits2.is_nan();         // <=

  // For all other values, we want the values to be bitwise equal.
  return bits1.uintval() == bits2.uintval();
}

PVS-Studio warning:

V501 There are identical sub-expressions to the left and to the right of the '&&' operator: bits2.is_nan() && bits2.is_nan() Compare.h 23

bool OpenACCClauseWithVarList::classof(const OpenACCClause *C) {
  return OpenACCPrivateClause::classof(C) ||
         OpenACCFirstPrivateClause::classof(C) ||
         OpenACCDevicePtrClause::classof(C) ||        // <=
         OpenACCDevicePtrClause::classof(C) ||        // <=
         OpenACCAttachClause::classof(C) || OpenACCNoCreateClause::classof(C) ||
         OpenACCPresentClause::classof(C) || OpenACCCopyClause::classof(C) ||
         OpenACCCopyInClause::classof(C) || OpenACCCopyOutClause::classof(C) ||
         OpenACCReductionClause::classof(C) || OpenACCCreateClause::classof(C);
}

PVS-Studio warning:

V501 There are identical sub-expressions 'OpenACCDevicePtrClause::classof(C)' to the left and to the right of the '||' operator. OpenACCClause.cpp 31

It seems simple when these bugs have already been found. But at the same time, no one noticed them before the analyzer. Love static code analysis!

Bug 7. Pointless pointer check

isl_size isl_local_space_var_offset(....)
{
  isl_space *space;

  space = isl_local_space_peek_space(ls);
  if (space < 0)
    return isl_size_error;
  ....
}

PVS-Studio warning:

V503 This is a nonsensical comparison: pointer < 0. isl_local_space.c 257

Checking the pointer to see if it is negative makes no sense. The behavior of such an operation is unspecified. There should be either a comparison for nullptr, or another check at all.

Bug 8. Dangerous shift of literal 1

static Expected<uint64_t>
GeneratePerfEventConfigValue(bool enable_tsc,
                             std::optional<uint64_t> psb_period) {
  uint64_t config = 0;
  ....
  if (Expected<uint32_t> offset = ReadIntelPTConfigFile(
          kTSCBitOffsetFile, IntelPTConfigFileType::BitOffset))
    config |= 1 << *offset;
  ....
}

PVS-Studio warning:

V629 Consider inspecting the '1 << * offset' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. IntelPTSingleBufferTrace.cpp 153

The analyzer generated a number of V629 warnings for certain expressions. In them, a value of int type is shifted and the result is used together with 64-bit values. The code fragment above is not necessarily incorrect but is very suspicious. As well as other fragments pointed by the analyzer.

The mask is formed in the 64-bit config variable. But this shifts literal 1, which is of type int. Consequently, it will not be possible to correctly set the high bits in the config if needed. Perhaps this code works because high bits do not need to be set yet. Anyway, it is more correct to write it this way:

config |= (uint64_t)(1) << *offset;
// or this way
config |= 1ULL << *offset;

Bug 9. Incorrect use of unique_ptr

std::unique_ptr<OptionDefinition> m_options_definition_up;

Status SetOptionsFromArray(StructuredData::Dictionary &options) {
  Status error;
  m_num_options = options.GetSize();
  m_options_definition_up.reset(new OptionDefinition[m_num_options]);
  ....
}

PVS-Studio warning:

V554 Incorrect use of unique_ptr. The memory allocated with 'new []' will be cleaned using 'delete'. CommandObjectCommands.cpp 1384

The m_options_definition_up smart pointer is meant to store a single element. But a pointer to an array is placed there. Here is the result - undefined behavior at the time of object destruction. Read more here "Why do arrays have to be deleted via delete[] in C++".

To fix the bug, one just needs to add [] when declaring a smart pointer:

std::unique_ptr<OptionDefinition []> m_options_definition_up;

Bug 10. Incorrect index check

std::vector<lldb::addr_t> m_indexed_isa_cache;

bool AppleObjCRuntimeV2::NonPointerISACache::EvaluateNonPointerISA(
    ObjCISA isa, ObjCISA &ret_isa) {
  ....
  uintptr_t index = (isa & m_objc_debug_indexed_isa_index_mask) >>
                     m_objc_debug_indexed_isa_index_shift;
  ....
  // If the index is still out of range then this isn't a pointer.
  if (index > m_indexed_isa_cache.size())
    return false;

  LLDB_LOGF(log, "AOCRT::NPI Evaluate(ret_isa = 0x%" PRIx64 ")",
            (uint64_t)m_indexed_isa_cache[index]);

  ret_isa = m_indexed_isa_cache[index];
  ....
}

PVS-Studio warnings:

V557 Array overrun is possible. The 'index' index is pointing beyond array bound. AppleObjCRuntimeV2.cpp 3284
V557 Array overrun is possible. The 'index' index is pointing beyond array bound. AppleObjCRuntimeV2.cpp 3287

Classic antipattern: using an incorrect operator to check an index for array overrun. The function execution is aborted if the index is greater than the array size:

if (index > m_indexed_isa_cache.size())
  return false;

This is incorrect because the array elements are numbered from zero. If it turns out that index is equal to m_indexed_isa_cache.size(), an array overrun will occur (Off-by-one Error).

Correct code with >=:

if (index >= m_indexed_isa_cache.size())
  return false;

Here is a tricky thing. Since I encounter this antipattern often, I've got used to it. Once I almost overlooked an outstanding bug in a DPDK project because of it. Rather, I lost freshness of the vision and referred the bug to that common antipattern. In fact, everything turned out to be much more mind-bending than usually—"Most striking error I found with PVS-Studio in 2024".

Bug 11. The variable is assigned to itself

template <typename Key, typename ElementLattice> class MapLattice {
  using Container = llvm::DenseMap<Key, ElementLattice>;
  Container C;
public:
  ....
  explicit MapLattice(Container C) { C = std::move(C); }
  ....
}

PVS-Studio warning:

V570 The 'C' variable is assigned to itself. MapLattice.h 52

A class data member and a function parameter have the same names. Such naming has gone awry. The constructor doesn't write the required data to the container. Options to fix the code:

explicit MapLattice(Container C) { this->C = std::move(C); }

// Or use another variable name
explicit MapLattice(Container src) { C = std::move(src); }

// It is even more elegant to use the initialization list of the constructor
explicit MapLattice(Container C) : C { std::move(C) } {};

P.S. Reading this note's draft, my colleague said that we have already described that error in the previous article. But here it is. I wish edits after our checks could be made quicker. But in fact it's usually a looong wait :)

Bug 12-13. Potential memory leaks

std::unique_ptr<Module> parseMIR(....) {
  ....
  MachineModuleInfoWrapperPass *MMIWP =
    new MachineModuleInfoWrapperPass(&TM);
  if (MIR->parseMachineFunctions(*M, MMIWP->getMMI()))
    return nullptr;
  PM.add(MMIWP);

  return M;
}

PVS-Studio warning:

V773 The function was exited without releasing the 'MMIWP' pointer. A memory leak is possible. LiveIntervalTest.cpp 74

Suppose, the parseMachineFunctions function returns an error status. Then no delete operator will be called for the MMIWP pointer when exiting the parseMIR function.

Similar case:

Value *LLParser::PerFunctionState::getVal(const std::string &Name, Type *Ty,
                                          LocTy Loc) {
  ....
  Value *FwdVal;
  if (Ty->isLabelTy()) {
    FwdVal = BasicBlock::Create(F.getContext(), Name, &F);
  } else {
    FwdVal = new Argument(Ty, Name);
  }
  if (FwdVal->getName() != Name) {
    P.error(Loc, "name is too long which can result in name collisions, "
                 "consider making the name shorter or "
                 "increasing -non-global-value-max-name-size");
    return nullptr;
  }

  ForwardRefVals[Name] = std::make_pair(FwdVal, Loc);
  return FwdVal;
}

PVS-Studio warning:

V773 The function was exited without releasing the 'FwdVal' pointer. A memory leak is possible. LLParser.cpp 3625

In general, it's strange to see manual memory handling in LLVM. Errors are attracted to such code like bees to honey...

Similar errors... open icon

V773 The function was exited without releasing the 'symbol_vendor' pointer. A memory leak is possible. SymbolVendorWasm.cpp 112
V773 The function was exited without releasing the 'symbol_vendor' pointer. A memory leak is possible. SymbolVendorELF.cpp 114
V773 The function was exited without releasing the 'dynamic_checkers' pointer. A memory leak is possible. ClangExpressionParser.cpp 1432
V773 The function was exited without releasing the 'ctx' pointer. A memory leak is possible. Driver.cpp 189
There were other warnings, but I didn't look into them as they are of the same-type (boring). The above example is enough for this article.

Bug 14. || and && seem to be messed up

Error LinuxKernelRewriter::readORCTables() {
  ....
  MCInst *Inst = BF->getInstructionAtOffset(Offset);
  if (!Inst) {
    Inst = BF->getInstructionContainingOffset(Offset);
    if (Inst || BC.MIB->hasAnnotation(*Inst, "AltInst"))    // <=
      continue;

    return createStringError(
      errc::executable_format_error,
      "no instruction at address 0x%" PRIx64 " in .orc_unwind_ip", IP);
  }
  ....
}

PVS-Studio warning:

V522 Dereferencing of the null pointer 'Inst' might take place. LinuxKernelRewriter.cpp 583

If it comes to evaluating the right operand of the logic OR, the BF pointer is nullptr and it is safely dereferenced. The check makes sense if we replace || with &&:

if (Inst && BC.MIB->hasAnnotation(*Inst, "AltInst"))
  continue;

Bug 15-16. Code bloat

There are quite heavy objects in header files, such as these:

static constexpr llvm::StringLiteral kNoRegister("%noreg");

PVS-Studio warning:

V1043 A global object variable 'kNoRegister' is declared in the header. Multiple copies of it will be created in all translation units that include this header file. LlvmState.h 28

static const std::vector<int64_t> InstructionsMappingShape{
    1, NumberOfInterferences, ModelMaxSupportedInstructionCount};

PVS-Studio warning:

V1043 A global object variable 'InstructionsMappingShape' is declared in the header. Multiple copies of it will be created in all translation units that include this header file. MLRegAllocEvictAdvisor.h 82

Due to static, linking will be erroneous, as each translation unit will have its own unique object. The problem is the bloat of the compiled code. These variables will be independently present IN EVERY compiled .cpp file where these header files are included.

More details about the case are given in the chapter "Terrible tip N34. Need to add a constant instance of the class everywhere? It's more convenient to declare it in the header file" from the book about antipatterns.

Of course, it's not a real bug, as the code will work. However, it is still better to replace static with inline.

Just a couple more cases. open icon

V1043 A global object variable 'MBBFrequencyShape' is declared in the header. Multiple copies of it will be created in all translation units that include this header file. MLRegAllocEvictAdvisor.h 91
V1043 A global object variable 'InstructionsShape' is declared in the header. Multiple copies of it will be created in all translation units that include this header file. MLRegAllocEvictAdvisor.h 80

Bug 17-18. Potential null pointer dereference

TaskData *Parent{nullptr};

TaskData *Init(TaskData *parent, int taskType) {
  TaskType = taskType;
  Parent = parent;
  Team = Parent->Team;                    // <= (1)
  BarrierIndex = Parent->BarrierIndex;    // <= (2)
  if (Parent != nullptr) {                // <= (3)
    Parent->RefCount++;
  ....
}

PVS-Studio warning:

V595 The 'Parent' pointer was utilized before it was verified against nullptr. Check lines: 543, 545. ompt-tsan.cpp 543

The Parent pointer is dereferenced in (1) and (2), although the check (3) tells us and the analyzer that it can be null.

Now the case is a bit more complicated when two functions interact.

ScheduleDAGSDNodes *createDefaultScheduler(SelectionDAGISel *IS,
                                           CodeGenOptLevel OptLevel) {
  const TargetLowering *TLI = IS->TLI;
  const TargetSubtargetInfo &ST = IS->MF->getSubtarget();
  ....
}

Note that in the function shown, the IS pointer is dereferenced without a preliminary check. Now let's see how this function is called.

struct ForceCodegenLinking {
  ForceCodegenLinking() {
    ....
    (void)llvm::createDefaultScheduler(nullptr,
                                       llvm::CodeGenOptLevel::Default);
    ....
  }
};

PVS-Studio warning:

V522 Dereferencing of the null pointer might take place. The null pointer is passed into 'createDefaultScheduler' function. Inspect the first argument. Check lines: 'SelectionDAGISel.cpp:285', 'LinkAllCodegenComponents.h:48'. SelectionDAGISel.cpp 285

P.S. It turns out that we also have already described this error (fragment N3).

Other V595 warnings. open icon

V595 The 'archer_flags' pointer was utilized before it was verified against nullptr. Check lines: 1225, 1233. ompt-tsan.cpp 1225
V595 The 'N' pointer was utilized before it was verified against nullptr. Check lines: 814, 824. ValueMapper.cpp 814
V595 The 'BPI' pointer was utilized before it was verified against nullptr. Check lines: 2284, 2296. JumpThreading.cpp 2284
V595 The 'COMDATSymbol' pointer was utilized before it was verified against nullptr. Check lines: 700, 703. MCContext.cpp 700
V522 Dereferencing of the null pointer might take place. The null pointer is passed into 'createVLIWDAGScheduler' function. Inspect the first argument. Check lines: 'ScheduleDAGVLIW.cpp:270', 'LinkAllCodegenComponents.h:50'. ScheduleDAGVLIW.cpp 270
V522 Dereferencing of the null pointer might take place. The null pointer is passed into 'createHybridListDAGScheduler' function. Inspect the first argument. Check lines: 'ScheduleDAGRRList.cpp:3175', 'LinkAllCodegenComponents.h:44'. ScheduleDAGRRList.cpp 3175
V522 Dereferencing of the null pointer might take place. The null pointer is passed into 'createSourceListDAGScheduler' function. Inspect the first argument. Check lines: 'ScheduleDAGRRList.cpp:3161', 'LinkAllCodegenComponents.h:42'. ScheduleDAGRRList.cpp 3161
V522 Dereferencing of the null pointer might take place. The null pointer is passed into 'createBURRListDAGScheduler' function. Inspect the first argument. Check lines: 'ScheduleDAGRRList.cpp:3147', 'LinkAllCodegenComponents.h:40'. ScheduleDAGRRList.cpp 3147
And so on...

Bug 19. Indirect error detection

Let's finish with a curious case when the analyzer made a mistake in diagnosing what was going on but still pointed to a real error.

DiagnosedSilenceableFailure
transform::ConvertToLoopsOp::apply(transform::TransformRewriter &rewriter,
                                   transform::TransformResults &results,
                                   transform::TransformState &state) {
  SmallVector<Operation *> loops;
  for (Operation *target : state.getPayloadOps(getTarget())) {
    auto tilingOp = dyn_cast<TilingInterface>(*target);
    if (!target) {
      DiagnosedSilenceableFailure diag =
          emitSilenceableError()
          << "expected the payload to implement TilingInterface";
      diag.attachNote(target->getLoc()) << "payload op";
      return diag;
    }
  ....
}

The analyzer issued two warnings at once, but they are irrelevant:

V522 Dereferencing of the null pointer 'target' might take place. LinalgTransformOps.cpp 2219
V595 The 'target' pointer was utilized before it was verified against nullptr. Check lines: 2214, 2215. LinalgTransformOps.cpp 2214

Let me distill the essence for you:

auto tilingOp = dyn_cast<TilingInterface>(*target);
if (!target)
  Failure(target->getLoc());

The analyzer is worried that the target null pointer is dereferenced. In fact, this pointer will never be null, so the condition will never be met.

The error is that the wrong pointer is checked. Correct code version:

auto tilingOp = dyn_cast<TilingInterface>(*target);
if (!tilingOp)
  Failure(target->getLoc());

We get a non-null target pointer. Then comes an attempt to dynamically cast it to a different type. If this fails, the tilingOp pointer will be nullptr. In this case, the non-null target pointer will be used when generating some kind of a failure message.

Conclusions

What did I want to say by writing this article?

No one is immune to mistakes, not even highly skilled developers like contributors to the LLVM project.
PVS-Studio is able to find errors even in compilers and can significantly improve code quality and reliability.
Static analyzers should be used regularly, not occasionally. An analogy is looking through compiler warnings.

Here you can download and try a trial full-featured version of PVS-Studio. For open-source projects, there is a free licensing option.

P.S. Webinars on parsing C++ code

If you are interested in C and C++ compiler development topics, you are invited to watch recordings of two webinars:

Yuri Minaev. Parsing C++. In this webinar, we will discuss grammars in C++ and how they work. We will talk about different kinds of parsers and why C++ is difficult to parse. We will also share some tricks to avoid extreme slowdown.
Yuri Minaev. C++ semantics. In this talk on the C++ semantics, we will take a look at symbols and name resolution. We will discuss different kinds of lookups, scope importing, overload resolution, as well as templates and their specifics.

If you like them, don't forget to check out our website once in a while - we look forward to dive into other noteworthy topics.

#Cpp #Compiler