Nikita Terentev

Nov 28 2025

Tags:

#Cpp #Compiler

Open wide: Inspecting LLVM 21 with static analysis

Nov 28 2025

Author: Nikita Terentev

Intro
Analysis results
Outro

It has been over a year since the last check of the LLVM project with PVS-Studio, and in that time, two releases have come out. So, it's a good time to get back and analyze the fresh LLVM 21.

Intro

Although pretty much all C++ developers have at least heard of LLVM, we'll make a brief disclaimer. LLVM is a massive open-source umbrella project that contains a huge set of various tools for writing optimizing compilers of any complexity.

We checked only the 253d18d032bb commit from the main branch without Clang, libc, lldb, and other components, and focused on High- and Medium-severity errors detected by the general-purpose diagnostic rules.

The commit links are provided to help verify and fix detected errors. They aren't intended to compromise or undermine the work of the contributors.

Well, let's get started!

Analysis results

Fragment N1

The PVS-Studio warning: V501 There are identical sub-expressions '!Subtarget.hasZeroCycleRegMoveFPR64()' to the left and to the right of the '&&' operator. AArch64InstrInfo.cpp 5430

void AArch64InstrInfo::copyPhysReg(....) const {
  ....
  if (Subtarget.hasZeroCycleRegMoveFPR128() &&
      !Subtarget.hasZeroCycleRegMoveFPR64() &&
      !Subtarget.hasZeroCycleRegMoveFPR64() && Subtarget.isNeonAvailable())
  ....
}

Let's start our investigation with a git blame. We can see that along with this suspicious fragment in the commit, the devs added four identical checks:

if (Subtarget.hasZeroCycleRegMoveFPR128() &&
    !Subtarget.hasZeroCycleRegMoveFPR64() &&
    !Subtarget.hasZeroCycleRegMoveFPR32() && ....

Upon closer inspection, we can notice the following in the blame:

if (Subtarget.hasZeroCycleRegMoveFPR128() &&
    !Subtarget.hasZeroCycleRegMoveFPR64() &&
    !Subtarget.hasZeroCycleRegMoveFPR32() && Subtarget.isNeonAvailable() {
  /*
   * new code
   */
  if (Subtarget.hasZeroCycleRegMoveFPR64() && 
     !Subtarget.hasZeroCycleRegMoveFPR32())
  /* 
   * existing code
   */
}

This looks like a classic copy-paste error, but, before jumping to conclusions, let's check whether hasZeroCycleRegMoveFPR64 has side effects that could cause it to be called twice. After all, when dealing with a project as large and intricate as LLVM, it's important to be extra cautious. The implementation lies in the AArch64GenSubtargetInfo.inc file generated by the internal TableGen DSL:

GET_SUBTARGETINFO_MACRO(HasZeroCycleRegMoveFPR64,
                        false,
                        hasZeroCycleRegMoveFPR64)

The macro is defined in the nearby AArch64Subtarget.h file:

class AArch64Subtarget final : public AArch64GenSubtargetInfo {
// Bool members corresponding to the SubtargetFeatures defined in tablegen
#define GET_SUBTARGETINFO_MACRO(ATTRIBUTE, DEFAULT, GETTER)\
  bool ATTRIBUTE = DEFAULT;
#include "AArch64GenSubtargetInfo.inc"

// Getters for SubtargetFeatures defined in tablegen
#define GET_SUBTARGETINFO_MACRO(ATTRIBUTE, DEFAULT, GETTER)\
  bool GETTER() const { return ATTRIBUTE; }
#include "AArch64GenSubtargetInfo.inc"
....
}

As we can see, the macro generates DSL getters and setters, which means that we just need to replace the second hasZeroCycleRegMoveFPR64 with hasZeroCycleRegMoveFPR32.

Fragment N2

The PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: PeekArg.getValNo() == PeekArg.getValNo() PPCISelLowering.cpp 7865

SDValue PPCTargetLowering::LowerCall_AIX(....) const {
  ....
  for (unsigned I = 0, E = ArgLocs.size(); I != E;) {
    ....
    CCValAssign &GPR1 = VA;
    ....
    if (I != E) {
      // If only 1 GPR was available, there will only be one custom GPR and
      // the argument will also pass in memory.
      CCValAssign &PeekArg = ArgLocs[I];
      if (PeekArg.isRegLoc() && PeekArg.getValNo() == PeekArg.getValNo()) // <=
      {
        assert(PeekArg.needsCustom() && "A second custom GPR is expected.");
        CCValAssign &GPR2 = ArgLocs[I++]; 
        RegsToPass.push_back(std::make_pair(GPR2.getLocReg(),
          DAG.getZExtOrTrunc(ArgAsInt, dl, MVT::i32)));
      }
    }
  ....
}

We're assuming for now that it's yet another victim of copy-paste. Let's check whether getValNo have any side effects:

class CCValAssign{
  ....
  unsigned ValNo;
  unsigned getValNo() const { return ValNo; } 
}

Nothing strange here, though. Take a look at the last commit:

CCValAssign &GPR1 = VA;
....
assert(I != E && "A second custom GPR is expected!");
CCValAssign &GPR2 = ArgLocs[I++];
assert(GPR2.isRegLoc() && GPR2.getValNo() == GPR1.getValNo() &&
       GPR2.needsCustom() && "A second custom GPR is expected!");
RegsToPass.push_back(
  std::make_pair(GPR2.getLocReg(),
                 DAG.getZExtOrTrunc(ArgAsInt, dl, MVT::i32)));

The idea is clear: an exceptional case that was previously guarded by an assert was redesigned into a regular branch. The commit text also points to that.

This patch implements the caller side of placing function call arguments

in stack memory. This removes the current limitation where LLVM on AIX

will report fatal error when arguments can't be contained in registers.

It can be noted that, in addition to the error found, there's another strange assignment:

CCValAssign &PeekArg = ArgLocs[I];
....
CCValAssign &GPR2 = ArgLocs[I++]; // here PeekArg == GPR2

Maybe the developer would like to write something like that:

if (I != E) {
  CCValAssign &GPR2 = ArgLocs[I];
  if (GPR2.isRegLoc() && PeekArg.getValNo() == GPR1.getValNo()) 
  {
    assert(PeekArg.needsCustom() && "A second custom GPR is expected.");
    I++;
    RegsToPass.push_back(std::make_pair(
             GPR2.getLocReg(), DAG.getZExtOrTrunc(ArgAsInt, dl, MVT::i32)));
  }
}

But for clarity, developers broke out PeekArg from GPR2 to show that, unlike the previous unconditional code, the argument now needs to be "peeked" first. And during the copy-paste process, GPR1 was accidentally dropped from the condition.

The corrected if likely should be:

if (PeekArg.isRegLoc() && PeekArg.getValNo() == GPR1.getValNo())

Interestingly, before migrating to GitHub, LLVM had a separate code review platform, and the commit included a link to this website. There, we can see that manual review can't always salvage the situation:

Fragment N3

The PVS-Studio warning: V557 Array overrun is possible. The value of 'regIdx' index could reach 31. VEAsmParser.cpp 696

static const MCPhysReg MISCRegs[31] = { .... };

static bool MorphToMISCReg(VEOperand &Op) {
  const auto *ConstExpr = dyn_cast<MCConstantExpr>(Op.getImm());
  if (!ConstExpr)
    return false;
  unsigned regIdx = ConstExpr->getValue();
  if (regIdx > 31 || MISCRegs[regIdx] == VE::NoRegister)
    return false;
  Op.Kind = k_Register;
  Op.Reg.RegNum = MISCRegs[regIdx];
  return true;
}

As we know, undefined behavior is out of compilers' scope. However, it doesn't mean that their codebase is immune to UB. Here we can see a classic UB scenario, triggered only if regIdx == 31. Probably, it seems impossible or very unlikely, so the error has been unnoticed. But who knows what tomorrow will bring?

The only question left is where the error lies: is there an incorrect number in the condition, or has an element been forgotten when adding to the array?

The function is located in the Vector Engine Assembly Language parser's code and, apparently, checks whether the SMIR instruction operand is valid. Take a look at the documentation:

SMIR — Save Miscellaneous Register
smir %sx, I
smir %sx, %usrcc
smir %sx, %psw
smir %sx, %sar
smir %sx, %pmmr
smir %sx, %pmcrMM
smir %sx, %pmcNN

I = 0 – 2, 7 – 11, 16 - 30
MM = 0 – 3
NN = 0 - 14

We're interested in the first option, smir %sx, I, where I = 0 – 2, 7 – 11, 16 – 30 are valid. Let's compare it with the array:

static const MCPhysReg MISCRegs[31] = {
    VE::USRCC,      VE::PSW,        VE::SAR,        VE::NoRegister,
    VE::NoRegister, VE::NoRegister, VE::NoRegister, VE::PMMR,
    VE::PMCR0,      VE::PMCR1,      VE::PMCR2,      VE::PMCR3,
    VE::NoRegister, VE::NoRegister, VE::NoRegister, VE::NoRegister,
    VE::PMC0,       VE::PMC1,       VE::PMC2,       VE::PMC3,
    VE::PMC4,       VE::PMC5,       VE::PMC6,       VE::PMC7,
    VE::PMC8,       VE::PMC9,       VE::PMC10,      VE::PMC11,
    VE::PMC12,      VE::PMC13,      VE::PMC14};

Everything matches, so there can be no ambiguity: regIdx > 31 should be replaced with regIdx > 30, or, even better, regIdx >= std::size(MISCRegs).

Interestingly, a similar function was added in the same commit, where the check is correct:

static const unsigned MiscRegDecoderTable[] = {....}; // <= the same 31 element

static DecodeStatus DecodeMISCRegisterClass(MCInst &Inst, unsigned RegNo,
                                            uint64_t Address,
                                            const void *Decoder) {
  if (RegNo > 30)
    return MCDisassembler::Fail;
  unsigned Reg = MiscRegDecoderTable[RegNo];
  if (Reg == VE::NoRegister)
    return MCDisassembler::Fail;
  Inst.addOperand(MCOperand::createReg(Reg));
  return MCDisassembler::Success;
}

Maybe the explicit number 31 in the MISCRegs array declaration may mislead developers.

Fragment N4

The PVS-Studio warning: V519 The 'FeaturesValue' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 1190, 1192. LVCodeViewReader.cpp 1192

Error LVCodeViewReader::loadTargetInfo(const ObjectFile &Obj) {
  ....
  Expected<SubtargetFeatures> Features = Obj.getFeatures();
  SubtargetFeatures FeaturesValue;
  if (!Features) {
    consumeError(Features.takeError());
    FeaturesValue = SubtargetFeatures();
  }
  FeaturesValue = *Features;
  ....
  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString(), CPU);
}

It looks like a dead store, but maybe the first assignment is intentionally performed for side effects? Let's examine it:

class SubtargetFeatures {
  std::vector<std::string> Features;
  ....
  SubtargetFeatures::SubtargetFeatures(StringRef Initial = "") {
    // Break up string into separate features
    Split(this->Features, Initial);
  }
}

The Split function splits the string (in our case, empty) by commas and stores it in an internal vector.

This could be a minor suboptimality, if not for the behavior of the Expected<T>::takeError function:

template <class T> class [[nodiscard]] Expected {
  ....
  /// Take ownership of the stored error.
  /// After calling this the Expected<T> is in an indeterminate state that can
  /// only be safely destructed. No further calls (beside the destructor) should
  /// be made on the Expected<T> value.
  Error takeError() {
  #if LLVM_ENABLE_ABI_BREAKING_CHECKS
    Unchecked = false;
  #endif
    return HasError ? Error(std::move(*getErrorStorage())) : Error::success();
  }
}

Thanks to the detailed comments in LLVM, which feel like there are just as many as there is code. They can help track the path to a potential UB.

Let's first look at what will be called when the if statement is checked:

/// Return false if there is an error.
explicit operator bool() {
#if LLVM_ENABLE_ABI_BREAKING_CHECKS
  Unchecked = HasError;
#endif
  return !HasError;
}

If Features contains an error, we'll end up in the then branch:

Error LVCodeViewReader::loadTargetInfo(const ObjectFile &Obj) {
  ....
  if (!Features) {
    consumeError(Features.takeError());
    FeaturesValue = SubtargetFeatures();
  }
  FeaturesValue = *Features;
  ....
  return loadGenericTargetInfo(TT.str(), FeaturesValue.getString(), CPU);
}

Next, the ownership of the internal Features data member is passed. The code then proceeds to dereference it, resulting in undefined behavior:

reference operator*() const {
  assertIsChecked();
  return *getStorage();
}

Okay, there's still a last line of defense—assertIsChecked—but unfortunately, it's only used for check after significant changes to the ABI:

void assertIsChecked() const {
/* Define to enable checks that alter the LLVM C++ ABI */
#if LLVM_ENABLE_ABI_BREAKING_CHECKS
  if (LLVM_UNLIKELY(Unchecked))
    fatalUncheckedExpected();
#endif
}

The developers added this file to the commit along with many modified code fragments—45 files and 7,000 lines. However, the high-level, abstract description in the message doesn't make the implementation details any clearer. Very likely, we can say that developers just forgot to add the else branch, and the code should look like this:

if (!Features) {
  consumeError(Features.takeError());
  FeaturesValue = SubtargetFeatures();
}
else
  FeaturesValue = *Features;

Here, we just fixed the error; however, there's also room for further improvements, for example, the first FeaturesValue assignment is still redundant because the constructor is called twice by default. However, it helps find more serious potential vulnerability.

The code could be changed like that:

Expected<SubtargetFeatures> Features = Obj.getFeatures();
SubtargetFeatures FeaturesValue;
if (!Features) {
  consumeError(Features.takeError());
} else {
  FeaturesValue = *Features;
}

We can go beyond and enhance readability, for example, by writing as follows:

SubtargetFeatures FeaturesValue;
if (Expected<SubtargetFeatures> Features = Obj.getFeatures())
  FeaturesValue = std::move(*Features);
else
  consumeError(Features.takeError());

Fragment N5

The PVS-Studio warning: V609 Divide by zero. Denominator range [0..255]. RISCVCustomBehaviour.cpp 268

unsigned RISCVInstrumentManager::getSchedClassID(....) const {
  ....
  // getBaseInfo works with (Opcode, LMUL, 0) if no SEW instrument,
  // or (Opcode, LMUL, SEW) if SEW instrument is active,
  // and depends on LMUL and SEW,
  // or (Opcode, LMUL, 0) if does not depend on SEW.
  uint8_t SEW = SI ? SI->getSEW() : 0;

  if (const auto *VXMO = RISCV::getVXMemOpInfo(Opcode)) {
    // Calculate the expected index EMUL. For indexed operations,
    // the DataEEW and DataEMUL are equal to
    // SEW and LMUL, respectively.
    unsigned IndexEMUL = ((1 << VXMO->Log2IdxEEW) * LMUL) / SEW;
    ....
  }
  else ....
  ....
}

The comment with the SEW variable lets us know that it's equal to 0. It's still a valid scenario, however, it makes the immediate and fearless potential division by zero very suspicious. The collection of undefined behavior cases was expanded with a third copy.

The git blame investigation shows that the if and its then branch, where the division takes place, appeared only after the SEW calculation. Before that, the else branch was the sole, unconditional execution path.

As we can see, in this original path, SEW == 0 is OK:

// Check if it depends on LMUL and SEW
const auto *RVV = RISCVVInversePseudosTable::getBaseInfo(Opcode, LMUL, SEW);
// Check if it depends only on LMUL
if (!RVV)
  RVV = RISCVVInversePseudosTable::getBaseInfo(Opcode, LMUL, 0);

We won't show the getBaseInfo code here, but you can trust (or check!) that there are no dangerous zero-related operations—just a large bunch of various data member comparisons. It remains to check that the conditions are incompatible when SEW == 0 and when we divide by it. It means that the following can't happen:

!SI && RISCV::getVXMemOpInfo(Opcode)

We open getVXMemOpInfo and again end up in the TableGen file (RISCVGenSearchableTables.inc):

const VXMemOpInfo *getVXMemOpInfo(unsigned BaseInstr) {
  struct KeyType {
    unsigned BaseInstr;
  };
  KeyType Key = {BaseInstr};
  struct Comp {
    bool operator()(const VXMemOpInfo &LHS,
                    const KeyType &RHS) const {
      if (LHS.BaseInstr < RHS.BaseInstr)
        return true;
      if (LHS.BaseInstr > RHS.BaseInstr)
        return false;
      return false;
    }
  };
  auto Table = ArrayRef(RISCVBaseVXMemOpTable);
  auto Idx = std::lower_bound(Table.begin(),
                              Table.end(),
                              Key,
                              Comp());
  if (Idx == Table.end() ||
      Key.BaseInstr != Idx->BaseInstr)
    return nullptr;

  return &*Idx;
}

It looks bulky, but the essence is clear: the operation code is checked whether it's located within the RISCVBaseVXMemOpTable table that seems to represent RISC-V indexed vector operations:

constexpr VXMemOpInfo RISCVBaseVXMemOpTable[] = {
  { 0x4, 0x1, 0x0, 0x0, VLOXEI16_V },     // 0
  { 0x5, 0x1, 0x0, 0x0, VLOXEI32_V },     // 1
  ....
  { 0x5, 0x0, 0x1, 0x8, VSUXSEG8EI32_V }, // 98
  { 0x3, 0x0, 0x1, 0x8, VSUXSEG8EI8_V },  // 99
 };

Now let's see what if SI == nullptr:

// Unpack all possible RISC-V instruments from IVec.
RISCVLMULInstrument *LI = nullptr;
RISCVSEWInstrument *SI = nullptr;
for (auto &I : IVec) {
  if (I->getDesc() == RISCVLMULInstrument::DESC_NAME)
    LI = static_cast<RISCVLMULInstrument *>(I);
  else if (I->getDesc() == RISCVSEWInstrument::DESC_NAME)
    SI = static_cast<RISCVSEWInstrument *>(I);
}

So, UB occurs if the vector operation and a set of instruments without RISCVSEWInstrument are sent as arguments at the same time.

Based on the observations and RISC-V documentation, such a scenario seems impossible, but only backend developers can say for sure—and most likely only after careful analysis. Indeed, it'd be better to fix a potentially serious error either by direct checking via if (SEW != 0), or, if this scenario is truly impossible, using assert. The developers did it in another else branch of the same check chain (inside getEEWAndEMUL):

assert(SEW >= 8 && "Unexpected SEW value")

Fragment N6

The PVS-Studio warning: V779 Unreachable code detected. It is possible that an error is present. VEISelDAGToDAG.cpp 321

bool VEDAGToDAGISel::SelectInlineAsmMemoryOperand(
    const SDValue &Op,
    InlineAsm::ConstraintCode ConstraintID,
    std::vector<SDValue> &OutOps) {
  ....
  case InlineAsm::ConstraintCode::m:
    // Try to match ADDRri since reg+imm style is safe
    // for all VE instructions with a memory operand.
    if (selectADDRri(Op, Op0, Op1)) {
      OutOps.push_back(Op0);
      OutOps.push_back(Op1);
      return false;
    }
    // Otherwise, require the address to be in a register and immediate 0.
    OutOps.push_back(Op);
    OutOps.push_back(CurDAG->getTargetConstant(0, SDLoc(Op), MVT::i32));
    return false;
  ....
}

At first glance, it's unclear why the analyzer issues the warning about unreachable code, but once we look at the innards of the selectADDRri function, everything falls into place:

bool VEDAGToDAGISel::selectADDRri(SDValue Addr,
                                  SDValue &Base,
                                  SDValue &Offset) {
  if (matchADDRri(Addr, Base, Offset))
    return true;

  Base = Addr;
  Offset = CurDAG->getTargetConstant(0, SDLoc(Addr), MVT::i32);
  return true;
}

Let's look at git blame. We can see the return value of selectADDRri used to be more varied (only returns are left):

bool VEDAGToDAGISel::SelectADDRri(....) {
  if (....) {
   ....
    return true;
  }
  if (....)
    return false;

  if (....) {
      ....
      return true;
  }
  ....
  return true;
}

Subsequently, this body was moved to the matchADDRri function, and selectADDRri received its current one.

It seems that it's all quite simple: there's a slightly unsuccessful refactoring, in which the function logic was changed without checking where it was used. However, git blame showed the opposite: on April 6, 2020, the selectADDRri body was modified, and two years later, on August 23, 2022, the same developer used it in if. One might assume they knew exactly what they were doing... if not for the comments. For example, why do they write "Otherwise" when they already know the code path is unreachable?

// Try to match ADDRri since reg+imm style is safe
// for all VE instructions with a memory operand.
if (selectADDRri(Op, Op0, Op1)) {
  ....
  return false;
}
// Otherwise, require the address to be in a register and immediate 0.

These same comments provide a hint: "try to match." What if matchADDRri should have been called instead of selectADDRri?

Let's substitute and compare:

if (matchADDRri(Op, Op0, Op1)) {
    OutOps.push_back(Op0);
    OutOps.push_back(Op1);
    return false;
  }
  OutOps.push_back(Op);
  OutOps.push_back(CurDAG->getTargetConstant(0, SDLoc(Op), MVT::i32));

If matchADDRri returns true, Op1 and Op2 are placed in the vector. Otherwise, we put Op and the 0 constant.

What will be in selectADDRri if we replace the names of the formal parameters with the actual ones?

bool VEDAGToDAGISel::selectADDRri(SDValue Op,
                                  SDValue &Op1,
                                  SDValue &Op2) {
  if (matchADDRri(Op, Op1, Op2))
    return true;

  Op1 = Op;
  Op2 = CurDAG->getTargetConstant(0, SDLoc(Addr), MVT::i32);
  return true;
}

In this case, if matchADDRri returns true, Op and Op2 contain the values that this function returns. If it returns false, there will be Op and the 0 constant. Does it remind you of anything? Perhaps the developer has inspired by the algorithm in selectADDRri and accidentally inserted its name instead of matchADDRri at the end.

Fragment N7

The PVS-Studio warnings: V560 A part of conditional expression is always false: Opc == ARM::MVE_VORR. ARMBaseInstrInfo.cpp 752

void ARMBaseInstrInfo::copyPhysReg(....) const {
  ....
  unsigned Opc = 0;
  if (SPRDest && SPRSrc)
    Opc = ARM::VMOVS;
  else if (GPRDest && SPRSrc)
    Opc = ARM::VMOVRS;
  else if (SPRDest && GPRSrc)
    Opc = ARM::VMOVSR;
  else if (ARM::DPRRegClass.contains(DestReg, SrcReg) && Subtarget.hasFP64())
    Opc = ARM::VMOVD;
  else if (ARM::QPRRegClass.contains(DestReg, SrcReg))
    Opc = Subtarget.hasNEON() ? ARM::VORRq : ARM::MQPRCopy;

  if (Opc) {
    MachineInstrBuilder MIB = BuildMI(MBB, I, DL, get(Opc), DestReg);
    MIB.addReg(SrcReg, getKillRegState(KillSrc));
    if (Opc == ARM::VORRq || Opc == ARM::MVE_VORR)  // <=
      MIB.addReg(SrcReg, getKillRegState(KillSrc));
    if (Opc == ARM::MVE_VORR)                       // <=
      addUnpredicatedMveVpredROp(MIB, DestReg);
    else if (Opc != ARM::MQPRCopy)
      MIB.add(predOps(ARMCC::AL));
    return;
  }
  ....
}

Here's another refactoring victim: Opc can't take the ARM::MVE_VORR value, but there's a comparison with it. Let's check for equality:

ARM::MVE_VORR == 1464
ARM::MQPRCopy == 385

Now let's glimpse git blame. According to the commit text and code, developers decided to introduce a new high-level instruction, MQPRCopy, which expands into one of two lower-level instructions. They changed two things:

Nothing serious happened: it's just a dead code that the compiler will throw away. However, there are bizarre things that stand out among many other similar warnings. For example, the function uses ARM::MVE_VORR in the code below.

if (ARM::QQPRRegClass.contains(DestReg, SrcReg)) {
  Opc = Subtarget.hasNEON() ? ARM::VORRq : ARM::MVE_VORR;
  ....
}

Probably only the code author or another contributor can say exactly how to fix it: simply remove useless conditions, replace with ARM::MQPRCopy, or make another, more complex edit.

Fragment N8

The PVS-Studio warning: V590 Consider inspecting this expression. The expression is excessive or contains a misprint. DwarfDebug.cpp 485

void DwarfDebug::addSubprogramNames(....) {
  if (getAccelTableKind() != AccelTableKind::Apple &&
      NameTableKind != DICompileUnit::DebugNameTableKind::Apple &&
      NameTableKind == DICompileUnit::DebugNameTableKind::None)
    return;
  ....
}

Let's look at another refactoring episode. As usual, look at git blame: the line with DebugNameTableKind::Apple was added separately. Initially, the code was as follows:

if (getAccelTableKind() != AccelTableKind::Apple &&
    CU.getNameTableKind() == DICompileUnit::DebugNameTableKind::None)
  return;

In another function, the developer added a similar segment, but it looks like neater:

void DwarfDebug::addAccelNameImpl(....) {
  ....
  if (getAccelTableKind() != AccelTableKind::Apple &&
      CU.getNameTableKind() != DICompileUnit::DebugNameTableKind::Apple &&
      CU.getNameTableKind() != DICompileUnit::DebugNameTableKind::Default)
    return;
  ....
}

It remains to determine how serious the issue is: is it just an excessive check or a logical error?

As demonstrated, both code fragments mean roughly the same thing, except that AccelTableKind is related to command-line parameters, while DebugNameTableKind is related to a kind of debug data embedded in LLVM IR and scoped to each compilation unit.

There's not much information available, but perhaps the commit text sheds some light:

D118754 added a new DICompileUnit::DebugNameTableKind for "Apple".... This creates a problem in the if statements changed, whereby for 'non Apple AccelTableKind' we emit empty tables for any'DebugNameTableKind' that is not 'Default'. We should consider the newly added kind here too.

Judging by the comment, the developer primarily fixed the addAccelNameImpl function and just overlooked that the == operator is used instead of != in addSubprogramNames. Most likely, the redundant condition should just be removed.

Fragment N9

The PVS-Studio warning: V547 Expression 'Value' is always false. LoongArchAsmBackend.cpp 139

static void fixupLeb128(MCContext &Ctx, const MCFixup &Fixup, uint8_t *Data,
                        uint64_t Value) {
  unsigned I;
  for (I = 0; Value; ++I, Value >>= 7)
    Data[I] |= uint8_t(Value & 0x7f);
  if (Value)
    Ctx.reportError(Fixup.getLoc(), "Invalid uleb128 value!");
}

The snowball of refactoring keeps growing. Initially, the code was as follows:

static void fixupLeb128(MCContext &Ctx, const MCFixup &Fixup,
                        MutableArrayRef<char> Data, uint64_t Value) {
  unsigned I;
  for (I = 0; I != Data.size() && Value; ++I, Value >>= 7)
    Data[I] |= uint8_t(Value & 0x7f);
  if (Value)
    Ctx.reportError(Fixup.getLoc(), "Invalid uleb128 value!");
}

Among all the code fragments analyzed in this article, this one is the most recent—it's dated on August 2, 2025. After reading the story behind it, we can see that the developer was initially fixing something completely different. Then, they spot the sanitizer problems and switched to redo applyFixup across multiple backends. Higher up the stack, in the MCAssembler::layout function, an assert appeared. It enables passing the Data parameter as a pointer without worrying about length checks:

assert(mc::isRelocRelocation(Fixup.getKind()) 
    || Fixup.getOffset() <= F.getFixedSize())

Due to such a huge unplanned rework, no joke, the developer ended up with 19 (!!!) backends. Yet, in one of them, a dead check emerged.

The GitHub commit is pleasantly green, filled with a huge bunch of labels for every touched backend. Although we couldn't find why Lanai stands out with a darker shade among all the others.

As for how to fix the remaining issue: should the if block be deleted, adjusted, or should the function's logic be rewritten entirely? Only the developer can say for sure. We hope they (or other contributors) will pay attention to this.

Fragment N10

The PVS-Studio warning: V547 Expression 'IsLeaf' is always false. PPCInstrInfo.cpp 419

bool PPCInstrInfo::getFMAPatterns(....) const {
  ....
  auto IsReassociableFMA = 
   [&](const MachineInstr &Instr, int16_t &AddOpIdx,
       int16_t &MulOpIdx, bool IsLeaf) {
     ....
     if (IsLeaf)                                                  // <=
       return true;
     
     AddOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxAddOpIdx];
     
     const MachineOperand &OpAdd = Instr.getOperand(AddOpIdx);
     MachineInstr *MIAdd = MRI->getUniqueVRegDef(OpAdd.getReg());
     // If 'add' operand's def is not in current block,
     // don't do ILP related opt.
     if (!MIAdd || MIAdd->getParent() != MBB)
       return false;
     
     // If this is not Leaf FMA Instr, its 'add' operand should only have
     // one use as this fma will be changed later.
     return IsLeaf ? true : MRI->hasOneNonDBGUse(OpAdd.getReg());  // <=
   }
  ....
}

It's a classic case of...yeah, refactoring. The commit points out that the early exit check was more intricate:

if (IsAdd && IsLeaf)

During refactoring, the developer separated the initial lambda into two dedicated ones: one handling addition and another handling Fused Multiply-Add (FMA). This change made part of the old IsAdd to become redundant, but the end of the function was overlooked and not refactored.

Given the function's logic, the final return statement should most likely be replaced with:

return MRI->hasOneNonDBGUse(OpAdd.getReg());

Fragment N11

The PVS-Studio warning: V581 The conditional expressions of the 'if' statements situated alongside each other are identical. Check lines: 5820, 5823. PPCInstrInfo.cpp 5823

bool PPCInstrInfo::getMemOperandWithOffsetWidth(....){
  if (!LdSt.mayLoadOrStore() || LdSt.getNumExplicitOperands() != 3)
    return false;

  // Handle only loads/stores with base register
  // followed by immediate offset.
  if (!LdSt.getOperand(1).isImm() ||
      (!LdSt.getOperand(2).isReg() && !LdSt.getOperand(2).isFI()))
    return false;
  if (!LdSt.getOperand(1).isImm() ||
      (!LdSt.getOperand(2).isReg() && !LdSt.getOperand(2).isFI()))
    return false;
  ....
}

The analyzer issued another warning for the PPCInstrInfo.cpp file from the previous code snippet, but it has a more complex background. Trying to figure out how this happened naturally makes my head dizzy. There was the commit with a change to the if statement and its edit, a commit revert due to sanitizer, a re-merge, etc.—all on the same day. The code went wild. For example, at one point, another check was doubled:

bool PPCInstrInfo::getMemOperandWithOffsetWidth(....) const {
  if (!LdSt.mayLoadOrStore() || LdSt.getNumExplicitOperands() != 3) // <=
    return false;

  // Handle only loads/stores with base register
  // followed by immediate offset.
  if (LdSt.getNumExplicitOperands() != 3)                           // <=
    return false;
  if (!LdSt.getOperand(1).isImm() ||
      (!LdSt.getOperand(2).isReg() && !LdSt.getOperand(2).isFI()))
    return false;
  ....
}

Let's make a quick relaxing test to check for side effects:

bool isImm() const { return OpKind == MO_Immediate;  }
bool isReg() const { return OpKind == MO_Register;   }
bool isFI()  const { return OpKind == MO_FrameIndex; }

LLVM is huge and complex but is nevertheless actively being developed, partly due to its easy acceptance of pull requests. For example, developers aren't required to run the entire titanic volume of tests before submitting a pull request. Commit rollbacks, known as "revert to green", are encouraged and considered a normal development practice. However, it can lead to unexpected consequences caused by corrupted merges.

Fragment N12

The PVS-Studio warning: V547 Expression 'Opcode == Instruction::FDiv' is always false. GISelValueTracking.cpp 1451

void GISelValueTracking::computeKnownFPClass(....) {
  ....
  unsigned Opcode = MI.getOpcode(); // variable is never changed
  ....
  switch(Opcode){
  ....
  case TargetOpcode::G_FDIV:   // == 188
  case TargetOpcode::G_FREM: { // == 189
    ....
    if (LHS == RHS) 
      if (Opcode == TargetOpcode::G_FDIV) { 
      ....
      }
    ....
    if (Opcode == Instruction::FDiv) { // == 21
    // Only 0/0, Inf/Inf produce NaN.
    ....
    }
  }
  ....
}

The last few warnings were fairly harmless. Now, let's move on to more significant logical errors. Unlike previous fragments, this entire titanic switch (almost 1,000 lines of code!) was added at once, and only afterward did developers patch small pieces outside the case branch we're looking at.

By the way, there's a similar oddity above:

case TargetOpcode::G_FADD:        // == 183
case TargetOpcode::G_STRICT_FADD: // == 277
case TargetOpcode::G_FSUB:        // == 184
case TargetOpcode::G_STRICT_FSUB: // == 278
  .....
  if (Opcode == Instruction::FAdd) { // == 14
  ....
  }
  ....
}

Comparisons between two different enums look odd. Yet, apart from these two places, the Instruction values aren't used anywhere else, whereas TargetOpcode is used not only in the case statements, but in various checks and comparisons. It seems to be a copy-paste error: the first comparison should involve TargetOpcode::G_FDIV, and the second one should use TargetOpcode::G_FADD.

Fragment N13

The PVS-Studio warning: V560 A part of conditional expression is always false: Defs.count(R) > 1. HexagonMCChecker.cpp 612

bool HexagonMCChecker::checkRegisters() {
    ....
    if (isLoopRegister(R) && Defs.count(R) > 1 &&
        (HexagonMCInstrInfo::isInnerLoop(MCB) ||
         HexagonMCInstrInfo::isOuterLoop(MCB))) {
      // Error out for definitions of loop registers at the end of a loop.
      reportError("loop-setup and some branch instructions "
                  "cannot be in the same packet");
      return false;
    }
    ....
}

At first glance, this looks like perfectly ordinary code until we realize thatDefs is a DenseMap:

/// Return 1 if the specified key is in the map, 0 otherwise.
size_type count(const_arg_type_t<KeyT> Val) const {
  return contains(Val) ? 1 : 0;
}

As we can see, error logging will never be executed, and, moreover, execution will always continue with potentially incorrect behavior.

Here's what git blame reports: the code celebrates its tenth anniversary in November. One might think its meaning had changed over time. But no, history confirms that Defs has always been DenseMap, and its count function only returned 0 or 1 back then. If we search for other similar checks, we see that the count result is used as a binary value everywhere.

Here are a few examples:

if (Defs.count(Reg))
if (Defs.count(Reg) && !MO.isDead())
if (Uses.count(DstReg) || Defs.count(SrcReg))

It seems the developers were trying to makeDenseMap behave similarly to standard library containers, where std::map and std::multimap have a common count interface. In the case of std::multimap, it may return more than one match. The commit text confirms it: developers changed the return type of the count member function from bool to size_type:

The count() function for STL datatypes returns unsigned, even where it's only 1/0 result like std::set.

Here's an explanation from the LLVM code review website:

If we want our container-like types to be containers, count should return size_type.

As a result, either a different container should be used, or the correct check should look something like this:

if (isLoopRegister(R) && Defs.count(R) &&
        (HexagonMCInstrInfo::isInnerLoop(MCB) ||
         HexagonMCInstrInfo::isOuterLoop(MCB)))

Fragment N14

The PVS-Studio warning: V560 A part of conditional expression is always false: AVX10Ver >= 2. Host.cpp 2177

StringMap<bool> sys::getHostCPUFeatures() {
  unsigned EAX = 0, EBX = 0, ECX = 0, EDX = 0;
  ....
  bool HasLeaf24 = MaxLevel >= 0x24
                && !getX86CpuIDAndInfo(0x24, &EAX, &EBX, &ECX, &EDX);

  int AVX10Ver = HasLeaf24 && (EBX & 0xff);
  Features["avx10.1"] = HasAVX10 && AVX10Ver >= 1;
  Features["avx10.2"] = HasAVX10 && AVX10Ver >= 2;
  return Features;
}

While the last couple of warnings could be summed up as "False, false, and nothing but false", this one is slightly more nuanced. In the HasLeaf24 && (EBX & 0xff) expression, both operands of && are first converted to the bool type, the logical AND is evaluated, and the result is expanded back to the int type. The output is either 0 or 1, and the AVX10Ver >= 2 expression always evaluates to false.

Looking at history, the previous iteration of the code was something like this:

bool HasLeaf24 = MaxLevel >= 0x24
              && !getX86CpuIDAndInfo(0x24, &EAX, &EBX, &ECX, &EDX);
Features["avx10.1-512"] = Features["avx10.1-256"]
                       && HasLeaf24
                       && ((EBX >> 18) & 1);

Let's take a look at what the function does in general. It collects the supported processor features in Features, i.e., the values of the CPUID instruction.

Open the Intel Advanced Vector Extension 10.1 specification for CPUID:

CPUID.(EAX=24H, ECX=00H):EBX[bit 18]
If 1, indicates that 512-bit vector support is present.

Everything matches the original version, but in the Advanced Vector Extension 10.2 specification, which the developer added support for, it's already different:

CPUID.(EAX=0x07,ECX=0x01):EDX.AVX10[19]
  and (CPUID.(EAX=0x24,ECX=0x0):EBX[7:0] >= 2)

The value range changed from one bit to eight: from zero to seven in EBX, which actually ends up in EBX & 0xff and no longer fits in bool.

According to the specification, the correct code should be, for example, as follows:

int AVX10Ver = EBX & 0xff;
Features["avx10.1"] = HasAVX10 && HasLeaf24 && AVX10Ver >= 1;
Features["avx10.2"] = HasAVX10 && HasLeaf24 && AVX10Ver >= 2;

Fragment N15

The PVS-Studio warning: V547 Expression 'i != e' is always true. MachineFunction.cpp 1444

void MachineJumpTableInfo::print(raw_ostream &OS) const {
  if (JumpTables.empty()) return;

  OS << "Jump Tables:\n";
  
  for (unsigned i = 0, e = JumpTables.size(); i != e; ++i) {
    OS << printJumpTableEntryReference(i) << ':';
    for (const MachineBasicBlock *MBB : JumpTables[i].MBBs)
      OS << ' ' << printMBBReference(*MBB);
    if (i != e)
      OS << '\n';
  }

  OS << '\n';
}

This warning is related to printer functions. Initially, the loop outputs values in a single line:

for (unsigned i = 0, e = JumpTables.size(); i != e; ++i) {
    OS << printJumpTableEntryReference(i) << ": ";
    for (unsigned j = 0, f = JumpTables[i].MBBs.size(); j != f; ++j)
      OS << ' ' << printMBBReference(*JumpTables[i].MBBs[j]);
}

OS << '\n';

Maybe the developer supposed if (i != e) to prevent two consecutive line breaks after the last line, but the condition is wrong and doesn't work as expected. In this case, the fixed code should be something like this:

void MachineJumpTableInfo::print(raw_ostream &OS) const {
  if (JumpTables.empty()) return;

  OS << "Jump Tables:\n";
  
  for (unsigned i = 0, e = JumpTables.size(); i != e; ++i) {
    OS << printJumpTableEntryReference(i) << ':';
    for (const MachineBasicBlock *MBB : JumpTables[i].MBBs)
      OS << ' ' << printMBBReference(*MBB);
    OS << '\n';
  }
}

Fragment N16

The PVS-Studio warning: V547 Expression is always true. SystemZAsmPrinter.cpp 1324

static void emitPPA1Flags(....){
  ....
  auto Flags1 = PPA1Flag1(0);
  Flags1 |= PPA1Flag1::DSA64Bit;
  if (VarArg) Flags1 |= PPA1Flag1::VarArg;

  .... // Flags1 hasn't changed

  if ((Flags1 & PPA1Flag1::DSA64Bit) == PPA1Flag1::DSA64Bit)
    OutStreamer->AddComment("  Bit 0: 1 = 64-bit DSA");
  else
    OutStreamer->AddComment("  Bit 0: 0 = 32-bit DSA");
  ....
}

Let's get back to the printer code. This is a peculiar check because if you set the bits in the value and then check their presence using the bitwise AND operation, of course, those bits will still be set. Meanwhile, all the other flags in this function are validated conditionally, based on the parameters passed in.

static void emitPPA1Flags(std::unique_ptr<MCStreamer> &OutStreamer,
                          bool VarArg,
                          bool StackProtector,
                          bool FPRMask,
                          bool VRMask,
                          bool EHBlock,
                          bool HasName) {
  ....
  auto Flags1 = PPA1Flag1(0);
  Flags1 |= PPA1Flag1::DSA64Bit;
  if (VarArg)
    Flags1 |= PPA1Flag1::VarArg;
  
  if (StackProtector)
    Flags2 |= PPA1Flag2::STACKPROTECTOR;
  
  // SavedGPRMask, SavedFPRMask, and SavedVRMask are precomputed in.
  if (FPRMask)
    Flags3 |= PPA1Flag3::FPRMask; // Add emit FPR mask flag.
  
  if (VRMask)
    Flags4 |= PPA1Flag4::VRMask; // Add emit VR mask flag.
  
  if (EHBlock)
    Flags4 |= PPA1Flag4::EHBlock; // Add optional EH block.
  
  if (HasName)
    Flags4 |= PPA1Flag4::ProcedureNamePresent; // Add optional name block.
  ....
}

One might assume that the check was made for consistency with other flags. But then the else branch makes the whole thing even stranger. The commit message gives us a hint:

Add the PPA1 to SystemZAsmPrinter.

Okay, let's refer to the IBM documentation:

PPA1 flag 1 offset X'08
'0.......'B  GPR Save area is 32 bit.  // <=
'1.......'B  GPR Save area is 64 bit.  // <=
'.0......'B  Reserved.
'..0.....'B  Own exception model.
'..1.....'B  Inherited exception model.
'...0....'B  tiny PPA3.
'...1....'B  full PPA3.
'....0...'B  Do Not call member for Exit DSA event.
'....1...'B  Call member for Exit DSA event.
'.....0..'B  Do Not treat as PL/I style exit DSA.
'.....1..'B  Treat as PL/I style exit DSA.
'......0.'B  This is not a Special linkage routine.
'......1.'B  This is a Special linkage routine.
'.......0'B  Not a Vararg routine.
'.......1'B  Vararg routine.

It confirms that a 32-bit version is indeed possible, but only the developer can clarify what behavior was intended here.

Fragment N17

The PVS-Studio warning: V1048 The 'FirstOp' variable was assigned the same value. MachineInstr.cpp 1995

void MachineInstr::print(....) const {
  ....
  if (MCSymbol *PreInstrSymbol = getPreInstrSymbol()) {
    if (!FirstOp) {
      FirstOp = false;
      OS << ',';
    }
    OS << " pre-instr-symbol ";
    MachineOperand::printSymbol(OS, *PreInstrSymbol);
  }
  ....
}

There's the final chunk of printer issues. This time, the problem is with a redundant assignment, not output. There are five such warnings in this function. If we trace the history, we see that the original pattern looked like this:

if (!FirstOp) {
  OS << ',';
}

Then, the first block of this type was added to include an assignment:

if (!FirstOp) {
    FirstOp = false;
    OS << ',';
}

Then, similar fragments began to appear, most likely copied and pasted by different developers. As a result, we now have five redundant lines that do nothing and shouldn't be there at all.

Other warnings in the file:

V1048 The 'FirstOp' variable was assigned the same value. MachineInstr.cpp 2003
V1048 The 'FirstOp' variable was assigned the same value. MachineInstr.cpp 2011
V1048 The 'FirstOp' variable was assigned the same value. MachineInstr.cpp 2019
V1048 The 'FirstOp' variable was assigned the same value. MachineInstr.cpp 2027

Fragment N18

The PVS-Studio warning: V547 Expression 'AllSame' is always true. SimplifyCFG.cpp 1914

bool SimplifyCFGOpt::hoistCommonCodeFromSuccessors(....) {
  ....
  bool AllSame = none_of(....);
  if (!AllSame)
    return false;
  if (AllSame) { .... }
  ....
}

Debugging a compiler is a very challenging endeavor, that's why the LLVM developers often write various checks and asserts. However, some of them make you question your sanity and perception of reality. This is one of them. No, this is neither a merge issue nor a refactoring error—it was added at once.

Fragment N19

The PVS-Studio warning: V547 Expression 'AbbrevDecl' is always true. LVDWARFReader.cpp 398

LVScope *LVDWARFReader::processOneDie(....) {
  ....
  if (const DWARFAbbreviationDeclaration *AbbrevDecl =
            TheDIE.getAbbreviationDeclarationPtr())
    if (AbbrevDecl)
      for (const DWARFAbbreviationDeclaration::AttributeSpec &AttrSpec :
           AbbrevDecl->attributes())
        processOneAttribute(TheDIE, &CurrentEndOffset, AttrSpec);
  ....
}

Here's another paranormal check. The inner if is redundant because when execution reaches it, AbbrevDecl always be non-zero anyway.

Fragment N20

The PVS-Studio warning: V791 The initial value of the index in the nested loop equals 'I'. Perhaps, 'I + 1' should be used instead. LoopUnrollAndJam.cpp 793

static bool checkDependencies(....){
  ....
  for (size_t I = 0; I < NumInsts; ++I) {
    for (size_t J = I; J < NumInsts; ++J) {
      if (!checkDependency(CurrentLoadsAndStores[I],
                           CurrentLoadsAndStores[J],
                           LoopDepth, CurLoopDepth, true, DI))
      return false;
      ....
  }
  ....
}

And finally, for a change of pace, let's look at a small optimization warning. In the comparison function, we can see that it immediately returns true when the pointer arguments are equal:

static bool checkDependency(Instruction *Src,
                            Instruction *Dst,
                            ....) {
  if (Src == Dst)
    return true;
  ....
}

As the analyzer correctly pointed out, since the comparison function early and has no side effects, there's no point in starting the inner loop at I instead of I + 1.

Outro

As we've seen today, even seasoned developers as LLVM contributors sometimes make mistakes: both logical and copy-paste errors, not to mention the remnants of incorrect automatic merges. It's okay! That's perfectly normal; even mighty dragons need to maintain good code hygiene. And if you want help with that, remember that you can use the free PVS-Studio version for open-source projects.

#Cpp #Compiler