The PVS-Studio static analyzer can find errors even in such a high-quality and well-tested project as LLVM. To make sure it's not empty words, we occasionally check the project and post notes about the checks just like this one.
LLVM is a software infrastructure project for creating compilers and their associated utilities. It's a set of high-level language compilers, optimization system, interpretation and compilation into machine code. It is written in a modern version of C++ language. The project is used by a huge number of users and is well tested.
Once in a while we check this project to demonstrate what PVS-Studio is capable of. Previous posts:
These articles demonstrate well the usefulness of the static code analysis methodology. However, this has nothing to do with improving the quality and reliability of a software project. Static analyzers should be used regularly during the development process to help detect bugs early on.
If I understand correctly, LLVM is already regularly checked with Coverity Scan Static Analysis and Clang Static Analyzer. PVS-Studio would look great next to the above tools :) That would be a horse of a different colour!
When writing the article, I wanted to share some juicy errors with you. At the same time, I didn't want to snow you under with the number of warnings. Anyway, it's hard to study the warnings issued on unfamiliar code, so I decided to stop by writing out 19 errors. 19 seems to be the right number for the 19th version of the project. Of course, we can detect a way more errors.
First, let's take a look at the prototype of the isStoreUsed function. Note that the last argument is optional.
bool isStoreUsed(const FrameIndexEntry &StoreFIE, ExprIterator Candidates,
bool IncludeLocalAccesses = true) const;
There are places where this function is used correctly:
if (SRU.isStoreUsed(*FIEX,
Prev ? SRU.expr_begin(*Prev) : SRU.expr_begin(BB),
/*IncludeLocalAccesses=*/false)) {
But at some point, the programmer's hand slipped and the parenthesis happened to be in the wrong place. Well, no one is immune to such mistakes.
void CalleeSavedAnalysis::analyzeSaves() {
....
// If this stack position is accessed in another function, we are
// probably dealing with a parameter passed in a stack -- do not mess
// with it
if (SRU.isStoreUsed(*FIE,
Prev ? SRU.expr_begin(*Prev) : SRU.expr_begin(BB)),
/*IncludeLocalAccesses=*/false) {
BlacklistedRegs.set(FIE->RegOrImm);
CalleeSaved.reset(FIE->RegOrImm);
Prev = &Inst;
continue;
}
....
}
The PVS-Studio analyzer generates two warnings for this fragment:
Delightful error. The code compiles because of the last optional argument of the function. It is actually equivalent to this:
SRU.isStoreUsed(*FIE,
Prev ? SRU.expr_begin(*Prev) : SRU.expr_begin(BB));
if (/*IncludeLocalAccesses=*/false) {
It is not so easy to spot such an error on a code review. The PVS-Studio analyzer can be a good helper in detecting such typos.
FailureOr<ConstantOrScalableBound>
ScalableValueBoundsConstraintSet::computeScalableBound(....)
{
....
AffineMap bound = [&] {
if (boundType == BoundType::EQ && !invalidBound(lowerBound) &&
lowerBound[0] == lowerBound[0]) { // <=
return lowerBound[0];
} else if (boundType == BoundType::LB && !invalidBound(lowerBound)) {
return lowerBound[0];
} else if (boundType == BoundType::UB && !invalidBound(upperBound)) {
return upperBound[0];
}
return AffineMap{};
}();
....
}
PVS-Studio warning:
V501 There are identical sub-expressions to the left and to the right of the '==' operator: lowerBound[0] == lowerBound[0] ScalableValueBoundsConstraintSet.cpp 110
A simple typo when working with similar array names. I'm not sure how to describe it more, so we'll move on to the next one.
For those who haven't read it, check out the note "Zero, one, two, Freddy's coming for you".
Value buildTernaryFn(TernaryFn ternaryFn, Value arg0, Value arg1,
Value arg2) {
bool headBool =
isInteger(arg0) && arg0.getType().getIntOrFloatBitWidth() == 1;
bool tailFloatingPoint =
isFloatingPoint(arg0) && isFloatingPoint(arg1) && isFloatingPoint(arg2);
bool tailInteger = isInteger(arg0) && isInteger(arg1) && isInteger(arg1);
OpBuilder::InsertionGuard g(builder);
builder.setInsertionPointToEnd(&block);
....
}
PVS-Studio warning:
V501 There are identical sub-expressions 'isInteger(arg1)' to the left and to the right of the '&&' operator. LinalgOps.cpp 502
The last call to the function reuses arg1:
bool tailInteger = isInteger(arg0) && isInteger(arg1) && isInteger(arg1);
The code is hard to read, which is the reason for the typo. Perhaps one can avoid the error by writing the code in a more "sparse" and aligned way:
bool tailFloatingPoint = isFloatingPoint(arg0)
&& isFloatingPoint(arg1)
&& isFloatingPoint(arg2);
bool tailInteger = isInteger(arg0)
&& isInteger(arg1)
&& isInteger(arg1);
In case of tabular formatting, the error will be easier to notice during code review.
StringRef ARMAsmParser::splitMnemonic(StringRef Mnemonic, ....) {
....
if (isMnemonicVPTPredicable(Mnemonic, ExtraToken) && Mnemonic != "vmovlt" &&
Mnemonic != "vshllt" && Mnemonic != "vrshrnt" && Mnemonic != "vshrnt" &&
Mnemonic != "vqrshrunt" && Mnemonic != "vqshrunt" &&
Mnemonic != "vqrshrnt" && Mnemonic != "vqshrnt" && Mnemonic != "vmullt" &&
Mnemonic != "vqmovnt" /*** <= ***/ && Mnemonic != "vqmovunt" &&
Mnemonic != "vqmovnt" /*** <= ***/ && Mnemonic != "vmovnt" &&
Mnemonic != "vqdmullt" && Mnemonic != "vpnot" &&
Mnemonic != "vcvtt" && Mnemonic != "vcvt") {
....
}
PVS-Studio warning:
V501 There are identical sub-expressions 'Mnemonic != "vqmovnt"' to the left and to the right of the '&&' operator. ARMAsmParser.cpp 6633
template <typename T>
LIBC_NAMESPACE::cpp::enable_if_t<LIBC_NAMESPACE::cpp::is_floating_point_v<T>,
bool>
ValuesEqual(T x1, T x2) {
LIBC_NAMESPACE::fputil::FPBits<T> bits1(x1);
LIBC_NAMESPACE::fputil::FPBits<T> bits2(x2);
// If either is NaN, we want both to be NaN.
if (bits1.is_nan() || bits2.is_nan())
return bits2.is_nan() && bits2.is_nan(); // <=
// For all other values, we want the values to be bitwise equal.
return bits1.uintval() == bits2.uintval();
}
PVS-Studio warning:
V501 There are identical sub-expressions to the left and to the right of the '&&' operator: bits2.is_nan() && bits2.is_nan() Compare.h 23
bool OpenACCClauseWithVarList::classof(const OpenACCClause *C) {
return OpenACCPrivateClause::classof(C) ||
OpenACCFirstPrivateClause::classof(C) ||
OpenACCDevicePtrClause::classof(C) || // <=
OpenACCDevicePtrClause::classof(C) || // <=
OpenACCAttachClause::classof(C) || OpenACCNoCreateClause::classof(C) ||
OpenACCPresentClause::classof(C) || OpenACCCopyClause::classof(C) ||
OpenACCCopyInClause::classof(C) || OpenACCCopyOutClause::classof(C) ||
OpenACCReductionClause::classof(C) || OpenACCCreateClause::classof(C);
}
PVS-Studio warning:
V501 There are identical sub-expressions 'OpenACCDevicePtrClause::classof(C)' to the left and to the right of the '||' operator. OpenACCClause.cpp 31
It seems simple when these bugs have already been found. But at the same time, no one noticed them before the analyzer. Love static code analysis!
isl_size isl_local_space_var_offset(....)
{
isl_space *space;
space = isl_local_space_peek_space(ls);
if (space < 0)
return isl_size_error;
....
}
PVS-Studio warning:
V503 This is a nonsensical comparison: pointer < 0. isl_local_space.c 257
Checking the pointer to see if it is negative makes no sense. The behavior of such an operation is unspecified. There should be either a comparison for nullptr, or another check at all.
static Expected<uint64_t>
GeneratePerfEventConfigValue(bool enable_tsc,
std::optional<uint64_t> psb_period) {
uint64_t config = 0;
....
if (Expected<uint32_t> offset = ReadIntelPTConfigFile(
kTSCBitOffsetFile, IntelPTConfigFileType::BitOffset))
config |= 1 << *offset;
....
}
PVS-Studio warning:
V629 Consider inspecting the '1 << * offset' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. IntelPTSingleBufferTrace.cpp 153
The analyzer generated a number of V629 warnings for certain expressions. In them, a value of int type is shifted and the result is used together with 64-bit values. The code fragment above is not necessarily incorrect but is very suspicious. As well as other fragments pointed by the analyzer.
The mask is formed in the 64-bit config variable. But this shifts literal 1, which is of type int. Consequently, it will not be possible to correctly set the high bits in the config if needed. Perhaps this code works because high bits do not need to be set yet. Anyway, it is more correct to write it this way:
config |= (uint64_t)(1) << *offset;
// or this way
config |= 1ULL << *offset;
std::unique_ptr<OptionDefinition> m_options_definition_up;
Status SetOptionsFromArray(StructuredData::Dictionary &options) {
Status error;
m_num_options = options.GetSize();
m_options_definition_up.reset(new OptionDefinition[m_num_options]);
....
}
PVS-Studio warning:
V554 Incorrect use of unique_ptr. The memory allocated with 'new []' will be cleaned using 'delete'. CommandObjectCommands.cpp 1384
The m_options_definition_up smart pointer is meant to store a single element. But a pointer to an array is placed there. Here is the result - undefined behavior at the time of object destruction. Read more here "Why do arrays have to be deleted via delete[] in C++".
To fix the bug, one just needs to add [] when declaring a smart pointer:
std::unique_ptr<OptionDefinition []> m_options_definition_up;
std::vector<lldb::addr_t> m_indexed_isa_cache;
bool AppleObjCRuntimeV2::NonPointerISACache::EvaluateNonPointerISA(
ObjCISA isa, ObjCISA &ret_isa) {
....
uintptr_t index = (isa & m_objc_debug_indexed_isa_index_mask) >>
m_objc_debug_indexed_isa_index_shift;
....
// If the index is still out of range then this isn't a pointer.
if (index > m_indexed_isa_cache.size())
return false;
LLDB_LOGF(log, "AOCRT::NPI Evaluate(ret_isa = 0x%" PRIx64 ")",
(uint64_t)m_indexed_isa_cache[index]);
ret_isa = m_indexed_isa_cache[index];
....
}
PVS-Studio warnings:
Classic antipattern: using an incorrect operator to check an index for array overrun. The function execution is aborted if the index is greater than the array size:
if (index > m_indexed_isa_cache.size())
return false;
This is incorrect because the array elements are numbered from zero. If it turns out that index is equal to m_indexed_isa_cache.size(), an array overrun will occur (Off-by-one Error).
Correct code with >=:
if (index >= m_indexed_isa_cache.size())
return false;
Here is a tricky thing. Since I encounter this antipattern often, I've got used to it. Once I almost overlooked an outstanding bug in a DPDK project because of it. Rather, I lost freshness of the vision and referred the bug to that common antipattern. In fact, everything turned out to be much more mind-bending than usually—"Most striking error I found with PVS-Studio in 2024".
template <typename Key, typename ElementLattice> class MapLattice {
using Container = llvm::DenseMap<Key, ElementLattice>;
Container C;
public:
....
explicit MapLattice(Container C) { C = std::move(C); }
....
}
PVS-Studio warning:
V570 The 'C' variable is assigned to itself. MapLattice.h 52
A class data member and a function parameter have the same names. Such naming has gone awry. The constructor doesn't write the required data to the container. Options to fix the code:
explicit MapLattice(Container C) { this->C = std::move(C); }
// Or use another variable name
explicit MapLattice(Container src) { C = std::move(src); }
// It is even more elegant to use the initialization list of the constructor
explicit MapLattice(Container C) : C { std::move(C) } {};
P.S. Reading this note's draft, my colleague said that we have already described that error in the previous article. But here it is. I wish edits after our checks could be made quicker. But in fact it's usually a looong wait :)
std::unique_ptr<Module> parseMIR(....) {
....
MachineModuleInfoWrapperPass *MMIWP =
new MachineModuleInfoWrapperPass(&TM);
if (MIR->parseMachineFunctions(*M, MMIWP->getMMI()))
return nullptr;
PM.add(MMIWP);
return M;
}
PVS-Studio warning:
V773 The function was exited without releasing the 'MMIWP' pointer. A memory leak is possible. LiveIntervalTest.cpp 74
Suppose, the parseMachineFunctions function returns an error status. Then no delete operator will be called for the MMIWP pointer when exiting the parseMIR function.
Similar case:
Value *LLParser::PerFunctionState::getVal(const std::string &Name, Type *Ty,
LocTy Loc) {
....
Value *FwdVal;
if (Ty->isLabelTy()) {
FwdVal = BasicBlock::Create(F.getContext(), Name, &F);
} else {
FwdVal = new Argument(Ty, Name);
}
if (FwdVal->getName() != Name) {
P.error(Loc, "name is too long which can result in name collisions, "
"consider making the name shorter or "
"increasing -non-global-value-max-name-size");
return nullptr;
}
ForwardRefVals[Name] = std::make_pair(FwdVal, Loc);
return FwdVal;
}
PVS-Studio warning:
V773 The function was exited without releasing the 'FwdVal' pointer. A memory leak is possible. LLParser.cpp 3625
In general, it's strange to see manual memory handling in LLVM. Errors are attracted to such code like bees to honey...
Error LinuxKernelRewriter::readORCTables() {
....
MCInst *Inst = BF->getInstructionAtOffset(Offset);
if (!Inst) {
Inst = BF->getInstructionContainingOffset(Offset);
if (Inst || BC.MIB->hasAnnotation(*Inst, "AltInst")) // <=
continue;
return createStringError(
errc::executable_format_error,
"no instruction at address 0x%" PRIx64 " in .orc_unwind_ip", IP);
}
....
}
PVS-Studio warning:
V522 Dereferencing of the null pointer 'Inst' might take place. LinuxKernelRewriter.cpp 583
If it comes to evaluating the right operand of the logic OR, the BF pointer is nullptr and it is safely dereferenced. The check makes sense if we replace || with &&:
if (Inst && BC.MIB->hasAnnotation(*Inst, "AltInst"))
continue;
There are quite heavy objects in header files, such as these:
static constexpr llvm::StringLiteral kNoRegister("%noreg");
PVS-Studio warning:
V1043 A global object variable 'kNoRegister' is declared in the header. Multiple copies of it will be created in all translation units that include this header file. LlvmState.h 28
static const std::vector<int64_t> InstructionsMappingShape{
1, NumberOfInterferences, ModelMaxSupportedInstructionCount};
PVS-Studio warning:
V1043 A global object variable 'InstructionsMappingShape' is declared in the header. Multiple copies of it will be created in all translation units that include this header file. MLRegAllocEvictAdvisor.h 82
Due to static, linking will be erroneous, as each translation unit will have its own unique object. The problem is the bloat of the compiled code. These variables will be independently present IN EVERY compiled .cpp file where these header files are included.
More details about the case are given in the chapter "Terrible tip N34. Need to add a constant instance of the class everywhere? It's more convenient to declare it in the header file" from the book about antipatterns.
Of course, it's not a real bug, as the code will work. However, it is still better to replace static with inline.
TaskData *Parent{nullptr};
TaskData *Init(TaskData *parent, int taskType) {
TaskType = taskType;
Parent = parent;
Team = Parent->Team; // <= (1)
BarrierIndex = Parent->BarrierIndex; // <= (2)
if (Parent != nullptr) { // <= (3)
Parent->RefCount++;
....
}
PVS-Studio warning:
V595 The 'Parent' pointer was utilized before it was verified against nullptr. Check lines: 543, 545. ompt-tsan.cpp 543
The Parent pointer is dereferenced in (1) and (2), although the check (3) tells us and the analyzer that it can be null.
Now the case is a bit more complicated when two functions interact.
ScheduleDAGSDNodes *createDefaultScheduler(SelectionDAGISel *IS,
CodeGenOptLevel OptLevel) {
const TargetLowering *TLI = IS->TLI;
const TargetSubtargetInfo &ST = IS->MF->getSubtarget();
....
}
Note that in the function shown, the IS pointer is dereferenced without a preliminary check. Now let's see how this function is called.
struct ForceCodegenLinking {
ForceCodegenLinking() {
....
(void)llvm::createDefaultScheduler(nullptr,
llvm::CodeGenOptLevel::Default);
....
}
};
PVS-Studio warning:
V522 Dereferencing of the null pointer might take place. The null pointer is passed into 'createDefaultScheduler' function. Inspect the first argument. Check lines: 'SelectionDAGISel.cpp:285', 'LinkAllCodegenComponents.h:48'. SelectionDAGISel.cpp 285
P.S. It turns out that we also have already described this error (fragment N3).
Let's finish with a curious case when the analyzer made a mistake in diagnosing what was going on but still pointed to a real error.
DiagnosedSilenceableFailure
transform::ConvertToLoopsOp::apply(transform::TransformRewriter &rewriter,
transform::TransformResults &results,
transform::TransformState &state) {
SmallVector<Operation *> loops;
for (Operation *target : state.getPayloadOps(getTarget())) {
auto tilingOp = dyn_cast<TilingInterface>(*target);
if (!target) {
DiagnosedSilenceableFailure diag =
emitSilenceableError()
<< "expected the payload to implement TilingInterface";
diag.attachNote(target->getLoc()) << "payload op";
return diag;
}
....
}
The analyzer issued two warnings at once, but they are irrelevant:
Let me distill the essence for you:
auto tilingOp = dyn_cast<TilingInterface>(*target);
if (!target)
Failure(target->getLoc());
The analyzer is worried that the target null pointer is dereferenced. In fact, this pointer will never be null, so the condition will never be met.
The error is that the wrong pointer is checked. Correct code version:
auto tilingOp = dyn_cast<TilingInterface>(*target);
if (!tilingOp)
Failure(target->getLoc());
We get a non-null target pointer. Then comes an attempt to dynamically cast it to a different type. If this fails, the tilingOp pointer will be nullptr. In this case, the non-null target pointer will be used when generating some kind of a failure message.
What did I want to say by writing this article?
Here you can download and try a trial full-featured version of PVS-Studio. For open-source projects, there is a free licensing option.
If you are interested in C and C++ compiler development topics, you are invited to watch recordings of two webinars:
If you like them, don't forget to check out our website once in a while - we look forward to dive into other noteworthy topics.