It's been two years since we last checked the code of the LLVM project with PVS-Studio, so let's see if PVS-Studio is still the leader among tools for detecting bugs and security weaknesses. We'll do that by scanning the LLVM 8.0.0 release for new bugs.
Frankly, I didn't feel like writing this article. It's not much fun talking about the project that we already checked more than once (1, 2, 3). I'd prefer something new instead, but I had no choice.
Every time a new version of LLVM is released or Clang Static Analyzer is updated, we get emails reading along these lines:
Hey, the new version of Clang Static Analyzer got new diagnostics! PVS-Studio seems to be getting less relevant. Clang can detect more bugs than before and is now catching up with PVS-Studio. What'd you say?
To that I'd gladly respond:
We haven't been lazing around either! We've significantly increased PVS-Studio's capabilities, so no worries - we are still the best.
But that's a bad answer, I'm afraid. It offers no proofs, and that's the reason why I'm writing this article. So, I've checked LLVM one more time and found tons of bugs of all kinds. Those that I liked the most will be discussed further. Clang Static Analyzer can't detect these bugs (or makes the process very troublesome) - and we can. And, by the way, it took me only one evening to write all those bugs down.
The article, though, took me several weeks to complete. I just couldn't bring myself to put the gathered material into text :).
By the way, if you wonder what techniques PVS-Studio employs to detect bugs and vulnerabilities, take a look at this post.
As I already said, the last of the many checks of LLVM was done two years ago, and the bugs found then were fixed by the authors. This article will show a new portion of errors. How come there are new bugs at all? There are three reasons:
Snippet No. 1: Copy-Paste
static bool ShouldUpgradeX86Intrinsic(Function *F, StringRef Name) {
if (Name == "addcarryx.u32" || // Added in 8.0
....
Name == "avx512.mask.cvtps2pd.128" || // Added in 7.0
Name == "avx512.mask.cvtps2pd.256" || // Added in 7.0
Name == "avx512.cvtusi2sd" || // Added in 7.0
Name.startswith("avx512.mask.permvar.") || // Added in 7.0 // <=
Name.startswith("avx512.mask.permvar.") || // Added in 7.0 // <=
Name == "sse2.pmulu.dq" || // Added in 7.0
Name == "sse41.pmuldq" || // Added in 7.0
Name == "avx2.pmulu.dq" || // Added in 7.0
....
}
PVS-Studio diagnostic message: V501 [CWE-570] There are identical sub-expressions 'Name.startswith("avx512.mask.permvar.")' to the left and to the right of the '||' operator. AutoUpgrade.cpp 73
The occurrence of the "avx512.mask.permvar." substring is checked twice. The second condition obviously was to check something else, but the programmer forgot to change the copied line.
Snippet No. 2: Typo
enum CXNameRefFlags {
CXNameRange_WantQualifier = 0x1,
CXNameRange_WantTemplateArgs = 0x2,
CXNameRange_WantSinglePiece = 0x4
};
void AnnotateTokensWorker::HandlePostPonedChildCursor(
CXCursor Cursor, unsigned StartTokenIndex) {
const auto flags = CXNameRange_WantQualifier | CXNameRange_WantQualifier;
....
}
PVS-Studio diagnostic message: V501 There are identical sub-expressions 'CXNameRange_WantQualifier' to the left and to the right of the '|' operator. CIndex.cpp 7245
The named constant CXNameRange_WantQualifier is used twice because of a typo.
Snippet No. 3: Confusion over operator precedence
int PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
....
if (ISD == ISD::EXTRACT_VECTOR_ELT && Index == ST->isLittleEndian() ? 1 : 0)
return 0;
....
}
PVS-Studio diagnostic message: V502 [CWE-783] Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator. PPCTargetTransformInfo.cpp 404
I find this bug very cute. Yes, I know that I have a strange taste :).
As dictated by operator precedence, the original expression is evaluated as follows:
(ISD == ISD::EXTRACT_VECTOR_ELT && (Index == ST->isLittleEndian())) ? 1 : 0
From the practical point of view, though, this condition doesn't make sense as it can be reduced to:
(ISD == ISD::EXTRACT_VECTOR_ELT && Index == ST->isLittleEndian())
This is obviously a bug. It must have been the Index variable that the programmer wanted to check for 0/1. To fix the code, the ternary operator should be enclosed in parentheses:
if (ISD == ISD::EXTRACT_VECTOR_ELT && Index == (ST->isLittleEndian() ? 1 : 0))
The ternary operator is actually very tricky and may lead to logic errors. Use it carefully and don't hesitate to put additional parentheses around it. This subject is discussed in more detail here, in the section "Beware of the ?: operator and enclose it in parentheses".
Snippets No. 4, 5: Null pointer
Init *TGParser::ParseValue(Record *CurRec, RecTy *ItemType, IDParseMode Mode) {
....
TypedInit *LHS = dyn_cast<TypedInit>(Result);
....
LHS = dyn_cast<TypedInit>(
UnOpInit::get(UnOpInit::CAST, LHS, StringRecTy::get())
->Fold(CurRec));
if (!LHS) {
Error(PasteLoc, Twine("can't cast '") + LHS->getAsString() +
"' to string");
return nullptr;
}
....
}
PVS-Studio diagnostic message: V522 [CWE-476] Dereferencing of the null pointer 'LHS' might take place. TGParser.cpp 2152
If the LHS pointer happens to be null, the program is expected to generate a warning. Instead, it will dereference that very null pointer: LHS->getAsString().
It's quite a typical situation for error handlers to contain bugs because developers don't test them properly. Static analyzers check all reachable code no matter how often it's actually executed. This is a good example of how static analysis complements other code testing and protection means.
A similar faulty handler for the RHS pointer is found a bit further: V522 [CWE-476] Dereferencing of the null pointer 'RHS' might take place. TGParser.cpp 2186
Snippet No. 6: Using a pointer after a move
static Expected<bool>
ExtractBlocks(....)
{
....
std::unique_ptr<Module> ProgClone = CloneModule(BD.getProgram(), VMap);
....
BD.setNewProgram(std::move(ProgClone)); // <=
MiscompiledFunctions.clear();
for (unsigned i = 0, e = MisCompFunctions.size(); i != e; ++i) {
Function *NewF = ProgClone->getFunction(MisCompFunctions[i].first); // <=
assert(NewF && "Function not found??");
MiscompiledFunctions.push_back(NewF);
}
....
}
PVS-Studio diagnostic message: V522 [CWE-476] Dereferencing of the null pointer 'ProgClone' might take place. Miscompilation.cpp 601
The smart pointer ProgClone first releases the object ownership:
BD.setNewProgram(std::move(ProgClone));
In fact, ProgClone has become a null pointer - so, technically, a null pointer gets dereferenced a bit further:
Function *NewF = ProgClone->getFunction(MisCompFunctions[i].first);
But that won't happen! Note that the loop doesn't actually execute at all.
The MiscompiledFunctions container is first cleared:
MiscompiledFunctions.clear();
And then its size is used in the loop condition:
for (unsigned i = 0, e = MisCompFunctions.size(); i != e; ++i) {
Obviously, the loop just won't start. I think it's a bug too, and the code was meant to look somehow differently.
I guess what we see here is that notorious error parity, where one bug acts as a disguise for another :).
Snippet No. 7: Using a pointer after a move
static Expected<bool> TestOptimizer(BugDriver &BD, std::unique_ptr<Module> Test,
std::unique_ptr<Module> Safe) {
outs() << " Optimizing functions being tested: ";
std::unique_ptr<Module> Optimized =
BD.runPassesOn(Test.get(), BD.getPassesToRun());
if (!Optimized) {
errs() << " Error running this sequence of passes"
<< " on the input program!\n";
BD.setNewProgram(std::move(Test)); // <=
BD.EmitProgressBitcode(*Test, "pass-error", false); // <=
if (Error E = BD.debugOptimizerCrash())
return std::move(E);
return false;
}
....
}
PVS-Studio diagnostic message: V522 [CWE-476] Dereferencing of the null pointer 'Test' might take place. Miscompilation.cpp 709
This one is similar to the previous case. The object's contents are first moved and then it's used as if nothing happened. This error has been growing ever more common after move semantics were added to C++. That's what I like about this language! You are given new ways to shoot yourself in the foot, which means PVS-Studio will always have work to do :).
Snippet No. 8: Null pointer
void FunctionDumper::dump(const PDBSymbolTypeFunctionArg &Symbol) {
uint32_t TypeId = Symbol.getTypeId();
auto Type = Symbol.getSession().getSymbolById(TypeId);
if (Type)
Printer << "<unknown-type>";
else
Type->dump(*this);
}
PVS-Studio diagnostic message: V522 [CWE-476] Dereferencing of the null pointer 'Type' might take place. PrettyFunctionDumper.cpp 233
Just like error handlers, test functions printing debug data don't usually get adequate test coverage either, and this is one example of that. Instead of helping the user solve their problems, the function is waiting for them to fix it.
Fixed code:
if (Type)
Type->dump(*this);
else
Printer << "<unknown-type>";
Snippet No. 9: Null pointer
void SearchableTableEmitter::collectTableEntries(
GenericTable &Table, const std::vector<Record *> &Items) {
....
RecTy *Ty = resolveTypes(Field.RecType, TI->getType());
if (!Ty) // <=
PrintFatalError(Twine("Field '") + Field.Name + "' of table '" +
Table.Name + "' has incompatible type: " +
Ty->getAsString() + " vs. " + // <=
TI->getType()->getAsString());
....
}
PVS-Studio diagnostic message: V522 [CWE-476] Dereferencing of the null pointer 'Ty' might take place. SearchableTableEmitter.cpp 614
I don't think you need any comments on this one.
Snippet No. 10: Typo
bool FormatTokenLexer::tryMergeCSharpNullConditionals() {
....
auto &Identifier = *(Tokens.end() - 2);
auto &Question = *(Tokens.end() - 1);
....
Identifier->ColumnWidth += Question->ColumnWidth;
Identifier->Type = Identifier->Type; // <=
Tokens.erase(Tokens.end() - 1);
return true;
}
PVS-Studio diagnostic message: V570 The 'Identifier->Type' variable is assigned to itself. FormatTokenLexer.cpp 249
Assigning a variable to itself is a meaningless operation. The programmer must have meant to do the following:
Identifier->Type = Question->Type;
Snippet No. 11: Suspicious break
void SystemZOperand::print(raw_ostream &OS) const {
switch (Kind) {
break;
case KindToken:
OS << "Token:" << getToken();
break;
case KindReg:
OS << "Reg:" << SystemZInstPrinter::getRegisterName(getReg());
break;
....
}
PVS-Studio diagnostic message: V622 [CWE-478] Consider inspecting the 'switch' statement. It's possible that the first 'case' operator is missing. SystemZAsmParser.cpp 652
There is a very suspicious break statement at the beginning. Shouldn't there be something else here?
Snippet No. 12: Checking a pointer after dereferencing
InlineCost AMDGPUInliner::getInlineCost(CallSite CS) {
Function *Callee = CS.getCalledFunction();
Function *Caller = CS.getCaller();
TargetTransformInfo &TTI = TTIWP->getTTI(*Callee);
if (!Callee || Callee->isDeclaration())
return llvm::InlineCost::getNever("undefined callee");
....
}
PVS-Studio diagnostic message: V595 [CWE-476] The 'Callee' pointer was utilized before it was verified against nullptr. Check lines: 172, 174. AMDGPUInline.cpp 172
The Callee pointer is first dereferenced when the getTTI function is called.
And then it turns out that the pointer should be checked for nullptr:
if (!Callee || Callee->isDeclaration())
Too late...
Snippets No. 13 - No....: Checking a pointer after dereferencing
The previous example isn't unique. The same problem is found in this snippet:
static Value *optimizeDoubleFP(CallInst *CI, IRBuilder<> &B,
bool isBinary, bool isPrecise = false) {
....
Function *CalleeFn = CI->getCalledFunction();
StringRef CalleeNm = CalleeFn->getName(); // <=
AttributeList CalleeAt = CalleeFn->getAttributes();
if (CalleeFn && !CalleeFn->isIntrinsic()) { // <=
....
}
PVS-Studio diagnostic message: V595 [CWE-476] The 'CalleeFn' pointer was utilized before it was verified against nullptr. Check lines: 1079, 1081. SimplifyLibCalls.cpp 1079
And this one:
void Sema::InstantiateAttrs(const MultiLevelTemplateArgumentList &TemplateArgs,
const Decl *Tmpl, Decl *New,
LateInstantiatedAttrVec *LateAttrs,
LocalInstantiationScope *OuterMostScope) {
....
NamedDecl *ND = dyn_cast<NamedDecl>(New);
CXXRecordDecl *ThisContext =
dyn_cast_or_null<CXXRecordDecl>(ND->getDeclContext()); // <=
CXXThisScopeRAII ThisScope(*this, ThisContext, Qualifiers(),
ND && ND->isCXXInstanceMember()); // <=
....
}
PVS-Studio diagnostic message: V595 [CWE-476] The 'ND' pointer was utilized before it was verified against nullptr. Check lines: 532, 534. SemaTemplateInstantiateDecl.cpp 532
And here:
Then I lost interest in tracking V595 warnings, so I can't tell you if there are other bugs of this type besides the ones shown above. I bet there are.
Snippets No. 17, 18: Suspicious shift
static inline bool processLogicalImmediate(uint64_t Imm, unsigned RegSize,
uint64_t &Encoding) {
....
unsigned Size = RegSize;
....
uint64_t NImms = ~(Size-1) << 1;
....
}
PVS-Studio diagnostic message: V629 [CWE-190] Consider inspecting the '~(Size - 1) << 1' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. AArch64AddressingModes.h 260
This code might actually be correct, but it does look strange and needs examining.
Suppose the Size variable has the value 16; then the NImms variable is expected to get the following value:
1111111111111111111111111111111111111111111111111111111111100000
But in reality it will get the value:
0000000000000000000000000000000011111111111111111111111111100000
This happens because all the calculations are done on the 32-bit unsigned type, and only then does it get implicitly promoted to uint64_t, with the most significant bits zeroed out.
The problem can be fixed as follows:
uint64_t NImms = ~static_cast<uint64_t>(Size-1) << 1;
Another bug of this type: V629 [CWE-190] Consider inspecting the 'Immr << 6' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. AArch64AddressingModes.h 269
Snippet No. 19: Missing keyword else?
void AMDGPUAsmParser::cvtDPP(MCInst &Inst, const OperandVector &Operands) {
....
if (Op.isReg() && Op.Reg.RegNo == AMDGPU::VCC) {
// VOP2b (v_add_u32, v_sub_u32 ...) dpp use "vcc" token.
// Skip it.
continue;
} if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) { // <=
Op.addRegWithFPInputModsOperands(Inst, 2);
} else if (Op.isDPPCtrl()) {
Op.addImmOperands(Inst, 1);
} else if (Op.isImm()) {
// Handle optional arguments
OptionalIdx[Op.getImmTy()] = I;
} else {
llvm_unreachable("Invalid operand type");
}
....
}
PVS-Studio diagnostic message: V646 [CWE-670] Consider inspecting the application's logic. It's possible that 'else' keyword is missing. AMDGPUAsmParser.cpp 5655
This one is not a bug. Since the then block of the first if statement ends with continue, it doesn't matter if it has the else keyword or not. The behavior will be the same in any case. However, the missing else makes the code less readable and, therefore, potentially dangerous. If continue disappears one day, the behavior will change drastically. I strongly recommend adding else.
Snippet No. 20: Four identical typos
LLVM_DUMP_METHOD void Symbol::dump(raw_ostream &OS) const {
std::string Result;
if (isUndefined())
Result += "(undef) ";
if (isWeakDefined())
Result += "(weak-def) ";
if (isWeakReferenced())
Result += "(weak-ref) ";
if (isThreadLocalValue())
Result += "(tlv) ";
switch (Kind) {
case SymbolKind::GlobalSymbol:
Result + Name.str(); // <=
break;
case SymbolKind::ObjectiveCClass:
Result + "(ObjC Class) " + Name.str(); // <=
break;
case SymbolKind::ObjectiveCClassEHType:
Result + "(ObjC Class EH) " + Name.str(); // <=
break;
case SymbolKind::ObjectiveCInstanceVariable:
Result + "(ObjC IVar) " + Name.str(); // <=
break;
}
OS << Result;
}
PVS-Studio diagnostic messages:
The programmer accidentally used the + operator instead of += and ended up with four meaningless constructs.
Snippet No. 21: Undefined behavior
static void getReqFeatures(std::map<StringRef, int> &FeaturesMap,
const std::vector<Record *> &ReqFeatures) {
for (auto &R : ReqFeatures) {
StringRef AsmCondString = R->getValueAsString("AssemblerCondString");
SmallVector<StringRef, 4> Ops;
SplitString(AsmCondString, Ops, ",");
assert(!Ops.empty() && "AssemblerCondString cannot be empty");
for (auto &Op : Ops) {
assert(!Op.empty() && "Empty operator");
if (FeaturesMap.find(Op) == FeaturesMap.end())
FeaturesMap[Op] = FeaturesMap.size();
}
}
}
Try to spot the bug on your own first. I added the image so that you don't peek at the answer right away:
PVS-Studio diagnostic message: V708 [CWE-758] Dangerous construction is used: 'FeaturesMap[Op] = FeaturesMap.size()', where 'FeaturesMap' is of 'map' class. This may lead to undefined behavior. RISCVCompressInstEmitter.cpp 490
The faulty line is this one:
FeaturesMap[Op] = FeaturesMap.size();
If the Op element hasn't been found, the program creates a new element in the map and assigns it the total number of elements in this map. You just don't know if the size function will be called before or after adding the new element.
Snippets No. 22 - No. 24: Duplicate assignments
Error MachOObjectFile::checkSymbolTable() const {
....
} else {
MachO::nlist STE = getSymbolTableEntry(SymDRI);
NType = STE.n_type; // <=
NType = STE.n_type; // <=
NSect = STE.n_sect;
NDesc = STE.n_desc;
NStrx = STE.n_strx;
NValue = STE.n_value;
}
....
}
PVS-Studio diagnostic message: V519 [CWE-563] The 'NType' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 1663, 1664. MachOObjectFile.cpp 1664
I don't think it's a true error - rather a duplicate assignment. But it's still a defect.
Two other cases:
Snippets No. 25 - No. 27: More duplicate assignments
These ones deal with slightly different versions of duplicate assignments.
bool Vectorizer::vectorizeLoadChain(
ArrayRef<Instruction *> Chain,
SmallPtrSet<Instruction *, 16> *InstructionsProcessed) {
....
unsigned Alignment = getAlignment(L0);
....
unsigned NewAlign = getOrEnforceKnownAlignment(L0->getPointerOperand(),
StackAdjustedAlignment,
DL, L0, nullptr, &DT);
if (NewAlign != 0)
Alignment = NewAlign;
Alignment = NewAlign;
....
}
PVS-Studio diagnostic message: V519 [CWE-563] The 'Alignment' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 1158, 1160. LoadStoreVectorizer.cpp 1160
This is a very strange snippet, and it probably contains a logic error. The Alignment variable is first assigned the value based on the condition, and then it is assigned the value once again, but without any prior check.
Similar defects:
Snippet No. 28: Always true condition
static int readPrefixes(struct InternalInstruction* insn) {
....
uint8_t byte = 0;
uint8_t nextByte;
....
if (byte == 0xf3 && (nextByte == 0x88 || nextByte == 0x89 ||
nextByte == 0xc6 || nextByte == 0xc7)) {
insn->xAcquireRelease = true;
if (nextByte != 0x90) // PAUSE instruction support // <=
break;
}
....
}
PVS-Studio diagnostic message: V547 [CWE-571] Expression 'nextByte != 0x90' is always true. X86DisassemblerDecoder.cpp 379
The check doesn't make sense. The nextByte variable is never equal to 0x90: it just logically follows from the previous check. This must be some logic error.
Snippets No. 29 - No....: Always true/false conditions
There are many warnings about an entire condition (V547) or part of a condition (V560) being always true or false. Rather than genuine bugs, these are often simply bad code, the effects of macro expansion, and so on. That said, all such warnings should still be checked because some of them may point at genuine logic errors. For example, the following snippet doesn't look right:
static DecodeStatus DecodeGPRPairRegisterClass(MCInst &Inst, unsigned RegNo,
uint64_t Address, const void *Decoder) {
DecodeStatus S = MCDisassembler::Success;
if (RegNo > 13)
return MCDisassembler::Fail;
if ((RegNo & 1) || RegNo == 0xe)
S = MCDisassembler::SoftFail;
....
}
PVS-Studio diagnostic message: V560 [CWE-570] A part of conditional expression is always false: RegNo == 0xe. ARMDisassembler.cpp 939
The 0xE constant is the decimal number 14. The check RegNo == 0xe doesn't make sense because if RegNo > 13, the function will return.
I saw a lot of other V547 and V560 warnings, but, like with V595, I didn't feel excited about checking them since I already had enough material for an article :). So, no figures for the total number of bugs of this type in LLVM.
Here's an example to illustrate why checking those warnings is boring. The analyzer is totally correct when issuing a warning on the following code. But it's still not a bug.
bool UnwrappedLineParser::parseBracedList(bool ContinueOnSemicolons,
tok::TokenKind ClosingBraceKind) {
bool HasError = false;
....
HasError = true;
if (!ContinueOnSemicolons)
return !HasError;
....
}
PVS-Studio diagnostic message: V547 [CWE-570] Expression '!HasError' is always false. UnwrappedLineParser.cpp 1635
Snippet No. 30: Suspicious return
static bool
isImplicitlyDef(MachineRegisterInfo &MRI, unsigned Reg) {
for (MachineRegisterInfo::def_instr_iterator It = MRI.def_instr_begin(Reg),
E = MRI.def_instr_end(); It != E; ++It) {
return (*It).isImplicitDef();
}
....
}
PVS-Studio diagnostic message: V612 [CWE-670] An unconditional 'return' within a loop. R600OptimizeVectorRegisters.cpp 63
It's either a bug or a specific coding technique meant to communicate some idea to fellow programmers. To me it doesn't tell anything except that it's a very suspicious piece of code. Please don't write code like that :).
Feeling tired? OK, it's time to make some tea or coffee.
I think 30 examples is enough for existing diagnostics. Now let's see if we can find anything interesting with the new diagnostics, which were added after the previous check. Over the last two years, the C++ analyzer module was extended with 66 new diagnostics.
Snippet No. 31: Unreachable code
Error CtorDtorRunner::run() {
....
if (auto CtorDtorMap =
ES.lookup(JITDylibSearchList({{&JD, true}}), std::move(Names),
NoDependenciesToRegister, true))
{
....
return Error::success();
} else
return CtorDtorMap.takeError();
CtorDtorsByPriority.clear();
return Error::success();
}
PVS-Studio diagnostic message: V779 [CWE-561] Unreachable code detected. It is possible that an error is present. ExecutionUtils.cpp 146
As you can see, both branches of the if statement end with a return statement, which means the CtorDtorsByPriority container will never be cleared.
Snippet No. 32: Unreachable code
bool LLParser::ParseSummaryEntry() {
....
switch (Lex.getKind()) {
case lltok::kw_gv:
return ParseGVEntry(SummaryID);
case lltok::kw_module:
return ParseModuleEntry(SummaryID);
case lltok::kw_typeid:
return ParseTypeIdEntry(SummaryID); // <=
break; // <=
default:
return Error(Lex.getLoc(), "unexpected summary kind");
}
Lex.setIgnoreColonInIdentifiers(false); // <=
return false;
}
PVS-Studio diagnostic message: V779 [CWE-561] Unreachable code detected. It is possible that an error is present. LLParser.cpp 835
This one is interesting. Take a look at this part first:
return ParseTypeIdEntry(SummaryID);
break;
There seems to be nothing strange about this code; the break statement is unnecessary and can be safely removed. But it's not that simple.
The warning is triggered by the following lines:
Lex.setIgnoreColonInIdentifiers(false);
return false;
Indeed, this code is unreachable. All the case labels of the switch statement end with a return, and the meaningless lone break doesn't look that harmless anymore! What if one of the branches was meant to end with a break rather than return?
Snippet No. 33: Accidental clearing of the most significant bits
unsigned getStubAlignment() override {
if (Arch == Triple::systemz)
return 8;
else
return 1;
}
Expected<unsigned>
RuntimeDyldImpl::emitSection(const ObjectFile &Obj,
const SectionRef &Section,
bool IsCode) {
....
uint64_t DataSize = Section.getSize();
....
if (StubBufSize > 0)
DataSize &= ~(getStubAlignment() - 1);
....
}
PVS-Studio diagnostic message: V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. RuntimeDyld.cpp 815
Note that the getStubAlignment function returns an unsigned value. Let's see how the expression will evaluate, assuming that the function will return the value 8:
~(getStubAlignment() - 1)
~(8u-1)
0xFFFFFFF8u
Note now that the DataSize variable's type is 64-bit unsigned. So it turns out that executing the operation DataSize & 0xFFFFFFF8 will result in clearing all 32 most significant bits of the value. I don't think the programmer wanted that. Perhaps they meant it to be DataSize & 0xFFFFFFFFFFFFFFF8u.
To fix the error, the code should be rewritten like this:
DataSize &= ~(static_cast<uint64_t>(getStubAlignment()) - 1);
Or like this:
DataSize &= ~(getStubAlignment() - 1ULL);
Snippet No. 34: Bad explicit type conversion
template <typename T>
void scaleShuffleMask(int Scale, ArrayRef<T> Mask,
SmallVectorImpl<T> &ScaledMask) {
assert(0 < Scale && "Unexpected scaling factor");
int NumElts = Mask.size();
ScaledMask.assign(static_cast<size_t>(NumElts * Scale), -1);
....
}
PVS-Studio diagnostic message: V1028 [CWE-190] Possible overflow. Consider casting operands of the 'NumElts * Scale' operator to the 'size_t' type, not the result. X86ISelLowering.h 1577
Explicit type conversion is used to avoid an overflow when multiplying variables of type int. In this case, though, it doesn't work because the multiplication will occur first and only then will the 32-bit result be promoted to type size_t.
Snippet No. 35: Bad copy-paste
Instruction *InstCombiner::visitFCmpInst(FCmpInst &I) {
....
if (!match(Op0, m_PosZeroFP()) && isKnownNeverNaN(Op0, &TLI)) {
I.setOperand(0, ConstantFP::getNullValue(Op0->getType()));
return &I;
}
if (!match(Op1, m_PosZeroFP()) && isKnownNeverNaN(Op1, &TLI)) {
I.setOperand(1, ConstantFP::getNullValue(Op0->getType())); // <=
return &I;
}
....
}
V778 [CWE-682] Two similar code fragments were found. Perhaps, this is a typo and 'Op1' variable should be used instead of 'Op0'. InstCombineCompares.cpp 5507
This new cool diagnostic detects situations where a code fragment is written using copy-paste, with all the names changed save one.
Note that all Op0's except one were changed to Op1 in the second block. The code should probably look like this:
if (!match(Op1, m_PosZeroFP()) && isKnownNeverNaN(Op1, &TLI)) {
I.setOperand(1, ConstantFP::getNullValue(Op1->getType()));
return &I;
}
Snippet No. 36: Variables mixed up
struct Status {
unsigned Mask;
unsigned Mode;
Status() : Mask(0), Mode(0){};
Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
Mode &= Mask;
};
....
};
PVS-Studio diagnostic message: V1001 [CWE-563] The 'Mode' variable is assigned but is not used by the end of the function. SIModeRegister.cpp 48
It's very dangerous to have the same names for function arguments as for class members because you risk mixing them up. What you see here is an example of that. The following expression is meaningless:
Mode &= Mask;
The argument is changed but never used after that. This snippet should probably look like this:
Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
this->Mode &= Mask;
};
Snippet No. 37: Variables mixed up
class SectionBase {
....
uint64_t Size = 0;
....
};
class SymbolTableSection : public SectionBase {
....
};
void SymbolTableSection::addSymbol(Twine Name, uint8_t Bind, uint8_t Type,
SectionBase *DefinedIn, uint64_t Value,
uint8_t Visibility, uint16_t Shndx,
uint64_t Size) {
....
Sym.Value = Value;
Sym.Visibility = Visibility;
Sym.Size = Size;
Sym.Index = Symbols.size();
Symbols.emplace_back(llvm::make_unique<Symbol>(Sym));
Size += this->EntrySize;
}
PVS-Studio diagnostic message: V1001 [CWE-563] The 'Size' variable is assigned but is not used by the end of the function. Object.cpp 424
This one is similar to the previous example. Correct version:
this->Size += this->EntrySize;
Snippets No. 38 - No. 47: Missing pointer check
We looked at a few examples of the V595 warning a bit earlier. What it detects is a situation when a pointer is first dereferenced and only then checked. The new diagnostic V1004 is the opposite of that, and it detects tons of errors too. It looks for already tested pointers that are not tested again when necessary. Here are a few errors of this type found in LLVM's code.
int getGEPCost(Type *PointeeType, const Value *Ptr,
ArrayRef<const Value *> Operands) {
....
if (Ptr != nullptr) { // <=
assert(....);
BaseGV = dyn_cast<GlobalValue>(Ptr->stripPointerCasts());
}
bool HasBaseReg = (BaseGV == nullptr);
auto PtrSizeBits = DL.getPointerTypeSizeInBits(Ptr->getType()); // <=
....
}
PVS-Studio diagnostic message: V1004 [CWE-476] The 'Ptr' pointer was used unsafely after it was verified against nullptr. Check lines: 729, 738. TargetTransformInfoImpl.h 738
Ptr can be nullptr, which is indicated by the check:
if (Ptr != nullptr)
However, the same pointer is dereferenced without such a check a bit further:
auto PtrSizeBits = DL.getPointerTypeSizeInBits(Ptr->getType());
Another similar case.
llvm::DISubprogram *CGDebugInfo::getFunctionFwdDeclOrStub(GlobalDecl GD,
bool Stub) {
....
auto *FD = dyn_cast<FunctionDecl>(GD.getDecl());
SmallVector<QualType, 16> ArgTypes;
if (FD) // <=
for (const ParmVarDecl *Parm : FD->parameters())
ArgTypes.push_back(Parm->getType());
CallingConv CC = FD->getType()->castAs<FunctionType>()->getCallConv(); // <=
....
}
PVS-Studio diagnostic message: V1004 [CWE-476] The 'FD' pointer was used unsafely after it was verified against nullptr. Check lines: 3228, 3231. CGDebugInfo.cpp 3231
Note the FD pointer. This error is straightforward, so no comments on this one.
One more here:
static void computePolynomialFromPointer(Value &Ptr, Polynomial &Result,
Value *&BasePtr,
const DataLayout &DL) {
PointerType *PtrTy = dyn_cast<PointerType>(Ptr.getType());
if (!PtrTy) { // <=
Result = Polynomial();
BasePtr = nullptr;
}
unsigned PointerBits =
DL.getIndexSizeInBits(PtrTy->getPointerAddressSpace()); // <=
....
}
PVS-Studio diagnostic message: V1004 [CWE-476] The 'PtrTy' pointer was used unsafely after it was verified against nullptr. Check lines: 960, 965. InterleavedLoadCombinePass.cpp 965
How do you avoid errors like that? Be very careful when reviewing your code and check it regularly with PVS-Studio.
I don't think we should examine other examples of this type, so here's just a list of the warnings:
Snippets No. 48 - No. 60: Not critical but still a defect (potential memory leak)
std::unique_ptr<IRMutator> createISelMutator() {
....
std::vector<std::unique_ptr<IRMutationStrategy>> Strategies;
Strategies.emplace_back(
new InjectorIRStrategy(InjectorIRStrategy::getDefaultOps()));
....
}
PVS-Studio diagnostic message: V1023 [CWE-460] A pointer without owner is added to the 'Strategies' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-isel-fuzzer.cpp 58
You can't simply write xxx.push_back(new X) to append an element to a container of type std::vector<std::unique_ptr<X>> because there is no implicit cast from X* to std::unique_ptr<X>.
The popular solution is to write xxx.emplace_back(new X) since it is compilable: the emplace_back method constructs the element directly from the arguments and, therefore, can use explicit constructors.
But that solution isn't safe. If the vector is full, memory will be reallocated. This operation may fail and end up raising an std::bad_alloc exception. In this case, the pointer will be lost and the program won't be able to delete the object created.
A safer solution is to create a unique_ptr, which will retain the pointer until the vector attempts to reallocate the memory:
xxx.push_back(std::unique_ptr<X>(new X))
The C++14 standard allows you to use 'std::make_unique':
xxx.push_back(std::make_unique<X>())
This type of defect has no effect in LLVM. Compilation will simply terminate if memory allocation fails. That said, it may be quite critical in applications with a long uptime, which can't simply terminate when a memory allocation failure occurs.
So, even though this code isn't dangerous to LLVM, I thought I should still tell you about this bug pattern and the fact that PVS-Studio can now detect it.
Other similar cases:
I wrote down 60 warnings and stopped at that. Did PVS-Studio find any other bugs in LLVM? Yes, it did. But as I was writing down the examples, night fell, so I decided to knock off.
I hope you enjoyed reading this article and it encouraged you to try the PVS-Studio analyzer for yourself.
Visit this page to download the analyzer and get a trial key.
Most importantly, use static analysis regularly. One-time checks, like those that we do to popularize static analysis and promote PVS-Studio, aren't the normal scenario.
Good luck with improving your code's quality and reliability!