Commercial static analyzers perform deeper and fuller code analysis compared to compilers. Let's see what PVS-Studio found in the source code of the LLVM 13.0.0 project.
Compilers developers constantly improve their products and built-in code analyzers. Some IDEs such as Visual Studio and CLion provide enhanced built-in analyzers. So, developers ask an obvious question – is it reasonable to use additional solutions to control code quality? Or it's enough to use built-in tools of a modern compiler or IDE?
Developing a project, you should use the minimum required applications. Therefore, if you use a program with up-to-the-mark mechanisms of code analysis, it's enough – you don't need to add additional utilities to the pipeline.
That's how we do it in PVS-Studio. Occasionally, our users ask us whether we provide analysis better than some other compiler or its analyzer. Usually, the number of such questions increases with a new compiler release.
In theory, there are some proper answers to these questions. Here they are:
But that's not what you wanted to hear, yeah? :). It seems that we want to evade the question. This article is a way to answer. Our team checks compilers showing the product's capabilities.
Today we check the latest LLVM 13.0.0 release. Of course, our readers and we are not interested in LLVM. We're going to evaluate the power of PVS-Studio in comparison with the Clang compiler, Clang Static Analyzer, and Clang-tidy. LLVM developers use these programs to build and check the project. If we find some errors, you'll see the benefits of introducing PVS-Studio in the development process.
Previously, we checked LLVM 11. Click here if you're wondering to know more.
It's more convenient to view PVS-Studio warnings in an IDE. I had Visual Studio 2019 on my computer. So, I used it. And little left to do:
In fact, that's not so easy. If you don't want to receive a large number of false or banal (within the project) warnings, you need to preconfigure the analyzer. I don't mind receiving such warnings, since I need to find some exciting bugs worthy of an article. And that's all.
If you want to use the analyzer regularly – you need to preconfigure it. Also, it's better to start with declaring all warnings a technical debt and hide them. Then, you can handle new warnings, gradually eliminating technical debt. Here you can find this approach described in detail.
We have lots of articles explaining how to set up and introduce the analyzer. Let's stick to the main topic. Curious to know what we found? Let's figure it out.
I spent an evening viewing the log and wrote out engaging warnings. Sure you can find far more errors. However, the fact that skimming through the report, you can fix 20 errors, demonstrates the analyzer in a favorable light.
PVS-Studio is, and always has been good at detecting typos. You can easily spot them in code snippets described in the article. During code reviews, programmers fail to find typos and then get angry detecting them after debugging :).
It's easy to come up with rules for detecting typos. But it is much more difficult to implement them. You have to strike a balance between useful warnings and false positives. The Clang compiler and related analyzers have diagnostics to identify various types of errors that I describe below. But since they did not help – our analyzer has better diagnostics.
uint64_t uval;
....
bool DWARFFormValue::extractValue(const DWARFDataExtractor &Data,
uint64_t *OffsetPtr, dwarf::FormParams FP,
const DWARFContext *Ctx,
const DWARFUnit *CU) {
....
case DW_FORM_LLVM_addrx_offset:
Value.uval = Data.getULEB128(OffsetPtr, &Err) << 32;
Value.uval = Data.getU32(OffsetPtr, &Err);
break;
....
}
The PVS-Studio warning: V519 [CWE-563, CERT-MSC13-C] The 'Value.uval' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 334, 335. DWARFFormValue.cpp 335
It makes no sense to write different values one by one to the same variable. This is exactly what the analyzer warns us about. The code author made a typo, forgetting to add '|'. This code should create one 64-bit value from two 32-bit values. The correct code looks as follows:
case DW_FORM_LLVM_addrx_offset:
Value.uval = Data.getULEB128(OffsetPtr, &Err) << 32;
Value.uval |= Data.getU32(OffsetPtr, &Err);
break;
In the ExecutorAddress class, we need to implement operators of the same type. I'm pretty sure the programmer used copy-paste. Don't you think it's boring to write the following code without copy-paste?
class ExecutorAddress {
....
ExecutorAddress &operator++() {
++Addr;
return *this;
}
ExecutorAddress &operator--() {
--Addr;
return *this;
}
ExecutorAddress operator++(int) { return ExecutorAddress(Addr++); }
ExecutorAddress operator--(int) { return ExecutorAddress(Addr++); }
ExecutorAddress &operator+=(const ExecutorAddrDiff Delta) {
Addr += Delta.getValue();
return *this;
}
ExecutorAddress &operator-=(const ExecutorAddrDiff Delta) {
Addr -= Delta.getValue();
return *this;
}
....
private:
uint64_t Addr = 0;
}
Unfortunately, the faster you write the code – the higher is the probability of forgetting to replace something in the copied code. It's tedious to write and check such code. That's why it "attracts" errors.
Fortunately, static analyzers work hard and can't get tired :). PVS-Studio completes code reviews. It offers to pay attention to these two functions:
ExecutorAddress operator++(int) { return ExecutorAddress(Addr++); }
ExecutorAddress operator--(int) { return ExecutorAddress(Addr++); }
The PVS-Studio warning: V524 It is odd that the body of '--' function is fully equivalent to the body of '++' function. ExecutorAddress.h 104
A striking error: the programmer forgot to replace the ++ operator with -- in the right part of the copied line.
bool operator==(const BDVState &Other) const {
return OriginalValue == OriginalValue && BaseValue == Other.BaseValue &&
Status == Other.Status;
}
V501 [CWE-571] There are identical sub-expressions to the left and to the right of the '==' operator: OriginalValue == OriginalValue RewriteStatepointsForGC.cpp 758
A classic error! I covered this topic in another long article - "The evil within the comparison functions".
To reduce the number of such errors, I recommend using table-style formatting when handling operations of the same type. Here's how I would write this function:
bool operator==(const BDVState &Other) const {
return
OriginalValue == OriginalValue
&& BaseValue == Other.BaseValue
&& Status == Other.Status;
}
The code is longer, but it helps the programmer to notice the typo during code review. However, you still may fail to notice an error. To be on the safe side, it's better to use an enhanced analyzer.
Considering the previous example, you may think that I'm exaggerating because it is a random blooper. Unfortunately, comparison functions tend to typos. Let's take a look at another example.
bool TypeInfer::EnforceSmallerThan(TypeSetByHwMode &Small,
TypeSetByHwMode &Big) {
....
if (Small.empty())
Changed |= EnforceAny(Small);
if (Big.empty())
Changed |= EnforceAny(Big);
assert(Small.hasDefault() && Big.hasDefault());
SmallVector<unsigned, 4> Modes;
union_modes(Small, Big, Modes);
for (unsigned M : Modes) {
TypeSetByHwMode::SetType &S = Small.get(M);
TypeSetByHwMode::SetType &B = Big.get(M);
if (any_of(S, isIntegerOrPtr) && any_of(S, isIntegerOrPtr)) {
auto NotInt = [](MVT VT) { return !isIntegerOrPtr(VT); };
Changed |= berase_if(S, NotInt);
Changed |= berase_if(B, NotInt);
} else if (any_of(S, isFloatingPoint) && any_of(B, isFloatingPoint)) {
auto NotFP = [](MVT VT) { return !isFloatingPoint(VT); };
Changed |= berase_if(S, NotFP);
Changed |= berase_if(B, NotFP);
} else if (S.empty() || B.empty()) {
Changed = !S.empty() || !B.empty();
S.clear();
B.clear();
} else {
TP.error("Incompatible types");
return Changed;
}
....
}
Why don't you try to find the typo before I show you the error? Here's a picture to hide the answer.
Here's the issue:
if (any_of(S, isIntegerOrPtr) && any_of(S, isIntegerOrPtr))
The PVS-Studio warning: V501 [CWE-571] There are identical sub-expressions 'any_of(S, isIntegerOrPtr)' to the left and to the right of the '&&' operator. CodeGenDAGPatterns.cpp 479
The correct version:
if (any_of(S, isIntegerOrPtr) && any_of(B, isIntegerOrPtr))
LegalizerHelper::LegalizeResult LegalizerHelper::lowerRotate(MachineInstr &MI) {
Register Dst = MI.getOperand(0).getReg();
Register Src = MI.getOperand(1).getReg();
Register Amt = MI.getOperand(2).getReg();
LLT DstTy = MRI.getType(Dst);
LLT SrcTy = MRI.getType(Dst);
LLT AmtTy = MRI.getType(Amt);
....
}
The PVS-Studio warning: V656 [CWE-665] Variables 'DstTy', 'SrcTy' are initialized through the call to the same function. It's probably an error or un-optimized code. Consider inspecting the 'MRI.getType(Dst)' expression. Check lines: 5953, 5954. LegalizerHelper.cpp 5954
As I mentioned earlier, formatting the code with a table helps to protect code from typos. Yep, it helps, but you can't be sure for 100%. This is beautiful code, resembling a table. But it still contains an error.
It seems that the programmer used copy-paste for the following line:
LLT DstTy = MRI.getType(Dst);
But they replaced Dst by Src only in one place:
LLT SrcTy = MRI.getType(Dst);
The correct code looks as follows:
LLT DstTy = MRI.getType(Dst);
LLT SrcTy = MRI.getType(Src);
LLT AmtTy = MRI.getType(Amt);
One does not simply write code in C or C++ without accidentally dereferencing a null pointer somewhere :). LLVM has such cases too. It's boring and tedious to study warnings about null pointers. I've looked through these warnings. I guess I could find much more of them.
void DwarfCompileUnit::addLabelAddress(DIE &Die, dwarf::Attribute Attribute,
const MCSymbol *Label) {
....
if (Label)
DD->addArangeLabel(SymbolCU(this, Label));
bool UseAddrOffsetFormOrExpressions =
DD->useAddrOffsetForm() || DD->useAddrOffsetExpressions();
const MCSymbol *Base = nullptr;
if (Label->isInSection() && UseAddrOffsetFormOrExpressions)
Base = DD->getSectionLabel(&Label->getSection());
....
}
The PVS-Studio warning: V1004 [CWE-476, CERT-EXP34-C] The 'Label' pointer was used unsafely after it was verified against nullptr. Check lines: 74, 81. DwarfCompileUnit.cpp 81
The "if (Label)" check tells us and the analyzer that the Label pointer can be null. But then this pointer is dereferenced without any verification:
if (Label->isInSection() && UseAddrOffsetFormOrExpressions)
Better not to do so.
static bool HandleUse(....)
{
....
if (Pat->isLeaf()) {
DefInit *DI = dyn_cast<DefInit>(Pat->getLeafValue());
if (!DI)
I.error("Input $" + Pat->getName() + " must be an identifier!");
Rec = DI->getDef();
}
....
}
The PVS-Studio warning: V1004 [CWE-476, CERT-EXP34-C] The 'DI' pointer was used unsafely after it was verified against nullptr. Check lines: 3349, 3351. CodeGenDAGPatterns.cpp 3351
The DI pointer is checked, but then it is immediately dereferenced without checking. The question arises: is this an error? If the DI pointer is null, the error function that can generate an exception is called. Let's take a look at this function:
void TreePattern::error(const Twine &Msg) {
if (HasError)
return;
dump();
PrintError(TheRecord->getLoc(), "In " + TheRecord->getName() + ": " + Msg);
HasError = true;
}
Nope, this function does not throw an exception and does not terminate the program.
Right after logging an error state, null pointer dereference happens.
The project has a few more similar errors. There's no point to consider them separately:
Error DWARFDebugLine::LineTable::parse(...., raw_ostream *OS, bool Verbose) {
assert((OS || !Verbose) && "cannot have verbose output without stream");
....
auto EmitRow = [&] {
if (!TombstonedAddress) {
if (Verbose) {
*OS << "\n";
OS->indent(12);
}
if (OS)
State.Row.dump(*OS);
State.appendRowToMatrix();
}
};
....
}
The PVS-Studio warning: V595 [CWE-476, CERT-EXP12-C] The 'OS' pointer was utilized before it was verified against nullptr. Check lines: 791, 793. DWARFDebugLine.cpp 791
The "if (OS)" check indicates that the OS pointer can be null. However, this pointer can be already dereferenced without prior verification.
The code block starts with assert that protects against null pointers. However, this is not enough, since, in the release build, the assert macro is expanded in an empty string.
It's better to make the code safer:
auto EmitRow = [&] {
if (!TombstonedAddress) {
if (OS)
{
if (Verbose) {
*OS << "\n";
OS->indent(12);
}
State.Row.dump(*OS);
}
State.appendRowToMatrix();
}
};
LLVM developers sometimes think that small enums are represented by a single byte. That is, sizeof(enum) == sizeof(char). It's quite dangerous to do that. According to the C++ standard, the sizes of types are implementation-defined.
enum class FunctionKinds { ENTRY, EXIT, TAIL, LOG_ARGS_ENTER, CUSTOM_EVENT };
....
static Error loadObj(....) {
....
auto Kind = Extractor.getU8(&OffsetPtr);
static constexpr SledEntry::FunctionKinds Kinds[] = {
SledEntry::FunctionKinds::ENTRY, SledEntry::FunctionKinds::EXIT,
SledEntry::FunctionKinds::TAIL,
SledEntry::FunctionKinds::LOG_ARGS_ENTER,
SledEntry::FunctionKinds::CUSTOM_EVENT};
if (Kind >= sizeof(Kinds))
return errorCodeToError(
std::make_error_code(std::errc::executable_format_error));
Entry.Kind = Kinds[Kind];
....
}
The PVS-Studio warning: V557 [CWE-125, CERT-ARR30-C] Array overrun is possible. The value of 'Kind' index could reach 19. InstrumentationMap.cpp 196
The warning requires explanation. Data flow analysis processes this code:
if (Kind >= sizeof(Kinds))
return errorCodeToError(...);
As a result, if the condition is not met, the Kind variable further has the [0..19] value.
Why 19 and not 4? Look at how the FunctionKinds type is declared. According to the C++11 standard, scoped enums have an underlying type of int by default. The analyzer knows that the Visual C++ on the x64 platform was used, which means that the enumeration will be represented by 4 bytes. You can see it yourself if you write this test program:
int main()
{
enum class FunctionKinds { ENTRY, EXIT, TAIL, LOG_ARGS_ENTER, CUSTOM_EVENT };
static constexpr FunctionKinds Kinds[] = {
FunctionKinds::ENTRY, FunctionKinds::EXIT, FunctionKinds::TAIL,
FunctionKinds::LOG_ARGS_ENTER, FunctionKinds::CUSTOM_EVENT
};
std::cout << sizeof(Kinds) << std::endl;
return 0;
}
We build the program with the Visual C++ compiler, run it and see the number "20".
It turns out that our code is not protected from the protection against array index out of bounds. To fix the code, you need to compare Kind not with the size of the array in bytes, but with the number of array elements.
The correct check:
if (Kind >= sizeof(Kinds) / sizeof(Kinds[0]))
return errorCodeToError(....);
enum CondCode {
// Opcode N U L G E Intuitive operation
SETFALSE, // 0 0 0 0 Always false (always folded)
SETOEQ, // 0 0 0 1 True if ordered and equal
....
SETCC_INVALID // Marker value.
};
static void InitCmpLibcallCCs(ISD::CondCode *CCs) {
memset(CCs, ISD::SETCC_INVALID, sizeof(ISD::CondCode)*RTLIB::UNKNOWN_LIBCALL);
....
}
The PVS-Studio warning: V575 [CWE-628, CERT-EXP37-C] The 'memset' function processes the pointer to enum type. Inspect the first argument. TargetLoweringBase.cpp 662
The code runs only if you are lucky and the elements of the CondCode enumeration are represented by one byte.
The memset function fills an array of bytes. The SETCC_INVALID value is written to each byte. If enum is represented by 4 bytes, as happens with Visual C++ assembly, the array is filled with meaningless values. These values are equal to the result of repeating the constant in each of the 4 bytes:
SETCC_INVALID << 24 | SETCC_INVALID << 16 | SETCC_INVALID << 8 | SETCC_INVALID
The correct way to fill the array:
std::fill(CCs, CCs + RTLIB::UNKNOWN_LIBCALL, ISD::SETCC_INVALID);
Expected<std::pair<JITTargetAddress, Edge::Kind>>
EHFrameEdgeFixer::readEncodedPointer(uint8_t PointerEncoding,
JITTargetAddress PointerFieldAddress,
BinaryStreamReader &RecordReader) {
.....
Edge::Kind PointerEdgeKind;
switch (EffectiveType) {
case DW_EH_PE_udata4: {
....
PointerEdgeKind = Delta32;
break;
}
case DW_EH_PE_udata8: {
....
PointerEdgeKind = Delta64;
break;
}
case DW_EH_PE_sdata4: {
....
PointerEdgeKind = Delta32;
break;
}
case DW_EH_PE_sdata8: {
....
PointerEdgeKind = Delta64;
break;
}
}
if (PointerEdgeKind == Edge::Invalid)
return make_error<JITLinkError>(
"Unspported edge kind for encoded pointer at " +
formatv("{0:x}", PointerFieldAddress));
return std::make_pair(Addr, Delta64);
}
The PVS-Studio warning: V614 [CWE-457, CERT-EXP53-CPP] Potentially uninitialized variable 'PointerEdgeKind' used. EHFrameSupport.cpp 704
The PointerEdgeKind variable may remain uninitialized after executing the switch block. However, if the variable has not been initialized, then it is expected to be equal to the named Edge::invalid constant.
You should initialize it with this constant immediately when declaring a variable:
Edge::Kind PointerEdgeKind = Edge::Invalid;
Another such error: V614 [CWE-457, CERT-EXP53-CPP] Potentially uninitialized variable 'RESULT' used. llvm-rtdyld.cpp 998
At the beginning, consider the auxiliary report_fatal_error function:
void llvm::report_fatal_error(const Twine &Reason, bool GenCrashDiag) {
....
abort();
}
The important thing here is that it terminates the program by calling the abort function. That is, report_fatal_error is the noreturn function.
There is also an intermediate function, the call of which we discuss further:
void llvm::report_fatal_error(const char *Reason, bool GenCrashDiag) {
report_fatal_error(Twine(Reason), GenCrashDiag);
}
Note. The GenCrashDiag argument is optional:
__declspec(noreturn) void report_fatal_error(const char *reason,
bool gen_crash_diag = true);
By the way, it just struck me – we could not consider the body of the function. The annotation of the __declspec(noreturn) function states that it does not return control. But I decided to leave it as it is to explain the situation as detailed as possible.
Let's get to the point. Take a look at this code snippet:
int AMDGPUCFGStructurizer::improveSimpleJumpintoIf(....)
{
....
if (LandBlkHasOtherPred) {
report_fatal_error("Extra register needed to handle CFG");
Register CmpResReg =
HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);
report_fatal_error("Extra compare instruction needed to handle CFG");
insertCondBranchBefore(LandBlk, I, R600::IF_PREDICATE_SET,
CmpResReg, DebugLoc());
}
....
}
The PVS-Studio warning: V779 [CWE-561, CERT-MSC12-C] Unreachable code detected. It is possible that an error is present. AMDILCFGStructurizer.cpp 1286
Note that after calling the report_fatal_error function, the program is still trying to do something. All these actions don't make sense anymore.
I guess the code author did not plan to terminate the program but wanted to report an error. Perhaps a programmer must use some other function to log information about the issue.
uint64_t WinCOFFObjectWriter::writeObject(MCAssembler &Asm,
const MCAsmLayout &Layout) {
....
if (EmitAddrsigSection) {
auto Frag = new MCDataFragment(AddrsigSection);
Frag->setLayoutOrder(0);
raw_svector_ostream OS(Frag->getContents());
for (const MCSymbol *S : AddrsigSyms) {
if (!S->isTemporary()) {
encodeULEB128(S->getIndex(), OS);
continue;
}
MCSection *TargetSection = &S->getSection();
assert(SectionMap.find(TargetSection) != SectionMap.end() &&
"Section must already have been defined in "
"executePostLayoutBinding!");
encodeULEB128(SectionMap[TargetSection]->Symbol->getIndex(), OS);
}
}
....
}
The PVS-Studio warning: V773 [CWE-401, CERT-MEM31-C, CERT-MEM51-CPP] Visibility scope of the 'Frag' pointer was exited without releasing the memory. A memory leak is possible. WinCOFFObjectWriter.cpp 1116
Perhaps, I'm wrong, and it's not an error. But I don't understand where and how the object referenced by the Frag pointer can be deleted. I agree with the analyzer: it looks like a memory leak.
A similar case: V773 [CWE-401, CERT-MEM31-C, CERT-MEM51-CPP] Visibility scope of the 'Frag' pointer was exited without releasing the memory. A memory leak is possible. WinCOFFObjectWriter.cpp 1130
In this section, you can view code fragments that attracted my attention. However, I can't call them bugs. It resembles redundant and unsuccessful code. Now I'm going to explain it to you.
static uint16_t toSecMapFlags(uint32_t Flags) {
uint16_t Ret = 0;
if (Flags & COFF::IMAGE_SCN_MEM_READ)
Ret |= static_cast<uint16_t>(OMFSegDescFlags::Read);
if (Flags & COFF::IMAGE_SCN_MEM_WRITE)
Ret |= static_cast<uint16_t>(OMFSegDescFlags::Write);
if (Flags & COFF::IMAGE_SCN_MEM_EXECUTE)
Ret |= static_cast<uint16_t>(OMFSegDescFlags::Execute);
if (Flags & COFF::IMAGE_SCN_MEM_EXECUTE)
Ret |= static_cast<uint16_t>(OMFSegDescFlags::Execute);
if (!(Flags & COFF::IMAGE_SCN_MEM_16BIT))
Ret |= static_cast<uint16_t>(OMFSegDescFlags::AddressIs32Bit);
....
}
The PVS-Studio warning: V581 [CWE-670] The conditional expressions of the 'if' statements situated alongside each other are identical. Check lines: 335, 337. DbiStreamBuilder.cpp 337
This fragment is repeated twice:
if (Flags & COFF::IMAGE_SCN_MEM_EXECUTE)
Ret |= static_cast<uint16_t>(OMFSegDescFlags::Execute);
I think that this is a random redundant code, and it's better to delete it. However, this may be a real error if a programmer intended to run other checks and perform some other actions in the second block.
std::string pathname_;
....
void FilePath::Normalize() {
if (pathname_.c_str() == nullptr) {
pathname_ = "";
return;
}
....
}
The PVS-Studio warning: V547 [CWE-570] Expression 'pathname_.c_str() == nullptr' is always false. gtest-filepath.cc 349
If we delete the function implementation, nothing will change. It does nothing. It looks like an artifact of several consecutive refactorings.
raw_ostream &raw_ostream::write_escaped(StringRef Str,
bool UseHexEscapes) {
....
*this << hexdigit((c >> 4 & 0xF));
*this << hexdigit((c >> 0) & 0xF);
....
}
The PVS-Studio warning: V592 The expression was enclosed by parentheses twice: '((c >> 4 & 0xF))'.V592 The expression was enclosed by parentheses twice: '((c >> 4 & 0xF))'. One pair of parentheses is unnecessary or misprint is present. raw_ostream.cpp 188
The first line has double parentheses. This redundancy indicates that a programmer wanted to write the expression in a different way. Indeed, the next line demonstrates the way they wanted to write it. Parentheses were used to make it easier to read the expression.
Programmers wanted to write the following code:
*this << hexdigit((c >> 4) & 0xF);
*this << hexdigit((c >> 0) & 0xF);
Although the parenthesis is in the wrong place, it is not an error. Anyway, the shift precedence (>>) is higher than the binary AND (&). Everything is calculated correctly.
template <class ELFT>
void ELFState<ELFT>::writeSectionContent(
Elf_Shdr &SHeader, const ELFYAML::StackSizesSection &Section,
ContiguousBlobAccumulator &CBA) {
if (!Section.Entries)
return;
if (!Section.Entries)
return;
....
}
The PVS-Studio warning: V581 [CWE-670] The conditional expressions of the 'if' statements situated alongside each other are identical. Check lines: 1380, 1383. ELFEmitter.cpp 1383
It looks like unsuccessful merge of two code branches, which caused duplicate lines. It's not an error, but it's worth removing the duplicate.
Here are more similar fragments with code duplicates:
PVS-Studio is still a worthy solution for developers. It has produced and continues to produce deeper and more diverse code analysis compared to compilers and free tools.
Since PVS-Studio is able to find errors even in such well-tested applications as compilers, it makes sense to see what it can find in your projects :). I suggest to try the trial version of the analyzer right away. Thank you for your attention.
0