Webinar: C++ semantics - 06.11
Static analysis helps developers catch errors early and improve code quality and reliability. This article examines some potential vulnerabilities and errors PVS-Studio found in the STP library's source code.
The STP project's description is concise and contains many complex terms. So it's not immediately clear what the library is for. My guess is the project was intended to test bit vector or array limitations. However, I cannot come up with scenarios when developers need such tests. It's okay though - we are not here to discuss mathematical laws, we are here to check the STP project for coding errors. STP is an open-source library licensed under MIT. It is written in C++. STP is a component for tools that look for errors in source code. The library uses the CMake build system, so the project was easy to build and check for errors. The code is written in C++ 14 and contains 65 thousand lines.
Language |
Files |
Empty lines |
Comments |
Code |
---|---|---|---|---|
C++ |
68 |
4732 |
4714 |
27569 |
C |
44 |
3961 |
5855 |
25680 |
C/C++ Header |
89 |
3171 |
5031 |
8571 |
yacc |
3 |
323 |
303 |
3083 |
lex |
3 |
84 |
81 |
571 |
CMake |
15 |
74 |
323 |
319 |
Perl |
1 |
23 |
33 |
106 |
Total amount |
233 |
12469 |
16340 |
65899 |
The project is small and high-quality, so the errors were few. However, it's still important to examine and correct them.
First, let's inspect errors that cause resource leaks and program crashes.
Warning #1
c_interface.cpp:1808: V773 The function was exited without closing the file referenced by the 'cvcin' handle. A resource leak is possible.
Expr vc_parseExpr(VC vc, const char* infile)
{
extern FILE *cvcin, *smtin;
cvcin = fopen(infile, "r"); // <=
if (cvcin == NULL)
{
fprintf(stderr, "STP: Error: cannot open %s\n", infile);
stp::FatalError("Cannot open file");
return 0;
}
CONSTANTBV::ErrCode c = CONSTANTBV::BitVector_Boot();
if (0 != c)
{
cout << CONSTANTBV::BitVector_Error(c) << endl;
return 0; // <=
}
....
return output; // <=
}
In the code above, the analyzer discovered the cvcin file descriptor leak. The fopen function opens the file and then there is no fclose function call that would close the file. If cvcin == NULL, the program exits with an error: file not found. But if the code reaches the second conditional block, the cvcin descriptor is lost. The corrected code:
Expr vc_parseExpr(VC vc, const char* infile)
{
extern FILE *cvcin, *smtin;
cvcin = fopen(infile, "r");
if (cvcin == NULL)
{
....
stp::FatalError("Cannot open file");
return 0;
}
CONSTANTBV::ErrCode c = CONSTANTBV::BitVector_Boot();
if (0 != c)
{
cout << CONSTANTBV::BitVector_Error(c) << endl;
fclose(cvcin); // <=
return 0;
}
....
if (b->UserFlags.smtlib1_parser_flag)
{
smtin = cvcin;
cvcin = NULL; // <=
....
}
....
if(smtin != NULL)
fclose(smtin); // <=
else
fclose(cvcin); // <=
return output;
}
This solution is not ideal. If an exception is thrown between the fopen and fclose calls - or if one introduces another exit point into the function - the fclose method will not be called. To solve this problem, you can use the RAII (Resource Acquisition Is Initialization) idiom. C++ implements this through the use of destructors. Alternatively, you can use unique_ptr:
template<typename T>
using DeletedPtr = std::unique_ptr<T, std::function<void(T*)>>;
Expr vc_parseExpr(VC vc, const char* infile)
{
DeletedPtr<FILE> cvcin(fopen(infile, "r"),
[](FILE* f)
{
fclose(f);
});
....
if (!cvcin)
{
....
stp::FatalError("Cannot open file");
return 0;
}
....
}
Warning #2
MutableASTNode.h:269: V505 The 'alloca' function is used inside the loop. This can quickly overflow stack.
The analyzer detected that the for-loop calls the alloca function. Since the alloca function uses the stack memory, the function's multiple calls from inside the loop can unexpectedly lead to stack overflow.
static void getDisjointExtractVariables(....)
{
const int size = all.size();
for (int i = size - 1; i >= 0; i--)
{
....
// TODO remove alloca
bool* found = (bool*)alloca(sizeof(bool) * node.GetValueWidth());
for (size_t j = 0; j < node.GetValueWidth(); j++)
found[j] = false;
....
}
}
The alloca function allocates a memory block on the stack. The memory is released after the for-loop exits. Even if the found variable is declared inside the loop, the memory allocated for it will not be released at the end of each iteration. Such code is not necessarily an error. This depends on the stack's size, the allocated memory's volume and the number of iterations. In this case we can see a comment that the developer intended to remove alloca - or, maybe, to replace it with dynamic allocation. You can use dynamic allocation to fix the code above, but this approach has its disadvantages.
const int size = all.size();
for (int i = size - 1; i >= 0; i--)
{
....
// TODO remove alloca
bool* found = (bool*)calloc(sizeof(bool), node.GetValueWidth());
....
free(found);
}
There are a few more warnings like this in the code:
bool changed = true;
while (changed)
{
changed = false;
signed* columnH = (signed*)alloca(sizeof(signed) * bitWidth);//(1)
signed* columnL = (signed*)alloca(sizeof(signed) * bitWidth);//(2)
signed* sumH = (signed*)alloca(sizeof(signed) * bitWidth); //(3)
signed* sumL = (signed*)alloca(sizeof(signed) * bitWidth); //(4)
....
// working with 'changed';
....
}
Warning #3
STPManager.cpp:549: V581 The conditional expressions of the 'if' statements situated alongside each other are identical. Check lines: 543, 549.
The analyzer found that two if-statements follow each other and have the same condition statements. This code is redundant or incorrect.
bool STPMgr::VarSeenInTerm(const ASTNode& var, const ASTNode& term)
{
if (READ == term.GetKind() && WRITE == term[0].GetKind()
/*&& !GetRemoveWritesFlag()*/)
{
return false; // <=
}
if (READ == term.GetKind() && WRITE == term[0].GetKind()
/*&& GetRemoveWritesFlag()*/)
{
return true; // <= (unreachable statement)
}
....
}
The duplicate if-statements contain code blocks that have opposite meaning. The commented-out code inside the blocks was likely important. If one removes it, the second check becomes unnecessary. However, there's a chance the developer intended to write term[1] in the second expression:
if (READ == term.GetKind())
{
if(WRITE == term[0].GetKind())
return false;
if(WRITE == term[1].GetKind()) // <=
return true;
}
Warning #4
FixedBits.h:194: V524 It is odd that the body of 'minimum_numberOfTrailingZeroes' function is fully equivalent to the body of 'minimum_trailingOne' function.
unsigned minimum_numberOfTrailingZeroes() // <=
{
unsigned i = 0;
for (; i < getWidth(); i++)
{
if (!isFixed(i) || getValue(i))
break;
}
return i;
}
unsigned minimum_trailingOne() // <=
{
unsigned i = 0;
for (; i < getWidth(); i++)
{
if (!isFixed(i) || getValue(i))
break;
}
return i;
}
This warning means that the analyzer found two functions with identical bodies. Such code is not an error in itself, but is a reason to look closer. Since we are not the ones developing this library, we can only assume an error. Function names mean different things. If this is not an error, it makes sense to rewrite the code for clarity.
unsigned minimum_numberOfTrailingZeroes()
{
unsigned i = 0;
for (; i < getWidth(); i++)
{
if (!isFixed(i) || getValue(i))
break;
}
return i;
}
unsigned minimum_trailingOne
{
return minimum_numberOfTrailingZeroes();
}
Now it's clearer what developer meant. By rewriting the code we've also lowered the chance that someone may change only one function and cause an error.
There are more warnings like this one:
Warning #5
UnsignedIntervalAnalysis.cpp:276: V547 Expression 'bottomChanged' is always false.
UnsignedInterval* UnsignedIntervalAnalysis::visit(....)
{
....
if (bottomChanged) // might have been zero. // <=
{
if (CONSTANTBV::BitVector_Lexicompare(result->minV, c1Min) > 0)
{
CONSTANTBV::BitVector_Copy(result->minV,
c1Min); //c1 should still be 1
}
if (CONSTANTBV::BitVector_Lexicompare(result->maxV, c1Min) < 0)
{
CONSTANTBV::BitVector_Copy(result->maxV,
c1Min); //c1 should still be 1
}
}
}
The analyzer discovered that bottomChanged is always false. Maybe this is correct. However, if you inspect the code above, you might suspect that something is wrong there.
UnsignedInterval* UnsignedIntervalAnalysis::visit(....)
{
switch(n.GetCind())
{
....
case BVDIV:
{
....
bool bottomChanged = false;
if (CONSTANTBV::BitVector_is_empty(c1->minV)) // <= (1)
{
if (CONSTANTBV::BitVector_is_empty(c1->maxV))
{
....
break; // result is [1111..111, 11...11111] // <= (2)
}
bottomChanged = true; // <= (3)
CONSTANTBV::BitVector_Destroy(c1Min);
break; // TODO fix so that it can run-on.
}
....
if (bottomChanged). // <= (4)
{
.... //// <= (unreachable statement)
}
break;
}
}
}
The if (bottomChanged) expression is inside the switch statement's body. When bottomChanged is set to true (see label 2), the current execution branch exits. As a result, if the code reaches label 4, bottomChanged is always false.
The analyzer issued quite a few similar warnings:
Not all errors become evident immediately after someone made a mistake. They often lay low until someone alters the code - or the execution flow reaches some secret corner. Fixing these errors early saves a lot of time in the future.
Warning #6
This example does not contain an error. However, an error may occur if one refactors the code or changes its logic.
Dependencies.h:151: V711 It is dangerous to create a local variable within a loop with a same name as a variable controlling this loop.
The analyzer discovered a situation, where an iterator contains a loop:
void print() const
{
auto it = dependents.begin(); // <=
for (/**/; it != dependents.end(); it++)
{
cout << (it->first).GetNodeNum();
const set<ASTNode>* dep = it->second;
set<ASTNode>::iterator it = dep->begin(); // <=
while (it != dep->end())
{
cout << " " << (*it).GetNodeNum();
it++;
}
cout << endl;
}
}
If you accidentally move it++ to the end of the loop, the program will work incorrectly. A more reliable approach is to rename the internal iterator or to use the for-loop:
void print() const
{
for (const auto &depnt : dependents)
{
cout << (depnt.first).GetNodeNum();
const set<ASTNode>* dep = depnt.second;
for (const auto &inDep : dep)
{
cout << " " << inDep.GetNodeNum();
}
cout << endl;
}
}
Warning #7
AssortedPrinters.cpp:93: V688 The 'ListOfDeclaredVars' function argument possesses the same name as one of the class members, which can result in a confusion.
void STPMgr::printVarDeclsToStream(ostream& os, ASTNodeSet& ListOfDeclaredVars)
{
for (ASTNodeSet::iterator i = ListOfDeclaredVars.begin(),
iend = ListOfDeclaredVars.end();
{
....
}
}
Here's a similar warning. The ListOfDeclaredVars variable replaces a class member with the same name:
class STPMgr
{
....
// For printing purposes
// Used just by the CVC parser.
ASTVec ListOfDeclaredVars;
....
}
This code is correct, but may confuse developers who access it. This situation is better avoided and the local variable - renamed.
Below are a few code fragments where the analyzer found opportunities to improve performance or readability.
Warning #8
SimplifyingNodeFactory.cpp:1379: V560 A part of conditional expression is always true: children.size() == 2.
ASTNode SimplifyingNodeFactory::CreateTerm(....)
{
if (children.size() == 2) // <=(1)
{
if (children.size() == 2 && children[0] == children[1]) // <=(2)
{
result = bm.CreateZeroConst(width);
}
else if (children.size() == 2 && // <=(3)
children[1] == bm.CreateZeroConst(width))
{
result = children[0];
}
else
{
result = NodeFactory::CreateTerm(
BVPLUS, width, children[0],
NodeFactory::CreateTerm(BVUMINUS, width, children[1]));
}
}
}
Label 1 points to where the container size is checked. There is no need to do this again in conditions 2 and 3. The code is currently correct - but only because the 2nd and the 3d conditions are written with the AND operator. This may change in the future. Below is the fixed code:
ASTNode SimplifyingNodeFactory::CreateTerm(....)
{
if (children.size() == 2) // <= (1)
{
if (children[0] == children[1]) // <= (2)
....
else if (children[1] == bm.CreateZeroConst(width))
....
else
....
}
}
Warning #9
FixedBits.cpp:405: warning: V560 A part of conditional expression is always true: i < width.
void FixedBits::fromUnsigned(unsigned val)
{
for (unsigned i = 0; i < width; i++)
{
if (i < width && i < sizeof(unsigned) * 8) // <=
{
setFixed(i, true);
setValue(i, (val & (1 << i)));
}
else if (i < width) // <=
{
setFixed(i, true);
setValue(i, false);
}
else // The unsigned value is bigger than the bitwidth of this.
{ // so it can't be represented.
if (val & (1 << i)) // <= (unreachable statement)
{
stp::FatalError(LOCATION "Cant be represented.");
}
}
}
}
The loop counter starts at 0, counts up to - but does not reach - width. Thus, the condition i < width is always true. Here's how I fixed the code:
void FixedBits::fromUnsigned(unsigned val)
{
for (unsigned i = 0; i < width; i++)
{
setFixed(i, true);
if (i < sizeof(unsigned) * 8)
setValue(i, (val & (1 << i)));
else
setValue(i, false);
}
}
Warning #10
cpp_interface.cpp:151: V669 The 'strval' argument is a non-constant reference. The analyzer is unable to determine the position at which this argument is being modified. It is possible that the function contains an error.
ASTNode Cpp_interface::CreateBVConst(string& strval,
int base,
int bit_width)
{
return bm.CreateBVConst(strval, base, bit_width);
}
The analyzer found that the strval parameter was passed to the function by reference, but was not modified anywhere. Then take a look at the bm.CreateBVConst function. The strval parameter is passed by value:
ASTNode STPMgr::CreateBVConst(string strval,
int base,
int bit_width)
{
....
}
This may indicate an error, but most likely, the strval parameter should be a reference to a constant. Inside the STPMgr::CreateBVConst function body, strval is not modified either. This allows us to pass the string by reference and to remove the unnecessary copy:
ASTNode Cpp_interface::CreateBVConst(const string& strval,
int base,
int bit_width)
{
return bm.CreateBVConst(strval, base, bit_width);
}
ASTNode STPMgr::CreateBVConst(const string& strval,
int base,
int bit_width)
{
if (bit_width <= 0)
{
FatalError("Bit width of constant must be greater than 0");
}
assert(bit_width > 0);
return charToASTNode((unsigned char*)strval.c_str(), base,
bit_width);
}
The charToASTNode function does not modify the string either. If you were to accept the fix, you would need to work with this too.
Due to sick days, quarantines and winter holidays, I am posting this article a couple of months later than I originally intended. So it is possible that STP library's authors already fixed some of the errors I described. Either way, this article aims to demonstrate the analyzer's capabilities rather than list as many errors as possible. Remember that static code analyzers are most beneficial when used regularly. Such approach allows you to fix errors before they become critical.
The article shows that the PVS-Studio analyzer found many problematic code snippets inside the STP library's code. These potential problems may not manifest themselves in any way so far, but the fact that they exist is worrisome. They will always be there, waiting - and may inflict some damage when you least expect it. Finding errors while writing code is much better than fixing an endless stream of bugs before the release. To try the PVS-Studio static analyzer on your project, you can follow this link.
0