Webinar: Evaluation - 05.12
Examining the project code with the help of a static analyzer, sometimes we wonder how the error appeared and why no one noticed it. Would you like to see such an example? If so, welcome to read the article!
You're reading a classic article where I check an open-source project with the help of the PVS-Studio static analyzer. We regularly post such articles to popularize the static code analysis methodology. Moreover, we strive to improve the quality of checked projects since eventually they contain less bugs.
But I should warn you: this is an unnatural way of using the code analyzer! The analyzer is useful when used regularly and helps detect errors at the early stage. Fixing an error during the development is much cheaper than fixing it at the testing stage or when the user reports about it. This is described in detail in the article: "Introduce static analysis in the process, don't just search for bugs with it".
When writing an article about project check, the authors have another task. They should look through the list of warnings and select some fascinating bugs that show the benefits of static analysis. Such type of articles is useful for learning purposes but have nothing to do with enhancing the development process.
To make it clear, I'd make an analogy between analyzers and compilers.
The analyzer and the compiler warnings are basically the same thing. However, specialized analyzers perform more code checks and use different technologies to detect complex error patterns.
It seems unwise to develop a project with compiler warnings disabled. If you enable warnings every couple month only, this won't be effective:
The same thing applies to warnings of static code analyzers. It's better not to run such tools from time to time but embed them in the development process.
I guess our regular readers are surprised by so much theory — sorry for that. A new generation of programmers grows up, and they are ready to step on the same rake. Therefore, it's useful to retell such stories over and over again.
FreeCAD is a general-purpose parametric 3D computer-aided design (CAD) modeler. This open-source software application is based on C, C++, and Python. PVS-Studio doesn't support Python at the moment, so I will only check the C and C++ parts. Here is the link to the project code.
It's not the first time we've written an article about the FreeCAD project. We wrote the previous one in 2015. It's been 8 years — quite a long time ago. Seems like a lot might have changed in the project by now, and there's no need to read the previous article, but just in case, here it is.
Let's take a look at the very special bug I mentioned earlier.
QGVPage* QGIView::getQGVPage(TechDraw::DrawView* dView)
{
ViewProviderPage* vpp = getViewProviderPage(dView);
if (!vpp) {
return vpp->getQGVPage();
}
return nullptr;
}
The PVS-Studio warning: V522 [CWE-476, CERT-EXP34-C] Dereferencing of the null pointer 'vpp' might take place. QGIView.cpp 592
The author of the code made a typo while writing conditions:
It's plain and clear. However, I've found this case intriguing for several reasons.
The first reason for meditation. In my previous article, I described an almost identical error! Here is this article: "Simple, yet easy-to-miss errors in code". And here is the code:
if (!SomeGlobalPointer)
{
delete[] SomeGlobalPointer;
}
Well, we've just wondered how developers make such errors, and here we go again. Starting to read the bug report for FreeCAD, I suddenly stumble across an almost identical bug. Guess I catch some kind of vibe :).
P.S. Actually, there's nothing unusual here. What impressed me is that this case shows the Baader–Meinhof phenomenon at work. It occurs when something you know or just learned suddenly seems to appear everywhere.
The second reason for meditation. I wonder how such an error even appeared! The thing is, there's an identical function nearby. And it contains a correct condition! See it for yourself:
QGVPage* QGIView::getQGVPage(TechDraw::DrawView* dView)
{
ViewProviderPage* vpp = getViewProviderPage(dView);
if (!vpp) {
return vpp->getQGVPage();
}
return nullptr;
}
QGSPage* QGIView::getQGSPage(TechDraw::DrawView* dView)
{
ViewProviderPage* vpp = getViewProviderPage(dView);
if (vpp) {
return vpp->getQGSPage();
}
return nullptr;
}
I don't know how that happened.
It's very likely that a developer copy-pastes the second function. But something doesn't add up... It's strange that a developer has fixed the error in the second function, but left it in the first one. I don't believe in that scenario.
After thinking about it, I came up with the following hypothesis. Maybe both functions had an error. Then, someone noticed that the getQGSPage function wasn't working correctly, and then fixed it.
However, this idea was not proven. First, the change history didn't contain any place where both functions were incorrect. The incorrect and correct function appear in the commit at the same time. A developer refactored the code to reduce the number of duplicate code.
Second, these functions are often used equally and occasionally combined. It's strange to notice a bug in the behavior of one function and not notice it in another.
Mystery!
The third reason for meditation. We don't know how an error like this will manifest itself. If the pointer is null, dereferencing it will cause undefined behavior. The condition helps the compiler in recognizing that the pointer is null. I wonder what it will do and what code it will generate.
I know it makes no sense to predict how the compiler will behave and how undefined behavior will manifest itself. Any assumptions will be wrong. By the way, here's a good article on this topic: "Falsehoods programmers believe about undefined behavior".
Nevertheless, I wondered if I could write similar code that would run as correct code due to undefined behavior. Or at least the code that won't crash. This could partly explain why the bug hasn't been noticed yet.
To be honest, I wasn't very good at it. The code was very unstable. The slightest change to the code greatly affected its behavior.
However, it may also happen that the code won't crush. Here's an example of such code.
If the code is compiled in unoptimized mode (-O0), it will be terminated:
Program terminated with signal: SIGSEGV
If we build it with optimization (-O2), it outputs garbage:
1033569176
But it's not crashing. Anyway, I enjoyed experimenting with it. If you wish, you can continue to experiment and share your observations.
A classic typo: calling the hasText function for the second time
void SheetTableView::contextMenuEvent(QContextMenuEvent*)
{
const QMimeData* mimeData = QApplication::clipboard()->mimeData();
....
actionPaste->setEnabled(
mimeData && (mimeData->hasText() || mimeData->hasText()));
....
}
The PVS-Studio warning: V501 [CWE-570] There are identical sub-expressions 'mimeData->hasText()' to the left and to the right of the '||' operator. SheetTableView.cpp 1115
The code double-checks that the object can return plain text. I can't say for sure how to write the code correctly. I would guess that the hasHtml function should be called instead of one of the hasText functions.
A typo in the last line
void Document::handleChildren3D(ViewProvider* viewProvider, bool deleting)
{
....
std::vector<App::DocumentObject*> children = viewProvider->claimChildren3D();
SoGroup* childGroup = viewProvider->getChildRoot();
SoGroup* frontGroup = viewProvider->getFrontRoot();
SoGroup* backGroup = viewProvider->getFrontRoot();
if (deleting ||
childGroup->getNumChildren() != static_cast<int>(children.size())) {
....
}
The analyzer indicates an anomaly in the code with two warnings at the same time:
I think we are facing a demonstration of the last line effect: the error is most likely to be made at the end of code lines/blocks of the same type.
The code contains three lines that are very similar to each other. Most likely, a developer copy-pasted this code fragment and accidentally duplicated the second line:
SoGroup* frontGroup = viewProvider->getFrontRoot();
The developers need to replace "front" with "back" in the copied line. There should have been two such replacements in the line, but only one was made.
SoGroup* backGroup = viewProvider->getFrontRoot(); // incorrect
SoGroup* backGroup = viewProvider->getBackRoot(); // correct
These typos are hard to find during the code reviews, and static analysis can become a very good assistant.
Freddy's time
What's Freddy got to do with it? This is a reference to the article: "Zero, one, two, Freddy's coming for you". Typos are attracted to code when variable names contain 0, 1, or 2. That's exactly such a case.
TopoDS_Edge GeometryUtils::asCircle(TopoDS_Edge occEdge, bool& arc) { .... }
bool GeometryMatcher::compareBSplines(TopoDS_Edge &edge1, TopoDS_Edge &edge2)
{
....
BRepAdaptor_Curve adapt1(edge1);
BRepAdaptor_Curve adapt2(edge2);
bool isArc1(false);
bool isArc2(false);
TopoDS_Edge circleEdge1;
TopoDS_Edge circleEdge2;
try {
circleEdge1 = GeometryUtils::asCircle(edge1, isArc1);
circleEdge2 = GeometryUtils::asCircle(edge2, isArc1);
}
catch (Base::RuntimeError&) {
Base::Console().Error(....);
return false;
}
if (!isArc1 && !isArc2) {
return compareCircles(circleEdge1, circleEdge2);
}
if (isArc1 && isArc2) {
return compareCircleArcs(circleEdge1, circleEdge2);
}
return false;
}
The PVS-Studio warning: V537 [CWE-682] Consider reviewing the correctness of 'isArc1' item's usage. GeometryMatcher.cpp 242
It's a large piece of code. But that's OK, we'll discuss it bit by bit. Let's start with the asCircle function.
TopoDS_Edge GeometryUtils::asCircle(TopoDS_Edge occEdge, bool& arc) { .... }
Note that the function receives the second argument by reference to return the edge type information through it.
Now look at the following code:
bool isArc1(false);
bool isArc2(false);
....
circleEdge1 = GeometryUtils::asCircle(edge1, isArc1);
circleEdge2 = GeometryUtils::asCircle(edge2, isArc1);
A call to the asCircle function should have initialized 4 variables:
A typo causes reusing of the isArc1 variable in the second line. As a result, the isArc1 variable is initialized with the wrong value, and the isArc2 variable remains always equal to false.
So, further checks are pointless (they don't work as intended):
if (!isArc1 && !isArc2) {
....
if (isArc1 && isArc2) {
Test your attention
Try to find a bug in this code yourself. Just for fun.
void QGIViewPart::highlightMoved(QGIHighlight* highlight, QPointF newPos)
{
std::string highlightName = highlight->getFeatureName();
App::Document* doc = getViewObject()->getDocument();
App::DocumentObject* docObj = doc->getObject(highlightName.c_str());
auto detail = dynamic_cast<DrawViewDetail*>(docObj);
auto oldAnchor = detail->AnchorPoint.getValue();
if (detail) {
Base::Vector3d delta = Rez::appX(DrawUtil::toVector3d(newPos)) /
getViewObject()->getScale();
delta = DrawUtil::invertY(delta);
detail->AnchorPoint.setValue(oldAnchor + delta);
}
}
I put a picture here, so you don't look at the answer right away. Our unicorn is waiting for your answer.
Indeed, the error isn't very complicated, and I think you've found it. However, such a search is quite boring. That's why such "monolith-like" code is hard to check during code reviews. The point is that the detail pointer is dereferenced until it's checked.
auto oldAnchor = detail->AnchorPoint.getValue();
if (detail) {
Sometimes the analyzer reports some issues in two ways. This is just such a case:
If you want to fix the error, just move the declaration and initialization of the oldAnchor variable inside the condition. Here is the fixed code:
if (detail) {
auto oldAnchor = detail->AnchorPoint.getValue();
Base::Vector3d delta = Rez::appX(DrawUtil::toVector3d(newPos)) /
getViewObject()->getScale();
delta = DrawUtil::invertY(delta);
detail->AnchorPoint.setValue(oldAnchor + delta);
}
Dangerous constructor
TaskTransformedParameters::TaskTransformedParameters(
ViewProviderTransformed *TransformedView, QWidget *parent)
: TaskBox(...., TransformedView->menuName, true, parent) // <=
, proxy(nullptr)
, TransformedView(TransformedView)
, parentTask(nullptr)
, insideMultiTransform(false)
, blockUpdate(false)
{
selectionMode = none;
if (TransformedView) { // <=
Gui::Document* doc = TransformedView->getDocument();
this->attachDocument(doc);
}
....
}
The PVS-Studio warning: V664 [CWE-476, CERT-EXP34-C] The 'TransformedView' pointer is being dereferenced on the initialization list before it is verified against null inside the body of the constructor function. Check lines: 57, 66. TaskTransformedParameters.cpp 57
A TransformedView pointer is passed to the constructor. It's dereferenced when constructing the base class:
: TaskBox(...., TransformedView->menuName, ....)
This is dangerous because this pointer may be null. Of course, the analyzer doesn't react to every dereference of an unchecked pointer as dangerous. If it does, there will be a lot of false positives (explanation). The analyzer infers that the pointer is null, because it finds this check in the construct body:
if (TransformedView)
The wrong pointer is checked
void SectionCut::startCutting(bool isInitial)
{
....
App::PropertyLinkList* CutLinkList =
dynamic_cast<App::PropertyLinkList*>(
CutCompoundBF->getPropertyByName("Objects"));
if (!CutCompoundBF) {
Base::Console().Error((std::string("SectionCut error: ")
+ std::string(CompoundName)
+ std::string(" could not be added\n")).c_str());
return;
}
CutLinkList->setValue(ObjectsListLinks);
....
}
The analyzer again issues two different warnings to notify developers about anomalies:
The first warning is implicit. It points to these lines of code:
.... CutCompoundBF->getPropertyByName("Objects"));
if (!CutCompoundBF) {
It's strange to dereference a pointer at the beginning and then check it for equality with nullptr.
The second warning immediately shows us that we might check another variable instead of the CutCompoundBF variable.
Indeed, the CutLinkList variable should be checked in the condition. Then the anomalies will disappear. Here is the fixed code:
App::PropertyLinkList* CutLinkList =
dynamic_cast<App::PropertyLinkList*>(
CutCompoundBF->getPropertyByName("Objects"));
if (!CutLinkList) {
Base::Console().Error(....);
return;
}
CutLinkList->setValue(ObjectsListLinks);
To optimize the program performance, it would be better to use profilers. Static analyzers can't predict which code will waste the most CPU time. It depends on the input data, and static analyzers don't know about that.
Nevertheless, we can micro-optimize something in the code using a static analyzer. However, there is no guarantee that these optimizations will provide higher performance of the program. At least it won't get any worse either.
Here are some examples of what warning is in terms of micro-optimizations.
For example, the FreeCAD project has a lot of redundant string conversions. Here's one of them:
void TaskAttacher::onRefName(const QString& text, unsigned idx)
{
std::string subElement;
....
std::vector<std::string> refnames = pcAttach->Support.getSubValues();
....
refnames[idx] = subElement.c_str();
....
}
The PVS-Studio warning: V811 Decreased performance. Excessive type casting: string -> char * -> string. Consider inspecting the 'subElement.c_str()' expression. TaskAttacher.cpp 672
It's inefficient to copy a std::string from a C string (null-terminated string). We have to reread the C string to find out its length and allocate the necessary memory buffer. Although the subElement contains the length information.
In general, you can write shorter code that will operate faster:
refnames[idx] = subElement;
There are still many redundant variables in the project. They are created and initialized, but are not used anywhere. This is most likely a side effect of multiple iterations of code changes and refactoring. Suddenly, the variable has become needless for the developers, but they forget to remove it from the function.
Here is the example:
void TaskFilletParameters::apply()
{
std::string name = getDressUpView()->getObject()->getNameInDocument();
ui->filletRadius->apply();
if(ui->listWidgetReferences->count() == 0)
Base::Console().Warning(
tr("Empty fillet created !\n").toStdString().c_str());
}
The PVS-Studio warning: V808 'name' object of 'basic_string' type was created but was not utilized. TaskFilletParameters.cpp 186
The name variable is obviously redundant. However, there is no guarantee that the compiler will delete it at the optimization stage. Strings are complex classes that allocate memory. Memory allocation is a side effect that changes the external state. The compiler has difficulties to understand whether all this complex construct is redundant or not.
Starting with C++14, the compiler can perform the optimization and delete a variable. However, this is an optional optimization. New compilers perform it, but we should keep in mind that the code complexity plays an important role.
Anyway, it makes no sense to clutter the program code with unnecessary strings.
Here is another example:
void DlgRevertToBackupConfigImp::showEvent(QShowEvent* event)
{
....
for (const auto& backup : backups) {
auto filename = backup.filename().string();
auto modification_date =
QDateTime::fromSecsSinceEpoch(fs::last_write_time(backup));
auto item = new QListWidgetItem(QLocale().toString(modification_date));
item->setData(Qt::UserRole, QString::fromStdString(backup.string()));
ui->listWidget->addItem(item);
}
....
}
The PVS-Studio warning: V808 'filename' object of 'basic_string' type was created but was not utilized. DlgRevertToBackupConfigImp.cpp 79
Here, the filename variable is needlessly copied many times and destroyed in the loop.
P.S. By the way, sometimes when we're reviewing code with such "lost" variables, we suddenly find out that the code contains an error and the variable should have actually been used. I can't say it's a common situation, but it does happen.
Indeed, I didn't describe all the errors and dangerous code snippets in this article. I was glancing through the warnings until I had enough content for an article. If the project developers or someone else wish to thoroughly check a project, they can run the analysis themselves. We provide a free license for the most of open-source projects.
As I said at the beginning of the article, it's better to integrate analysis into the development process rather than do one-off checks. When you first start using the analyzer, there might be many warnings and it can be daunting. There's really nothing to worry about. The solution already exists. I invite you to read the note "How to introduce a static code analyzer in a legacy project and not to discourage the team".
So, let's break down a few more warnings.
Alignment: walking on thin ice
template <int N>
void TRational<N>::ConvertTo (double& rdValue) const
{
assert(m_kDenom != 0);
if (m_kNumer == 0)
{
rdValue = 0.0;
return;
}
unsigned int auiResult[2];
....
rdValue = *(double*)auiResult;
....
}
The PVS-Studio warning: V1032 [CWE-843, CERT-EXP36-C] The pointer 'auiResult' is cast to a more strictly aligned pointer type. Wm4TRational.inl 742
An array of the two elements of the unsigned int type is used as a variable of the double type. The unsigned int type is aligned to a 4-byte boundary and the double type is aligned to an 8-byte boundary. If the array is not aligned to an 8-byte boundary, undefined behavior will occur.
This code violates the strict aliasing rule, which causes undefined behavior. However, if we're lucky, this code may work. Since there aren't other variables in front of the array, there is a high probability that it will be 8 bytes aligned. Therefore, everything will work as the programmer expects. At least for now.
However, such delicate operability is easy to break. Declare another variable before the array, and that's enough.
template <int N>
void TRational<N>::ConvertTo (double& rdValue) const
{
assert(m_kDenom != 0);
if (m_kNumer == 0)
{
rdValue = 0.0;
return;
}
int fooooo; // Variable is added
unsigned int auiResult[2]; // Alignment is not multiple of 8 bytes
....
rdValue = *(double*)auiResult;
....
}
If you are interested in the topic, read this scary and educational bedtime story: "A bug story: data alignment on x86".
Okay, how to make it right?
Before C++20, you could use memcpy. Don't be afraid: the compiler perfectly optimizes the code. Starting with C++20, use std::bit_cast. If you wish to find something useful, you can subscribe to our newsletter.
Unsafe array index check
First, let's look at the getIndexFromName auxiliary function.
int DrawUtil::getIndexFromName(const std::string& geomName)
{
....
if (boost::regex_search(begin, end, what, re, flags)) {
return int(std::stoi(what.str()));
} else {
ErrorMsg << "getIndexFromName: malformed geometry name - " << geomName;
throw Base::ValueError(ErrorMsg.str());
}
}
The function converts a string fragment into a number (index). Now let's go ahead through the code that uses this function.
TechDraw::VertexPtr DrawViewPart::getVertex(std::string vertexName) const
{
const std::vector<TechDraw::VertexPtr>
allVertex(DrawViewPart::getVertexGeometry());
size_t iTarget = DrawUtil::getIndexFromName(vertexName);
if (allVertex.empty()) {
//should not happen
throw Base::IndexError("DVP::getVertex - No vertices found.");
}
if (iTarget > allVertex.size()) { // <=
//should not happen
throw Base::IndexError("DVP::getVertex - Vertex not found.");
}
return allVertex.at(iTarget); // <=
}
The PVS-Studio warning: V557 [CWE-119, CERT-ARR30-C] Array overrun is possible. The 'iTarget' index is pointing beyond array bound. DrawViewPart.cpp 809
Technically, the index values should always be correct. However, developers protected the code just in case. Unfortunately, there is a slight error. The bound value isn't handled correctly:
if (iTarget > allVertex.size()) {
Suppose there are 100 elements in the vector. In this case, the maximum correct index should be equal to the value 99, instead of 100. This is how a correct check should look like:
if (iTarget >= allVertex.size()) {
Risk of infinite loop when reading a file
inline uint32 readUint32 ( istream &is ) {
static const int buf_len = sizeof ( uint32 ) ;
unsigned char buf [ buf_len ] ;
int rsf = 0 ;
std::streampos original_pos = is.tellg() ;
while ( rsf < buf_len && !is.eof() ) {
is.read ( reinterpret_cast< char * >( buf ) + rsf, buf_len - rsf ) ;
rsf += static_cast< int >( is.gcount () ) ;
}
....
}
Before we take a look at the error, let's turn our attention to this line:
static const int buf_len = sizeof ( uint32 ) ;
The sizeof(uint32) string is the compile-time const. It's quite possible that the developer wanted to give a constant a neat name. The developer is also worried that it would be computed every time when entered the function, so they declared it as static.
On the other hand, static may cause additional overhead costs. The constant can be stored in the data section and computed at the runtime when the function is first entered.
Much better is to use constexpr (C++11):
constexpr size_t buf_len = sizeof ( uint32 );
Then the constant will have a name, and there won't be unnecessary actions. The compiler will simply set the constant in the right places.
Now back to the loop.
while ( rsf < buf_len && !is.eof() )
The PVS-Studio warning: V663 [CWE-834] Infinite loop is possible. The 'cin.eof()' condition is insufficient to break from the loop. Consider adding the 'cin.fail()' function call to the conditional expression. zipheadio.h 84
If there is some failure in reading the file, the loop can become infinite. The end of the file hasn't been reached yet, but nothing is being read. The fail function is used to check this situation.
The way how the code is fixed depends on what behavior we want. If the reading option is enough, we can write such a condition to stop the loop:
while ( rsf < buf_len && !is.eof() && !is.fail() ) {
However, such a record isn't very nice. Instead, you can use the good function as an option:
while ( rsf < buf_len && is.good() ) {
Note that this is just one way to avoid an infinite loop if you can't read the entire file. That doesn't mean that the code is correctly fixed.
It all depends on how a program is designed and how it should react to the incapability to read the entire file. Some program may continue to handle some part of the data, another may generate an error message, and so on.
If you wish to interact with the input stream and find out that something went wrong while reading, there is a whole set of functions. You can find a good table on this topic on the cppreference website.
As you can see, there are various ways of how to implement reading from a stream and how to handle the reading errors.
For the code under consideration, reading only a part of the data doesn't look like a correct scenario. Therefore, here's how the code can be changed:
while ( rsf < buf_len
&& is.read(reinterpret_cast< char * >( buf ) + rsf, buf_len - rsf))
{
rsf += static_cast< int >( is.gcount () ) ;
}
if (is.fail())
throw Error("Unable to read file.");
This time I decided not to write a special conclusion. Here's one thing I'd say — static analysis is useful. That's all :).
So, let's get to the point. If you haven't evaluated PVS-Studio yet, I suggest you download and install it now. Try to check your code or the code of your colleagues. You may find many interesting things.
Lifehack. Try the "10 most interesting warnings" mode to get acquainted with the analyzer.
0