Webinar: Evaluation - 05.12
We often check large projects because it's easier to find bugs there. What if we try PVS-Studio on a small project? In this article we analyze Blend2D — a library for vector 2D graphics. Let's look at what we found.
It's no secret that large projects have fascinating errors. It's not just "the larger the codebase is – the more errors we can find". It's also a known fact that the density of errors grows along with the codebase. That's why we love checking large projects — to treat you with a variety of "yummy" and tricky errors and typos. Besides, it's always interesting to search through a huge project with lots of dependencies, legacy code, and other stuff.
Today I'm moving away from this tradition. I decided to take a small project and see what PVS-Studio can find there. I chose Blend2D — branch master, commit c484790.
Blend2D is a 2D vector graphics engine. This small library written in C++ contains about 70,000 lines of code:
---------------------------------------------------------------------
Language files blank comment code
---------------------------------------------------------------------
C++ 97 12924 9481 43372
C/C++ Header 137 8305 12971 25225
This library allows you to create 2D images. To achieve high performance, the library developers used multithreaded rendering and a self-written rasterizer. Blend2D provides C and C++ API. You can read more about the project and capabilities of this library on the website. Now let's proceed to the errors that PVS-Studio found in the Blend2D source code.
V547 Expression 'h == 0' is always false. jpegcodec.cpp 252
BLResult blJpegDecoderImplProcessMarker(....) noexcept {
uint32_t h = blMemReadU16uBE(p + 1);
// ....
if (h == 0)
return blTraceError(BL_ERROR_JPEG_UNSUPPORTED_FEATURE);
// ....
impl->delayedHeight = (h == 0); // <=
// ....
}
In this code fragment, the result of the blMemReadU16uBE function call is assigned to the h variable. Then if the h == 0 check is true, we exit from the function's body. During initialization impl->delayedHeight, the h variable has non-zero value. Thus, impl->delayedHeight is false.
V557 [CERT-ARR30-C] Array overrun is possible. The '3' index is pointing beyond array bound. geometry_p.h 552
static BL_INLINE bool blIsCubicFlat(const BLPoint p[3], double f) {
if (p[3] == p[0]) {
// ....
}
// ....
}
In the signature of the blIsCubicFlat function, the p variable is declared as an array of 3 elements. Then, p[3] is calculated in the body of the blMemReadU16uBE function.
Declaring the const BLPoint p[3] argument in the function's signature equals declaring const BLPoint *p. The specified size is a hint to the developer. The compiler doesn't use the size in any way. Thus, array index out of bounds happens only if we pass an array of 3 or fewer elements to the function. If blIsCubicFlat receives an array of 4 elements or more, there is no array index out of bounds and the code works in a defined way. I looked at the blIsCubicFlat function call and realized that the array of 4 elements is passed to this function. This means that there's a mistake in the function's signature — a typo in the value of the array size.
V792 The '_isTagged' function located to the right of the operator '&' will be called regardless of the value of the left operand. Perhaps, it is better to use '&&'. style.h 209
BL_NODISCARD BL_INLINE bool isObject() const noexcept
{
return (data.type > BL_STYLE_TYPE_SOLID) & _isTagged();
}
Here the analyzer suggests using the && logical operator instead of bitwise &. The thing is, when we use bitwise &, both of its arguments are calculated regardless of what values are obtained. For example, if (data.type > BL_STYLE_TYPE_SOLID) is false, bitwise & returns 0 for any value of the second argument. However, the _isTagged function is called anyway.
If (data.type > BL_STYLE_TYPE_SOLID) is false, then the result of the && logical operator is also 0, regardless of the second argument. Here the _isTagged function is not called.
The only question is, do we want to call the _isTagged function always or only when it is necessary to calculate the result? This function may have some side effects, which we may want to use regardless of the calculation. To answer this question, I looked at the _isTagged function code:
BL_NODISCARD BL_INLINE bool _isTagged(uint32_t styleType) const noexcept {
As you see from the function's signature, _isTagged has the const modifier. This means that the function has no side effects.
Thus, using logical && instead of bitwise & in this code fragment allows us to avoid an unnecessary function call and reduces the program's execution time.
V595 [CERT-EXP12-C] The '_threadPool' pointer was utilized before it was verified against nullptr. Check lines: 158, 164. rasterworkermanager.cpp 158
class BLRasterWorkerManager {
public:
BLThreadPool* _threadPool;
uint32_t _workerCount;
// ....
}
// ....
void BLRasterWorkerManager::reset() noexcept {
// ....
if (_workerCount) {
// ....
_threadPool->releaseThreads(_workerThreads, _workerCount);
_workerCount = 0;
// ....
}
if (_threadPool) {
_threadPool->release();
_threadPool = nullptr;
}
// ....
}
The _threadPool pointer is dereferenced and then it's checked for nullptr. The question is: is it an error or just a redundant check? Let's try to figure it out.
When I examined the code, I realized the check was indeed redundant. We can simplify the code a bit. The following invariant is executed for the BLRasterWorkerManage class: the _threadPool pointer is null only when the _workerCount field equals 0.
Besides the reset method, fields workerCount and _threadPool are modified in two places: in the constructor and in the init method. Let's start with the constructor:
BL_INLINE BLRasterWorkerManager() noexcept
: // ....
_threadPool(nullptr),
// ....
_workerCount(0),
// ....
{}
Everything is easy here: we assign 0 to the _workerCount field, and nullptr to the _threadPool pointer. Invariant is obviously executed.
Not so easy with the init method:
BLResult BLRasterWorkerManager::init(....) noexcept {
// ....
uint32_t workerCount = threadCount - 1;
// ....
if (workerCount) {
// ....
BLThreadPool* threadPool = nullptr;
if (initFlags & BL_CONTEXT_CREATE_FLAG_ISOLATED_THREAD_POOL) {
threadPool = blThreadPoolCreate();
if (!threadPool)
return blTraceError(BL_ERROR_OUT_OF_MEMORY);
}
else {
threadPool = blThreadPoolGlobal();
}
// ....
uint32_t n = threadPool->acquireThreads(workerThreads,
workerCount, acquireThreadFlags, &reason);
// ....
if (!n) {
threadPool->release();
threadPool = nullptr;
// ....
}
// ....
_threadPool = threadPool;
// ....
_workerCount = n;
}
else {
// ....
}
}
First, we calculate the value of the workerCount local variable. Don't confuse it with the _workerCount field! If the variable's value is 0, then the else branch is executed. In this branch, both fields remain unchanged. So, we'll look only at the case where workerCount is not equal to 0 and the then branch is executed. In this case, first, the threadPool pointer (not _threadPool!) becomes equal to 0. Then, depending on a condition, this pointer is initialized by the result of calling either blThreadPoolCreate or blThreadPoolGlobal. If it's the blThreadPoolCreate function and it returns nullptr, then the no-return blTraceError function is called. We are not interested in the further execution. The blThreadPoolGlobal function looks like this:
static BLWrap<BLInternalThreadPool> blGlobalThreadPool;
BLThreadPool* blThreadPoolGlobal() noexcept { return &blGlobalThreadPool; }
This means that the blThreadPoolGlobal function returns a non-null pointer. Consequently, either we lose control over the code, or the threadPool pointer is not null. Let's go further:
uint32_t n = threadPool->acquireThreads(workerThreads, workerCount,
acquireThreadFlags, &reason);
Here, the value of the threads acquired is written to the n variable. The value may or may not be zero.
If n equals 0, the threadPool pointer is nulled. The _threadPool pointer is also nulled, the _workerCount field is assigned the value of the n variable — 0. As a result: _threadPool = nullptr, _workerCount = 0. In this case, the invariant is true.
Now let's assume n is not 0. In this case, the threadPool pointer remains non-null and its value is written to the _threadPool pointer. The _workerCount field is assigned non-zero value of n. As a result: _threadPool is not equal to nullptr; _workerCount is not equal to 0. In this case the invariant is also true.
So, the invariant is really true. We can use it and say that checks (_workerCount) and (_threadPool) are always both true or both false. So, we can simplify the code by combining two checks into one. Like that, for example:
void BLRasterWorkerManager::reset() noexcept {
// ....
if (_workerCount) {
assert(_threadPool);
for (uint32_t i = 0; i < _workerCount; i++)
_workDataStorage[i]->~BLRasterWorkData();
_threadPool->releaseThreads(_workerThreads, _workerCount);
_workerCount = 0;
_workerThreads = nullptr;
_workDataStorage = nullptr;
_threadPool->release();
_threadPool = nullptr;
}
// ....
}
V573 [CERT-EXP53-CPP] Uninitialized variable 'n' was used. The variable was used to initialize itself. pixelconverter.cpp 2210
static BLResult BL_CDECL bl_convert_multi_step(...., uint32_t w, ....)
{
for (uint32_t y = h; y; y--) {
uint32_t i = w;
workOpt.origin.x = baseOriginX;
dstData = dstLine;
srcData = srcLine;
while (i) {
uint32_t n = blMin(n, intermediatePixelCount);
srcToIntermediate(&ctx->first, intermediateData, 0,
srcData, srcStride, n, 1, nullptr);
intermediateToDst(&ctx->second, dstData, dstStride,
intermediateData, 0, n, 1, &workOpt);
dstData += n * dstBytesPerPixel;
srcData += n * srcBytesPerPixel;
workOpt.origin.x += int(n);
i -= n;
}
}
The following line triggered the analyzer:
uint32_t n = blMin(n, intermediatePixelCount);.
Agree, it's quite strange to declare a variable and use its uninitialized value. Looks like the developer wanted to write something like this:
uint32_t n = blMin(i, intermediatePixelCount);.
This looks better — the i variable is modified in the loop and is also used in the condition of breaking the loop.
V547 Expression 'x >= 5' is always true. pngcodec.cpp 588
static void blPngDeinterlaceBits(....) noexcept {
// ....
uint32_t x = w;
// ....
switch (n) {
case 2: {
// ....
if (x <= 4) break;
if (x >= 5) b = uint32_t(*d5++);
// ....
}
// ....
}
// ....
}
Let's assume that the value of the n variable is 2 and we go to the corresponding switch branch. If the value of the x variable is less than 5, the loop breaks. This means that check x >= 5 is always true.
It's hard to say where the error is. Maybe this check is redundant and we need to remove it. Maybe the developer intended to compare x with another value. Here's one of the possible fixes:
static void blPngDeinterlaceBits(....) noexcept {
....
uint32_t x = w;
....
switch (n) {
case 2: {
// ....
if (x <= 4) break;
b = uint32_t(*d5++);
// ....
}
// ....
}
// ....
}
V524 It is odd that the body of 'end' function is fully equivalent to the body of 'begin' function. string.h 258
class BLString : public BLStringCore
{
public:
// ....
BL_NODISCARD
BL_INLINE const char* begin() const noexcept
{
return impl->data + impl->size;
}
BL_NODISCARD
BL_INLINE const char* end() const noexcept
{
return impl->data + impl->size;
}
// ....
}
Obviously, a copy-paste error. When a developer implemented the begin method, they copied the end method and forgot to change the method's body. Corrected version:
BL_NODISCARD BL_INLINE const char* begin() const noexcept
{
return impl->data;
}
I suppose the readers have a question: "Wait, how did it happen? We usually write code from top to bottom. Why do you claim that the end method was copied and renamed into begin, and not vice versa?" This question is quite logical, so I present a small investigation of this warning.
First, the BLString has the data method. It looks like this:
BL_NODISCARD
BL_INLINE const char* data() const noexcept { return impl->data; }
And look at how many times it's used:
At the same time the begin method is not used at all:
Second, I found the following comment before the begin method:
//! Returns a pointer to the beginning of string data (iterator compatibility)
Now when we found all the evidence, let me tell you what happened.
The BLString class had the data and end methods. Everything was great. But then the Blend2D developers thought about iterator compatibility. In particular, they wanted to make the following fragment work:
BLString str;
for( auto symb : str ) { .... }
The BLString class needed to have methods begin and end. So, the developers wrote the missing begin method. It's more logical to copy the data method. It does the same thing as begin. But when developers support iterator compatibility, they don't think about the data method at all. This method has nothing to do with it. Developers think about the end method. They need it for iterator compatibility, and it's already implemented. So why not copy it? They did copy it, they forgot to change the body, and they got an error.
What does it lead to? Most likely, the begin method is not called directly, the data method is used instead. At the same time, the range-based for loop (the example above) still doesn't work. The code is compiled but does not iterate through the string.
V523 The 'then' statement is equivalent to the 'else' statement. pixelconverter.cpp 1215
template<typename PixelAccess, bool AlwaysUnaligned>
static BLResult BL_CDECL bl_convert_argb32_from_prgb_any(....)
{
for (uint32_t y = h; y != 0; y--) {
if (!AlwaysUnaligned && blIsAligned(srcData, PixelAccess::kSize))
{
for (uint32_t i = w; i != 0; i--) {
uint32_t pix = PixelAccess::fetchA(srcData);
uint32_t r = (((pix >> rShift) & rMask) * rScale) >> 16;
uint32_t g = (((pix >> gShift) & gMask) * gScale) >> 8;
uint32_t b = (((pix >> bShift) & bMask) * bScale) >> 8;
uint32_t a = (((pix >> aShift) & aMask) * aScale) >> 24;
BLPixelOps::unpremultiply_rgb_8bit(r, g, b, a);
blMemWriteU32a(dstData, (a << 24) | (r << 16) | (g << 8) | b);
dstData += 4;
srcData += PixelAccess::kSize;
}
}
else {
for (uint32_t i = w; i != 0; i--) {
uint32_t pix = PixelAccess::fetchA(srcData);
uint32_t r = (((pix >> rShift) & rMask) * rScale) >> 16;
uint32_t g = (((pix >> gShift) & gMask) * gScale) >> 8;
uint32_t b = (((pix >> bShift) & bMask) * bScale) >> 8;
uint32_t a = (((pix >> aShift) & aMask) * aScale) >> 24;
BLPixelOps::unpremultiply_rgb_8bit(r, g, b, a);
blMemWriteU32a(dstData, (a << 24) | (r << 16) | (g << 8) | b);
dstData += 4;
srcData += PixelAccess::kSize;
}
}
// ....
}
}
Another example of a copy-paste error. In this code fragment, branches else and then are completely identical. Obviously, the developer forgot to change the code on one of the branches, but I can't offer any fix here.
V1044 Loop break conditions do not depend on the number of iterations. otcmap.cpp 59
#if defined(__GNUC__)
#define BL_LIKELY(...) __builtin_expect(!!(__VA_ARGS__), 1)
#define BL_UNLIKELY(...) __builtin_expect(!!(__VA_ARGS__), 0)
#else
#define BL_LIKELY(...) (__VA_ARGS__)
#define BL_UNLIKELY(...) (__VA_ARGS__)
#endif
....
static BLResult BL_CDECL mapTextToGlyphsFormat0(....) noexcept {
// ....
uint32_t* ptr = content;
uint32_t* end = content + count;
// ....
while (ptr != end) {
uint32_t codePoint = content[0];
uint32_t glyphId = codePoint < 256
? uint32_t(glyphIdArray[codePoint].value())
: uint32_t(0);
content[0] = glyphId;
if (BL_UNLIKELY(glyphId == 0)) {
if (!undefinedCount)
state->undefinedFirst = (size_t)(ptr - content);
undefinedCount++;
}
}
// ....
}
This code fragment may cause looping. Variables ptr and end don't change within the loop. If condition ptr != end is true, we get an infinite loop. Looks like the developer forgot to add the ptr pointer increment. We can fix the code like this:
while (ptr != end) {
uint32_t codePoint = content[0];
uint32_t glyphId = codePoint < 256
? uint32_t(glyphIdArray[codePoint].value())
: uint32_t(0);
content[0] = glyphId;
if (BL_UNLIKELY(glyphId == 0)) {
if (!undefinedCount)
state->undefinedFirst = (size_t)(ptr - content);
undefinedCount++;
}
++ptr;
}
The analyzer issued another warning for this loop:
V776 Potentially infinite loop. The variable in the loop exit condition 'ptr != end' does not change its value between iterations. otcmap.cpp 59
Of course, this project doesn't have as many errors as large projects with about a million code lines. But we expected that.
However, this project has some impressive errors. What does this mean?
First, even small projects have errors. Which means, we need to find them and fix them :)
Second, a small codebase is not a guarantee that all errors will be found during code review. Sometimes developers miss an error after reading the code several times.
But static analysis tools don't miss them. A static analyzer is ready to search for errors in code at any time of the day. It doesn't need to rest. And most importantly — its all-seeing eye spies every typo in code!
If you are interested in static analysis and PVS-Studio - it's high time to try it. Just download a free version of the analyzer. Thank you for reading!
0