Webinar: Evaluation - 05.12
We recently found out that the source code of the Captain Blood game (formerly known as Age of Pirates: Captain Blood) has been opened. We couldn't pass it up so we checked the game's quality with the help of PVS-Studio. Will Captain Blood's legendary ship sink into the sea abyss because of the detected bugs? Let's find out!
Captain Blood is an action-adventure game made by 1C Company and based on the novels by Rafael Sabatini. The game follows the adventures of Captain Peter Blood, the main character of Sabatini's novel. The action takes place in New England. The protagonist of the game is an Irish doctor who has been unjustly convicted of treason against the king and exiled to the island of Barbados. Peter Blood managed to escape from the island by capturing a Spanish ship. Thus, he became the most famous pirate of that time.
The project was never released and was closed in 2010. This happened because 1C and Playlogic were unable to decide who owns the rights to the game. There were rumors that the final build of the game had been irretrievably lost. Recently, some news came out proving the opposite. A working build of the game appeared on a popular torrent tracker, and a walkthrough of the game was posted on YouTube. Later, the the game's source code was posted on GitHub under the GPLv3 license.
So, after we got learned more about the project, let's discuss the most interesting warnings that PVS-Studio found when analyzing it.
Warning N1
V1075 The function expects the file to be opened for writing, but it was opened for reading. Debugger.cpp 172
We can congratulate our teammate Sergey Larin on the first "trophy" warning of his diagnostic rule (it was released in PVS-Studio 7.15). According to this diagnostic rule, data is written into files that are opened for reading and vice versa. Honestly, we didn't even expect that this warning would ever be issued. Here is this problematic function:
void appDebuger::CopyDump(const char * src, DWORD from, const char * dst)
{
FILE * file = fopen(src, "rb");
if(file)
{
fseek(file, 0, SEEK_END);
DWORD size = ftell(file);
fseek(file, from, SEEK_SET);
if(size > from)
{
FILE * to = fopen(dst, "a+b");
if(to)
{
char buf[128];
while(from < size)
{
DWORD s = size - from;
if(s > sizeof(buf)) s = sizeof(buf);
memset(buf, ' ', sizeof(buf));
fread(buf, s, 1, file);
fwrite(buf, s, 1, file); // <=
from += s;
}
fclose(to);
}
}
fclose(file);
}
}
So, the function should read data from a file at the src path, starting at the from position, and write it to another file at the dst path. The file variable is responsible for the source file, the to variable is responsible for the resulting file.
Unfortunately, we can see reading from the file variable and writing to the file variable in the code. We can fix it this way:
fwrite(buf, s, 1, to);
Warning N2
V1065 [CWE-682] Expression can be simplified, check 'bi' and similar operands. PreviewAnimation.cpp 146
PreviewAnimation::PreviewAnimation(....)
{
// ....
const int bw = 30;
const int bh = 30;
const int bi = 3;
const int cn = 9;
bar.r.w = 7 + (bw + bi)*cn + 7 - bi + 1 + (bw + bi) + 7 - bi + 1;
// ....
}
That's one of my favorite diagnostic rules! It indicates that an expression can be simplified by removing the bi variable:
bar.r.w = 7 + (bw + bi)*cn + 7 - bi + 1 + bw + 7 + 1;
Maybe there should be another variable instead of one of the bi. Or the (bw + bi) expression was supposed to be multiplied by something. I wonder how the game animation would change...
I would like to mention that it's a bad practice to give constants such names (bw, bh, bi, cn). They can make the code confusing to the person reading it. It would be better to make the names of the variables more meaningful or leave a comment to identify them.
Warning N3
V570 [CWE-480] The 'ldMaxChars' variable is assigned to itself. StringAttr.cpp 198
void StringAttribute::LoadFromFile (....)
{
int ldMinChars, ldMaxChars;
// ....
minChars = ldMinChars;
ldMaxChars = ldMaxChars;
// ....
}
class StringAttribute : public BaseAttribute
{
int minChars;
int maxChars;
}
The usual assignment of a variable to itself takes place here. Let's look at the definition of the StringAttribute class to see what should be in place of the variable where the assignment occurs. Here we can see the similar maxChars data member. Obviously, the developer made a mistake and actually should have assigned the ldMaxChars value to the variable. Here's the fixed version of the code:
void StringAttribute::LoadFromFile (....)
{
int ldMinChars, ldMaxChars;
// ....
minChars = ldMinChars;
maxChars = ldMaxChars;
// ....
}
Warning N4
V781 [CWE-20, CERT-API00-C] The value of the 'start' index is checked after it was used. Perhaps there is a mistake in program logic. coreloader.h 136
__forceinline void GetBootParams(const char * bootIni)
{
long start = 0;
// ....
if (bootIni[start] && start > 0)
{
// ....
}
// ....
}
In this code fragment, the developer first used the start value to index over the bootIni array, and only then checked it for more than zero. This check may be pointless, then it would be better to remove it. The check can make sense, then it should be interchanged with the array subscripting:
if (start > 0 && bootIni[start])
{
// ....
}
Otherwise, the developer runs the risk of crashing the game. In fact, quite a few warnings of this type are issued for this code. Here are a few that are worth paying attention:
Warning N5
V700 [CWE-480] Consider inspecting the 'mView = mView = chr.Render().GetView()' expression. It is odd that variable is initialized through itself. CharacterItems.cpp 705
void CharacterItems::DrawFlares(float dltTime, const Matrix & toWorld)
{
// ....
Matrix mView = mView = chr.Render().GetView();
// ....
}
Here the mView variable is assigned to itself, that's pretty weird. There must have been a typo. Anyway, this redundant assignment can be removed:
void CharacterItems::DrawFlares(float dltTime, const Matrix & toWorld)
{
// ....
Matrix mView = chr.Render().GetView();
// ....
}
Warning N6
V654 [CWE-834] The condition 'n >= 0' of loop is always true. resourceselect.cpp 42
typedef unsigned int dword;
TResourceSelectorWindow::TResourceSelectorWindow ()
{
string PakPath;
// ....
miss->EditorGetPackPath(PakPath);
//....
for (dword n = PakPath.Size()-1; n >= 0; n--)
{
if (PakPath[n] == '\\')
{
PakPath.Delete(0, n+1);
break;
}
}
// ....
}
In this case, we see a potentially infinite loop, if there is no \ in the string. The n variable is of unsigned type, which means that it will always be greater than or equal to 0.
But in fact, there will be no infinite loop, because when the n variable overflows, there will be a failure. Overflowing of an unsigned variable does not lead to undefined behavior, but accessing beyond the array boundaries will. Actually, an access violation is likely to occur.
Here the developer has written an algorithm, that searches for the first occurrence of \ from the end. Then the algorithm removes a substring from the beginning to this character inclusively. The best way to fix this code is to use rbegin and rend, but there's a nuance. We are dealing with a self-written string (a std::string analog with extra functionality), that has no iterators. Then, to fix the code, we can take a signed variable twice the size (int64_t) and run a loop with it.
for (int64_t n = PakPath.Size() - 1; n >= 0; n--)
{
if (PakPath[n] == '\\')
{
PakPath.Delete(0, n+1);
break;
}
}
However, this will not work if you port the application to a 64-bit system, where the Size function returns a value of the uint64_t type. In this case, using iterators would be a good way to write correct code. For example, here's what the code in C++14 would look like if the self-written string still doesn't have iterators, but at least has functions to access the underlying buffer:
auto r_begin = std::make_reverse_iterator(PakPath.GetBuffer() + PakPath.Size());
auto r_end = std::make_reverse_iterator(PakPath.GetBuffer());
while (r_begin != r_end )
{
if (*r_begin == '\\')
{
PakPath.Delete(0, std::distance(r_begin, r_end));
break;
}
++r_begin;
}
Warning N7
V611 [CWE-762, CERT-MEM51-CPP] The memory was allocated using 'new T[]' operator but was released using the 'delete' operator. Consider inspecting this code. It's probably better to use 'delete [] ldName;'. ColorAttr.cpp 198
void ColorAttribute::LoadFromFile (....)
{
DWORD slen = 0;
// ....
char* ldName = NEW char[slen+1];
// ....
delete ldName;
}
Here the developer has cleared the dynamically allocated memory, but in the wrong way. Instead of calling the operator delete[] to release previously allocated memory by the operator new[], the developer called a simple delete. Here is the fixed version:
void ColorAttribute::LoadFromFile (....)
{
DWORD slen = 0;
// ....
char* ldName = NEW char[slen+1];
// ....
delete ldName[];
}
This error is discussed in more detail in the article written by my colleague: "Why do arrays have to be deleted via delete[] in C++".
Warning N8
V607 Ownerless expression 'isSetPause ? 1 : 0'. SoundsEngine.cpp 741
void SoundsEngine::SetPause(bool isSetPause)
{
//....
isSetPause ? 1 : 0;
//....
}
The analyzer detects dead code. I've noticed that the SoundsEngine class contains the similar data member — isPause, but it has the long type. The code might have done a conversion of the isSetPause parameter from bool to long type earlier, but then it was refactored. This line is harmless and can be removed.
Warning N9
V583 [CWE-783] The '?:' operator, regardless of its conditional expression, always returns one and the same value: 0xffff0000. NoFatality.cpp 63
void _cdecl NoFatality::Draw(float dltTime, long level)
{
//....
for (....)
{
Render().DrawBox(...., IsActive() ? 0xffff0000 : 0xffff0000);
}
//....
}
As we can see, the second and the third operands of the conditional operator are identical. So, the second argument of the DrawBox function is always the 0xffff0000 constant. It's likely that a different value should have been passed in one of the branches, otherwise the code can be simplified.
Warning N10
V523 [CWE-691] The 'then' statement is equivalent to the 'else' statement. TaskViewer.cpp 298
void TaskViewer::SetText(const char * rawText)
{
// ....
if (rawText[0] == '#')
{
// Let's replace the identifier with a real string
// then
str = rawText;
}
else
{
str = rawText;
}
// ....
}
In this code fragment, the analyzer has warned us that the if statement has identical then and else branches. So, regardless the condition, the same code fragment is executed. This may not be the behavior the developer expected.
Warning N11
V614 [CWE-457, CERT-EXP53-CPP] Uninitialized variable 'color.c' used. Color.h 1268
mathinline dword mathcall Color::GetDword() const
{
DColor color;
color.r = (byte)(r * 255.0f);
color.g = (byte)(g * 255.0f);
color.b = (byte)(b * 255.0f);
color.a = (byte)(a * 255.0f);
return color.c;
}
Let's find out what is DColor:
class DColor
{
public:
union
{
#ifndef _XBOX
struct
{
///Blue
unsigned char b;
///Green
unsigned char g;
///Red
unsigned char r;
///Transparency
unsigned char a;
};
#else
struct
{
///Transparency
unsigned char a;
///Red
unsigned char r;
///Green
unsigned char g;
///Blue
unsigned char b;
};
#endif
union
{
///Packed color
dword c;
///Packed color
dword color;
};
};
The four RGBA components are written byte-by-byte to the required data members, and then returned from the function as value of the dword type (the 4-byte unsigned type). If this code were written in the C language, it would be harmless and the developer would get the expected behavior. In C++ reading the union inactive data member results in undefined behavior. This is because C++ works with objects that should start their lifetime. Only one data member can be active at any given time. The lifetime of the inactive data member (the last union data member) has not started.
Some compilers allow such code to be written as a non-standard compiler extension, and in most cases this code works as expected. However, please note that this code is non-portable to other compilers. This code may work correctly on the current compiler and platform, but if we change a tiny little thing, it may break.
So, how do we write code in C++ that would work the same way always and everywhere? There are several ways to do it. We can keep the four RGBA components in four data members and (if necessary) get their combination as a dword using bitwise operators:
class DColor
{
public:
///Blue
uint8_t b;
///Green
uint8_t g;
///Red
uint8_t r;
///Transparency
uint8_t a;
dword getAsDword() const noexcept
{
#ifndef _XBOX
return (static_cast<uint32_t>(b) << 0)
| (static_cast<uint32_t>(g) << 8)
| (static_cast<uint32_t>(r) << 16)
| (static_cast<uint32_t>(a) << 24);
#else
return (static_cast<uint32_t>(a) << 0)
| (static_cast<uint32_t>(r) << 8)
| (static_cast<uint32_t>(g) << 16)
| (static_cast<uint32_t>(b) << 24);
#endif
}
};
There is another way. We can use a 4-byte array that can be converted to a dword with the memcpy function or std::bit_cast since C++20:
class DColor
{
private:
uint8_t m_components[4];
public:
DColor(uint8_t r, uint8_t g, uint8_t b, uint8_t a) noexcept;
DColor(dword comp) noexcept; // To construct from dword
DColor& operator=(dword) noexcept; // To be assigned to dword
uint8_t& R() noexcept;
uint8_t& G() noexcept;
uint8_t& B() noexcept;
uint8_t& A() noexcept;
dword getAsDword() const noexcept
{
#if __cplusplus >= 202002L && __has_cpp_attribute(__cpp_lib_bit_cast)
return std::bit_cast<dword>(m_components);
#else
dword result;
memcpy(&result, m_components, sizeof(m_components));
return result;
#endif
}
};
My colleague, Andrey Karpov, made a comment on this. The use of conditional compilation directives makes the code less comprehensible. The nesting is broken and then we have to guess what the code is about by going through various code fragments in our mind.
We need to write code not to make it as short as possible, but to make it easier to read and understand. I would make two different implementations of the function here:
#if __cplusplus >= 202002L && __has_cpp_attribute(__cpp_lib_bit_cast)
dword getAsDword() const noexcept
{
return std::bit_cast<dword>(m_components);
}
#else
dword getAsDword() const noexcept
{
dword result;
memcpy(&result, m_components, sizeof(m_components));
return result;
}
#endif
The code is a little longer, but it's much more readable that way.
If we are dealing with long functions, this will not work with conditional compilation fragments. In this case, we can enclose these conditional compilation fragments in separate functions, which are implemented differently in different cases. It means that we need to separate the general code from the compilation conditions.
Warning N12
V579 [CWE-687, CERT-ARR01-C] The memset function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. Debugger.cpp 282
void appDebuger::debuggerMakeThreadDump(....)
{
CONTEXT ct;
memset (&ct, 0, sizeof(&ct));
// ....
}
typedef struct DECLSPEC_NOINITALL _CONTEXT
{
DWORD ContextFlags;
// ....
DWORD SegCs; // MUST BE SANITIZED
DWORD EFlags; // MUST BE SANITIZED
DWORD Esp;
DWORD SegSs;
BYTE ExtendedRegisters[MAXIMUM_SUPPORTED_EXTENSION];
} CONTEXT;
The developer intended to initialize all structure data members with zeros. The size of the ct object is 716 bytes, and they passed the size of the pointer – 4 bytes. We can fix the memset call this way:
memset (&ct, 0, sizeof(ct));
P.S. Oddly enough, developers are still using this structure initialization pattern with the memset function, although they could write this way:
CONTEXT ct {};
Warning N13
I would like to finish listing the errors with a few amusing typos:
void CharacterLegs::Invalidate()
{
lleg.th = ani->FindBone("Bedro_left_joint" ,true);
lleg.kn = ani->FindBone("Golen_left_joint" ,true);
lleg.fo = ani->FindBone("Foolt_left_joint" ,true);
lleg.to = ani->FindBone("Foolt_left_joint_2",true);
rleg.th = ani->FindBone("Bedro_right_joint" ,true);
rleg.kn = ani->FindBone("Golen_right_joint" ,true);
rleg.fo = ani->FindBone( "Foot_right_joint" ,true);
rleg.to = ani->FindBone( "Foot_right_joint_2",true);
if ( lleg.th >= 0 && lleg.kn >= 0
&& lleg.fo >= 0 && lleg.to >= 0
&& rleg.th >= 0 && rleg.kn >= 0
&& lleg.fo >= 0 && lleg.to >= 0 )
{
// ....
}
// ....
}
Here the analyzer found identical subexpressions in the condition of the if statement — lleg.to >= 0 and lleg.fo >= 0. As you can see, the developer intended to check all the data members but made a typo. Instead of two identical subexpressions there must be checks for other variables:
if ( lleg.th >= 0 && lleg.kn >= 0
&& rleg.th >= 0 && rleg.kn >= 0
&& lleg.fo >= 0 && lleg.to >= 0
&& rleg.fo >= 0 && rleg.to >= 0)
{
// ....
}
The analyzer found various kinds of errors in the project. I have described the Captain Blood's thirteen in this article, yo-ho-ho! You can find other bugs with the PVS-Studio analyzer as well. That's a good incentive to give the analyzer a try (if you have not yet). And it would be even more compelling to see what you can find in your own code or that of your teammates :)
I'd like to thank people who made it possible to finally see the game, as well as those who developed it and supported the game with reviews, articles and discussions. Well, whether the "Arabella" would fo down or not, let the readers decide for themselves!
0