Developing and playing games can be an incredibly engaging, addictive, and satisfying activity. But nothing spoils the gameplay experience like a tricky hidden bug. So, today we'll be looking at an open-source game engine—Godot Engine. Let's see how good it is, and if it's ready to give us unforgettable emotions of creating and playing games.
Godot is a versatile 2D and 3D game engine designed to power all kinds of projects. You can use it to create games or applications and then release them on desktop, mobile, or web platforms.
The engine is still young, but it's already gaining traction among those who appreciate open-source code, freeware, and good extensibility. Godot is quite beginner-friendly, so it's popular with novice developers.
Games like 1000 Days to Escape, City Game Studio: Your Game Dev Adventure Begins, Precipice, and others have been developed using the engine.
The version of Godot Engine that we used for the analysis is 4.2.2.
By the way, we've already checked Godot Engine in 2018. You can read the previous article here.
The first thing we'd like to start with after looking at the report is the macro issues in the project. Parameters aren't enclosed in parentheses. Let's look at a few examples of how this can shoot you in the foot.
Fragments N1-N2
#define HAS_WARNING(flag) (warning_flags & flag)
We need this macro to check whether a certain warning flag is set or not.
The warning_flags variable is a bitmask of the uint_32t type. This means that its value is 32 bits, where each bit is 1 if the flag is set and 0 if it isn't. The macro is used in conditional statements, where it's implicitly converted to the bool type. To understand how it works, we'll take a simplified version with 8 bits instead of 32.
Let's say we have an X flag that corresponds to the 4th bit in the mask and it's currently set. Then, the value of the warning_flags variable in the binary system looks like this:
00001000
Now let's say that we decided to use our macro to check if the X flag is set.
We pass the flag variable with the 00001000 value to the macro, and the bitwise AND results in a non-zero value that is converted to bool with a value to true.
Now let's say we want to check the Y flag that corresponds to the third bit with the same value of the warning_flags variable. We pass the variable with the 00000100 value to the macro, and the bitwise AND results in a zero value that is converted to bool with the value false.
Everything seems great. What can possibly go wrong? But what if someone wants to check if one of those flags is set? They can call the macro as follows:
if (HAS_WARNING(flags::X | flags::Y)) ....
Then, the result of the operation will always be true, even if none of the flags is set. Why does this happen? Let's pretend we're a preprocessor and just substitute the expression passed to the macro:
if (warning_flags & flags::X | flags::Y) ....
Now we'll look at the operator precedence table:
Precedence |
Operator |
Description |
Associativity |
---|---|---|---|
.... |
.... |
.... |
.... |
11 |
a & b |
Bitwise AND |
Left-to-right |
.... |
.... |
.... |
.... |
13 |
| |
Bitwise OR |
Left-to-right |
Operators are listed from top to bottom in the descending precedence order. This means that the expression is processed as follows:
if (( warning_flags & flags::X ) | flags::Y) ....
Let's say that the X and Y flags we're interested in aren't set in warning_flags. Then, the first bitwise AND operation returns 0, and the Y flag is set to it. We get an always true check.
So, the analyzer issues the following warning for this macro:
V1003 The macro 'HAS_WARNING' is a dangerous expression. The parameter 'flag' must be surrounded by parentheses. shader_language.cpp 40
As the message says, to fix it we just need to enclose the macro parameter in parentheses:
#define HAS_WARNING(flag) (warning_flags & (flag))
Here's another example of a dangerous macro:
#define IS_SAME_ROW(i, row) (i / current_columns == row)
The analyzer warning:
V1003 The macro 'IS_SAME_ROW' is a dangerous expression. The parameters 'i', 'row' must be surrounded by parentheses. item_list.cpp 643
If we pass an expression to the macro instead of a single variable, like this:
IS_SAME_ROW(current + 1, row)
Then, as a result of preprocessing substitution, we get the following:
(current + 1 / current_columns == row)
The calculation order here isn't what you'd expect.
To protect against such situations, simply wrap the macro parameters in parentheses:
#define IS_SAME_ROW(i, row) ((i) / current_columns == (row))
Fragment N3
Now let's look at the following condition:
const auto hint_r = ShaderLanguage::ShaderNode::Uniform::HINT_ROUGHNESS_R;
const auto hint_gray = ShaderLanguage::ShaderNode::Uniform::HINT_ROUGHNESS_GRAY;
if (tex->detect_roughness_callback
&& ( p_texture_uniforms[i].hint >= hint_r
|| p_texture_uniforms[i].hint <= hint_gray))
{
....
}
This condition is always true (except when the tex->detect_roughness_callback pointer is null).
To understand why this is the case, we need to look at enum Hint in the Uniform structure:
struct Uniform
{
....
enum Hint
{
HINT_NONE,
HINT_RANGE,
HINT_SOURCE_COLOR,
HINT_NORMAL,
HINT_ROUGHNESS_NORMAL,
HINT_ROUGHNESS_R,
HINT_ROUGHNESS_G,
HINT_ROUGHNESS_B,
HINT_ROUGHNESS_A,
HINT_ROUGHNESS_GRAY,
HINT_DEFAULT_BLACK,
HINT_DEFAULT_WHITE,
HINT_DEFAULT_TRANSPARENT,
HINT_ANISOTROPY,
HINT_SCREEN_TEXTURE,
HINT_NORMAL_ROUGHNESS_TEXTURE,
HINT_DEPTH_TEXTURE,
HINT_MAX
};
....
};
Under the hood of such enum is an integer type, and the HINT_ROUGHNESS_R and HINT_ROUGHNESS_GRAY values correspond to the 5 and 9 numbers.
Based on the above, the condition checks that p_texture_uniforms[i].hint >= 5 or p_texture_uniforms[i].hint <= 9. This means that any value of p_texture_uniforms[i].hint passes these checks. This is what the PVS-Studio analyzer warns about:
V547 Expression is always true. material_storage.cpp 929
In reality, the programmer wanted to check if p_texture_uniforms[i].hint was in the range of 5 to 9. To do this, they had to use the logical "AND":
if (tex->detect_roughness_callback
&& ( p_texture_uniforms[i].hint >= hint_r
&& p_texture_uniforms[i].hint <= hint_gray))
{
....
}
Here's a similar warning:
Fragment N4
Try to find the error here:
Error FontFile::load_bitmap_font(const String &p_path)
{
if (kpk.x >= 0x80 && kpk.x <= 0xFF)
{
kpk.x = _oem_to_unicode[encoding][kpk.x - 0x80];
} else if (kpk.x > 0xFF){
WARN_PRINT(vformat("Invalid BMFont OEM character %x
(should be 0x00-0xFF).", kpk.x));
kpk.x = 0x00;
}
if (kpk.y >= 0x80 && kpk.y <= 0xFF)
{
kpk.y = _oem_to_unicode[encoding][kpk.y - 0x80];
} else if (kpk.y > 0xFF){
WARN_PRINT(vformat("Invalid BMFont OEM character %x
(should be 0x00-0xFF).", kpk.x));
kpk.y = 0x00;
}
....
}
The analyzer warning:
V778 Two similar code fragments were found. Perhaps, this is a typo and 'y' variable should be used instead of 'x'. font.cpp 1970
So, PVS-Studio has found an error that occurred when copying a code fragment. Let's take a closer look at conditional blocks. They're identical except that in the first case all operations are performed on kpk.x, while in the second case they are performed on kpk.y.
However, the second condition got an error because of a copy-paste operation. Take a look at the WARN_PRINT call: if kpk.y > 0xFF, the kpk.x character is displayed when the warning is issued, not kpk.y. Searching for an error based on logs is more difficult :)
P.S.: the code really shouldn't have been multiplied like that. You can clearly see that the two code blocks differ only in the applied field. Putting the code in a function and calling it twice for different fields would be a better option:
Error FontFile::load_bitmap_font(const String &p_path)
{
constexpr auto check = [](auto &ch)
{
if (ch >= 0x80 && ch <= 0xFF)
{
auto res = _oem_to_unicode[encoding][ch - 0x80];
ch = res;
}
else if (ch > 0xFF)
{
WARN_PRINT(vformat("Invalid BMFont OEM character %x
(should be 0x00-0xFF).",ch));
ch = 0x00;
}
};
check(kpk.x);
check(kpk.y);
....
}
Fragment N5
Here are more conditions, but they're nested:
void GridMapEditor::_mesh_library_palette_input(const Ref<InputEvent> &p_ie)
{
const Ref<InputEventMouseButton> mb = p_ie;
// Zoom in/out using Ctrl + mouse wheel
if (mb.is_valid() && mb->is_pressed() && mb->is_command_or_control_pressed())
{
if (mb->is_pressed() && mb->get_button_index() == MouseButton::WHEEL_UP)
{
size_slider->set_value(size_slider->get_value() + 0.2);
}
....
}
}
The analyzer warning:
V571 Recurring check. The 'mb->is_pressed()' condition was already verified in line 837. grid_map_editor_plugin.cpp 838
This fragment contains a redundant check in the nested if statement. The mb->is_pressed() expression has already been checked above. Maybe it's a double check (it's common in GUIs), but then the developers should've added a comment about it. Or maybe something else should've been checked.
Here are similar warnings:
Fragment N6
And this is a classic case of dereferencing the pointer before checking it:
void GridMapEditor::_update_cursor_transform()
{
cursor_transform = Transform3D();
cursor_transform.origin = cursor_origin;
cursor_transform.basis = node->get_basis_with_orthogonal_index(cursor_rot);
cursor_transform.basis *= node->get_cell_scale();
cursor_transform = node->get_global_transform() * cursor_transform;
if (selected_palette >= 0)
{
if (node && !node->get_mesh_library().is_null())
{
cursor_transform *= node->get_mesh_library()
->get_item_mesh_transform(selected_palette);
}
}
....
}
The analyzer warning:
V595 The 'node' pointer was utilized before it was verified against nullptr. Check lines: 246, 251. grid_map_editor_plugin.cpp 246
It's rather strange to dereference a pointer and then check it a few lines down. Perhaps the dereferencing appeared in the code later than the check, and the developer didn't notice the check below.
Similar warnings:
Fragment N7
template <class T, class U = uint32_t,
bool force_trivial = false, bool tight = false>
class LocalVector
{
....
public:
operator Vector<T>() const
{
Vector<T> ret;
ret.resize(size());
T *w = ret.ptrw();
memcpy(w, data, sizeof(T) * count);
return ret;
}
....
};
The analyzer warning:
V780 Instantiation of LocalVector < AnimationCompressionDataState >: The object 'w' of a non-passive (non-PDS) type cannot be copied using the memcpy function. local_vector.h 280
This is an interesting fragment. The LocalVector class template has a conversion operator to the Vector class. In this conversion, the contents of the current vector need to be copied to the new one. To do this, the devs used the memcpy function.
It's all good as long as the T template type is trivially copyable. However, the analyzer has detected various specializations of LocalVector, which has this property violated. As an example, let's look at the LocalVector<AnimationCompressionDataState> specialization:
struct AnimationCompressionDataState
{
uint32_t components = 3;
LocalVector<uint8_t> data; // Committed packets.
struct PacketData
{
int32_t data[3] = { 0, 0, 0 };
uint32_t frame = 0;
};
float split_tolerance = 1.5;
LocalVector<PacketData> temp_packets;
// used for rollback if the new frame does not fit
int32_t validated_packet_count = -1;
....
};
The AnimationCompressionDataState class contains a LocalVector that is not trivially copyable.
For this case, there's a note in the memcpy documentation: "If the objects are potentially-overlapping or not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined".
We can easily fix the code by replacing the memcpy call with std::uninitialized_copy:
operator Vector<T>() const
{
Vector<T> ret;
ret.resize(size());
T *w = ret.ptrw();
std::uninitialized_copy(data, data + count, w);
return ret;
}
The PVS-Studio analyzer has detected 38 more dangerous specializations, but I won't give the full list, of course:
Fragment N8
Here's a possible violation of the program logic:
Dictionary GDScriptSyntaxHighlighter::_get_line_syntax_highlighting_impl
(int p_line)
{
const String &str = text_edit->get_line(p_line);
....
if ( is_digit(str[non_op])
|| ( str[non_op] == '.'
&& non_op < line_length
&& is_digit(str[non_op + 1]) ) )
{
in_number = true;
}
....
}
The analyzer warning:
V781 The value of the 'non_op' index is checked after it was used. Perhaps there is a mistake in program logic. gdscript_highlighter.cpp 370
The non_op value is first used as an index when accessing characters in the string. Only then the value is checked if it's less than the length.
Take a look at the access to the string after the check. If non_op < line_length, it doesn't mean that (non_op + 1) < line_length. So, an index out of bounds can happen in str[non_op + 1]. Especially since there are no null-terminated strings under the String hood.
A correct check should look like this:
if ( is_digit(str[non_op])
|| ( str[non_op] == '.'
&& non_op + 1 < line_length
&& is_digit(str[non_op + 1]) ) )
{
in_number = true;
}
Fragment N9
struct Particles
{
....
int amount = 0;
....
};
void ParticlesStorage::_particles_update_instance_buffer(
Particles *particles,
const Vector3 &p_axis,
const Vector3 &p_up_axis)
{
....
uint32_t lifetime_split = ....;
// Offset VBO so you render starting at the newest particle.
if (particles->amount - lifetime_split > 0)
{
....
}
....
}
The analyzer warning:
V555 The expression 'particles->amount - lifetime_split > 0' will work as 'particles->amount != lifetime_split'. particles_storage.cpp 959
The example is interesting because, despite the not-so-correct warning issued by the analyzer, it shows what the developers should pay attention to.
If the difference between two unsigned variables is greater than zero, then the expression is semantically equal to particles->amount != lifetime_split. The condition is considered false only if the variables are equal. If the left operand is less than the right one, a wrap-around occurs, and the resulting expression will be greater than zero. If the left operand is greater than the right one, the difference will be greater than zero.
However, the other thing to note here is that both variables have the same rank, but different signedness. The standard requires a compiler to perform implicit conversions before performing the subtraction. In this case, the unsigned 32-bit int is the common type. This can also add a few surprises if the left operand has a negative number in it.
The most correct check when signed and unsigned type expressions of the same rank are involved looks as follows:
if (particles->amount >= 0 && particles->amount > lifetime_split)
In fact, we've reinvented std::cmp_greater, which was introduced in C++20. Starting from this version of the standard, we can write concise code:
if (std::cmp_greater(particles->amount, lifetime_split))
Fragment N10
void AnimationNodeStateMachineEditor::_delete_tree_draw()
{
TreeItem *item = delete_tree->get_next_selected(nullptr);
while (item)
{
delete_window->get_cancel_button()->set_disabled(false);
return;
}
delete_window->get_cancel_button()->set_disabled(true);
}
The analyzer warning:
V1044 Loop break conditions do not depend on the number of iterations. animation_state_machine_editor.cpp 693
The while loop takes exactly one iteration. It looks very much like a pattern where we only need to take the first element from the container. We can do this using the for loop:
for (auto &&item : items)
{
DoSomething(item);
break;
}
So, we don't need to check if the container contains the first element. IMHO, the code is rather confusing because you expect an unknown finite number of iterations from loops.
In the fragment above, however, the while loop makes absolutely no sense. The simple if statement would've been enough:
void AnimationNodeStateMachineEditor::_delete_tree_draw()
{
TreeItem *item = delete_tree->get_next_selected(nullptr);
if (item)
{
delete_window->get_cancel_button()->set_disabled(false);
return;
}
delete_window->get_cancel_button()->set_disabled(true);
}
Fragment N11
static const char *script_list[][2] = {
....
{ "Myanmar / Burmese", "Mymr" },
{ "Nag Mundari", "Nagm" },
{ "Nandinagari", "Nand" },
....
}
The reader may ask, "What's wrong here?" We wouldn't have figured it out ourselves if it weren't for the V1076 diagnostic rule. Interestingly enough, this is the first warning we've written out. The diagnostic rule checks the program code for invisible characters. Such characters are like backdoors that a programmer may not see because of the text display settings in the development environment, but the compiler sees and correctly processes them.
The analyzer warning:
V1076 Code contains invisible characters that may alter its logic. Consider enabling the display of invisible characters in the code editor. locales.h 1114
Let's take a closer look at the next line:
{ "Nag Mundari", "Nagm" },
It contains the backdoors with an invisible character. If we use a hex editor, we can see the following:
There are 3 bytes sandwiched between the double quotation mark and the N character: E2, 80, and 8B. They correspond to the ZERO WIDTH SPACE (U+200B) Unicode character.
The strings from the script_list array that contains the "infected" string literal go into the TranslationServer::script_map hash table. The key of the hash table is the second string literal of the pair, and the value is the first one. So, the backdoored string literal goes into the hash table as a value, and the hash search isn't broken.
Next, we can look at where the value from the hash table could potentially go. I've found a few places:
Most likely, the backdoor was added accidentally by copying the name from some website. We can simply delete this character from the string literal.
Fragment N12
void MeshStorage::update_mesh_instances()
{
....
uint64_t mask = RS::ARRAY_FORMAT_VERTEX | RS::ARRAY_FORMAT_NORMAL
| RS::ARRAY_FORMAT_VERTEX;
....
}
The analyzer warning:
This is a strange bitmask initialization. RS::ARRAY_FORMAT_VERTEX is written twice there, although the developers may have wanted to write a different flag.
Here's a similar warning:
Fragment N13
void Image::initialize_data(int p_width, int p_height, bool p_use_mipmaps,
Format p_format, const Vector<uint8_t> &p_data)
{
....
ERR_FAIL_COND_MSG(p_width > MAX_WIDTH, "The Image width specified (" +
itos(p_width) +
" pixels) cannot be greater than " +
itos(MAX_WIDTH) +
" pixels.");
ERR_FAIL_COND_MSG(p_height > MAX_HEIGHT, "The Image height specified (" +
itos(p_height) +
" pixels) cannot be greater than " +
itos(MAX_HEIGHT) +
" pixels.");
ERR_FAIL_COND_MSG(p_width * p_height > MAX_PIXELS,
"Too many pixels for image, maximum is " + itos(MAX_PIXELS));
....
}
The analyzer warning:
V1083 Signed integer overflow is possible in 'p_width * p_height' arithmetic expression. This leads to undefined behavior. Left operand is in range '[0x1..0x1000000]', right operand is in range '[0x1..0x1000000]'. image.cpp 2200
So, we have the p_width and p_height variables of the int type. The maximum value that a 4-byte int can store is 2'147'483'647.
First, it is checked that p_width <= MAX_WIDTH, where MAX_WIDTH == 16'777'216. Then, it is checked that p_height <= MAX_HEIGHT, where MAX_HEIGHT == 16,777,216. The third check is that the product of p_width * p_height <= MAX_PIXELS.
Let's look at the case when p_width == p_height && p_width == 16'777'216. The result of multiplying these numbers is 281'474'976'710'656. To display the result, an 8-byte number is required, so there's a sign overflow. As we know, this leads to undefined behavior in C and C++.
If there are no helper functions that check for an overflow, the simplest fix may look like this:
ERR_FAIL_COND_MSG((int64_t) p_width * (int64_t) p_height > (int64_t) MAX_PIXELS,
"Too many pixels for image, maximum is " + itos(MAX_PIXELS));
Fragment N14
void RemoteDebugger::debug(....)
{
....
mutex.lock();
while (is_peer_connected())
{
mutex.unlock();
....
}
send_message("debug_exit", Array());
if (Thread::get_caller_id() == Thread::get_main_id())
{
if (mouse_mode != Input::MOUSE_MODE_VISIBLE)
{
Input::get_singleton()->set_mouse_mode(mouse_mode);
}
}
else
{
MutexLock mutex_lock(mutex);
messages.erase(Thread::get_caller_id());
}
}
V1020 The function exited without calling the 'mutex.unlock' function. Check lines: 556, 438. remote_debugger.cpp 556
This is an extremely interesting code fragment with multithreading. The PVS-Studio analyzer has detected that a mutex may not be unlocked on some execution paths. Let's get into it.
We'll start by seeing what type of mutex is being used:
class RemoteDebugger : public EngineDebugger
{
....
private:
// Make handlers and send_message thread safe.
Mutex mutex;
....
};
We'll dig a little deeper into what this Mutex is:
template <class StdMutexT>
class MutexImpl
{
friend class MutexLock<MutexImpl<StdMutexT>>;
using StdMutexType = StdMutexT;
mutable StdMutexT mutex;
public:
_ALWAYS_INLINE_ void lock() const { mutex.lock(); }
_ALWAYS_INLINE_ void unlock() const { mutex.unlock(); }
_ALWAYS_INLINE_ bool try_lock() const { return mutex.try_lock(); }
};
// Recursive, for general use
using Mutex = MutexImpl<THREADING_NAMESPACE::recursive_mutex>;
So, we actually have a recursive mutex, not a regular one. It's used with a custom RAII wrapper:
template <class MutexT>
class MutexLock
{
friend class ConditionVariable;
std::unique_lock<typename MutexT::StdMutexType> lock;
public:
_ALWAYS_INLINE_ explicit MutexLock(const MutexT &p_mutex)
: lock(p_mutex.mutex) {}
};
Developers use RemoteDebugger::mutex with RAII wrappers almost all the time. Here's just a couple of fragments: [1], [2], [3], .....
However, at one point something went wrong. The analyzer pointed to a fragment where the mutex is handled manually. So, we have several code execution options:
If we examine calls to the is_peer_connected function in the code base ([1], [2], [3], ....), then we can see that they occur locked by RemoteDebugger::mutex in all cases. Apparently, in this case, the programmer needed locking as well, but they implemented it manually.
Based on these assumptions, we can fix the code as follows:
void RemoteDebugger::debug(....)
{
....
const auto is_peer_connected_sync = [this]
{
MutexLock _ { mutex };
return is_peer_connected();
};
while (is_peer_connected_sync())
{
....
}
....
}
I can't guarantee that the fix is absolutely correct, as only the Godot developers know what should happen here. Although, at least we got rid of the potential undefined behavior associated with unlocking the mutex on every loop iteration.
Errors in code vary from simple to complex, from visible to invisible. In order not to ruin the enjoyment and experience of the product, it must be constantly cleaned of bugs and errors. Static and dynamic analyzers handle this task well.
Getting started with such solutions is easier than you might think. For example, you can get a trial version of the PVS-Studio analyzer here. There are also a number of terms and conditions for its free use.
Thanks for reading and have a great day!
0