Welcome to the second part of our Nau Engine trilogy, where we delve into key aspects of optimization and performance enhancement. Our goal is to discover issues that could impact the performance and stability of games powered by Nau Engine.
In the first part, we focused on the engine's functionality, highlighting three key error blocks: memory issues, copy-paste, and logic errors. Today we talk about performance: it also plays an important role in development. Now, let's shift our focus to the analysis results with PVS-Studio.
Fragment N1
std::vector<AnimationTargetData> m_targets;
void AnimationComponent::addAnimationTarget(IAnimationTarget::Ptr target)
{
if (target)
{
if (auto* nauObject = target->as<scene::NauObject*>())
{
....
m_targets.push_back(AnimationTargetData(std::move(wrapper), nullptr));
}
else
{
m_targets.push_back(AnimationTargetData(std::move(target), nullptr));
}
}
}
The PVS-Studio warning: V823 Decreased performance. Object may be created in-place in the 'm_targets' container. Consider replacing methods: 'push_back' -> 'emplace_back'. animation_component.cpp 180
In the code, push_back
is used to add objects to the m_targets
vector, which leads to the creation of a temporary object that is then copied or moved into a container. This causes a superfluous call to the move/copy constructor. To eliminate intermediate steps and create the object directly within the container, it'd be better to replace push_back
with emplace_back
. The last one passes constructor arguments directly to the vector, creating the object "in-place" without extra overhead.
Fragment N2
void writeContainerHeader(....)
{
....
const eastl::string contentLength = eastl::to_string(serializedData.size());
Vector<eastl::tuple<eastl::string, eastl::string>> httpHeader = {
{"NauContent-Kind", eastl::string(kind)},
{"Content-Type", "application/json"},
{"Content-Length", std::move(contentLength)}
};
....
}
The PVS-Studio warning: V833 Passing the const-qualified object 'contentLength' to the 'std::move' function disables move semantics. nau_container.cpp 68
The contentLength
object of the eastl::string
type is declared with the const
qualifier. Developers might have intended to move it into the httpHeader
vector via std::move
. Unfortunately, the move operation doesn't occur because the eastl::string
constructor overload with the rvalue reference couldn't be selected. Instead, the overload resolution favors the copy constructor.
The fix is quite straightforward: remove the const
qualifier from contentLength
, allowing the object to be moved.
But that's not all—there's another implicit string copy in this code. When httpHeader
is declared, the std::initializer_list
constructor is called. Since std::initializer_list
is a lightweight proxy object over the const T
array, the compiler effectively rewrites the code as follows:
void writeContainerHeader(....)
{
....
eastl::string contentLength = eastl::to_string(serializedData.size());
const eastl::tuple<eastl::string, eastl::string> backing_array[] {
{"NauContent-Kind", eastl::string(kind)},
{"Content-Type", "application/json"},
{"Content-Length", std::move(contentLength)}
};
Vector<eastl::tuple<eastl::string, eastl::string>> httpHeader {
std::initializer_list { backing_array }
};
....
}
Based on this, constant tuples will be copied into the vector, leading to superfluous string copies within the constant array. It'd be better to use emplace_back
calls instead of std::initializer_list
:
void writeContainerHeader(....)
{
....
eastl::string contentLength = eastl::to_string(serializedData.size());
Vector<eastl::tuple<eastl::string, eastl::string>> httpHeader;
httpHeader.reserve(3);
httpHeader.emplace_back("NauContent-Kind", eastl::string(kind));
httpHeader.emplace_back("Content-Type", "application/json");
httpHeader.emplace_back("Content-Length", std::move(contentLength));
}
Such code will indeed be more high-performance, though at the expense of brevity.
Fragment N3
inline const ComponentInfo getEntityComponentInfo(....) const
{
EntityComponentRef ref = getEntityComponentRef(eid, cid);
if (ref.isNull())
return ComponentInfo("<invalid>", eastl::move(ref));
return ComponentInfo(dataComponents
.getComponentNameById(ref.getComponentId()),
eastl::move(ref));
}
The PVS-Studio warning: V839 The 'EntityManager::getEntityComponentInfo' function returns a constant value. This may interfere with move semantics. entityManager.h 1855
The getEntityComponentInfo
function returns objects of the const ComponentInfo
type. However, it's not recommended to return const-qualified objects from functions (see CppCoreGuidelines F.49). Before C++17, it could lead to redundant copies when the object is initialized with the function result.
The fix this, we can just remove const
from the return type.
inline ComponentInfo getEntityComponentInfo(....) const
{
....
}
Here are other cases:
Fragment N4
eastl::unique_ptr<uint8_t[]> convertData(....)
{
switch (format)
{
case cocos2d::backend::PixelFormat::RGBA4444:
{
....
eastl::unique_ptr<uint8_t[]> newData{ new uint8_t[....] };
....
return std::move(newData);
}
....
}
The PVS-Studio warning: V828 Decreased performance. Moving a local object in a return statement prevents copy elision. texture_nau.cpp 78
In this code snippet, the function returns the object of the eastl::unique_ptr<uint8_t[]>
type. When the return type matches the type of the return value—and that value is a local variable—the compiler can optimize the process in one of the following ways:
When std::move
is used, the expression in the return
statement no longer matches the return type, which prevents the compiler from applying NRVO. This leads to superfluous overhead, as the compiler is forced to use move/copy operations instead of the more efficient NRVO.
To avoid this pessimization caused by the std::move(newData)
expression, the std::move
call should be removed.
Fragment N5
void configureVirtualFileSystem()
{
....
fs::path currentPath = fs::current_path();
auto& props = getServiceProvider().get<GlobalProperties>();
if (auto contentPath = props.getValue<eastl::string>("contentPath");
contentPath)
{
const std::filesystem::path path = contentPath->c_str();
#ifdef NAU_PACKAGE_BUILD
for (const auto& entry : fs::directory_iterator(path))
{
if ( entry.is_regular_file()
&& entry.path().extension() == ".assets")
{
const auto& filePath = entry.path();
auto assetPackFS = io::createAssetPackFileSystem(
strings::toU8StringView(filePath.string())
);
vfs.mount("/packs", assetPackFS).ignore();
assetDb.addAssetDB("packs/assets_database/database.db");
}
}
#else
auto contentFs = io::createNativeFileSystem(path.string());
vfs.mount("/content", std::move(contentFs)).ignore();
auto assetDbPath = path.parent_path() / "assets_database";
vfs.mount("/assets_db", std::move(contentFs)).ignore();
assetDb.addAssetDB("assets_db/database.db");
#endif
}
}
The PVS-Studio warning: V808 'currentPath' object of 'path' type was created but was not utilized. default_application_delegate.cpp 101
The analyzer has detected that the currentPath
variable created to store the current path via fs::current_path()
is never used within the configureVirtualFileSystem()
function. This suggests that after refactoring, the variable will be redundant and can be safely removed without impacting the function's logic.
Fragment N6
class ThreadPoolExecutor final : public Executor,
public IRuntimeComponent
{
public:
ThreadPoolExecutor(std::optional<size_t> threadsCount)
{
const size_t maxThreads = threadsCount ? *threadsCount
: getDefaultThreadsCount();
m_threads.reserve(maxThreads);
for (size_t i = 0; i < maxThreads; ++i)
{
m_threads.emplace_back([](ThreadPoolExecutor& executor,
size_t threadIndex)
{
threading::setThisThreadName(
std::format("Nau Pool-{}", threadIndex + 1)
);
executor.threadWork();
}, std::ref(*this), i);
}
RuntimeObjectRegistration{nau::Ptr<>{this}}.setAutoRemove();
void();
}
....
};
The PVS-Studio warning: V607 Ownerless expression 'void ()'. thread_pool_executor.cpp 60
The constructor of the ThreadPoolExecutor
class creates worker threads. At the end, as a cherry on the cake, the constructor body is decorated with the void();
construct that serves no practical purpose. How it got there is unclear; it might be a leftover from incomplete refactoring or an abandoned design choice.
Fragment N7
struct ScheduledArchetypeComponentTrack
{
....
ScheduledArchetypeComponentTrack() {}
....
};
The PVS-Studio warning: V832 It's better to use '= default;' syntax instead of empty constructor body. ecsQueryInternal.h 35
The ScheduledArchetypeComponentTrack
structure has a user-defined constructor. However, it'd be better to declare it as default, making the class trivially constructible. This allows the compiler to generate more optimized code. Moreover, some algorithms may select a more efficient strategy (benchmark) when working with trivially default constructible objects.
Here is the enhanced code:
struct ScheduledArchetypeComponentTrack
{
....
ScheduledArchetypeComponentTrack() = default;
....
};
Here are other cases:
Fragment N8
if (!this->hasComponent(depConstString.hash)
&& (!optional || (optional && !can_skip_optional)))
{
....
}
The PVS-Studio warning: V728 An excessive check can be simplified. The '||' operator is surrounded by opposite expressions '!optional' and 'optional'. dataComponent.cpp 283
In this case, we can simplify the (!optional || (optional && !can_skip_optional))
expression. The right-hand side of the operator ||
is only evaluated if optional
is contextually converted to true
. In this case, the left operand of the operator &&
will always be true
, making the check redundant. Hence, developers could simplify this code and enhance its readability in the following way:
if (!this->hasComponent(depConstString.hash)
&& (!optional || !can_skip_optional))
{
....
}
Fragment N9
// performance_profiling.h
static const nau::PerfTagFlag NAU_PERFTAGS = ....;
The PVS-Studio warning: V1043 A global object variable 'NAU_PERFTAGS' is declared in the header. Multiple copies of it will be created in all translation units that include this header file. performance_profiling.h 23
Declaring constants in the header file may seem straightforward heterogeneous, but the devil lies in the details. In C++, objects declared as const
within a namespace—including the global namespace—have internal linkage. This means that when the header file is included in multiple translation units, each unit gets its copy, potentially increasing the executable size.
To avoid this issue, you may consider using one of the following approaches.
Before C++17: Split the declaration and definition of constant. In the header file, declare the constant using the extern
specifier, and define the actual constant in the implementation file.
// performance_profiling.h
extern const nau::PerfTagFlag NAU_PERFTAGS; // Declaration
// performance_profiling.cpp
const nau::PerfTagFlag NAU_PERFTAGS = ....; // Definition
Starting with C++17: Declare the constant in the header file using the inline
specifier. This ensures that only one constant object exists across the entire program.
// performance_profiling.h
inline static const nau::PerfTagFlag NAU_PERFTAGS = ...;
Fragment N10
std::string Paths::getAssetsPath() const
{
if (m_paths.find("assets") == m_paths.end())
{
return "";
}
return m_paths.find("assets")->second;
}
The PVS-Studio warning: V838 Temporary object is constructed during the call of the 'find' function. Consider using an ordered associative container with efficient lookup to avoid construction of temporary objects. file_system.cpp 421
Let's take a look at how the Paths::m_paths
data member is declared:
class SHARED_API Paths
{
....
private:
....
std::map<std::string, std::string> m_paths;
};
The Paths::m_paths
data member is a sorted associative container. Its member function, std::map::find
, expects an argument of the same type as the key. This means that the string literal will be converted into std::string
, potentially entailing dynamic memory allocations.
Such a search is called homogeneous, when the parameter type of the search functions matches the key type within the container. Since C++14, heterogeneous search has been introduced for sorted associative containers. This means that the parameter type may not match the key type within the container. This can boost performance by eliminating the need for an additional conversion.
To enable heterogeneous search, developers could pass std::less<>
specialization as the third template argument to the std::map
:
class SHARED_API Paths
{
....
private:
....
std::map<std::string, std::string, std::less<>> m_paths;
};
In addition to this optimization, we may also notice that the object by the "assets"
key is searched twice in the container, which looks very suboptimal. Here's the possible fix:
std::string Paths::getAssetsPath() const
{
auto it = m_paths.find("assets");
if (it == m_paths.end()) return {};
return it->second;
}
Fragment N11
class DeferredRT
{
....
protected:
....
StereoMode m_stereoMode = StereoMode::MonoOrMultipass;
int m_numRt = 0, m_width = 0, m_height = 0;
nau::string m_name;
d3d::SamplerHandle m_defaultSampler;
ResizableResPtrTex m_mrts[MAX_NUM_MRT] = {};
ResizableResPtrTex m_depth;
bool m_useResolvedDepth = false;
};
DeferredRT::DeferredRT(const char* name,
int w, int h,
StereoMode stereoMode,
unsigned msaaFlag,
int numRT,
const unsigned texFmt[MAX_NUM_MRT],
uint32_t depthFmt)
: m_stereoMode(stereoMode)
{
m_name = name;
....
}
The PVS-Studio warning: V818 It is more efficient to use an initialization list 'm_name(name)' rather than an assignment operator. deferredRT.cpp 117
In the DeferredRT
class, the m_name
data member is initialized within the constructor body. However, this approach isn't as efficient as it could be. When the control flow enters the constructor body, the member initializer list has already been executed. If the data member is unspecified in it, it'll be initialized by default. For a string initialized by default, the following typically happens:
Subsequently, the operator =
is called in the constructor body, which effectively discards the result of the default initialization. To fix it, just initialize the data member in the member initializer list of the constructor:
DeferredRT::DeferredRT(const char* name,
int w, int h,
StereoMode stereoMode,
unsigned msaaFlag,
int numRT,
const unsigned texFmt[MAX_NUM_MRT],
uint32_t depthFmt)
: m_stereoMode(stereoMode)
, m_name(name)
{
....
}
Here are other cases:
Wrap-up
We've explored the main approaches that developers can use to enhance Nau Engine performance. Identifying and addressing these issues is a crucial step toward building efficient, stable, and engaging games. In the next article, we'll talk about common errors in class design.
Thank you for reading :)
0