One of the mechanisms of static analysis is method annotations of popular libraries. Annotations provide more information about functions during errors detecting. CARLA is an impressive open-source project in C++ that helped us implement this mechanism to our analyzer. Subsequently, the simulator became a test-target for the improved PVS-Studio static analyzer.
CARLA is an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites and environmental conditions.
The project is cross-platform and contains almost 78,000 lines of C++ code. In the project repository, we also found code written in Python, XML, YAML, DOS Batch, CMake and other languages.
Static code analysis is the process of detecting errors and defects in a software's source code. Static analysis can be viewed as an automated code review process. One of the technologies used in static analysis is function annotations of popular libraries. The developer studies the documentation of such functions and notes facts useful for analysis. During the program check, the analyzer takes these facts from the annotations. This approach allows the analysis to be carried out with higher accuracy.
The result of checking projects - a report with warnings. In PVS-Studio, you can open the report in text editor or in the analyzer utility. It's possible to open reports in software development tools, such as Visual Studio or CLion, but it requires the use of appropriate plugins. Further the article will show you the top 10 errors found in the CARLA project. You can also test your skills and try detecting them yourself.
To manage the build process in Unreal Engine, use their custom build system - Unreal Build Tool. Therefore, the analysis of projects written on the Unreal Engine is performed in a special way. There are two options for checking UE projects:
CARLA uses a modified Unreal Engine 4 kernel, which is also available on GitHub. However, both the original and modified kernel have private access. Building on Windows consists of two stages: building the engine and building the project itself. We will see how to analyze both.
You can build Unreal Engine 4 in 8 steps.
To check the engine, integrate static analysis into the Unreal Build Tool assembly system. To perform the analysis and get the check results, you need to perform the following steps.
As a result, instead of the project building or rebuilding, PVS-Studio performs the source code analysis. Now let's build the CARLA simulator itself.
The project does not generate a solution. This doesn't allow us to integrate into the Unreal Build Tool. So, let's check the project through compiler monitoring. There are two ways to do this:
Both utilities are already in the C:\Program Files (x86)\PVS-Studio folder after installing PVS-Studio. Let's use the second option - C and C++ Compiler Monitoring UI IDE. To start build process, follow the steps:
To view the analyzer warnings easily, you can use Visual Studio. Open the folder with the CARLA repository and download the report. It may be useful to filter warnings issued on kernel files, autogenerated files and included library files. To do this, perform a few more actions:
Now let's study the analyzer warnings in Visual Studio. Let's start with warnings issued on CARLA simulator code and their own libraries.
We will view the errors found in the CARLA source files a little later. The point is, we needed to check this project for another task. Before testing the simulator, we slightly modified the PVS-Studio kernel so that it collects statistics of Unreal Engine 4 method calls. This data can now help us with annotating.
Annotation is performed in two stages:
At the next check of the project, information about the annotated methods that you encounter in the code will be obtained both from function signatures and annotations.
For example, an annotation may suggest that:
Which annotation would be the most useful? It's a good question. Let's find out in the comments below.
Annotations not only help to detect new errors, but also allow you to exclude some false positives.
What did we need the CARLA simulator for? To take and annotate all the Unreal Engine 4 functions is a very large-scale task. It requires a lot of time. Someday, maybe, we will power through it, but now we decided to start small and see the results. In order not to take 200 random engine functions, we decided to identify the most popular ones. We found a couple of large projects. They are rather outdated Unreal Tournament game and the currently supported CARLA simulator. The simulator in C++ suited us for the following reasons:
So, we selected the projects. We successfully completed the build and checked the projects. What's next? Now we need to collect statistics on functions calls of the game engine. How to do that - that is the question. Fortunately, we have the analyzer source code at hand. The analyzer builds a parse tree and allows us to find function calls with all the necessary information. So, it was enough to write something similar to a new diagnostic. The function suited us if two conditions were met:
If both conditions were met, information was recorded in a separate file. All we had to do was run the analysis with a modified kernel. After the analysis, we received a log of functions. Then we applied some simple formulas in Excel and converted the statistics to the following form:
We decided that for a start it is enough to annotate all the functions that we encountered more than 10 times. There were about 200 of them. Since developers don't really like to document code, we had to study the implementation of each Unreal Engine 4 function in the source code to annotate it. As an example, here is an annotation of the ConstructUFunction function:
C_"void ConstructUFunction(UFunction*& OutFunction, \
const FFunctionParams& Params)"
ADD(HAVE_STATE | RET_SKIP | F_ARG_ALLOC,
"UE4CodeGen_Private",
nullptr,
"ConstructUFunction",
ALLOC_ARG, SKIP);
The F_ARG_ALLOC flag means that the function allocates the resource and gives it back through one of its parameters. The ALLOC_ARG flag indicates that a pointer to the allocated resource is returned through the first parameter of the function, namely OutFunction. The SKIP flag says that the second argument of the function is not special and uninteresting for us.
After we annotated all the functions, we double-checked the CARLA simulator and the version of the engine the simulator uses. As expected, some of the false positives disappeared and several new warnings appeared.
New warning N1
V611 The memory was allocated using 'new' operator but was released using the 'free' function. Consider inspecting operation logics behind the 'Allocation' variable. Check lines: 1746, 1786. BulkData2.cpp 1746
void FBulkDataAllocation::SetMemoryMappedData(
FBulkDataBase* Owner,
IMappedFileHandle* MappedHandle,
IMappedFileRegion* MappedRegion)
{
....
FOwnedBulkDataPtr* Ptr
= new FOwnedBulkDataPtr(MappedHandle, MappedRegion); // <=
Owner->SetRuntimeBulkDataFlags(BULKDATA_DataIsMemoryMapped);
Allocation = Ptr; // <=
}
void FBulkDataAllocation::Free(FBulkDataBase* Owner)
{
if (!Owner->IsDataMemoryMapped())
{
FMemory::Free(Allocation); // <=
Allocation = nullptr;
}
else { .... }
}
An object of the FOwnedBulkDataPtrtype is created using the new operator and released using the Free function. This last function calls std::free. This can lead to undefined behavior. The triggering appeared after we annotated the FMemory::Free function.
C_"static void Free(void* Original)"
ADD(HAVE_STATE_DONT_MODIFY_VARS | RET_SKIP,
nullptr,
"FMemory",
"Free",
POINTER_TO_FREE);
New warning N2
V530 The return value of function 'CalcCacheValueSize' is required to be utilized. MemoryDerivedDataBackend.cpp 135
void FMemoryDerivedDataBackend::PutCachedData(
const TCHAR* CacheKey,
TArrayView<const uint8> InData,
bool bPutEvenIfExists)
{
....
FString Key(CacheKey);
....
FCacheValue* Val = new FCacheValue(InData);
int32 CacheValueSize = CalcCacheValueSize(Key, *Val);
// check if we haven't exceeded the MaxCacheSize
if ( MaxCacheSize > 0
&& (CurrentCacheSize + CacheValueSize) > MaxCacheSize)
{
....
}
else
{
COOK_STAT(Timer.AddHit(InData.Num()));
CacheItems.Add(Key, Val);
CalcCacheValueSize(Key, *Val); // <=
CurrentCacheSize += CacheValueSize;
}
}
The return value of the CalcCacheValueSize method was not used. According to the analyzer, calling this method with no return value is senseless. Analyzer has information about the signatures of the CalcCacheValueSize method and its implementation, that's why it realized that the function has no state. Neither arguments, nor class properties, nor any other variables change. This became clear since annotated methods were used inside the CalcCacheValueSize function. A senseless function call may indicate a possible error in the program logic.
New warning N3
V630 The 'Malloc' function is used to allocate memory for an array of objects which are classes containing constructors. UnrealNames.cpp 639
class alignas(PLATFORM_CACHE_LINE_SIZE) FNamePoolShardBase : FNoncopyable
{
public:
void Initialize(FNameEntryAllocator& InEntries)
{
LLM_SCOPE(ELLMTag::FName);
Entries = &InEntries;
Slots = (FNameSlot*)FMemory::Malloc(
FNamePoolInitialSlotsPerShard * sizeof(FNameSlot), alignof(FNameSlot));
memset(Slots, 0, FNamePoolInitialSlotsPerShard * sizeof(FNameSlot));
CapacityMask = FNamePoolInitialSlotsPerShard - 1;
}
....
}
The FNameSlot type objects are created without existing constructor call. The annotation of the Malloc function gives a hint. The annotation states that the Malloc function only allocates memory, and the size of the allocated memory block is specified in the first argument. This code fragment is suspicious and may lead to errors.
Thus, the Unreal Engine method annotations allows you to detect new errors. And now let's look at the check results of the CARLA simulator.
Warning N1
V522 Dereferencing of the null pointer 'CarlaActor' might take place. CarlaServer.cpp 1652
void FCarlaServer::FPimpl::BindActions()
{
....
FCarlaActor* CarlaActor = Episode->FindCarlaActor(ActorId);
if (CarlaActor)
{
return RespondError("get_light_boxes",
ECarlaServerResponse::ActorNotFound,
" Actor Id: " + FString::FromInt(ActorId));
}
if (CarlaActor->IsDormant())
{
return RespondError("get_light_boxes",
ECarlaServerResponse::FunctionNotAvailiableWhenDormant,
" Actor Id: " + FString::FromInt(ActorId));
}
else { .... }
....
}
One lost exclamation mark - and the function completely changes its behavior. Now, if CarlaActor is valid, an error is thrown. And if it is nullptr, the function leads to undefined behavior, which may be an abnormal program termination.
Warning N2
The analyzer issued a similar warning in another function.
V522 Dereferencing of the null pointer 'HISMCompPtr' might take place. ProceduralBuilding.cpp 32
UHierarchicalInstancedStaticMeshComponent* AProceduralBuilding::GetHISMComp(
const UStaticMesh* SM)
{
....
UHierarchicalInstancedStaticMeshComponent** HISMCompPtr =
HISMComps.Find(SMName);
if (HISMCompPtr) return *HISMCompPtr;
UHierarchicalInstancedStaticMeshComponent* HISMComp = *HISMCompPtr;
// If it doesn't exist, create the component
HISMComp = NewObject<UHierarchicalInstancedStaticMeshComponent>(this,
FName(*FString::Printf(TEXT("HISMComp_%d"), HISMComps.Num())));
HISMComp->SetupAttachment(RootComponent);
HISMComp->RegisterComponent();
....
}
When the search for SMName in HISMComps is a success, the GetHISMComp method returns the found element. Otherwise, the HISMCompPtr contains null pointer and dereference occurs. This causes undefined behavior. Most likely, initialization in the HISMComp definition was unnecessary. Immediately after, HISMComp receives new value.
Warning N3
V547 Expression 'm_trail == 0' is always false. unpack.hpp 699
std::size_t m_trail;
....
inline int context::execute(const char* data, std::size_t len,
std::size_t& off)
{
....
case MSGPACK_CS_EXT_8: {
uint8_t tmp;
load<uint8_t>(tmp, n);
m_trail = tmp + 1;
if(m_trail == 0) {
unpack_ext(m_user, n, m_trail, obj);
int ret = push_proc(obj, off);
if (ret != 0) return ret;
}
else {
m_cs = MSGPACK_ACS_EXT_VALUE;
fixed_trail_again = true;
}
} break;
....
}
The tmp variable has the uint8_t type, which means its value ranges from 0 to 255. The m_trail variable is in the range from 1 to 256 because of integer promotion of the tmp variable. Since the m_trail in the condition cannot equal 0, instructions in the condition body are never executed. Such code can be redundant or not corresponding to the author's intents. It needs checking.
The analyzer found several more similar code fragments:
Warning N4
A very similar situation occurred in another function.
V547 Expression '(uint8) WheelLocation >= 0' is always true. Unsigned type value is always >= 0. CARLAWheeledVehicle.cpp 510
float ACarlaWheeledVehicle::GetWheelSteerAngle(
EVehicleWheelLocation WheelLocation) {
check((uint8)WheelLocation >= 0)
check((uint8)WheelLocation < 4)
....
}
Some check function takes the bool type value as its argument. The function throws an exception if the false value is passed. In the first check, the expression always has the true value, since the uint8 type has a range from 0 to 255. Probably, there is a typo in the check contents. The exact same check is in 524 line.
Warning N5
V547 Expression 'rounds > 1' is always true. CarlaExporter.cpp 137
void FCarlaExporterModule::PluginButtonClicked()
{
....
int rounds;
rounds = 5;
....
for (int round = 0; round < rounds; ++round)
{
for (UObject* SelectedObject : BP_Actors)
{
....
// check to export in this round or not
if (rounds > 1) // <=
{
if (areaType == AreaType::BLOCK && round != 0)
continue;
else if (areaType == AreaType::ROAD && round != 1)
continue;
else if (areaType == AreaType::GRASS && round != 2)
continue;
else if (areaType == AreaType::SIDEWALK && round != 3)
continue;
else if (areaType == AreaType::CROSSWALK && round != 4)
continue;
}
....
}
}
}
It's clearly a typo. Instead of round a developer wrote rounds. It's easy to make a mistake in one letter, especially at the end of the tough workday. We are all human and we get tired. But a static code analyzer is a program, and it always works with the same vigilance. So, it's good to have such a tool at hand. Let me dilute the continuous code with a picture with simulator graphics.
Warning N6
V612 An unconditional 'return' within a loop. EndPoint.h 84
static inline auto make_address(const std::string &address) {
....
boost::asio::ip::tcp::resolver::iterator iter = resolver.resolve(query);
boost::asio::ip::tcp::resolver::iterator end;
while (iter != end)
{
boost::asio::ip::tcp::endpoint endpoint = *iter++;
return endpoint.address();
}
return boost::asio::ip::make_address(address);
}
The while loop, the condition, the iterator increment - all that shows that the instructions in the block must be executed more than once. However, due to return, only one iteration is performed. Surely there must be another logic here, otherwise the loop can be eliminated.
Warning N7
V794 The assignment operator should be protected from the case of 'this == &other'. cpp11_zone.hpp 92
struct finalizer_array
{
void call() {
finalizer* fin = m_tail;
for(; fin != m_array; --fin) (*(fin-1))();
}
~finalizer_array() {
call();
::free(m_array);
}
finalizer_array& operator=(finalizer_array&& other) noexcept
{
this->~finalizer_array(); // <=
new (this) finalizer_array(std::move(other));
return *this;
}
finalizer_array(finalizer_array&& other) noexcept
: m_tail(other.m_tail), m_end(other.m_end), m_array(other.m_array)
{
other.m_tail = MSGPACK_NULLPTR;
other.m_end = MSGPACK_NULLPTR;
other.m_array = MSGPACK_NULLPTR;
}
....
finalizer* m_tail;
finalizer* m_end;
finalizer* m_array;
}
The analyzer detected an overloaded assignment operator, where this == &other lacks a check. Calling a destructor via this pointer results in the loss of other data. Subsequently, the assignment operator returns a copy of the cleaned object. The analyzer issued several more warnings that could be potential errors:
Warning N8
V1030 The 'signals' variable is used after it was moved. MapBuilder.cpp 926
void MapBuilder::CreateController(....,
const std::set<road::SignId>&& signals)
{
....
// Add the signals owned by the controller
controller_pair.first->second->_signals = std::move(signals);
// Add ContId to the signal owned by this Controller
auto& signals_map = _map_data._signals;
for(auto signal: signals) { // <=
auto it = signals_map.find(signal);
if(it != signals_map.end()) {
it->second->_controllers.insert(signal);
}
}
}
The signals container will become empty after moving, and the range-based for loop will not execute. One of the right approaches would be to use controller_pair.first->second->_signals:
for (auto signal: controller_pair.first->second->_signals)
However, it would be correct, except for one thing. The signals container has a const specifier, which means it cannot be moved. Instead, it is copied, and therefore the program logically works correctly. A developer who wanted to optimize the code was able to confuse both himself and the analyzer. Kudos to him for this code. For the V1030 diagnostic fine-tuning, we will take this situation into account. Maybe we will write a new diagnostic.
Warning N9
V1061 Extending the 'std' namespace may result in undefined behavior. Waypoint.cpp 11
Let's look at two code snippets from the Waypoint.h and Waypoint.cpp files:
// Waypoint.h
namespace std {
template <>
struct hash<carla::road::element::Waypoint> {
using argument_type = carla::road::element::Waypoint;
using result_type = uint64_t;
result_type operator()(const argument_type& waypoint) const;
};
} // namespace std
// Waypoint.cpp
namespace std {
using WaypointHash = hash<carla::road::element::Waypoint>; // <=
WaypointHash::result_type WaypointHash::operator()(
const argument_type &waypoint) const
{
WaypointHash::result_type seed = 0u;
boost::hash_combine(seed, waypoint.road_id);
boost::hash_combine(seed, waypoint.section_id);
boost::hash_combine(seed, waypoint.lane_id);
boost::hash_combine(seed,
static_cast<float>(std::floor(waypoint.s * 200.0)));
return seed;
}
} // namespace std
In the header file, the developer extends the std namespace by declaring the explicit template specialization of the hash class in order to work with the carla::road::element::Waypoint type. In the file Waypoint.cpp, the developer adds the WaypointHash alias and the definition of the operator() function to the std namespace.
The C++ standard forbids extending the std namespace. The contents of the 'std' namespace are defined solely by the C++ Standards Committee and changed depending on the C++ language version. Modifying namespace's content may result in undefined behavior. However, adding an explicit or partial template specialization, as in the Waypoint.h file, is an exception. The V1061 diagnostic says that the definition of the operator() function in the Waypoint.cpp file is permitted, but the alias declaration in the std namespace is prohibited.
Actually, it is not necessary to extend the std namespace this way. It is enough to add the std::hash template specialization for a user type outside of std (yes, it is possible):
// Waypoint.h
// Not inside namespace "std"
template <>
struct std::hash<carla::road::element::Waypoint> {....};
// Waypoint.cpp
// Not inside namespace "std"
using WaypointHash = std::hash<CARLA::road::element::Waypoint>;
WaypointHash::result_type WaypointHash::operator()(
const WaypointHash::argument_type& waypoint) const {....}
Warning N10
I left one interesting error for last. I encourage you to find it yourself. Unlike the others, this error is from the engine of Unreal Engine 4 game itself.
virtual void visit(ir_variable *var)
{
....
const bool bBuiltinVariable = (var->name &&
strncmp(var->name, "gl_", 3) == 0);
if (bBuiltinVariable && ShaderTarget == vertex_shader &&
strncmp(var->name, "gl_InstanceID", 13) == 0)
{
bUsesInstanceID = true;
}
if (bBuiltinVariable &&
var->centroid == 0 && (var->interpolation == 0 ||
strncmp(var->name, "gl_Layer", 3) == 0) &&
var->invariant == 0 && var->origin_upper_left == 0 &&
var->pixel_center_integer == 0)
{
// Don't emit builtin GL variable declarations.
needs_semicolon = false;
}
else if (scope_depth == 0 && var->mode == ir_var_temporary)
{
global_instructions.push_tail(new(mem_ctx) global_ir(var));
needs_semicolon = false;
}
else {....}
....
}
Here are two hints for you:
V666 Consider inspecting third argument of the function 'strncmp'. It is possible that the value does not correspond with the length of a string which was passed with the second argument. GlslBackend.cpp 943
Error in the strncmp function call:
strncmp(var->name, "gl_Layer", 3)
As the third argument of the function the number of characters to compare is passed, and as the second one - a string literal. The analyzer database has an annotation of the standard strncmp function, which says that the number of characters should probably match the the string literal length. In addition, for earlier calls of the strncmp function, the number of characters did coincide with the length of the string literal. However, in the code snippet above, the function compares only part of the string. The check of
strncmp(var->name, "gl_Layer", 3) == 0
is senseless, since bBuiltinVariable already contains the result of the same check:
strncmp(var->name, "gl_", 3) == 0
Most likely, the function call should've looked like this:
strncmp(var->name, "gl_Layer", 8)
The CARLA simulator is not only an entertaining and useful Unreal Engine 4 project, but it's also a high-quality product. The use of static analysis decreases the time spent on application development and debugging, and function annotations help perform more accurate analysis. We thank authors of this wonderful project for the opportunity to study the source code.
You can read more about static analysis in video game development and view the top 10 software bugs here.
Like other C++ software tools, static code analyzers never stay still for long and are continuously evolving. You may find interesting our latest article on C++ tools evolution. Check it out!