There aren't many things left in modern C++ that don't fit the "Don't pay for what you don't use" paradigm. One of them is dynamic_cast. In this article, we'll find out what's wrong with it, and after that — try to find an alternative.
Let's refresh our memory about the C++ basics. If you find that part boring, you can always skip it.
So, suppose we have the following class hierarchy:
struct Shape
{
virtual void Draw() const = 0;
};
struct Circle : Shape
{
void Draw() const override;
};
struct Triangle : Shape
{
void Draw() const override;
};
struct Rectangle : Shape
{
void Draw() const override;
};
According to the object-oriented paradigm, we can safely perform upcasting —expand a pointer to the child class object and turn it into a pointer to the base class:
#include <memory>
void foo(const Shape *);
void bar()
{
std::unique_ptr<Shape> circle = std::make_unique<Circle>();
std::unique_ptr<Shape> triangle = std::make_unique<Triangle>();
std::unique_ptr<Shape> rect = std::make_unique<Rectangle>();
foo(circle.get());
foo(triangle.get());
foo(rect.get());
}
Since the derived classes (Circle, Triangle, and Rectangle) are guaranteed to have an instance of the Shape base class, the compiler can perform such transformation without any additional actions.
What if we now want to turn the pointer to the base class into the pointer to the the derived class? Any class derived from Shape can hide beneath the pointer to it. We need to somehow understand at runtime what's actually stored under the pointer. And here comes the runtime type identification (RTTI).
RTTI is a special mechanism that allows us to determine the data type of a variable at runtime. RTTI is used every time you use the dynamic_cast and typeid operators. The C++ standard does not determine how exactly RTTI is implemented, and its implementation is "delegated" to the application binary interface (ABI). Most compilers store information in a virtual table (vtable). If you need details, you can look at the vtable implementation on the x86_64 platform.
To correctly downcast from the pointer to a base class to the pointer to the child class, let's use the dynamic_cast operator:
void Visit(const Shape *ptr)
{
if (auto circle = dynamic_cast<const Circle *>(ptr))
{
// do smth with circle
}
else if (auto triangle = dynamic_cast<const Triangle *>(ptr))
{
// do smth with triangle
}
else if (auto rect = dynamic_cast<const Rectangle *>(ptr))
{
// do smth with rect
}
}
Of course, dynamic_cast is not so common in user code, typeid is used even lesser. Someone may think that such operators are a sign of poor application design. But what about applications where these operations are forced measure and their number may be up to millions? Does looking into vtable drastically slow down the program? That's what we're going to talk about.
Static analyzers, as well as compilers, work with source code via its intermediate representation. Abstract syntax tree (AST) is one of them. In AST, nodes represent various language constructs.
A convenient way to implement AST in code is via class hierarchy. In the Clang compiler, for example, tree nodes are derived from such classes as Decl (interface for function and type declarations) and Stmt (interface for constructions).
Look at the example. Let's assume we have the synthetic code block which looks as follows:
{
for (size_t i = 0; i < 10; ++i) ;
if (true) ;
}
According to the grammar of C and C++, this is a compound statement (block) that contains the for loop and the if branch. If we try to implement this in the object-oriented paradigm, we'll have the following entities:
// Base class for all type of statements
struct Stmt { /* .... */ };
// Base class for all type of expressions
struct Expr { /* .... */ };
// Base class for all types of declarations
struct Decl { /* .... */ };
struct IfStmt : Stmt
{
const Expr& GetCondition() const noexcept;
const Stmt& GetBody() const noexcept;
const Stmt* GetElse() const noexcept;
// ....
};
struct ForStmt : Stmt
{
const Decl* GetInit() const noexcept;
const Expr* GetCondition() const noexcept;
const Expr* GetPostBodyExpr() const noexcept;
const Stmt& GetBody() const noexcept;
// ....
};
using StatementList = ....;
struct CompoundStmt : Stmt
{
auto begin() const noexcept { return stmts.begin(); }
auto end() const noexcept { return stmts.end(); }
private:
StatementList stmts;
// ....
};
We'll put all constructs located the block in a container (for example, std::vector). Since there are way more than one construct, we'll work with them via pointers to the Stmt base class. Now we can iterate over all constructs:
void foo(const CompoundStmt *compStmt)
{
for (auto stmt : compStmt)
{
// do smth
}
}
And finally, call our custom logic for every specialized node:
void Visit(const IfStmt &);
void Visit(const ForStmt &);
Here we face a problem: we need to know at runtime what construct lies under the pointer to Stmt and call the necessary overload:
void foo(const Stmt *stmt)
{
for (auto stmt : compStmt)
{
if (auto ifStmt = dynamic_cast<const IfStmt *>(stmt))
{
Visit(*ifStmt);
}
else if (auto forStmt = dynamic_cast<const ForStmt *>(stmt))
{
Visit(*forStmt);
}
}
}
However, there's a small detail because of which the code above may not compile. If the Stmt base class is not polymorphic, then dynamic_cast won't work its magic. To fix this issue, we need to add at least one virtual function. Like this:
struct Stmt { virtual void Dummy(); /* .... */ };
Now everything's fine and the code works in most cases. Until it turns out that RTTI is off in the project and there are reasons for it. For example, in GCC/Clang we can do it by passing the -fno-rtti flag; In MSVC — via /GR-. Note that disabling RTTI dies not affect the exception mechanism and dynamic dispatch.
By the way, while I was writing the article, I found an interesting survey on isocpp.org. Out of 2058 respondents, 14% said that they partly disable RTTI on their programs, and 18% completely disable it.
In conclusion, dynamic_cast is bad because:
The visitor design pattern can be one of the solutions. It eliminates the need to perform dynamic_cast at cost of one-two virtual function calls. However, let's see how we can do the other way.
The recipe is simple. We'll implement a field with the type information in the parent class and check it ourselves. This is how it looks:
struct Stmt
{
enum Type { IfStmt, ForStmt, DoStmt, WhileStmt, .... };
const Type m_kind;
Stmt(Type kind) : m_kind { kind } { /* .... */ }
/* .... */
};
struct IfStmt : Stmt
{
IfStmt() : Stmt { Stmt::Type::IfStmt } { /* .... */ }
/* .... */
};
struct ForStmt : Stmt
{
ForStmt() : Stmt { Stmt::Type::ForStmt } { /* .... */ }
/* .... */
};
Then when creating an object of the child class, we'll write the class type in the m_kind field. Now we can check the type like this:
void foo(const Stmt *stmt)
{
if (stmt->m_kind == Stmt::Type::IfStmt)
{
auto ifStmt = static_cast<const IfStmt *>(stmt)
Visit(*ifStmt);
}
else if (stmt->m_kind == Stmt::Type::ForStmt)
{
auto forStmt = static_cast<const ForStmt *>(stmt)
Visit(*forStmt);
}
}
The pros of this approach are as follows:
However, there are cons as well:
After looking at the code, you may wonder: "Where is the analog of dynamic_cast here?" Of course, writing checks like that is not convenient. To fix it, let's add a bit of abstraction. Let's look at how it is implemented in PVS-Studio and LLVM.
Let's go back to our example:
struct Stmt
{
enum Type { IfStmt, ForStmt, DoStmt, WhileStmt, .... };
Stmt(Type kind) : m_kind { kind } { /* .... */ }
Type Kind() const { return m_kind; }
private:
const Type m_kind;
/* .... */
};
Let's implement the IsA function template which will accept the pointer to an object and the intended type. Inside, this function will check for a null pointer and the needed kind.
template <typename T, typename Kind>
bool IsA(T *p, Kind kind) noexcept
{
return p && p->Kind() == kind;
}
Next, we'll implement a couple of struct templates without definition:
template <Stmt::Type K>
struct type_from_kind;
template <typename T>
struct kind_from_type;
The templates are needed to further specialize structs for each derived class. Thus, we will link the tree node type and the enumeration constant located within the class.
template <> struct type_from_kind<Stmt::IfStmt>
{
using type = IfStmt;
};
template <> struct kind_from_type<IfStmt>
{
static constexpr auto value = Stmt::Type::IfStmt;
};
Of course, writing such specializations for each class is not very great. Here the "macro magic" comes to the rescue.
#define MAKE_STMT_TRAITS(t, k) \
template <> struct type_from_kind<k> \
{ \
using type = t; \
}; \
template <> struct kind_from_type<t> \
{ \
static constexpr auto value = k; \
};
Then the definition of child classes will look like this:
struct IfStmt : Stmt { /* .... */ };
MAKE_STMT_TRAITS(IfStmt, Stmt::IfStmt)
struct ForStmt : Stmt { /* .... */ };
MAKE_STMT_TRAITS(ForStmt, Stmt::ForStmt)
Now we can implement our own dyn_cast that encapsulates static_cast with a check for the desired type via the IsA function:
template <typename To, typename From>
requires std::is_pointer_v<To> && std::convertible_to<From, To>
auto dyn_cast(From *p) noexcept
{
using ResultType = std::remove_cvref_t<std::remove_pointer_t<To>>;
return IsA(p, kind_from_type<ResultType>::value)
? static_cast<To>(p)
: nullptr;
}
Thanks to this approach, we can achieve consistency with the way when we just wrote dynamic_cast:
void foo(const Stmt *stmt)
{
for (auto stmt : compStmt)
{
if (auto ifStmt = dyn_cast<const IfStmt *>(stmt))
{
Visit(*ifStmt);
}
else if (auto forStmt = dyn_cast<const ForStmt *>(stmt))
{
Visit(*forStmt);
}
}
}
Now let's inspect another approach.
Let's go back to the Stmt struct:
struct Stmt
{
enum Type { IfStmt, ForStmt, DoStmt, WhileStmt, .... };
Stmt(Type kind) : m_kind { kind } { /* .... */ }
Type Kind() const { return m_kind; }
private:
const Type m_kind;
/* .... */
};
We implement classof, a static member function, in each descendant class. It takes the pointer to the base class and checks it for a match with kind of the child class:
struct IfStmt : Stmt
{
static bool classof(const Stmt *stmt) noexcept
{
return stmt->Kind() == Stmt::Type::IfStmt;
}
/* .... */
};
struct ForStmt : Stmt
{
static bool classof(const Stmt *stmt) noexcept
{
return stmt->Kind() == Stmt::Type::ForStmt;
}
/* .... */
};
Next, in the IsA function we need to simply call classof from the type that the first template parameter passed to us:
template <typename To, typename From>
bool IsA(From *p) noexcept
{
using ResultType = std::remove_cvref_t<
std::remove_pointer_t< std::remove_cv_ref_t<To> >
>;
return ResultType::classof(p);
}
Then our dyn_cast will perform the same IsA check, as it the first method. Depending on whether it matches or not, either static_cast is performed to the desired type or a null pointer is returned:
template <typename To, typename From>
requires std::is_pointer_v<To> && std::convertible_to<From *, To>
auto dyn_cast(From *p) noexcept
{
using res_type = std::remove_cv_t <
std::remove_pointer_t< std::remove_cvref_t<To> >
>;
return IsA<res_type>(p) ? static_cast<To>(p) : nullptr;
}
When I talked about benchmarks at the conference, I mentioned only synthetic tests. I suspected this wouldn't be enough, and one of my teammates also noted this. At the conference I was told that without knowing the performance gain of a given option, we can't rush and rewrite the code according to it. In this article, I decided to make amends and give statistics on how much the analyzer core will slow down if dynamic_cast is returned.
A synthetic example looks like this (link to Quick C++ Benchmark).
#include <memory>
#include <vector>
#include <random>
struct Stmt_RTTI { virtual ~Stmt_RTTI() = default; };
struct IfStmt_RTTI : Stmt_RTTI { };
struct ForStmt_RTTI : Stmt_RTTI { };
struct Stmt_WithEnum
{
enum Type { IfStmt, ForStmt, DoStmt, WhileStmt };
Stmt_WithEnum(Type kind) : m_kind { kind } { }
virtual ~Stmt_WithEnum() = default;
Type Kind() const { return m_kind; }
private:
const Type m_kind;
};
namespace Solution1
{
template <Stmt_WithEnum::Type K>
struct type_from_kind;
template <typename T>
struct kind_from_type;
#define MAKE_STMT_TRAITS(t, k) \
template <> struct Solution1::type_from_kind<k> \
{ \
using type = t; \
}; \
template <> struct Solution1::kind_from_type<t> \
{ \
static constexpr auto value = k; \
};
template <typename T, typename Kind, typename ...Kinds>
bool IsA(T stmt, Kind kind, Kinds ...kinds) noexcept
{
return stmt &&
((stmt->Kind() == kind) || ... || (stmt->Kind() == kinds));
}
template <typename To, typename From>
requires std::is_pointer_v<To>
&& requires { static_cast<To>(std::declval<From *>()); }
auto dyn_cast(From *p) noexcept
{
using ResultType = std::remove_cvref_t<
std::remove_pointer_t< std::remove_cvref_t<To> >
>;
return IsA(p, kind_from_type<ResultType>::value)
? static_cast<To>(p)
: nullptr;
}
}
struct IfStmt_WithEnum : Stmt_WithEnum
{
IfStmt_WithEnum() : Stmt_WithEnum { Stmt_WithEnum::IfStmt } {}
static bool classof(const Stmt_WithEnum *p) noexcept
{
return p && p->Kind() == Stmt_WithEnum::IfStmt;
}
};
MAKE_STMT_TRAITS(IfStmt_WithEnum, Stmt_WithEnum::IfStmt)
struct ForStmt_WithEnum : Stmt_WithEnum
{
ForStmt_WithEnum() : Stmt_WithEnum { Stmt_WithEnum::ForStmt } {}
static bool classof(const Stmt_WithEnum *p) noexcept
{
return p && p->Kind() == Stmt_WithEnum::ForStmt;
}
};
MAKE_STMT_TRAITS(ForStmt_WithEnum, Stmt_WithEnum::ForStmt)
namespace Solution2
{
template <typename To, typename From>
bool IsA(From *p) noexcept
{
using ResultType = std::remove_cvref_t<
std::remove_pointer_t< std::remove_cvref_t<To> >
>;
return ResultType::classof(p);
}
template <typename To, typename From>
requires std::is_pointer_v<To>
&& requires { static_cast<To>(std::declval<From *>()); }
auto dyn_cast(From *p) noexcept
{
using ResultType = std::remove_cvref_t<
std::remove_pointer_t< std::remove_cvref_t<To> >
>;
return IsA<ResultType>(p) ? static_cast<To>(p) : nullptr;
}
}
std::unique_ptr<Stmt_RTTI> factory_1()
{
static std::mt19937_64 Generator { 0 };
std::uniform_int_distribution d { 0, 1 };
switch (d(Generator))
{
case 0:
return std::make_unique<IfStmt_RTTI>();
case 1:
return std::make_unique<ForStmt_RTTI>();
}
std::terminate();
}
std::unique_ptr<Stmt_WithEnum> factory_2()
{
static std::mt19937_64 Generator { 0 };
std::uniform_int_distribution d { 0, 1 };
switch (d(Generator))
{
case 0:
return std::make_unique<IfStmt_WithEnum>();
case 1:
return std::make_unique<ForStmt_WithEnum>();
}
std::terminate();
}
static void StmtRTTI_Benchmark(benchmark::State& state) {
std::vector<std::unique_ptr<Stmt_RTTI>> vec;
const auto size = 1'000'000u;
vec.reserve(size);
for (size_t i = 0; i < size; ++i)
{
vec.push_back(factory_1());
}
// Code inside this loop is measured repeatedly
for (auto _ : state)
{
for (const auto &stmt : vec)
{
if (auto ifStmt = dynamic_cast<const IfStmt_RTTI *>(stmt.get()))
{
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(ifStmt);
}
else if (auto forStmt = dynamic_cast<const ForStmt_RTTI *>(stmt.get()))
{
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(forStmt);
}
}
}
}
// Register the function as a benchmark
BENCHMARK(StmtRTTI_Benchmark);
static void StmtWithEnum_Benchmark_1(benchmark::State& state) {
std::vector<std::unique_ptr<Stmt_WithEnum>> vec;
const auto size = 1'000'000u;
vec.reserve(size);
for (size_t i = 0; i < size; ++i)
{
vec.push_back(factory_2());
}
// Code inside this loop is measured repeatedly
for (auto _ : state)
{
for (const auto &stmt : vec)
{
if (auto ifStmt =
Solution1::dyn_cast<const IfStmt_WithEnum *>(stmt.get()))
{
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(ifStmt);
}
else if (auto forStmt =
Solution1::dyn_cast<const ForStmt_WithEnum *>(stmt.get()))
{
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(forStmt);
}
}
}
}
BENCHMARK(StmtWithEnum_Benchmark_1);
static void StmtWithEnum_Benchmark_2(benchmark::State& state)
{
std::vector<std::unique_ptr<Stmt_WithEnum>> vec;
const auto size = 1'000'000u;
vec.reserve(size);
for (size_t i = 0; i < size; ++i)
{
vec.push_back(factory_2());
}
// Code inside this loop is measured repeatedly
for (auto _ : state)
{
for (const auto &stmt : vec)
{
if (auto ifStmt = Solution2::dyn_cast<
const IfStmt_WithEnum *>(stmt.get()))
{
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(ifStmt);
}
else if (auto forStmt =
Solution2::dyn_cast<const ForStmt_WithEnum *>(stmt.get()))
{
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(forStmt);
}
}
}
}
BENCHMARK(StmtWithEnum_Benchmark_2);
Results:
As you can see, the dynamic_cast option in 3-4 times slower than the two other options.
However, let's not rush to conclusions just yet. We need to see how PVS-Studio behaves on real projects.
Of course, we want to answer the question: does it give the analyzer any performance gain? Obviously, the analyzer spends most of its time not in pointer conversion functions, but, for example, in diagnostic rules.
We will take measurements on SelfTester – this is our own utility that takes a pool of 123 test projects and analyzes them with PVS-Studio. The utility simultaneously runs the analysis of 4 projects, each project is checked in 8 threads. We use real open source projects as test projects. As a result, an analyzer warning log is generated for each project. This log is compared with the reference log for the same project. After that, SelfTester creates a log comparison report in a form that is convenient for the developer to understand.
We ran SelfTester on a machine with the following configuration:
For the benchmark, we ran the analysis on projects twice. The first time we ran the tool with a core where our dyn_cast analog was changed to dynamic_cast. The second time — with our optimized option. Both times we ran the analysis on the up-to date version of the PVS-Studio core.
Results of measurements on 15 projects from the pool:
Project |
The project size, KLOC |
Time: dynamic_cast |
Time: check + static_cast |
---|---|---|---|
SObjectizer |
17.2 |
0:01:10 |
0:00:56 |
StrongDC |
102 |
0:00:46 |
0:00:41 |
Notepad++ |
111 |
0:02:48 |
0:02:55 |
WinMerge |
172 |
0:11:02 |
0:09:48 |
db_10 |
213 |
0:36:58 |
0:32:20 |
pcsx2 |
302 |
0:04:44 |
0:04:57 |
dosbox |
302 |
0:05:54 |
0:04:12 |
CamStudio |
327 |
0:09:49 |
0:08:34 |
Shareaza |
400 |
0:40:32 |
0:36:17 |
mpc-hc |
872 |
0:40:46 |
0:34:30 |
QtParts |
1361 |
0:03:31 |
0:02:35 |
miranda32 |
1811 |
0:32:03 |
0:27:07 |
awesome-hpp |
2196 |
0:12:01 |
0:11:37 |
ffdshow |
2213 |
0:36:23 |
0:34:08 |
Freeswitch |
3690 |
0:37:33 |
0:33:30 |
Then we tried to eliminate the statistical error a bit and repeated the experiment:
Project |
The project size, KLOC |
Time: dynamic_cast |
Time: check + static_cast |
---|---|---|---|
SObjectizer |
17.2 |
0:01:04 |
0:00:51 |
StrongDC |
102 |
0:00:49 |
0:00:47 |
Notepad++ |
111 |
0:03:11 |
0:02:58 |
WinMerge |
172 |
0:11:23 |
0:09:44 |
db_10 |
213 |
0:37:38 |
0:32:50 |
pcsx2 |
302 |
0:04:49 |
0:04:35 |
dosbox |
302 |
0:06:05 |
0:05:20 |
CamStudio |
327 |
0:10:19 |
0:09:15 |
Shareaza |
400 |
0:40:48 |
0:36:39 |
mpc-hc |
872 |
0:37:37 |
0:34:18 |
QtParts |
1361 |
0:03:52 |
0:03:05 |
miranda32 |
1811 |
0:33:04 |
0:31:28 |
awesome-hpp |
2196 |
0:12:09 |
0:11:18 |
ffdshow |
2213 |
0:39:32 |
0:36:04 |
Freeswitch |
3690 |
0:36:54 |
0:32:16 |
From benchmarks we can see that the larger the project's size, the stronger dynamic_cast affects the final time of the analysis. And for small projects that are quickly analyzed, the change is not so significant.
As a result, we can really say that for some projects, disabling RTTI can lead to a significant performance gain. After all, Clang compiles with -fno-rtti for a reason. However, don't get too hung up on this and fight RTTI everywhere. Sometimes the convenience of writing code can be more important than the performance gain (especially if it is only hypothetical).