Vladislav Stolyarov

Oct 13 2022

Tags:

#Cpp #Knowledge

Is there life without RTTI or How we wrote our own dynamic_cast

Oct 13 2022

Author: Vladislav Stolyarov

About dynamic_cast
Issue
Writing our own dynamic_cast
Benchmarks
- Synthetic example
- Real benchmark
Results

There aren't many things left in modern C++ that don't fit the "Don't pay for what you don't use" paradigm. One of them is dynamic_cast. In this article, we'll find out what's wrong with it, and after that — try to find an alternative.

About dynamic_cast

Let's refresh our memory about the C++ basics. If you find that part boring, you can always skip it.

So, suppose we have the following class hierarchy:

struct Shape
{
  virtual void Draw() const = 0;
};

struct Circle : Shape
{
  void Draw() const override;
};

struct Triangle : Shape
{
  void Draw() const override;
};

struct Rectangle : Shape
{
  void Draw() const override;
};

According to the object-oriented paradigm, we can safely perform upcasting —expand a pointer to the child class object and turn it into a pointer to the base class:

#include <memory>

void foo(const Shape *);

void bar()
{
  std::unique_ptr<Shape>   circle = std::make_unique<Circle>();
  std::unique_ptr<Shape> triangle = std::make_unique<Triangle>();
  std::unique_ptr<Shape>     rect = std::make_unique<Rectangle>();

  foo(circle.get());
  foo(triangle.get());
  foo(rect.get());
}

Since the derived classes (Circle, Triangle, and Rectangle) are guaranteed to have an instance of the Shape base class, the compiler can perform such transformation without any additional actions.

What if we now want to turn the pointer to the base class into the pointer to the the derived class? Any class derived from Shape can hide beneath the pointer to it. We need to somehow understand at runtime what's actually stored under the pointer. And here comes the runtime type identification (RTTI).

RTTI is a special mechanism that allows us to determine the data type of a variable at runtime. RTTI is used every time you use the dynamic_cast and typeid operators. The C++ standard does not determine how exactly RTTI is implemented, and its implementation is "delegated" to the application binary interface (ABI). Most compilers store information in a virtual table (vtable). If you need details, you can look at the vtable implementation on the x86_64 platform.

To correctly downcast from the pointer to a base class to the pointer to the child class, let's use the dynamic_cast operator:

void Visit(const Shape *ptr)
{
  if (auto circle = dynamic_cast<const Circle *>(ptr))
  {
    // do smth with circle
  }
  else if (auto triangle = dynamic_cast<const Triangle *>(ptr))
  {
    // do smth with triangle
  }
  else if (auto rect = dynamic_cast<const Rectangle *>(ptr))
  {
    // do smth with rect
  }
}

Of course, dynamic_cast is not so common in user code, typeid is used even lesser. Someone may think that such operators are a sign of poor application design. But what about applications where these operations are forced measure and their number may be up to millions? Does looking into vtable drastically slow down the program? That's what we're going to talk about.

Issue

Static analyzers, as well as compilers, work with source code via its intermediate representation. Abstract syntax tree (AST) is one of them. In AST, nodes represent various language constructs.

A convenient way to implement AST in code is via class hierarchy. In the Clang compiler, for example, tree nodes are derived from such classes as Decl (interface for function and type declarations) and Stmt (interface for constructions).

Look at the example. Let's assume we have the synthetic code block which looks as follows:

{
  for (size_t i = 0; i < 10; ++i) ;
  if (true) ;
}

According to the grammar of C and C++, this is a compound statement (block) that contains the for loop and the if branch. If we try to implement this in the object-oriented paradigm, we'll have the following entities:

// Base class for all type of statements
struct Stmt { /* .... */ };

// Base class for all type of expressions
struct Expr { /* .... */ };

// Base class for all types of declarations
struct Decl { /* .... */ };

struct IfStmt : Stmt
{
  const Expr& GetCondition() const noexcept;
  const Stmt& GetBody()      const noexcept;
  const Stmt* GetElse()      const noexcept;

  // ....
};

struct ForStmt : Stmt
{
  const Decl* GetInit()         const noexcept;
  const Expr* GetCondition()    const noexcept;
  const Expr* GetPostBodyExpr() const noexcept;
  const Stmt& GetBody()         const noexcept;

  // ....
};

using StatementList = ....;

struct CompoundStmt : Stmt
{
  auto begin() const noexcept { return stmts.begin(); }
  auto end() const noexcept { return stmts.end(); }
  
private:
  StatementList stmts;

  // ....
};

We'll put all constructs located the block in a container (for example, std::vector). Since there are way more than one construct, we'll work with them via pointers to the Stmt base class. Now we can iterate over all constructs:

void foo(const CompoundStmt *compStmt)
{
  for (auto stmt : compStmt)
  {
    // do smth
  }
}

And finally, call our custom logic for every specialized node:

void Visit(const IfStmt &);
void Visit(const ForStmt &);

Here we face a problem: we need to know at runtime what construct lies under the pointer to Stmt and call the necessary overload:

void foo(const Stmt *stmt)
{
  for (auto stmt : compStmt)
  {
    if (auto ifStmt = dynamic_cast<const IfStmt *>(stmt))
    {
      Visit(*ifStmt);
    }
    else if (auto forStmt = dynamic_cast<const ForStmt *>(stmt))
    {
      Visit(*forStmt);
    }
  }
}

However, there's a small detail because of which the code above may not compile. If the Stmt base class is not polymorphic, then dynamic_cast won't work its magic. To fix this issue, we need to add at least one virtual function. Like this:

struct Stmt { virtual void Dummy(); /* .... */ };

Now everything's fine and the code works in most cases. Until it turns out that RTTI is off in the project and there are reasons for it. For example, in GCC/Clang we can do it by passing the -fno-rtti flag; In MSVC — via /GR-. Note that disabling RTTI dies not affect the exception mechanism and dynamic dispatch.

By the way, while I was writing the article, I found an interesting survey on isocpp.org. Out of 2058 respondents, 14% said that they partly disable RTTI on their programs, and 18% completely disable it.

In conclusion, dynamic_cast is bad because:

It work only on polymorphic types.
It doesn't work with the -fno-rtti flag.
ABI-specific information about types is required for its work.
Downcasting works via the vtable view, and this is a slow operation.

The visitor design pattern can be one of the solutions. It eliminates the need to perform dynamic_cast at cost of one-two virtual function calls. However, let's see how we can do the other way.

Writing our own dynamic_cast

Set the stage

The recipe is simple. We'll implement a field with the type information in the parent class and check it ourselves. This is how it looks:

struct Stmt
{ 
  enum Type { IfStmt, ForStmt, DoStmt, WhileStmt, ....  };
  const Type m_kind;
  Stmt(Type kind) : m_kind { kind } { /* .... */ }
  /* .... */
};

struct IfStmt : Stmt
{
  IfStmt() : Stmt { Stmt::Type::IfStmt } { /* .... */ }
  /* .... */
};

struct ForStmt : Stmt
{
  ForStmt() : Stmt { Stmt::Type::ForStmt } { /* .... */ }
  /* .... */
};

Then when creating an object of the child class, we'll write the class type in the m_kind field. Now we can check the type like this:

void foo(const Stmt *stmt)
{
  if (stmt->m_kind == Stmt::Type::IfStmt)
  {
    auto ifStmt = static_cast<const IfStmt *>(stmt)
    Visit(*ifStmt);
  }
  else if (stmt->m_kind == Stmt::Type::ForStmt)
  {
    auto forStmt = static_cast<const ForStmt *>(stmt)
    Visit(*forStmt);
  }
}

The pros of this approach are as follows:

Polymorphism of classes is now optional.
The code can be compiled without RTTI.
We control the size of the type information. We also know the additional information about the class — it's the size of enumeration.
Quick check and conversion via static_cast in theory is quicker than dynamic_cast.

However, there are cons as well:

We write more code.
With multiple inheritance, the logic of checks will not be that simple.
static_cast can't work with virtual inheritance, and the code won't compile. But how many people use virtual inheritance in 2022?

After looking at the code, you may wonder: "Where is the analog of dynamic_cast here?" Of course, writing checks like that is not convenient. To fix it, let's add a bit of abstraction. Let's look at how it is implemented in PVS-Studio and LLVM.

Make it neat. The PVS-Studio approach

Let's go back to our example:

struct Stmt
{ 
  enum Type { IfStmt, ForStmt, DoStmt, WhileStmt, ....  };
  Stmt(Type kind) : m_kind { kind } { /* .... */ }
  Type Kind() const { return m_kind; }
private:
  const Type m_kind;
  /* .... */
};

Let's implement the IsA function template which will accept the pointer to an object and the intended type. Inside, this function will check for a null pointer and the needed kind.

template <typename T, typename Kind>
bool IsA(T *p, Kind kind) noexcept
{
  return p && p->Kind() == kind;
}

Next, we'll implement a couple of struct templates without definition:

template <Stmt::Type K>
struct type_from_kind;

template <typename T>
struct kind_from_type;

The templates are needed to further specialize structs for each derived class. Thus, we will link the tree node type and the enumeration constant located within the class.

template <> struct type_from_kind<Stmt::IfStmt>
{
  using type = IfStmt;
};

template <> struct kind_from_type<IfStmt>
{
  static constexpr auto value = Stmt::Type::IfStmt;
};

Of course, writing such specializations for each class is not very great. Here the "macro magic" comes to the rescue.

#define MAKE_STMT_TRAITS(t, k) \
  template <> struct type_from_kind<k> \
  { \
    using type = t; \
  }; \

  template <> struct kind_from_type<t> \
  { \
    static constexpr auto value = k; \
  };

Then the definition of child classes will look like this:

struct IfStmt : Stmt { /* .... */ };
MAKE_STMT_TRAITS(IfStmt, Stmt::IfStmt)

struct ForStmt : Stmt { /* .... */ };
MAKE_STMT_TRAITS(ForStmt, Stmt::ForStmt)

Now we can implement our own dyn_cast that encapsulates static_cast with a check for the desired type via the IsA function:

template <typename To, typename From>
  requires std::is_pointer_v<To> && std::convertible_to<From, To>
auto dyn_cast(From *p) noexcept
{
  using ResultType = std::remove_cvref_t<std::remove_pointer_t<To>>;
  return IsA(p, kind_from_type<ResultType>::value)
           ? static_cast<To>(p)
           : nullptr;
}

Thanks to this approach, we can achieve consistency with the way when we just wrote dynamic_cast:

void foo(const Stmt *stmt)
{
  for (auto stmt : compStmt)
  {
    if (auto ifStmt = dyn_cast<const IfStmt *>(stmt))
    {
      Visit(*ifStmt);
    }
    else if (auto forStmt = dyn_cast<const ForStmt *>(stmt))
    {
      Visit(*forStmt);
    }
  }
}

Now let's inspect another approach.

Make it neat. The LLVM approach

Let's go back to the Stmt struct:

struct Stmt
{ 
  enum Type { IfStmt, ForStmt, DoStmt, WhileStmt, ....  };
  Stmt(Type kind) : m_kind { kind } { /* .... */ }
  Type Kind() const { return m_kind; }
private:
  const Type m_kind;
  /* .... */
};

We implement classof, a static member function, in each descendant class. It takes the pointer to the base class and checks it for a match with kind of the child class:

struct IfStmt : Stmt
{
  static bool classof(const Stmt *stmt) noexcept
  {
    return stmt->Kind() == Stmt::Type::IfStmt;
  }
  /* .... */
};

struct ForStmt : Stmt
{
  static bool classof(const Stmt *stmt) noexcept
  {
    return stmt->Kind() == Stmt::Type::ForStmt;
  }
  /* .... */
};

Next, in the IsA function we need to simply call classof from the type that the first template parameter passed to us:

template <typename To, typename From>
bool IsA(From *p) noexcept
{
  using ResultType = std::remove_cvref_t<
    std::remove_pointer_t< std::remove_cv_ref_t<To> >
  >;

  return ResultType::classof(p);
}

Then our dyn_cast will perform the same IsA check, as it the first method. Depending on whether it matches or not, either static_cast is performed to the desired type or a null pointer is returned:

template <typename To, typename From>
  requires std::is_pointer_v<To> && std::convertible_to<From *, To>
auto dyn_cast(From *p) noexcept
{
  using res_type = std::remove_cv_t < 
    std::remove_pointer_t< std::remove_cvref_t<To> >
  >;
 
  return IsA<res_type>(p) ? static_cast<To>(p) : nullptr;
}

Benchmarks

When I talked about benchmarks at the conference, I mentioned only synthetic tests. I suspected this wouldn't be enough, and one of my teammates also noted this. At the conference I was told that without knowing the performance gain of a given option, we can't rush and rewrite the code according to it. In this article, I decided to make amends and give statistics on how much the analyzer core will slow down if dynamic_cast is returned.

Synthetic example

A synthetic example looks like this (link to Quick C++ Benchmark).

We define the Stmt_RTTI struct from which the IfStmt_RTTI and ForStmt_RTTI structs will be inherited. We will later convert them using dynamic_cast.
Let's define the struct of Stmt_WithEnum from which IfStmt_WithEnum and ForStmt_WithEnum will be inherited. We will convert them by checking with the IsA function, implemented in the two ways described above, and by converting using static_cast.
We form a vector of 1'000'000 smart pointers to the type Stmt_RTTI / Stmt_WithEnum, initialize them pseudo-randomly with inheritors.
Iterate over the vector and convert to one of the child types.

Benchmark code open icon

#include <memory>
#include <vector>
#include <random>

struct Stmt_RTTI { virtual ~Stmt_RTTI() = default; };

struct IfStmt_RTTI : Stmt_RTTI { };
struct ForStmt_RTTI : Stmt_RTTI { };

struct Stmt_WithEnum
{ 
  enum Type { IfStmt, ForStmt, DoStmt, WhileStmt };
  Stmt_WithEnum(Type kind) : m_kind { kind } { }
  virtual ~Stmt_WithEnum() = default;
  Type Kind() const { return m_kind; }

private:
  const Type m_kind;
};

namespace Solution1
{
  template <Stmt_WithEnum::Type K>
  struct type_from_kind;

  template <typename T>
  struct kind_from_type;

#define MAKE_STMT_TRAITS(t, k) \
  template <> struct Solution1::type_from_kind<k> \
  { \
    using type = t; \
  }; \
  template <> struct Solution1::kind_from_type<t> \
  { \
    static constexpr auto value = k; \
  };

  template <typename T, typename Kind, typename ...Kinds>
  bool IsA(T stmt, Kind kind, Kinds ...kinds) noexcept
  {
    return stmt &&
          ((stmt->Kind() == kind) || ... || (stmt->Kind() == kinds));
  }

  template <typename To, typename From>
       requires std::is_pointer_v<To>
    && requires { static_cast<To>(std::declval<From *>()); }
  auto dyn_cast(From *p) noexcept
  {
    using ResultType = std::remove_cvref_t<
      std::remove_pointer_t< std::remove_cvref_t<To> >
    >;

    return IsA(p, kind_from_type<ResultType>::value)
             ? static_cast<To>(p)
             : nullptr;
  }
}

struct IfStmt_WithEnum : Stmt_WithEnum
{
  IfStmt_WithEnum() : Stmt_WithEnum { Stmt_WithEnum::IfStmt } {} 

  static bool classof(const Stmt_WithEnum *p) noexcept
  {
    return p && p->Kind() == Stmt_WithEnum::IfStmt;
  }
};

MAKE_STMT_TRAITS(IfStmt_WithEnum, Stmt_WithEnum::IfStmt)

struct ForStmt_WithEnum : Stmt_WithEnum
{
  ForStmt_WithEnum() : Stmt_WithEnum { Stmt_WithEnum::ForStmt } {}

  static bool classof(const Stmt_WithEnum *p) noexcept
  {
    return p && p->Kind() == Stmt_WithEnum::ForStmt;
  }
};

MAKE_STMT_TRAITS(ForStmt_WithEnum, Stmt_WithEnum::ForStmt)

namespace Solution2
{
  template <typename To, typename From>
  bool IsA(From *p) noexcept
  {
    using ResultType = std::remove_cvref_t<
      std::remove_pointer_t< std::remove_cvref_t<To> >
    >;

    return ResultType::classof(p);
  }

  template <typename To, typename From>
       requires std::is_pointer_v<To>
    && requires { static_cast<To>(std::declval<From *>()); }
  auto dyn_cast(From *p) noexcept
  {
    using ResultType = std::remove_cvref_t<
      std::remove_pointer_t< std::remove_cvref_t<To> >
    >;

    return IsA<ResultType>(p) ? static_cast<To>(p) : nullptr;
  }
}

std::unique_ptr<Stmt_RTTI> factory_1()
{
  static std::mt19937_64 Generator { 0 };
  std::uniform_int_distribution d { 0, 1 };
  switch (d(Generator))
  {
  case 0:
    return std::make_unique<IfStmt_RTTI>();
  case 1:
    return std::make_unique<ForStmt_RTTI>();
  }

  std::terminate();
}

std::unique_ptr<Stmt_WithEnum> factory_2()
{
  static std::mt19937_64 Generator { 0 };
  std::uniform_int_distribution d { 0, 1 };
  switch (d(Generator))
  {
  case 0:
    return std::make_unique<IfStmt_WithEnum>();
  case 1:
    return std::make_unique<ForStmt_WithEnum>();
  }

  std::terminate();
}

static void StmtRTTI_Benchmark(benchmark::State& state) {
  std::vector<std::unique_ptr<Stmt_RTTI>> vec;
  const auto size = 1'000'000u;
  
  vec.reserve(size);
  for (size_t i = 0; i < size; ++i)
  {
    vec.push_back(factory_1());
  }

  // Code inside this loop is measured repeatedly
  for (auto _ : state)
  {
    for (const auto &stmt : vec)
    {
      if (auto ifStmt = dynamic_cast<const IfStmt_RTTI *>(stmt.get()))
      {
        // Make sure the variable is not optimized away by compiler
        benchmark::DoNotOptimize(ifStmt);
      }
      else if (auto forStmt = dynamic_cast<const ForStmt_RTTI *>(stmt.get()))
      {
        // Make sure the variable is not optimized away by compiler
        benchmark::DoNotOptimize(forStmt);
      }
    }
  }
}
// Register the function as a benchmark
BENCHMARK(StmtRTTI_Benchmark);

static void StmtWithEnum_Benchmark_1(benchmark::State& state) {
  std::vector<std::unique_ptr<Stmt_WithEnum>> vec;
  const auto size = 1'000'000u;
  
  vec.reserve(size);
  for (size_t i = 0; i < size; ++i)
  {
    vec.push_back(factory_2());
  }

  // Code inside this loop is measured repeatedly
  for (auto _ : state)
  {
    for (const auto &stmt : vec)
    {
      if (auto ifStmt = 
                 Solution1::dyn_cast<const IfStmt_WithEnum *>(stmt.get()))
      {
        // Make sure the variable is not optimized away by compiler
        benchmark::DoNotOptimize(ifStmt);
      }
      else if (auto forStmt = 
                 Solution1::dyn_cast<const ForStmt_WithEnum *>(stmt.get()))
      {
        // Make sure the variable is not optimized away by compiler
        benchmark::DoNotOptimize(forStmt);
      }
    }
  }
}
BENCHMARK(StmtWithEnum_Benchmark_1);

static void StmtWithEnum_Benchmark_2(benchmark::State& state) 
{
  std::vector<std::unique_ptr<Stmt_WithEnum>> vec;
  const auto size = 1'000'000u;
  
  vec.reserve(size);
  for (size_t i = 0; i < size; ++i)
  {
    vec.push_back(factory_2());
  }

  // Code inside this loop is measured repeatedly
  for (auto _ : state)
  {
    for (const auto &stmt : vec)
    {
      if (auto ifStmt = Solution2::dyn_cast<
                                     const IfStmt_WithEnum *>(stmt.get()))
      {
        // Make sure the variable is not optimized away by compiler
        benchmark::DoNotOptimize(ifStmt);
      }
      else if (auto forStmt = 
                 Solution2::dyn_cast<const ForStmt_WithEnum *>(stmt.get()))
      {
        // Make sure the variable is not optimized away by compiler
        benchmark::DoNotOptimize(forStmt);
      }
    }
  }
}
BENCHMARK(StmtWithEnum_Benchmark_2);

Results:

As you can see, the dynamic_cast option in 3-4 times slower than the two other options.

However, let's not rush to conclusions just yet. We need to see how PVS-Studio behaves on real projects.

Real benchmark

Of course, we want to answer the question: does it give the analyzer any performance gain? Obviously, the analyzer spends most of its time not in pointer conversion functions, but, for example, in diagnostic rules.

To measure, we changed our optimized conversion operations only in the function that performs conversions between the nodes of the tree and its specializations. Note that it's not the only entity that carries such functionality.

We will take measurements on SelfTester – this is our own utility that takes a pool of 123 test projects and analyzes them with PVS-Studio. The utility simultaneously runs the analysis of 4 projects, each project is checked in 8 threads. We use real open source projects as test projects. As a result, an analyzer warning log is generated for each project. This log is compared with the reference log for the same project. After that, SelfTester creates a log comparison report in a form that is convenient for the developer to understand.

We ran SelfTester on a machine with the following configuration:

Intel Core i7-9700k;
32 GB RAM;
Samsung SSD 970 EVO Plus 500GB as a system disk;
WDC WD10EZEX-22MFCA0, SelfTester is located here.

For the benchmark, we ran the analysis on projects twice. The first time we ran the tool with a core where our dyn_cast analog was changed to dynamic_cast. The second time — with our optimized option. Both times we ran the analysis on the up-to date version of the PVS-Studio core.

Results of measurements on 15 projects from the pool:

Project	The project size, KLOC	Time: dynamic_cast	Time: check + static_cast
SObjectizer	17.2	0:01:10	0:00:56
StrongDC	102	0:00:46	0:00:41
Notepad++	111	0:02:48	0:02:55
WinMerge	172	0:11:02	0:09:48
db_10	213	0:36:58	0:32:20
pcsx2	302	0:04:44	0:04:57
dosbox	302	0:05:54	0:04:12
CamStudio	327	0:09:49	0:08:34
Shareaza	400	0:40:32	0:36:17
mpc-hc	872	0:40:46	0:34:30
QtParts	1361	0:03:31	0:02:35
miranda32	1811	0:32:03	0:27:07
awesome-hpp	2196	0:12:01	0:11:37
ffdshow	2213	0:36:23	0:34:08
Freeswitch	3690	0:37:33	0:33:30

Then we tried to eliminate the statistical error a bit and repeated the experiment:

Project	The project size, KLOC	Time: dynamic_cast	Time: check + static_cast
SObjectizer	17.2	0:01:04	0:00:51
StrongDC	102	0:00:49	0:00:47
Notepad++	111	0:03:11	0:02:58
WinMerge	172	0:11:23	0:09:44
db_10	213	0:37:38	0:32:50
pcsx2	302	0:04:49	0:04:35
dosbox	302	0:06:05	0:05:20
CamStudio	327	0:10:19	0:09:15
Shareaza	400	0:40:48	0:36:39
mpc-hc	872	0:37:37	0:34:18
QtParts	1361	0:03:52	0:03:05
miranda32	1811	0:33:04	0:31:28
awesome-hpp	2196	0:12:09	0:11:18
ffdshow	2213	0:39:32	0:36:04
Freeswitch	3690	0:36:54	0:32:16

From benchmarks we can see that the larger the project's size, the stronger dynamic_cast affects the final time of the analysis. And for small projects that are quickly analyzed, the change is not so significant.

Results

As a result, we can really say that for some projects, disabling RTTI can lead to a significant performance gain. After all, Clang compiles with -fno-rtti for a reason. However, don't get too hung up on this and fight RTTI everywhere. Sometimes the convenience of writing code can be more important than the performance gain (especially if it is only hypothetical).

#Cpp #Knowledge