Webinar: Evaluation - 05.12
We have several smart pointers in C++ – 'std::unique_ptr', 'std::shared_ptr', 'std::weak_ptr'.
There are also non-standard smart pointers, for example in boost: intrusive_ptr, local_shared_ptr.
We published and translated this article with the copyright holder's permission. The author is Evgeny Shulgin, (email – izaronplatz@gmail.com). The article was originally published on Habr.
In this article we are discussing a new smart pointer type – static_ptr. It is most similar to std::unique_ptr without dynamic allocations.
std::unique_ptr<T> is a pointer that wraps a non-smart T* pointer. Every C++ developer has probably used this class.
The most popular reason to use this pointer is dynamic polymorphism.
If at a compiling stage we don't "know" the class of the object we will create in a certain execution point, so we won't know the value by which we should increment the stack pointer. Therefore, we can't create this object on the stack — we can only create it in the heap.
Suppose we have a polymorphic class IEngine and its children TSteamEngine, TRocketEngine, TEtherEngine. The object of "some IEngine child known at runtime" is std::unique_ptr<IEngine> in most cases. So, the memory for the object is allocated in the heap.
Figure 1. std::unique_ptr<IEngine> with the objects of different size
The heap allocations are for large objects (std::vector with lots of elements, etc.), while the stack is better for small objects.
On Linux, to get the stack size for a process, you can run the following method:
ulimit -s
It displays a low size by default. My systems had 8192 KiB = 8 MiB. While the heap memory enables you to consume gigabytes.
The allocation of too many small objects causes memory fragmentation and affects the CPU cache. You can use memory pool to prevent this.
How can we create an object similar to std::unique_ptr but completely on the stack?
The C++ library contains std::aligned_storage, which reserves raw memory on the stack. We can use this memory and placement new to make an object of the required class T. But don't forget to ensure that the memory size is not less than sizeof(T).
So, with the overhead of only a few unused bytes on the stack, we can create objects of the derived class.
I had intended to create a stack-only version of std::unique_ptr<T>, so I looked for ready-made implementations. The idea seemed to be staring me in the face.
After I thought up the words like stack_ptr, static_ptr, etc. and searched them on GitHub, I finally found a reasonable implementation in the ceph project, in ceph/static_ptr.h. I also discovered some useful ideas there. However, this class is not commonly used in the project, and the implementation has some significant blunders.
The implementation may look as follows: there is a buffer for an object (in the form of std::aligned_storage); and some data that allows us to properly handle the object: to call the destructor of the exact type that static_ptr currently contains.
Figure 2. sp::static_ptr<IEngine> with objects of different size (32-byte buffer)
In this chapter we discuss the step-by-step implementation and lots of its nuances.
I decided to put the static_ptr class inside namespace sp (from static pointer).
Implementations of containers, smart pointers and other things are generally some of the hardest programs on C++, because you should consider things that nobody checks in projects.
Suppose we want to call a move constructor to move bytes from one memory region to another. We could write this as follows:
template <typename T>
struct move_constructer
{
static void call(T *lhs, T *rhs)
{
new (lhs) T(std::move(*rhs));
}
};
// call `move_constructer<T>::call(dst, src);
But what to do, if the T class does not contain a move constructor?
If there is a chance that the T type has a move assignment operator, we can use it. Otherwise, we have to "break" the compilation.
The newer the C++ standard is, the easier it is to write code for these things. We get the following code (compiled in C++17):
template <typename T>
struct move_constructer
{
static void call(T *lhs, T *rhs)
{
if constexpr (std::is_move_constructible_v<T>)
{
new (lhs) T(std::move(*rhs));
}
else if constexpr ( std::is_default_constructible_v<T>
&& std::is_move_assignable_v<T>)
{
new (lhs) T();
*lhs = std::move(*rhs);
}
else
{
[]<bool flag = false>()
{
static_assert(flag, "move constructor disabled");
}();
}
}
};
(on the 10th line a static_assert compilation "break" occurs with a hack)
However, it is better to use the noexcept specifier when it is possible. In C++20, we get such code as simple as possible for now:
template <typename T>
struct move_constructer
{
static void call(T *lhs, T *rhs)
noexcept (std::is_nothrow_move_constructible_v<T>)
requires (std::is_move_constructible_v<T>)
{
new (lhs) T(std::move(*rhs));
}
static void call(T *lhs, T *rhs)
noexcept ( std::is_nothrow_default_constructible_v<T>
&& std::is_nothrow_move_assignable_v<T>)
requires ( !std::is_move_constructible_v<T>
&& std::is_default_constructible_v<T>
&& std::is_move_assignable_v<T>)
{
new (lhs) T();
*lhs = std::move(*rhs);
}
We can create the move_assigner structure in a similar way. We could also make copy_constructer and copy_assigner, but our implementation doesn't require them. In static_ptr, the copy constructor and copy assignment operator will be deleted (as in unique_ptr).
Although static_ptr can store any object, it is better to "know" the exact type of the object static_ptr contains. For example, this would help us call the destructor of this particular object and do other things.
Here's what I've come up with after a few tries— we need to use the ops structure:
struct ops
{
using binary_func = void(*)(void *dst, void *src);
using unary_func = void(*)(void *dst);
binary_func move_construct_func;
binary_func move_assign_func;
unary_func destruct_func;
};
And a couple of auxiliary functions to cast void* to T*...
template <typename T, typename Functor>
void call_typed_func(void *dst, void *src)
{
Functor::call(static_cast<T*>(dst), static_cast<T*>(src));
}
template <typename T>
void destruct_func(void *dst)
{
static_cast<T*>(dst)->~T();
}
And now we can set each T type to have our own copy of ops:
template <typename T>
static constexpr ops ops_for
{
.move_construct_func = &call_typed_func<T, move_constructer<T>>,
.move_assign_func = &call_typed_func<T, move_assigner<T>>,
.destruct_func = &destruct_func<T>,
};
using ops_ptr = const ops *;
Now static_ptr stores a reference to the ops_for<T>, where T is the class of the object. And static_ptr contains this object.
We can't copy static_ptr - we can only move it to another static_ptr. To choose a move we need to determine the type of both static_ptr objects.
Here's the implementation of the following method:
// moving objects using ops
static void move_construct(void *dst_buf, ops_ptr &dst_ops,
void *src_buf, ops_ptr &src_ops)
{
if (!src_ops && !dst_ops)
{
// both object are nullptr_t, do nothing
return;
}
else if (src_ops == dst_ops)
{
// objects have the same type, make move
(*src_ops->move_assign_func)(dst_buf, src_buf);
(*src_ops->destruct_func)(src_buf);
src_ops = nullptr;
}
else
{
// objects have different type
// delete the old object
if (dst_ops)
{
(*dst_ops->destruct_func)(dst_buf);
dst_ops = nullptr;
}
// construct the new object
if (src_ops)
{
(*src_ops->move_construct_func)(dst_buf, src_buf);
(*src_ops->destruct_func)(src_buf);
}
dst_ops = src_ops;
src_ops = nullptr;
}
}
Now we need to determine the default buffer size and the alignment, because std::aligned_storage requires these two values.
Obviously, the alignment of the derived class may surpass the alignment of the base class. Therefore, the alignment should be as maximum as possible. The std::max_align_t type helps us to do the following:
static constexpr std::size_t align = alignof(std::max_align_t);
My systems set it to 16, but some non-standard values are also possible.
And the memory from the malloc heap is also aligned to the maximum possible value by default.
The default buffer size can be set to 16 bytes or to sizeof(T), so we need to choose the one whose value will be larger.
template <typename T>
struct static_ptr_traits
{
static constexpr std::size_t buffer_size =
std::max(static_cast<std::size_t>(16), sizeof(T));
};
Obviously, we should add a specialization for our custom type, so objects of all derived classes could be stored. It's better to define a macro. It allows us to write code faster. We can create the macro to specify the buffer size for some class:
#define STATIC_PTR_BUFFER_SIZE(Tp, size) \
namespace sp \
{ \
template<> struct static_ptr_traits<Tp> \
{ \
static constexpr std::size_t buffer_size = size; \
}; \
}
// example:
STATIC_PTR_BUFFER_SIZE(IEngine, 1024)
However, this is not enough for the target size to be "inherited" by all children. We can add one more macro using the std::is_base_of class template:
#define STATIC_PTR_INHERITED_BUFFER_SIZE(Tp, size) \
namespace sp \
{ \
template <typename T> requires std::is_base_of_v<Tp, T> \
struct static_ptr_traits<T> \
{ \
static constexpr std::size_t buffer_size = size; \
}; \
}
// example:
STATIC_PTR_INHERITED_BUFFER_SIZE(IEngine, 1024)
Now we implement the class. It contains only two fields — a reference to ops and the buffer for an object:
template <typename Base> requires(!std::is_void_v<Base>)
class static_ptr
{
private:
static constexpr std::size_t buffer_size =
static_ptr_traits<Base>::buffer_size;
static constexpr std::size_t align = alignof(std::max_align_t);
// Struct for calling object's operators
// equals to `nullptr` when `buf_` contains no object
// equals to `ops_for<T>` when `buf_` contains a `T` object
ops_ptr ops_;
// Storage for underlying `T` object
// this is mutable so that `operator*` and `get()` can
// be marked const
mutable std::aligned_storage_t<buffer_size, align> buf_;
// ...
First, we implement the reset function, which deletes the object. This function is commonly used:
// destruct the underlying object
void reset() noexcept(std::is_nothrow_destructible_v<Base>)
{
if (ops_)
{
(ops_->destruct_func)(&buf_);
ops_ = nullptr;
}
}
Next, we implement basic constructors in the same way as std::unique_ptr:
// operators, ctors, dtor
static_ptr() noexcept : ops_ { nullptr } {}
static_ptr(std::nullptr_t) noexcept : ops_ { nullptr } {}
static_ptr& operator=(std::nullptr_t)
noexcept(std::is_nothrow_destructible_v<Base>)
{
reset();
return *this;
}
Now we implement move constructor and move assignment operator:
static_ptr(static_ptr &&rhs) : ops_ { nullptr }
{
move_construct(&buf_, ops_, &rhs.buf_, rhs.ops_);
}
static_ptr& operator=(static_ptr &&rhs)
{
move_construct(&buf_, ops_, &rhs.buf_, rhs.ops_);
return *this;
}
It is better, if we can accept static_ptr of other types. The other type should fit in the buffer and should be inherited from the current type:
template <typename Derived>
struct derived_class_check
{
static constexpr bool ok = sizeof(Derived) <= buffer_size
&& std::is_base_of_v<Base, Derived>;
};
We have to declare all instantiations as a 'friend' class:
// support static_ptr's conversions of different types
template <typename T> friend class static_ptr;
Then we need to rewrite the previous two functions as follows:
template <typename Derived = Base>
static_ptr(static_ptr<Derived> &&rhs)
requires(derived_class_check<Derived>::ok)
: ops_ { nullptr }
{
move_construct(&buf_, ops_, &rhs.buf_, rhs.ops_);
}
template <typename Derived = Base>
static_ptr& operator=(static_ptr<Derived> &&rhs)
requires(derived_class_check<Derived>::ok)
{
move_construct(&buf_, ops_, &rhs.buf_, rhs.ops_);
return *this;
}
The copy constructor is deleted:
static_ptr(const static_ptr &) = delete;
static_ptr& operator=(const static_ptr &) = delete;
The destructor destroys the object in the buffer:
~static_ptr()
{
reset();
}
To create an object in the buffer in-place, we implement the emplace function. The existing object will be destructed, a new one will be constructed in the buffer, and the pointer to ops will be updated:
// in-place (re)initialization
template <typename Derived = Base, typename ...Args>
Derived& emplace(Args &&...args)
noexcept(std::is_nothrow_constructible_v<Derived, Args...>)
requires(derived_class_check<Derived>::ok)
{
reset();
Derived* derived = new (&buf_) Derived(std::forward<Args>(args)...);
ops_ = &ops_for<Derived>;
return *derived;
}
Then we implement assessors functions in the same way as those that the std::unique_ptr contains:
// accessors
Base* get() noexcept
{
return ops_ ? reinterpret_cast<Base*>(&buf_) : nullptr;
}
const Base* get() const noexcept
{
return ops_ ? reinterpret_cast<const Base*>(&buf_) : nullptr;
}
Base& operator*() noexcept { return *get(); }
const Base& operator*() const noexcept { return *get(); }
Base* operator&() noexcept { return get(); }
const Base* operator&() const noexcept { return get(); }
Base* operator->() noexcept { return get(); }
const Base* operator->() const noexcept { return get(); }
operator bool() const noexcept { return ops_; }
};
And finally we implement the sp::make_static function similar to the std::make_unique and std::make_shared functions:
template <typename T, class ...Args>
static static_ptr<T> make_static(Args &&...args)
{
static_ptr<T> ptr;
ptr.emplace(std::forward<Args>(args)...);
return ptr;
}
Code is available on GitHub!
It's easier than you would think! I have written unit tests that have the lifetime of the inside objects of static_ptr.
This test also contains typical scenarios for static_ptr and behavior of the objects inside static_ptr objects.
For benchmarks I used the google/benchmark library. You can find the code in the repository.
I described two scenarios, each of them checks the std::unique_ptr and sp::static_ptr class templates:
In the first scenario, the sp::static_ptr should benefit from no dynamic allocation. In the second scenario, the sp::static_ptr should benefit from memory locality. Although, it is obvious that compilers are smart and can fix "bad" scenarios, depending on the optimization flags.
Let's run benchmark in Debug:
***WARNING*** Library was built as DEBUG. Timings may be affected.
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------
SingleUniquePointer 207 ns 207 ns 3244590
SingleStaticPointer 39.1 ns 39.1 ns 17474886
IteratingOverUniquePointers 3368 ns 3367 ns 204196
IteratingOverStaticPointers 1716 ns 1716 ns 397344
--------------------------------------------------------------------------------
And now, in Release:
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------
SingleUniquePointer 14.5 ns 14.5 ns 47421573
SingleStaticPointer 3.57 ns 3.57 ns 197401957
IteratingOverUniquePointers 198 ns 198 ns 3573888
IteratingOverStaticPointers 195 ns 195 ns 3627462
--------------------------------------------------------------------------------
So, the sp::static_ptr implementation that is a stack-only analog of std::unique_ptr gains in performance.
0