To get a trial key
fill out the form below
Team License (standard version)
Enterprise License (extended version)
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Request our prices
New License
License Renewal
--Select currency--
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
To get the licence for your open-source project, please fill out this form
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
I am interested to try it on the platforms:
* By clicking this button you agree to our Privacy Policy statement

** This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Message submitted.

Your message has been sent. We will email you at

If you haven't received our response, please do the following:
check your Spam/Junk folder and click the "Not Spam" button for our message.
This way, you won't miss messages from our team in the future.

Interview with Anatoliy Kuznetsov, the …

Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library

Nov 08 2009

In this article, Anatoliy Kuznetsov answers the questions and tells us about the open BitMagic C++ Library.


While regularly looking through the Internet-resources related to the sphere of 64-bit programming, I often came across mentioning about BitMagic C++ Library and that it had gained a lot of benefits from using 64-bits. I decided to communicate with the library's author and offer him to tell us in an interview about his research and developments.

The questions are asked by: Andrey Karpov - "Program Verification Systems" company's worker developing PVS-Studio tool for verification of modern C++ applications.

The answers are given by: Anatoliy Kuznetsov - chief software engineer in NCBI; developer of the open library BitMagic C++ Library.

Hello, Anatoliy. Please, tell us about yourself. What projects are you involved in?

Hello Andrey,

I am chief software engineer, at present I am working in the team of searching and visualizing bio-molecular information in NCBI (National Center for Biotechnology Information). Besides my major activity, I am the chief developer and architect of the open library BitMagic C++ Library.

By education I am planning engineer, a graduate of the Lobachevskiy University in Nizhniy Novgorod.

What is BitMagic?

BitMagic was developed as a universal template library for working with compressed bit vectors. The library solves several tasks:

  • Provides a bit container which is really compatible with STL by ideology. It means that the container must support iterators, memory allocators and interact with algorithms and other STL containers.
  • The library can efficiently operate very long and sparse vectors.
  • Provides a possibility of serialization of vectors for further writing them into databases or sending by net.
  • A developer is provided with a set of algorithms for implementing set-theory operations and calculating distances and similarity metrics in multidimensional binary spaces.
  • Much consideration is given to optimization for the popular calculation acceleration systems, such as SSE.

In case of what tasks to be solved can BitMagic be of most interest for developers?

The library turned out to be rather universal and perhaps it wouldn't be easy to list all the possible ways to use it. At present, the library is of most interest in the following spheres:

  • Building of bit and inverted indexes for full-text search systems, acceleration of relational algebra operations (AND, OR, JOIN etc).
  • Development of non-standard extensions and indexes for existing databases (Oracle Cartridges, MS SQL extended stored procedures). As a rule, such extensions help integrate scientific, geographic and other non-standard data into the database.
  • Development of data mining algorithms.
  • Development of in-memory indexes and databases.
  • Development of systems of precise access differentiation with a large number of objects (security enhanced databases with differentiation of access to separate fields and columns).
  • Task management systems (on the computation cluster), systems of real-time tracing of task states, storage of task states described as Finite State Machines.
  • Tasks of representing and storage of strongly connected graphs.

What can you tell about the history of creating BitMagic library? What prompted you to create it?

For a long time, I and my colleagues had been working with the tasks related to large databases, analysis and visualization systems. The very first working version demonstrating bit vectors' abilities was shown by Maxim Shemanaryov (he is the developer of a wonderful 2D vector graphics library Antigrain Geometry: Then, some ideas of equivalent representation of sets were described by Koen Van Damm, an engineer from Europe who was working on the parsers of programming languages for verifying complex systems. There were other sources as well. I decided to systematize it all somehow and present in the form of a library suitable for multiple use in various projects.

What are the conditions of BitMagic library's distribution? Where can one download it?

The library is free for commercial and non-commercial use and is available in the form of source texts. The only restriction is the demand of mentioning the library and its authors when using it in the finite product.

You can see the materials here:

Am I right supposing that BitMagic gains significant advantages after being compiled in the 64-bit version?

Really, the library uses a series of optimization methods accelerating work in 64-bit systems or systems with SIMD commands (128-bit SSE2).

Here are the factors accelerating execution of algorithms:

  • a wide machine word (logical operations are performed over a wide word);
  • the programmer (and the compiler) has access to additional registers and lack of registers is not so crucial (there is such a disadvantage in x86 architecture);
  • memory alignment often accelerates operation (128-bit alignment of addresses provides a good result);
  • and of course the possibility to place more objects and data being processed in the memory of one program. This is a great plus of the 64-bit version clear to everyone.

At present, the quickest operation is available when using 128-bit SSE2 optimization in a 64-bit program. This mode combines the double number of x86 registers and the wide machine word to perform logical operations.

64-bit systems and programs are going through a real Renaissance. Migration of programs on 64-bits will be faster than moving from 16 to 32. Appearance of 64-bit versions of Windows on mass market and available toolkits (like the one your company is developing) will stimulate this process. In the environment of constant growth of systems' complexity and the size of code used in them, such a toolkit as PVS-Studio is a good help as it reduces efforts and forces release of products.

Tell us about the compression methods used in BitMagic, please

The current 3.6.0 version of the library uses several compression methods.

  • "Bitvectors" in memory are split into blocks. If a block is not occupied or is occupied fully, it is not allocated. That is, the programmer can set bits in a range very far from zero. Setting of bit 100,000,000 doesn't lead to an explosion in memory consumption which is often characteristic of vectors with two-dimensional linear model.
  • Blocks in memory can have an equivalent representation in the form of areas - gaps. Actually, this is a kind of RLE coding. Unlike RLE, our library doesn't lose the ability to execute logical operations or access random bits.
  • When serializing "bitvectors", a set of other methods is used: conversion into lists of integer numbers (representing nulls or ones) and list coding by Elias Gamma Coding method. When using these methods, we do lose the ability of random bit access but it is not so crucial for writing on the disk in comparison with the reduction of costs on storage and input-output.

Could you give some code examples demonstrating the use of BitMagic library?

One of the examples simply creates 2 vectors, initializes them and performs the logical operation AND. Further, the class enumerator is used for iteration and printing of the values saved in the vector.

#include <iostream>
#include "bm.h"
using namespace std;
int main(void)
    bm::bvector<>   bv;    
    bv[10] = true; bv[100] = true; bv[10000] = true;
    bm::bvector<>   bv2(bv);    
    bv2[10000] = false;
    bv &= bv2;
    bm::bvector<>::enumerator en = bv.first();
    bm::bvector<>::enumerator en_end = bv.end();
    for (; en < en_end; ++en) {
        cout << *en << endl;
    return 0;

The next example demonstrates serialization of vectors and use of compression mode.

#include <stdlib.h>
#include <iostream>
#include "bm.h"
#include "bmserial.h"
using namespace std;
// This procedure creates very dense bitvector.
// The resulting set will consists mostly from ON (1) bits
// interrupted with small gaps of 0 bits.
void fill_bvector(bm::bvector<>* bv)
    for (unsigned i = 0; i < MAX_VALUE; ++i) {
        if (rand() % 2500) {
void print_statistics(const bm::bvector<>& bv)
    bm::bvector<>::statistics st;
    cout << "Bits count:" << bv.count() << endl;
    cout << "Bit blocks:" << st.bit_blocks << endl;
    cout << "GAP blocks:" << st.gap_blocks << endl;
    cout << "Memory used:"<< st.memory_used << endl;
    cout << "Max.serialize mem.:" << 
            st.max_serialize_mem << endl << endl;;
unsigned char* serialize_bvector(
  bm::serializer<bm::bvector<> >& bvs, 
  bm::bvector<>& bv)
    // It is reccomended to optimize 
    // vector before serialization.
    bm::bvector<>::statistics st;
    cout << "Bits count:" << bv.count() << endl;
    cout << "Bit blocks:" << st.bit_blocks << endl;
    cout << "GAP blocks:" << st.gap_blocks << endl;
    cout << "Memory used:"<< st.memory_used << endl;
    cout << "Max.serialize mem.:" << 
             st.max_serialize_mem << endl;
    // Allocate serialization buffer.
    unsigned char*  buf = 
        new unsigned char[st.max_serialize_mem];
    // Serialization to memory.
    unsigned len = bvs.serialize(bv, buf, 0);
    cout << "Serialized size:" << len << endl << endl;
    return buf;
int main(void)
    bm::bvector<>   bv1;    
    bm::bvector<>   bv2;
   //  set DGAP compression mode ON
    // Prepare a serializer class 
    // for best performance it is best 
    // to create serilizer once and reuse it
    // (saves a lot of memory allocations)
    bm::serializer<bm::bvector<> > bvs;
    // next settings provide lowest serilized size 
    unsigned char* buf1 = serialize_bvector(bvs, bv1);
    unsigned char* buf2 = serialize_bvector(bvs, bv2);
    // Serialized bvectors (buf1 and buf2) now ready to be
    // saved to a database, file or send over a network.
    // ...
    // Deserialization.
    bm::bvector<>  bv3;
    // As a result of desrialization bv3 
    // will contain all bits from
    // bv1 and bv3:
    //   bv3 = bv1 OR bv2
    bm::deserialize(bv3, buf1);
    bm::deserialize(bv3, buf2);
    // After a complex operation 
    // we can try to optimize bv3.
    delete [] buf1;
    delete [] buf2;
    return 0;

What are your plans on developing BitMagic library?

We wish to implement some new vector compression methods with the ability of parallel data procession.

Due to mass release of Intel Core i5-i7-i9, it is rational to release the library's version for SSE 4.2. Intel company added some interesting features which can be efficiently used. The most interesting is the hardware support of bit number calculation (Population Count).

We are experimenting with nVidia CUDA and other GPGPU. Graphics cards allow you to perform integer and logical operations today - and their resources can be used for algorithms of working with sets and compression.

Popular related articles
The Evil within the Comparison Functions

Date: May 19 2017

Author: Andrey Karpov

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want t…
PVS-Studio for Java

Date: Jan 17 2019

Author: Andrey Karpov

In the seventh version of the PVS-Studio static analyzer, we added support of the Java language. It's time for a brief story of how we've started making support of the Java language, how far we've co…
Free PVS-Studio for those who develops open source projects

Date: Dec 22 2018

Author: Andrey Karpov

On the New 2019 year's eve, a PVS-Studio team decided to make a nice gift for all contributors of open-source projects hosted on GitHub, GitLab or Bitbucket. They are given free usage of PVS-Studio s…
The Last Line Effect

Date: May 31 2014

Author: Andrey Karpov

I have studied many errors caused by the use of the Copy-Paste method, and can assure you that programmers most often tend to make mistakes in the last fragment of a homogeneous code block. I have ne…
Characteristics of PVS-Studio Analyzer by the Example of EFL Core Libraries, 10-15% of False Positives

Date: Jul 31 2017

Author: Andrey Karpov

After I wrote quite a big article about the analysis of the Tizen OS code, I received a large number of questions concerning the percentage of false positives and the density of errors (how many erro…
Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities

Date: Nov 21 2018

Author: Andrey Karpov

A brief description of technologies used in the PVS-Studio tool, which let us effectively detect a large number of error patterns and potential vulnerabilities. The article describes the implementati…
The Ultimate Question of Programming, Refactoring, and Everything

Date: Apr 14 2016

Author: Andrey Karpov

Yes, you've guessed correctly - the answer is "42". In this article you will find 42 recommendations about coding in C++ that can help a programmer avoid a lot of errors, save time and effort. The au…
The way static analyzers fight against false positives, and why they do it

Date: Mar 20 2017

Author: Andrey Karpov

In my previous article I wrote that I don't like the approach of evaluating the efficiency of static analyzers with the help of synthetic tests. In that article, I give the example of a code fragment…
Static analysis as part of the development process in Unreal Engine

Date: Jun 27 2017

Author: Andrey Karpov

Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in th…
How PVS-Studio Proved to Be More Attentive Than Three and a Half Programmers

Date: Oct 22 2018

Author: Andrey Karpov

Just like other static analyzers, PVS-Studio often produces false positives. What you are about to read is a short story where I'll tell you how PVS-Studio proved, just one more time, to be more atte…

Comments (0)

Next comments

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
This website uses cookies and other technology to provide you a more personalized experience. By continuing the view of our web-pages you accept the terms of using these files. If you don't want your personal data to be processed, please, leave this site.
Learn More →