Annotating functions and types in JSON format

Quick start
Ways to register the annotation file
Structure of the annotation file
Type annotations
Function annotations
JSON Schema
Examples

The user annotation mechanism is a way of marking up types and functions in JSON format in order to provide the analyzer with additional information. Due to this information, the analyzer can find more errors in code. The mechanism works only for C and C++ languages.

Quick start

Let's say that the project requires that we forbid calling a function because it's unwanted:

void DeprecatedFunction(); // should not be used

void foo()
{
  DeprecatedFunction(); // unwanted call site
}

In order for the analyzer to issue the V2016 warning in the place where this function is called, we should create a special JSON file with the following contents:

{
  "version": 1,
  "annotations": [
    {
      "type": "function",
      "name": "DeprecatedFunction",
      "attributes": [ "dangerous" ]
    }
  ]
}

After that, we only need to register the file (another methods are listed below):

//V_PVS_ANNOTATIONS /path/to/project/annotations.json

void DeprecatedFunction();

void foo()
{
  DeprecatedFunction(); // <= V2016 will be issued here
}

Ways to register the annotation file

We can register JSON annotation files in the following ways:

Way N1. Write a special comment in the source code or in the diagnostic rules configuration file (.pvsconfig):

//V_PVS_ANNOTATIONS /path/to/annotations.json

If a relative path is specified, it's normalized relative to the directory where the file with comment is.

Way N2. Pass the ‑‑annotation-file (-A) flag to the pvs-studio-analyzer utility:

pvs-studio-analyzer --annotation-file=/path/to/annotations.json

Note. Multiple annotation files can be registered. Each file requires a separate flag or comment.

Structure of the annotation file

The file content is a JSON object consisting of two mandatory fields: version and annotations.

The version field takes an integer-type value and specifies the version of the mechanism. Depending on the value, the markup file can be processed differently. Currently, only one value is supported — 1.

The annotations field is an array of "annotation" objects:

{
  "version": 1,
  "annotations":
  [
    {
      ...
    },
    {
      ...
    }
  ]
}

Annotations can be of two types:

type annotations;
function annotations;

If an annotation is declared directly in the annotations array, it's a top-level annotation. Otherwise, it is a nested annotation.

Type annotations

The type annotation object consists of the following fields.

The "type" field

The mandatory field. It takes a string with one of the values: "record," "class," "struct," or "union". The last three options are aliases for "record" and have been added for convenience.

The "name" field

The mandatory field. It takes a string with the fully qualified name of an entity. The analyzer searches for this entity starting from the global scope. If the entity is in the global scope, the "::" character at the beginning of the name can be omitted.

The "members" field

The optional field. An array of nested annotation objects.

The "attributes" field

The optional field. An array of strings that specifies the properties of an entity. Attributes available for type annotations are as follows:

Smart pointers

"unique_ptr" — the type has the std::unique_ptr interface;
"shared_ptr" — the type has the std::shared_ptr interface;
"auto_ptr" — the type has the std::auto_ptr interface;

Containers

"string" — the type has the std::basic_string interface;
"string_view" — the type has the std::basic_string_view interface;
"array" — the type has the std::array interface;
"vector" — the type has the std::vector interface;
"map" — the type has the std::map interface;
"set" — the type has the std::set interface;
"list" — the type has the std::list interface;
"unordered" — in combination with "set" or "map" sets the type interface to std::unordered_set or std::unordered_map, respectively.
"multi" — in combination with "set" or "map" sets the type interface to std::multiset or std::multimap, respectively. If "unordered" is included, the type is given the std::unordered_multiset or std::unordered_multimap semantics.

Other types

"nullable" — the type has the semantics of a nullable type. Objects of these types can have one of two states: "valid" or "invalid". Accessing an object in the "invalid" state results in an error. Pointers and std::optional are examples of such types.

Semantics

"cheap_to_copy" — an object of the type can be passed to a function by copy with zero overhead;
"expensive_to_copy" — an object of the type should be passed to a function only by pointer/reference;
"copy_on_write" — the type has the copy-on-write semantics.

Function annotations

The function annotation object consists of the following fields:

The "type" field

The mandatory field. It takes a string with the function value. Also, the ctor value is available for nested function annotations (in the members field of type annotations). It indicates that a custom-type constructor is being annotated.

The "name" field

It takes a string with a function name. The field is mandatory if type has the "function" value, otherwise it should be omitted. The analyzer searches for the annotated entity by this name, starting from the global scope.

For top-level annotations, the fully qualified name is specified. For nested annotations, the unqualified name is specified.

If the function is in the global scope, the scope resolution operator ("::") at the beginning of the name can be omitted.

The "params" field

The optional field. An array of objects that describes formal parameters. Along with name, this field specifies the signature of the function by which the analyzer compares the annotation with its declaration in the program code. In the case of member functions, the analyzer also considers the qualifiers field.

Each object contains the following fields:

"type" (mandatory) — a type of a formal parameter as a string. For example, the first formal parameter of the memset function has the void * type. That's what should be written in the string. It's possible to omit unnecessary parameters and annotate several function overloads with a single annotation. For this purpose, use a wildcard character:
- The "*" character means that 0 or more parameters of any type can take its place. It should be the last in the list of parameters.
- The "?" character denotes that a parameter of any type can take its place.
"attributes" (optional) — an array of strings that specifies the properties of a parameter. Possible parameter attributes are described below.

If the annotation is to be applied to all overloads regardless of the parameters, the field can be omitted:

// Code
void foo();      // dangerous
void foo(int);   // dangerous
void foo(float); // dangerous

// Annotation
{
  ....
  "type": "function",
  "name": "foo",
  "attributes": [ "dangerous" ]
  ....
}

If an overload that takes no parameters is needed, specify an empty array explicitly:

// Code
void foo();      // dangerous
void foo(int);   // ok
void foo(float); // ok

// Annotation
{
  ....
  "type": "function",
  "name": "foo",
  "attributes": [ "dangerous" ],
  "params": []
  ....
}

Possible parameter attribute values

#	Attribute name	Attribute description
1	immutable	It indicates to the analyzer that the passed argument has not been modified after the function call. For example, the printf function has side effects (printing to stdout) but does not modify passed arguments.
2	not_null	It is valid only for nullable-type parameters. An argument in the "valid" state should be passed to the function.
3	unique_arg	The arguments passed should be different. For example, it doesn't make sense to pass two identical arguments to std::swap.
4	format_arg	The parameter denotes a format string. The analyzer checks the arguments according to the printf format specification.
5	pointer_to_free	A pointer by which memory is released in the function by using free. The pointer can be null.
6	pointer_to_gfree	A pointer by which memory is released in the function by using g_free. The pointer can be null.
7	pointer_to_delete	A pointer by which memory is released in the function by using 'operator delete'. The pointer can be null.
8	pointer_to_delete[]	A pointer by which memory is released in the function by using 'operator delete[]'. The pointer can be null.
9	pointer_to_unmap	A pointer by which memory is released in the function by using 'munmap'. The pointer can be null.

The "returns" field

The optional field. An object in which only the attributes field (an array of strings) can be used to specify the attributes of the return value.

Possible attribute values of the returned result

#	Attribute name	Attribute description
1	not_null	The function always returns an object of a nullable-type in the "valid" state.
2	maybe_null	The function may return an object of a nullable-type in the "invalid" state. An object should be checked before dereferencing.

The "template_params" field

The optional field. An array of strings that enables specifying the template parameters of the function. The field is required when template parameters are used in a function signature:

// Code
template <typename T1, class T2>
void MySwap(T1 &lhs, T2 &rhs);

// Annotation
{
  ....
  "template_params": [ "typename T1", "class T2" ],
  "name": "MySwap",
  "params": [
    { "type": "T1 &", attributes: [ "unique_arg" ] },
    { "type": "T2 &", attributes: [ "unique_arg" ] }
  ]
  ....
}

The "qualifiers" field

The optional field. It enables us to apply the annotation only to a member function with a specific set of cvref qualifiers. It's available only for nested annotations that have the type field set to "function". Along with name and params, the field specifies the signature of the non-static member function by which the analyzer compares the annotation with its declaration in the program code. The field takes an array of strings with the following possible values: "const", "volatile", "&", or "&&".

Example:

// Code
struct Foo
{
  void Bar();                // don't need to annotate this overload
  void Bar() const;          // want to annotate this overload
  void Bar() const volatile; // and this one
};

// Annotation
{
  ....
  "type": "record",
  "name": "Foo",
  "members": [
    {
      "type": "function",
      "name": "Bar",
      "qualifiers": [ "const" ]
    },
    {
      "type": "function",
      "name": "Bar",
      "qualifiers": [ "const", "volatile" ]
    }
  ]
  ....
}

If the annotation is to be applied to all qualified and unqualified versions, the field should be omitted:

// Code
struct Foo
{
  void Bar();       // want to annotate this overload
  void Bar() const; // and this one
};

// Annotation
{
  ....
  "type": "record",
  "name": "Foo",
  "members": [
    {
      "type": "function",
      "name": "Bar",
    }
  ]
  ....
}

If the annotation is to be applied only to the unqualified version, the field value should be an empty array:

// Code
struct Foo
{
  void Bar();       // want to annotate this overload
  void Bar() const; // but NOT this one
};

// Annotation
{
  ....
  "type": "record",
  "name": "Foo",
  "members": [
    {
      "type": "function",
      "name": "Bar",
      "qualifiers": []
    }
  ]
  ....
}

The "attributes" field

The optional field. It's an array of strings that sets the properties of an entity.

Possible function and constructor attributes

#	Attribute name	Attribute description	Note
1	pure	The function is pure.	A function is pure when it has no side effects, does not modify the passed arguments, and the result of the function is the same when it's called with the same set of arguments.
2	noreturn	The function does not return control to the caller function.
3	nodiscard	The result of the function should be used.
4	nullable_uninitialized	A custom nullable-type constructor initializes an object in the "invalid" state.
5	nullable_initialized	The custom nullable-type constructor initializes the object in the "valid" state.
6	nullable_checker	The function checks the state of the user nullable type. If the function returns true, the object is considered to be in a "valid" state; if not, it is "invalid". The result of the function is to be implicitly converted to the bool type.
7	nullable_getter	The function performs access to internal data of the user nullable type. The object must be in the "valid" state.
8	dangerous	The function is marked as dangerous, and the program code must not contain its call.	It can also be used to mark a function as deprecated.

An applicability table of different attributes with function annotations is below:

#	Attribute	Free function	Constructor	Member function
1	pure	✓	✕	✓
2	noreturn	✓	✕	✓
3	nodiscard	✓	✓	✓
4	nullable_uninitialized	✕	✓	✓
5	nullable_initialized	✕	✓	✓
6	nullable_checker	✕	✕	✓
7	nullable_getter	✕	✕	✓
8	dangerous	✓	✓	✓

JSON Schema

JSON Schema comes bundled with the distribution or is available at the link.

Examples

How to annotate user nullable type

Let's say there is a user nullable type as follows:

constexpr struct MyNullopt { /* .... */ } my_nullopt;

template <typename T>
class MyOptional
{
public:
  MyOptional();
  MyOptional(MyNullopt);

  template <typename U>
  MyOptional(U &&val);

public:
  bool HasValue() const;

  T& Value();
  const T& Value() const;

private:
  /* implementation */
};

Code notes:

The default constructor and the constructor of the MyNullopt type initialize the object in the "invalid" state.
The constructor template that takes a parameter of the U&& type initializes the object in the "valid" state.
The HasValue member function checks the state of a nullable-type object. If the object is in the "valid" state, true is returned; otherwise, false. The function does not change the state of a nullable-type object.
Overloads of Value member functions return the underlying object. Functions do not change the state of a nullable-type object.

Then the annotation of the class and its member functions looks as follows:

{
  "version": 1,
  "annotations": [
    {
      "type": "class",
      "name": "MyOptional",
      "attributes": [ "nullable" ],
      "members": [
        {
          "type": "ctor",
          "attributes": [ "nullable_uninitialized" ]
        },
        {
          "type": "ctor",
          "attributes": [ "nullable_uninitialized" ],
          "params": [
            {
              "type": "MyNullopt"
            }
          ]
        },
        {
          "type": "ctor",
          "template_params": [ "typename U" ],
          "attributes": [ "nullable_initialized" ],
          "params": [
            {
              "type": "U &&val"
            }
          ]
        },
        {
          "type": "function",
          "name": "HasValue",
          "attributes": [ "nullable_checker", "pure", "nodiscard" ]
        },
        {
          "type": "function",
          "name": "Value",
          "attributes": [ "nullable_getter", "nodiscard" ]
        }
      ]
    }
  ]
}

How to add an "always valid" contract for the nullable-type function parameter

Suppose the following code:

namespace Foo
{
  template <typaname CharT>
  size_t my_strlen(const CharT *ptr);
}

The Foo::my_strlen function has the following properties:

The first parameter must always be non-zero, i.e., in the "valid" state.
The function is pure and does not modify anything.

Then the function annotation looks as follows:

{
  "version": 1,
  "annotations":
  [
    {
      "type": "function",
      "name": "Foo::my_strlen",
      "attributes": [ "pure" ],
      "template_params": [ "typename CharT" ],
      "params": [
        {
          "type": "const CharT *",
          "attributes": [ "not_null" ]
        }
      ]
    }
  ]
}

How to mark up a user-formatted I/O function

Let's say there is the Foo::LogAtError function:

namespace Foo
{
  void LogAtError(const char *, ...);
}

It's known that:

It takes a format string as its first parameter according to the printf specification. The argument must not be null.
The arguments matching the format string, starting with the second one, are passed.
The function does not modify the passed arguments.
The function does not return control after it is called.

The analyzer can check if the passed arguments match the format string. Also, it can determine that the code is unreachable after calling the function. To do this, we need to mark up the function as follows:

{
  "version": 1,
  "annotations": [
    {
      "type": "function",
      "name": "Foo::LogAtError",
      "attributes": [ "noreturn" ],
      "params": [
        {
          "type": "const char *",
          "attributes" : [ "format_arg", "not_null", "immutable" ]
        },
        {
          "type": "...",
          "attributes": [ "immutable" ]
        }
      ]
    }
  ]
}

How to use a wildcard character to annotate multiple overloads

Suppose that, in the previous example, a programmer added some overloads to the Foo::LogAtExit function:

namespace Foo
{
  void LogAtExit(const     char *fmt, ...);
  void LogAtExit(const  char8_t *fmt, ...);
  void LogAtExit(const  wchar_t *fmt, ...);
  void LogAtExit(const char16_t *fmt, ...);
  void LogAtExit(const char32_t *fmt, ...);
}

In this case, it's not necessary to write annotations for all overloads. One, using the wildcard character, is enough:

{
  "version": 1,
  "annotations": [
    {
      "type": "function",
      "name": "Foo::LogAtExit",
      "attributes": [ "noreturn" ],
      "params": [
        {
          "type": "?",
          "attributes" : [ "format_arg", "not_null", "immutable" ]
        },
        {
          "type": "...",
          "attributes": [ "immutable" ]
        }
      ]
    }
  ]
}

How to mark a function as dangerous (or deprecated)

Suppose there are two overloads of the Foo::Bar function:

namespace Foo
{
  void Bar(int i);
  void Bar(double d);
}

We need to forbid the first overload. To do this, mark up the function as follows:

{
  "version": 1,
  "annotations": [
    {
      "type": "function",
      "name": "Foo::Bar",
      "attributes": [ "dangerous" ],
      "params": [
        {
          "type": "int"
        }
      ]
    }
  ]
}