Annotating C and C++ entities in JSON format
- Quick start
- Ways to register the annotation file
- Structure of the annotation file
- Type annotations
- Function annotations
- JSON Schema
- Examples
- How to annotate user nullable type
- How to add an "always valid" contract for the nullable-type function parameter
- How to mark up a user-formatted I/O function
- How to use a wildcard character to annotate multiple overloads
- How to mark a function as dangerous (or deprecated)
- How to mark a function as a source/sink of tainted data
The user annotation mechanism is a way of marking up types and functions in JSON format in order to provide the analyzer with additional information. Due to this information, the analyzer can find more errors in code. The mechanism works only for C and C++ languages.
Quick start
Let's say that the project requires that we forbid calling a function because it's unwanted:
void DeprecatedFunction(); // should not be used
void foo()
{
DeprecatedFunction(); // unwanted call site
}
In order for the analyzer to issue the V2016 warning in the place where this function is called, we should create a special JSON file with the following contents:
{
"version": 1,
"annotations": [
{
"type": "function",
"name": "DeprecatedFunction",
"attributes": [ "dangerous" ]
}
]
}
After that, just connect the file (you can find all the options for connecting it here):
//V_PVS_ANNOTATIONS, language: cpp, path: %path/to/annotations.json%
void DeprecatedFunction();
void foo()
{
DeprecatedFunction(); // <= V2016 will be issued here
}
Note. The V2016 diagnostic rule is disabled by default. In order for the analyzer to issue warnings, enable the diagnostic rule in the settings.
Ways to register the annotation file
You can learn more about how to enable the annotation file in this documentation.
Structure of the annotation file
The file content is a JSON object consisting of two mandatory fields: version and annotations.
The version field takes an integer-type value and specifies the version of the mechanism. Depending on the value, the markup file can be processed differently. Currently, only one value is supported — 1.
The annotations field is an array of "annotation" objects:
{
"version": 1,
"annotations":
[
{
...
},
{
...
}
]
}
Annotations can be of two types:
- type annotations;
- function annotations;
If an annotation is declared directly in the annotations array, it's a top-level annotation. Otherwise, it is a nested annotation.
Type annotations
The type annotation object consists of the following fields.
The "type" field
The mandatory field. It takes a string with one of the values: "record," "class," "struct," or "union". The last three options are aliases for "record" and have been added for convenience.
The "name" field
The mandatory field. It takes a string with the fully qualified name of an entity. The analyzer searches for this entity starting from the global scope. If the entity is in the global scope, the "::" character at the beginning of the name can be omitted.
The "members" field
The optional field. An array of nested annotation objects.
The "attributes" field
The optional field. An array of strings that specifies the properties of an entity. Attributes available for type annotations are as follows:
Smart pointers
- "unique_ptr" — the type has the std::unique_ptr interface;
- "shared_ptr" — the type has the std::shared_ptr interface;
- "auto_ptr" — the type has the std::auto_ptr interface;
Containers
- "string" — the type has the std::basic_string interface;
- "string_view" — the type has the std::basic_string_view interface;
- "array" — the type has the std::array interface;
- "vector" — the type has the std::vector interface;
- "map" — the type has the std::map interface;
- "set" — the type has the std::set interface;
- "list" — the type has the std::list interface;
- "unordered" — in combination with "set" or "map" sets the type interface to std::unordered_set or std::unordered_map, respectively.
- "multi" — in combination with "set" or "map" sets the type interface to std::multiset or std::multimap, respectively. If "unordered" is included, the type is given the std::unordered_multiset or std::unordered_multimap semantics.
Other types
- "nullable" — the type has the semantics of a nullable type. Objects of these types can have one of two states: "valid" or "invalid". Accessing an object in the "invalid" state results in an error. Pointers and std::optional are examples of such types.
Semantics
- "cheap_to_copy" — an object of the type can be passed to a function by copy with zero overhead;
- "expensive_to_copy" — an object of the type should be passed to a function only by pointer/reference;
- "copy_on_write" — the type has the copy-on-write semantics.
Function annotations
The function annotation object consists of the following fields:
The "type" field
The mandatory field. It takes a string with the function value. Also, the ctor value is available for nested function annotations (in the members field of type annotations). It indicates that a custom-type constructor is being annotated.
The "name" field
It takes a string with a function name. The field is mandatory if type has the "function" value, otherwise it should be omitted. The analyzer searches for the annotated entity by this name, starting from the global scope.
For top-level annotations, the fully qualified name is specified. For nested annotations, the unqualified name is specified.
If the function is in the global scope, the scope resolution operator ("::") at the beginning of the name can be omitted.
The "params" field
The optional field. An array of objects that describes formal parameters. Along with name, this field specifies the signature of the function by which the analyzer compares the annotation with its declaration in the program code. In the case of member functions, the analyzer also considers the qualifiers field.
Each object contains the following fields:
- "type" (mandatory) — a type of a formal parameter as a string. For example, the first formal parameter of the memset function has the void * type. That's what should be written in the string. It's possible to omit unnecessary parameters and annotate several function overloads with a single annotation. For this purpose, use a wildcard character:
- The "*" character means that 0 or more parameters of any type can take its place. It should be the last in the list of parameters.
- The "?" character denotes that a parameter of any type can take its place.
- "attributes" (optional) — an array of strings that specifies the properties of a parameter. Possible parameter attributes are described below.
- "constraint" (optional) — an object that contains the data about the parameter constraints. If the analyzer detects the possible violation of constraints, a user gets the V1108 warning. The possible object fields are described further in the documentation.
If the annotation is to be applied to all overloads regardless of the parameters, the field can be omitted:
// Code
void foo(); // dangerous
void foo(int); // dangerous
void foo(float); // dangerous
// Annotation
{
....
"type": "function",
"name": "foo",
"attributes": [ "dangerous" ]
....
}
If an overload that takes no parameters is needed, specify an empty array explicitly:
// Code
void foo(); // dangerous
void foo(int); // ok
void foo(float); // ok
// Annotation
{
....
"type": "function",
"name": "foo",
"attributes": [ "dangerous" ],
"params": []
....
}
Possible parameter attribute values
# |
Attribute name |
Attribute description |
---|---|---|
1 |
immutable |
It indicates to the analyzer that the passed argument has not been modified after the function call. For example, the printf function has side effects (printing to stdout) but does not modify passed arguments. |
2 |
not_null |
It is valid only for nullable-type parameters. An argument in the "valid" state should be passed to the function. |
3 |
unique_arg |
The arguments passed should be different. For example, it doesn't make sense to pass two identical arguments to std::swap. |
4 |
format_arg |
The parameter denotes a format string. The analyzer checks the arguments according to the printf format specification. |
5 |
pointer_to_free |
A pointer by which memory is released in the function by using free. The pointer can be null. |
6 |
pointer_to_gfree |
A pointer by which memory is released in the function by using g_free. The pointer can be null. |
7 |
pointer_to_delete |
A pointer by which memory is released in the function by using 'operator delete'. The pointer can be null. |
8 |
pointer_to_delete[] |
A pointer by which memory is released in the function by using 'operator delete[]'. The pointer can be null. |
9 |
pointer_to_unmap |
A pointer by which memory is released in the function by using 'munmap'. The pointer can be null. |
10 |
taint_source |
Data returned via a parameter is from a tainted source. |
11 |
taint_sink |
Data passed via a parameter can lead to vulnerability exploitation if it is obtained from a tainted source. |
Possible fields of parameter constraints
All constraint fields are optional. A list of fields that set certain conditions of constraints is provided below.
Here are the fields that set the list of allowed and disallowed values of the parameter:
- The allowed field is string arrays. It sets the list of allowed integral values that the function parameter can receive. If values are not on this list, they are disallowed by default.
- The disallowed field is string arrays. It sets the list of disallowed integral values that the function parameter can receive. If values are not on this list, they are allowed by default.
Each string in the array is an interval from the minimum to the maximum bounds, inclusively. The string with intervals is set in the format "x..y", where 'x' and 'y' are the left and right bounds, respectively. A user can remove one of the bounds. Then the string will look like this: "x.." or "..y". The interval is from 'x' to plus infinity and from minus infinity to 'y', respectively.
Here are examples of intervals:
- "0..10" is a string that sets the interval from 0 to 10, inclusively.
- "..10" is a string that sets the interval from minus infinity to 10, inclusively.
- "0.." is a string that sets the interval from 0 to plus infinity.
An array can contain multiple intervals. When the analyzer reads the intervals, it normalizes all intervals in the array. The process merges overlapping and adjacent intervals, placing them in ascending order.
If the allowed and disallowed fields are set at the same time, the analyzer subtracts the "disallowed" intervals from "allowed" to obtain a set of allowed values. If the values in the disallowed field completely cover the values in the allowed field, the analyzer issues the V019 warning.
The "returns" field
The optional field. An object in which only the attributes field (an array of strings) can be used to specify the attributes of the return value.
Possible attribute values of the returned result
# |
Attribute name |
Attribute description |
---|---|---|
1 |
not_null |
The function always returns an object of a nullable-type in the "valid" state. |
2 |
maybe_null |
The function may return an object of a nullable-type in the "invalid" state. An object should be checked before dereferencing. |
3 |
taint_source |
The function may return data from a tainted source. |
The "template_params" field
The optional field. An array of strings that enables specifying the template parameters of the function. The field is required when template parameters are used in a function signature:
// Code
template <typename T1, class T2>
void MySwap(T1 &lhs, T2 &rhs);
// Annotation
{
....
"template_params": [ "typename T1", "class T2" ],
"name": "MySwap",
"params": [
{ "type": "T1 &", attributes: [ "unique_arg" ] },
{ "type": "T2 &", attributes: [ "unique_arg" ] }
]
....
}
The "qualifiers" field
The optional field. It enables us to apply the annotation only to a member function with a specific set of cvref qualifiers. It's available only for nested annotations that have the type field set to "function". Along with name and params, the field specifies the signature of the non-static member function by which the analyzer compares the annotation with its declaration in the program code. The field takes an array of strings with the following possible values: "const", "volatile", "&", or "&&".
Example:
// Code
struct Foo
{
void Bar(); // don't need to annotate this overload
void Bar() const; // want to annotate this overload
void Bar() const volatile; // and this one
};
// Annotation
{
....
"type": "record",
"name": "Foo",
"members": [
{
"type": "function",
"name": "Bar",
"qualifiers": [ "const" ]
},
{
"type": "function",
"name": "Bar",
"qualifiers": [ "const", "volatile" ]
}
]
....
}
If the annotation is to be applied to all qualified and unqualified versions, the field should be omitted:
// Code
struct Foo
{
void Bar(); // want to annotate this overload
void Bar() const; // and this one
};
// Annotation
{
....
"type": "record",
"name": "Foo",
"members": [
{
"type": "function",
"name": "Bar",
}
]
....
}
If the annotation is to be applied only to the unqualified version, the field value should be an empty array:
// Code
struct Foo
{
void Bar(); // want to annotate this overload
void Bar() const; // but NOT this one
};
// Annotation
{
....
"type": "record",
"name": "Foo",
"members": [
{
"type": "function",
"name": "Bar",
"qualifiers": []
}
]
....
}
The "attributes" field
The optional field. It's an array of strings that sets the properties of an entity.
Possible function and constructor attributes
# |
Attribute name |
Attribute description |
Note |
---|---|---|---|
1 |
pure |
The function is pure. |
A function is pure when it has no side effects, does not modify the passed arguments, and the result of the function is the same when it's called with the same set of arguments. |
2 |
noreturn |
The function does not return control to the caller function. |
|
3 |
nodiscard |
The result of the function should be used. |
|
4 |
nullable_uninitialized |
A custom nullable-type member function puts the object in the "invalid" state. |
|
5 |
nullable_initialized |
A custom nullable-type member function puts the object in the "valid" state. |
|
6 |
nullable_checker |
The function checks the state of the user nullable type. If the function returns true, the object is considered to be in a "valid" state; if not, it is "invalid". The result of the function is to be implicitly converted to the bool type. |
|
7 |
nullable_getter |
The function performs access to internal data of the user nullable type. The object must be in the "valid" state. |
|
8 |
dangerous |
The function is marked as dangerous, and the program code must not contain its call. |
It can also be used to mark a function as deprecated. In order for the analyzer to issue warnings, enable the V2016 diagnostic rule in the settings. |
An applicability table of different attributes with function annotations is below:
# |
Attribute |
Free function |
Constructor |
Member function |
---|---|---|---|---|
1 |
pure |
✓ |
✕ |
✓ |
2 |
noreturn |
✓ |
✕ |
✓ |
3 |
nodiscard |
✓ |
✓ |
✓ |
4 |
nullable_uninitialized |
✕ |
✓ |
✓ |
5 |
nullable_initialized |
✕ |
✓ |
✓ |
6 |
nullable_checker |
✕ |
✕ |
✓ |
7 |
nullable_getter |
✕ |
✕ |
✓ |
8 |
dangerous |
✓ |
✕ |
✓ |
JSON Schema
JSON Schema comes bundled with the distribution or is available at the link.
JSON schemas are supplied with the distribution and are also available at the links below:
Examples
How to annotate user nullable type
Let's say there is a user nullable type as follows:
constexpr struct MyNullopt { /* .... */ } my_nullopt;
template <typename T>
class MyOptional
{
public:
MyOptional();
MyOptional(MyNullopt);
template <typename U>
MyOptional(U &&val);
public:
bool HasValue() const;
T& Value();
const T& Value() const;
private:
/* implementation */
};
Code notes:
- The default constructor and the constructor of the MyNullopt type initialize the object in the "invalid" state.
- The constructor template that takes a parameter of the U&& type initializes the object in the "valid" state.
- The HasValue member function checks the state of a nullable-type object. If the object is in the "valid" state, true is returned; otherwise, false. The function does not change the state of a nullable-type object.
- Overloads of Value member functions return the underlying object. Functions do not change the state of a nullable-type object.
Then the annotation of the class and its member functions looks as follows:
{
"version": 1,
"annotations": [
{
"type": "class",
"name": "MyOptional",
"attributes": [ "nullable" ],
"members": [
{
"type": "ctor",
"attributes": [ "nullable_uninitialized" ]
},
{
"type": "ctor",
"attributes": [ "nullable_uninitialized" ],
"params": [
{
"type": "MyNullopt"
}
]
},
{
"type": "ctor",
"template_params": [ "typename U" ],
"attributes": [ "nullable_initialized" ],
"params": [
{
"type": "U &&val"
}
]
},
{
"type": "function",
"name": "HasValue",
"attributes": [ "nullable_checker", "pure", "nodiscard" ]
},
{
"type": "function",
"name": "Value",
"attributes": [ "nullable_getter", "nodiscard" ]
}
]
}
]
}
How to add an "always valid" contract for the nullable-type function parameter
Suppose the following code:
namespace Foo
{
template <typaname CharT>
size_t my_strlen(const CharT *ptr);
}
The Foo::my_strlen function has the following properties:
- The first parameter must always be non-zero, i.e., in the "valid" state.
- The function is pure and does not modify anything.
Then the function annotation looks as follows:
{
"version": 1,
"annotations":
[
{
"type": "function",
"name": "Foo::my_strlen",
"attributes": [ "pure" ],
"template_params": [ "typename CharT" ],
"params": [
{
"type": "const CharT *",
"attributes": [ "not_null" ]
}
]
}
]
}
How to mark up a user-formatted I/O function
Let's say there is the Foo::LogAtError function:
namespace Foo
{
void LogAtError(const char *, ...);
}
It's known that:
- It takes a format string as its first parameter according to the printf specification. The argument must not be null.
- The arguments matching the format string, starting with the second one, are passed.
- The function does not modify the passed arguments.
- The function does not return control after it is called.
The analyzer can check if the passed arguments match the format string. Also, it can determine that the code is unreachable after calling the function. To do this, we need to mark up the function as follows:
{
"version": 1,
"annotations": [
{
"type": "function",
"name": "Foo::LogAtError",
"attributes": [ "noreturn" ],
"params": [
{
"type": "const char *",
"attributes" : [ "format_arg", "not_null", "immutable" ]
},
{
"type": "...",
"attributes": [ "immutable" ]
}
]
}
]
}
How to use a wildcard character to annotate multiple overloads
Suppose that, in the previous example, a programmer added some overloads to the Foo::LogAtExit function:
namespace Foo
{
void LogAtExit(const char *fmt, ...);
void LogAtExit(const char8_t *fmt, ...);
void LogAtExit(const wchar_t *fmt, ...);
void LogAtExit(const char16_t *fmt, ...);
void LogAtExit(const char32_t *fmt, ...);
}
In this case, it's not necessary to write annotations for all overloads. One, using the wildcard character, is enough:
{
"version": 1,
"annotations": [
{
"type": "function",
"name": "Foo::LogAtExit",
"attributes": [ "noreturn" ],
"params": [
{
"type": "?",
"attributes" : [ "format_arg", "not_null", "immutable" ]
},
{
"type": "...",
"attributes": [ "immutable" ]
}
]
}
]
}
How to mark a function as dangerous (or deprecated)
Suppose there are two overloads of the Foo::Bar function:
namespace Foo
{
void Bar(int i);
void Bar(double d);
}
We need to forbid the first overload. To do this, mark up the function as follows:
{
"version": 1,
"annotations": [
{
"type": "function",
"name": "Foo::Bar",
"attributes": [ "dangerous" ],
"params": [
{
"type": "int"
}
]
}
]
}
How to mark a function as a source/sink of tainted data
Let's say there is a function that returns external data via the out parameter and return value.
std::string ReadStrFromStream(std::istream &input, std::string &str)
{
....
input >> str;
return str;
....
}
To mark the function as a source of tainted data, do the following:
{
"version": 1,
"annotations": [
{
"type": "function",
"name": "ReadStrFromStream",
"params": [
{
"type": "std::istream &input"
},
{
"type": "std::string &str",
"attributes": [ "taint_source" ]
}
],
"returns": { "attributes": [ "taint_source" ] }
}
]
}
Let's assume there is a function where some vulnerability can be exploited if tainted data is put into it.
void DoSomethingWithData(std::string &str)
{
.... // Some vulnerability
}
To mark the function as a sink of tainted data, add the following annotation:
{
"version": 1,
"annotations": [
{
{
"type": "function",
"name": "DoSomethingWithData",
"params": [ { "type": "std::string &str",
"attributes": [ "taint_sink" ] }]
}
}
]
}