Mikhail Gelvikh

Oct 04 2024

Tags:

#StaticAnalysis

User annotations for PVS-Studio

Oct 04 2024

Author: Mikhail Gelvikh

Why does the analyzer need user code annotation?
Case in point
Facilitate your workflow
Conclusion

How often does your static analyzer struggle to identify the source code nuances? It probably happens more often than you'd like, doesn't it? In the article, our team shares how we've dealt with it. Let us introduce you to our new user annotation mechanism!

Why does the analyzer need user code annotation?

This is a good question. It seems like the analyzer should scan the entire source code without any hints. And you're 100% right. We're aligned on this—if the analyzer can handle something by itself, it's better not to burden the user with it. That's why the analyzer already knows about the most frequently used libraries and patterns. Unfortunately, it's not always possible to reliably and automatically spot any facts about the code.

Author's note. Of course, we have relevant tasks in our TODOs: to tackle the halting problem, to analyze code comments on ~~kludges~~ minor architecture solutions, and to read developers' minds. For obvious reasons, progress on these is moving extremely slowly.

In reality, we have to make a few compromises to boost the analysis. The manual annotation is the easiest and most reliable way to "introduce" your code to the analyzer.

For example, to handle one of the most frequent requests on how to correctly handle the nullable classes (various wrapper classes, std::optional, etc.), the analyzer needs to deduce the following facts:

a class stores some resource or a reference to it;
to access the resource, the wrapper state should be checked.

The issue gets worse if:

there are functions that can change the wrapper state;
there are different initialization options;
there are non-standard functions that are used for checking (something other than operator bool, the IsValid variations, and so on);
there are functions that we can call without checking.

A lot of logic, heuristics, and the analysis time with a non-guaranteed result. But we can instead just tell the analyzer directly what we want!

Starting with versions 7.31 (for C and C++) and 7.32 (for C#), PVS-Studio has introduced the option to manually annotate functions and classes via separate JSON files.

You could ask why we made the annotation mechanism in separate files and in JSON format when we could have just made them directly in the code. And it's more convenient to keep the code and its annotations together, isn't it? Yes, we agree, but as always there are reasons for this:

many customers use third-party libraries and components in their work, the code of which they can't modify;
different teams often want to use different annotations for the same code;
we've already had the minimal annotations in code.

Of course, we plan to enhance the annotation mechanism in code, but that's another story. We've dug up many interesting insights while designing this feature, so let us share them with you! Stay tuned for updates :)

Case in point

Even now, the new custom annotation system enables developers to set the following parameters:

For types:

similarity to standard classes (for example, if the class has an interface similar to one of the standard containers);
semantics (cheap-to-copy, copy-on-write, etc.);
other properties (the nullable type).

For functions:

Function properties:
- the function does not return control;
- the function is declared as deprecated;
- the function is checked whether it's pure;
- the function result should be used, etc.
The function parameter properties:
- should differ from other parameters;
- the nullable object should be valid;
- is a source/sink of taint data, etc.
Parameter constraints:
- allow and disallow the passing of certain integers.
Return value properties:
- taint data;
- the valid nullable object;
- etc.

Let's see what the annotation of the nullable-type template looks like. Yes, it's not the simplest case but shows the power of the annotation mechanism.

Let's say we have some code that looks like this:

constexpr struct MyNullopt { /* .... */ } my_nullopt;

template <typename T>
class MyOptional
{
public:
  MyOptional();
  MyOptional(MyNullopt);

  template <typename U>
  MyOptional(U &&val);

public:
  bool HasValue() const;

  T& Value();
  const T& Value() const;

private:
  /* implementation */
};

Here are some clarifications on the code:

The default and MyNullopt constructors initialize the object in the "invalid" state.
The constructor template that takes a parameter of the U&& type initializes the object in the "valid" state.
The HasValue member function checks the object state. If the object is "valid", true is returned; otherwise, false. The function doesn't change the object state.
The Value member functions return the underlying object but don't change the object state.

Now we can write the following annotation:

{
  "version": 2,
  "annotations": [
    {
      "type": "class",
      "name": "MyOptional",
      "attributes": [ "nullable" ],
      "members": [
        {
          "type": "ctor",
          "attributes": [ "nullable_uninitialized" ]
        },
        {
          "type": "ctor",
          "attributes": [ "nullable_uninitialized" ],
          "params": [
            {
              "type": "MyNullopt"
            }
          ]
        },
        {
          "type": "ctor",
          "template_params": [ "typename U" ],
          "attributes": [ "nullable_initialized" ],
          "params": [
            {
              "type": "U &&val"
            }
          ]
        },
        {
          "type": "function",
          "name": "HasValue",
          "attributes": [ "nullable_checker", "pure", "nodiscard" ]
        },
        {
          "type": "function",
          "name": "Value",
          "attributes": [ "nullable_getter", "nodiscard" ]
        }
      ]
    }
  ]
}

Now the analyzer can warn us about the use of the dangerous nullable type:

Facilitate your workflow

We acknowledge that annotations in separate files cause issues. So, we also suggest a few enhancements to mitigate them.

Ready-to-use examples

To help you understand how to use the user annotation mechanism, we've prepared a list of examples for the most common scenarios—the markup of the format output functions (C++), marking functions as dangerous/deprecated (C++), etc. You can learn more in the documentation section.

JSON schemas

We support the versioning of the JSON schemas for each available language. These schemas let modern text editors and IDEs validate, suggest possible values, and show hints while editing.

When you write your own annotation file, add the $schema field to it and set value to schema for the required language. For example, the value for the C++ analyzer looks like this:

{

"version": 2,

"$schema": "https://files.pvs-studio.com/media/custom_annotations/v2/cpp-annotations.schema.json",

"annotations": [

{ .... }

]

}

This enables Visual Studio Code to provide hints when creating annotations:

You can find the up-to-date list of languages available for annotation and their schemas in the documentation.

Analyzer warnings

Not all issues can be detected when validating the JSON schemas. So, we created a special diagnostic rule, V019, that helps if something is wrong. This may be missing annotation files, parsing and annotation errors, etc.

Create it easy

We've designed the annotation mechanism, so that you can write as little as possible. For example, we've simplified the selection of function overloads in C++.

If you want the annotation to apply to all functions with this name, you can omit the params field when describing the function.

// Code
void foo();      // dangerous
void foo(int);   // dangerous
void foo(float); // dangerous

// Annotation
{
  ....
  "type": "function",
  "name": "foo",
  "attributes": [ "dangerous" ]
  ....
}

If you need to annotate the parameter-free function, just explicitly specify the following:

// Code
void foo();      // dangerous
void foo(int);   // ok
void foo(float); // ok

// Annotation
{
  ....
  "type": "function",
  "name": "foo",
  "attributes": [ "dangerous" ],
  "params": []
  ....
}

Make it flexible

In addition, you can also use wildcard characters. For instance, they help you omit any function parameters if they aren't relevant to the annotation. The following wildcard characters are available now:

The "*" character replaces 0 or more parameters of any type;
The "?" character replaces a parameter of any type.

Take a look how you can apply this, for example, to mark up the formatted I/O functions. Let's say you have a set of functions for outputting text:

namespace Foo
{
  void LogAtExit(const     char *fmt, ...);
  void LogAtExit(const  char8_t *fmt, ...);
  void LogAtExit(const  wchar_t *fmt, ...);
  void LogAtExit(const char16_t *fmt, ...);
  void LogAtExit(const char32_t *fmt, ...);
}

No need to annotate each function in this case. Just write one and replace the changing type of the first parameter with a wildcard character:

{
  "version": 2,
  "annotations": [
    {
      "type": "function",
      "name": "Foo::LogAtExit",
      "attributes": [ "noreturn" ],
      "params": [
        {
          "type": "?",
          "attributes" : [ "format_arg", "not_null", "immutable" ]
        },
        {
          "type": "...",
          "attributes": [ "immutable" ]
        }
      ]
    }
  ]
}

Conclusion

We're sure that our new user annotation mechanism simplifies developers' workflows and boosts the accuracy of the static code analysis! We'd love for you to experience it firsthand and see how it works. To get started, just download the analyzer using the link. A picture is worth a thousand words, as they say :)

#StaticAnalysis