Artem Rovenskii

Aug 08 2022

Tags:

#CSharp #StaticAnalysis

PVS-Studio's data flow analysis untangles more and more related variables

Aug 08 2022

Author: Artem Rovenskii

What are related variables?
Synthetic examples
How do we test this?
Has it got better?
- Additional
- Missing
Conclusion

This is the second article about related variables and how the PVS-Studio analyzer deals with them. This time, let's talk about how we enhanced the mechanism we created, and what problems of the analyzer users this upgrade solved. We will also take a look at examples of related variables in real projects.

0976_Related_variables_upgrade/image1.png

What are related variables?

When we talk about related variables, we mean cases when the value of one variable depends on the value of another one. For example, the result of a comparison of one variable to null can be written to a boolean variable.

var variable = GetPotentialNull();
bool flag = variable != null;

In this case, the check of flag will at the same time be the check of variable.

Variables can be related in many ways. We will review several examples of such relationships below. Now let's think — how might such relationships between variables confuse the static analyzer?

The thing is that PVS-Studio uses the data flow analysis technology to track possible expression values. If the variable in the condition is checked for not being null, the analyzer understands — the variable is definitely not storing a null reference in the then branch.

The situation is much more complicated when checking the value of one variable implies an implicit check of another one. If data flow analysis cannot handle such checks correctly, the analyzer will make incorrect assumptions about possible variable values. This leads to false positives.

A false positive is a term that refers to the analyzer's warning issued for error-free code. Related variables are one of the causes of false positives.

Here's an example. First, the variable is checked for not being null. After that, the variable is dereferenced.

public void Test()
{
  var variable = GetPotentialNull();
  bool check = variable != null;
  if (check)
  {
    _ = variable.GetHashCode(); // <=
  }
}

If the analyzer issues a warning on the commented line, this warning is a false positive.

False positives make it difficult to read the analyzer's report. In some cases, such warnings persuade developers to add unnecessary or incorrect code fixes. You can learn more about false positives and the need to battle against them in the following article.

And that's not all yet! If the analyzer does not have information about a variable's value, it may fail to issue a warning. Therefore, the bug will not be detected early enough :(.

In April, we posted an article on related variables and how PVS-Studio supports them. The post turned out to be a long read. My teammate described some of the analyzer's mechanisms and showed examples of variable relationships. He also told the story why we decided to support related variables and the results of our work. So, why don't you read the previous article first so that you have the full picture of what's going on in this one?

This time we decided to support cases of relationships formed between variables with the help of the ternary operator and the if...else construction. And, if you are now reading this article, we were able to do that :).

Synthetic examples

Why is it difficult for the analyzer to handle related variables? The easiest way to figure this out is to look at synthetic code examples. A little later we will look at code examples from real projects.

public void TestRelations(bool condition)
{
  object variable = condition ? "notNull" : GetPotentialNull();
  if (condition)
    _ = variable.GetHashCode();
}

The method that can return null:

private static string GetPotentialNull()
{
  return random.NextDouble() > 0.5 ? "str" : null;
}

Previously, PVS-Studio issued a false positive about the potential dereference of a null reference in the if statement's body. It is obvious that if condition is true, variable has a value other than null. That thing is obvious to us, but not to the analyzer. We did a couple of fixes and now the analyzer understands that condition is related to variable.

From the analyzer's point of view, the variable value depends on the condition value:

if condition == true, variable is definitely not null;
if condition == false, then a null reference may potentially be written to variable.

Thus, when the analyzer gets the value of condition, it also gets the value of variable. In this example, the analyzer gets values when it proceeds to the conditional expression. The condition variable in the if-branch is true, which means variable is definitely not null.

The next problem were the relationships that appeared due to the if statement. Here's a simple case:

public void TestRelations2(bool condition)
{
  object variable;
  if (condition)
    variable = "notNull";
  else
    variable = GetPotentialNull();

  if (condition)
    _ = variable.GetHashCode();
}

PVS-Studio issued a warning that the null reference could be dereferenced. The idea here is the same as in example with the ternary operator. In the second if, variable is not null. Now PVS-Studio supports this type of variable relationships.

How do we test this?

We test the work of the analyzer not only on synthetic code, but on real code as well. For this purpose, we use a special set of open-source projects. The testing process includes several stages:

we analyze these projects with stable version of the analyzer and generate reports for each project;
then we add fixes to the analyzer code, and generate new reports;
next, we compare new and old reports and generate diff files.

As a result, we get a report with two types of records: missing — a warning disappeared, additional — a new warning appeared.

Let me point out that each warning (new one or one that disappeared) has to be reviewed. When skimming through the results, I asked myself almost every time the analyzer issued a warning: is this a good warning? Was it supposed to disappear or appear? How did the analyzer understand what is what?

Has it got better?

We wanted to "teach" the analyzer deal with related variables to minimize the number of false-positives. However, the new enhancement helped not only to remove false-positive warnings, but also to add good ones. The PVS-Studio analyzer now detects more related variables and finds even more potential bugs. Developers may not have thought of the relationships between variables, or understood them, or simply could not notice them. Developers edit their own code. However, sometimes they debug other people's code. Fixing one code line can cause issues in project because variables are related in some code fragment. Static analysis comes to the rescue in this case.

Let's not waste time and get to the point.

Additional

First, let's take look at warnings that appeared when PVS-Studio supported new related variables.

Issue 1

The first warning in question was issued for the SpaceEngineers project code.

public bool RemovePilot()
{
  bool usePilotOriginalWorld = false;
  ....
  Vector3D? allowedPosition = null;
  if (!usePilotOriginalWorld)
  {
    allowedPosition = FindFreeNeighbourPosition();

    if (!allowedPosition.HasValue)
      allowedPosition = PositionComp.GetPosition();
  }

  RemovePilotFromSeat(m_pilot);
  EndShootAll();

  if (usePilotOriginalWorld || allowedPosition.HasValue)  // <=
  {
    ....
  }
}

V3022 Expression 'usePilotOriginalWorld || allowedPosition.HasValue' is always true. MyCockpit.cs 666

The analyzer message says that the usePilotOriginalWorld || allowedPosition.HasValue expression always has the true value. Let's figure out why this is so.

Look a little higher up the code. We see that if the usePilotOriginalWorld variable is false, the return value of the FindFreeNeighbourPosition method is assigned to the allowedPosition variable. The method returns a nullable struct.

If so, two options are possible:

allowedPosition.HasValue is true;
allowedPosition.HasValue is false. In this case, the result of the GetPosition method call is assigned to allowedPosition. The method returns the usual struct, so HasValue of allowedPosition will definitely be true.

The GetPosition method:

public Vector3D GetPosition()
{
  return this.m_worldMatrix.Translation;
}

Thus, if the usePilotOriginalWorld variable is false, the nullable struct will always be written to allowedPosition. And the HasValue property of the struct will be true.

There are two options:

if usePilotOriginalWorld is true, the condition is true;
if usePilotOriginalWorld is false, allowedPosition.HasValue returns true and the condition is also true.

By the way, the analyzer issued another warning for the same method.

if (usePilotOriginalWorld || allowedPosition.HasValue)
{
  ....
  return true;
}
return false;    // <=

V3142 Unreachable code detected. It is possible that an error is present. MyCockpit.cs 728

Now the analyzer knows that this condition is always true. There is a return operator at the end of the condition. Therefore, return false is unreachable code. Is this really what the developer intended?

Issue 2

Another new warning appeared in a report for the... PVS-Studio project. Yes, we upgraded the mechanism and found the defect in our own product — thanks to night tests. During night tests, the PVS-Studio is looking for errors in PVS-Studio.

private static bool? IsTrivialProperty_internal(....)
{
  AssignmentExpressionSyntax setBody = null;
  if (!checkOnlyRead)
  {
    var setBodyFirst = setAccessorBody?.ChildNodes().FirstOrDefault();
    setBody = ....;
    if (setBody == null)
      return false;
    ....
  }

  getValue = ....;

  try
  {
    if (checkOnlyRead)
    {
      return IsTrivialGetterField(model, ref getValue, maybeTrue);
    }
    else
    {
      ExpressionSyntax setValue = setBody?.Left.SkipParenthesize();    // <=
      ....
    }
  } 
  catch (ArgumentException)
  {....}
}

V3022 Expression 'setBody' is always not null. The operator '?.' is excessive. TypeUtils.cs 309

The analyzer's warning says that at the time of receiving the value of the Left property, the setBody variable is never null. Let's see why.

If we are in the else branch, checkOnlyRead has the false value. Let's go a little higher up the code to the very first if. As you can see, if the checkOnlyRead value is false, setBody == null is checked. If setBody == null expression is true, the exit from the method occurs, and the execution thread will not reach the next if. Therefore, if checkOnlyRead is false, the setBody variable cannot be null.

Thus, the '?.' operator is excessive here and needs to be removed. And we removed it :).

Issue 3

This warning that appeared in the Umbraco project got me to thinking. At first I even thought it was a false positive.

private PublishResult CommitDocumentChangesInternal(....)
{
  ....
  if (unpublishing)
  {
    ....                
    if (content.Published)
    {
      unpublishResult = StrategyCanUnpublish(....);
      if (unpublishResult.Success)
      {
        unpublishResult = StrategyUnpublish(....);
      }
      else{....}
    } 
    else
    {
      throw new InvalidOperationException("Concurrency collision.");
    }
  }
  ....
  if (unpublishing)
  {
    if (unpublishResult?.Success ?? false)                       // <=
    {
      ....
    }
    ....
  }
  ....
}

V3022 Expression 'unpublishResult' is always not null. The operator '?.' is excessive. ContentService.cs 1553

The analyzer considers the operator '?.' redundant. Why? The Success property is only accessed when the unpublishing variable is true. Let's see how the method's code would be executed in this case.

A little higher up the code, we see the same condition — we know it is supposed to be true. We stumble upon if (content.Published) in this condition. Let's assume that the property will return true, because otherwise we will get an exception. In this condition, the unpublishResult local variable is assigned with the method's return value in two cases. Both calls always return values other than null.

The StrategyCanUnpublish method:

private PublishResult StrategyCanUnpublish(....)
{
  if (scope.Notifications.PublishCancelable(....)
  {
    ....
    return new PublishResult(....);
  }
  return new PublishResult(....);
}

The StrategyUnpublish method:

private PublishResult StrategyUnpublish(....)
{
  var attempt = new PublishResult(....);
  if (attempt.Success == false)
  {
    return attempt;
  }
  ....
  return attempt;
}

It turns out that if the unpublishing variable is true, two options are possible:

an exception is thrown;
a value other than null is assigned to the unpublishResult variable.

So the property can be accessed without checking for null. Well, I hope no one is confused.

Did you notice that the '??' operator in the same fragment doesn't make sense either? The analyzer issued the message:

V3022 Expression 'unpublishResult?.Success' is always not null. The operator '??' is excessive. ContentService.cs 1553

Missing

The following false positives disappeared after we supported related variables.

Issue 1

The first example is a code fragment from the Unity project:

public void DoGUI(....)
{
  using (var iter = fetchData ? new ProfilerFrameDataIterator() : null)
  {
    int threadCount = fetchData ? iter.GetThreadCount(frameIndex) : 0; // <=
    iter?.SetRoot(frameIndex, 0);
    ....
  }
}

V3095 The 'iter' object was used before it was verified against null. Check lines: 2442, 2443. ProfilerTimelineGUI.cs 2442

PVS-Studio used to generate a warning saying that iter is being used first and then it is checked for null on the next line. Now the analyzer understands that the iter variable is definitely not null in the ternary operator's then branch. The thing is that iter is null only when the fetchData variable is false, and dereference is performed only if fetchData == true.

Issue 2

The following false positive issued on PascalABC.NET also disappeared:

private void ConvertTypeHeader(ICommonTypeNode value)
{
  ....
  TypeInfo ti = helper.GetTypeReference(value);
  bool not_exist = ti == null;
  ....
  if (not_exist)
  {
    ti = helper.AddType(value, tb);
  }
  if (value.type_special_kind == type_special_kind.array_wrapper)
  {
    ti.is_arr = true;        // <=
  }
  ....
}

V3080 Possible null dereference. Consider inspecting 'ti'. NETGenerator.cs 2391

The analyzer issued a warning about the potential dereference of the null reference. The warning disappeared, by the way, not because we supported new types of related variables that I described on synthetics examples above. My colleague described this type of relationship in the last article on related variables. So why is the warning only missing now? Well, it's simple — we slightly updated the general mechanism, and now analyzer can "understand" such relationships between variables.

There is the if (not_exist) check before the code line that triggered the analyzer. If the variable is true, ti is assigned with the return value of the AddType method.

public TypeInfo AddType(ITypeNode type, TypeBuilder tb)
{
  TypeInfo ti = new TypeInfo(tb);
  defs[type] = ti;
  return ti;
}

As we can see, this method does not return null.

I shortened this piece of code and now it is easy to understand. However, the source code fragment is separated by a large number of lines. A large number of code lines makes it difficult to see the relationship between variables — even for those who wrote the code. This false positive can confuse the programmer. It may even provoke the programmer into making real errors in code. This is how covering the relationships between variables can make the user lives easier.

Issue 3

I will combine the following two warnings issued for code of the PascalABC.NET project into one — it is better to review them together.

public common_type_node instance(....)
{
  class_definition cl_def = tc.type_dec.type_def as class_definition;
  template_type_name ttn = tc.type_dec.type_name as template_type_name;
  if (!tc.is_synonym)
  {
   if (cl_def == null)
   {
     throw new CompilerInternalError(....);
   }
   if (cl_def.template_args == null || cl_def.template_args.idents == null)
   {
     throw new CompilerInternalError(....);
   }
  }
  else
  {
    if (ttn == null)                                               // <=
    {
      throw new CompilerInternalError("No template name.");
    }
  }

  List<SyntaxTree.ident> template_formals = (tc.is_synonym) ?
    ttn.template_args.idents : cl_def.template_args.idents;        // <=
  
  if (template_formals.Count != ttn.template_args.idents.Count)
  {
    ....
  }
}

First let's look at the false positive that disappeared after the improvements.

V3125 The 'ttn' object was used after it was verified against null. Check lines: 18887, 18880. syntax_tree_visitor.cs 18887

The PVS-Studio analyzer noticed that the variable is checked for null first and then used without such a check. The ttn dereference occurs if the ternary operator's condition is true, i.e, tc.is_synonym has the true value. Above we see that there is the if construct where the !tc.is_synonim expression is checked.

In this case, tc.is_synonym has the true value — the control flow will proceed to the else branch. In the else branch, ttn is checked for null equality. If the ttn == null expression is true, an exception will be thrown — the execution thread will not reach the line where ttn is dereferenced.

The opposite occurs with cl_def. In this case, tc.is_synonym should be false. It turns out that both variables are dereferenced only in cases when they are not null.

The analyzer issued another warning that was no longer a false positive. And this new warning appeared one line below the last warning.

if (template_formals.Count != ttn.template_args.idents.Count)
{
  ....
}

V3125 The 'ttn' object was used after it was verified against null. Check lines: 18888, 18880. syntax_tree_visitor.cs 18888

This time the analyzer issued the same warning, but for a different code fragment because now PVS-Studio takes into account the relationships between variables and knows that the dereference of ttn in the ternary operator is safe. However, the next call to the ttn may cause an exception, since the call is performed unconditionally. The situation seems suspicious.

You may ask: "Why wasn't this warning issued before? As I mentioned above, instead of this particular warning, the analyzer issued a warning about the situation in the ternary operator. There is no point in issuing a bunch of warnings about the potential dereferencing of the same variable.

Conclusion

The main aim of the PVS-Studio development team is to enhance the analyzer and minimize the number of false positives. We are striving to improve the experience of using PVS-Studio and try to cover as many relationships between variables as possible. And we will continue to work in this direction.

If your projects contain many related variables, try out the new version of PVS-Studio and see how it can handle them. The new analyzer version is already available for download on our website.

Have a clean code!

#CSharp #StaticAnalysis