Valery Komarov

Sep 23 2021

Tags:

#CSharp #Knowledge

Creating Roslyn API-based static analyzer for C#

Sep 23 2021

Author: Valery Komarov

Static analyzers: what are they and why do we need them?
Creating an analyzer based on Visual Studio templates
Working with Roslyn
- Syntax tree
- Semantic model and symbols
Conclusion

After you read this article, you'll have the knowledge to create your own static analyzer for C#. With the help of the analyzer, you can find potential errors and vulnerabilities in the source code of your own and other projects. Are you intrigued? Well, let's get started.

0867_AnalyzingCodeWithRoslyn_API/image1.png

First, we will make your own static analyzer from the Visual Studio templates, without going deeper into the Roslyn API. This allows you to quickly get a working application and at least roughly see what analyzers can do.

And after that, we'll take a closer look at Roslyn API, as well as various tools that allow you to perform deeper and more complex analysis.

Static analyzers: what are they and why do we need them?

I'm sure that many developers have some mistakes that they or their friends often make when writing code. Most likely you would like to have a tool that detects such errors without your participation. This tool is called a static analyzer.

A static analyzer is an automatic tool that searches for potential errors and vulnerabilities in a program's source code without launching the app directly.

However, what if the existing analyzers can't find what you want? The answer is simple — you create your own utility or even an entire analyzer. C# developers are very lucky. Thanks to Roslyn they can create their own static analyzer. This is exactly what this article is about.

Creating an analyzer based on Visual Studio templates

All our further static analyzer development will be based on the .NET Compiler Platform aka Roslyn. Thanks to the capabilities this platform provides, we can use C# to create our own static analysis tools. Here, the word 'static' means that the analyzed code doesn't need to be executed.

Since our analyzer is based on Roslyn, we should install the .NET Compiler Platform SDK for Visual Studio. One of the ways to do so is to open the Visual Studio Installer and select 'Visual Studio extension development' in the 'Workloads' tab.

0867_AnalyzingCodeWithRoslyn_API/image2.png

After we install the necessary toolset, we can start creating the analyzer.

Open Visual Studio, click on 'Create a new project', select C#. Specify Windows as the platform and select Roslyn as the project type. After this we should see three projects templates. We are interested in two: 'Analyzer with Code Fix (.NET Standard)' and 'Standalone Code Analysis Tool'.

0867_AnalyzingCodeWithRoslyn_API/image3.png

Let's study each of the templates.

Description of the "Analyzer with Code Fix (.NET Standard)" project and an example of its use

After we create a new project with the 'Analyzer with Code Fix (.NET Standard)' template, we get a solution with five projects inside.

0867_AnalyzingCodeWithRoslyn_API/image4.png

Now we pay our full attention to the first project called TestAnalyzer. The main work on the analyzer is performed in this exact project. Open the TestAnalyzerAnalyzer.cs file. It already contains an example of a simple rule for a static analyzer. The rule searches through all type(class) names in source code. If a type's name has lowercase characters, the rule underlines it with a green wavy line. Besides, if you hover the cursor on the type name marked with a wavy line, you see a familiar light bulb symbol. It offers to automatically correct the type name and bring all the characters to uppercase:

0867_AnalyzingCodeWithRoslyn_API/image5.png

The easiest way to see it is to launch a new VS instance, which already has our sample diagnostic rule. You can use the same approach for debugging. To do this, mark TestAnalyzer.vsix as a startup project and launch the application. After that, a so-called experimental Visual Studio instance window will open. A new diagnostic rule is already added in this VS instance. It is integrated with the installed VSIX extension that has the name of our test analyzer.

0867_AnalyzingCodeWithRoslyn_API/image6.png

Next, we create a new console project in the running VS instance. In this project, we see that the Program class name is underlined with a green wavy line. This is the work of our diagnostic rule, as the class name contains lowercase characters.

Create an analyzer based on the "Standalone Code Analysis Tool" project template

Now, let's create a new project of the 'Standalone Code Analysis Tool' type. In fact, it's a project of an ordinary console application with links to the necessary DLLs for analysis:

Microsoft.CodeAnalysis.CSharp.Analyzers.dll;
Microsoft.CodeAnalysis.Analyzers.dll;
Microsoft.CodeAnalysis.Workspaces.MSBuild.dll;
etc.

We can delete all methods except Main, from the Program.cs file.

Let's write the analyzer in such a way that it can find if statements, in which true and false branches are identical. Would you say that no one makes such mistakes? Surprisingly, this is a fairly common pattern. Look at the list of similar errors found in open source projects.

Let's say we are not satisfied if code contains a fragment like this:

public static void MyFunc1(int count)
{
  if (count > 100)
  {
    Console.WriteLine("Hello world!");
  }
  else
  {
    Console.WriteLine("Hello world!");
  }
}

So, we make the analyzer write the line number and the full path to the source file into the log file. Let's move on to writing code:

static void Main(string[] args)
{
  if (args.Length != 2)
    return;

  string solutionPath = args[0];
  string logPath = args[1];

  StringBuilder warnings = new StringBuilder();

  const string warningMessageFormat =
    "'if' with equal 'then' and 'else' blocks is found in file {0} at line {1}";

  MSBuildLocator.RegisterDefaults();
  using (var workspace = MSBuildWorkspace.Create())
  {
    Project currProject = GetProjectFromSolution(solutionPath, workspace);
    foreach (var document in currProject.Documents)
    {
      var tree = document.GetSyntaxTreeAsync().Result;
      var ifStatementNodes = tree.GetRoot()
                                 .DescendantNodesAndSelf()
                                 .OfType<IfStatementSyntax>();

      foreach (var ifStatement in ifStatementNodes)
      {
        if (ApplyRule(ifStatement))
        {
          int lineNumber = ifStatement.GetLocation()
                                      .GetLineSpan()
                                      .StartLinePosition.Line + 1;

          warnings.AppendLine(String.Format(warningMessageFormat,
                                            document.FilePath,
                                            lineNumber));
        }
      }
    }

    if (warnings.Length != 0)
      File.AppendAllText(logPath, warnings.ToString());
  }
}

In our case, we use a console application and not a plugin for VS. Thus, we need to specify the path to the solution file, which we are going to analyze. In order to get the solution, we use the MSBuildWorkspace class and the OpenSolutionAsync method. In its turn, the Solution class contains the Projects property, which stores the project entities. In my case, I created a new solution with a single console application project. Therefore, to get the project entity, I wrote the following method:

static Project GetProjectFromSolution(String solutionPath, 
                                      MSBuildWorkspace workspace)
{
  Solution currSolution = workspace.OpenSolutionAsync(solutionPath)
                                   .Result;
  return currSolution.Projects.Single();
}

When reviewing the 'Analyzer with Code Fix' project template, we did not change the provided template code. Now, we want to write a rule according to which our analyzer would work. In this regard, it is necessary to clarify several theoretical points.

Roslyn itself stores source file representations as trees. Look at the following code example:

if (number > 0)
{

}

Roslyn presents it as a tree with the following structure:

0867_AnalyzingCodeWithRoslyn_API/image7.png

The tree nodes are blue in the picture. We will work with them specifically. In Roslyn, such trees are represented as the SyntaxTree object types. As you can see in the picture, the tree nodes differ and each of them is represented by its own type. For example, the IfStatement node is represented by the IfStatementSyntax class object. All the nodes in their inheritance hierarchy originate from the SyntaxNode class. And only then they add some specific properties and methods to the ones they've inherited from the SyntaxNode class. For example, the IfStatementSyntax contains the Condition property. Condition, in turn, is a node of the ExpressionSyntax type. This order is natural for an object that represents the conditional if construction.

When we work with the necessary tree nodes, we can create logic for rules, according to which our static analyzer will work. For example, in order to determine in which IfStatement operators the true and false branches are completely identical, you need to do the following:

Look through all the IfStatementSyntax type tree nodes;
When visiting a node, get the Statement property value of the IfStatementSyntax type object and save the value to the thenBody variable;
IfStatementSyntax has the Else property. Get its value and save it to the elseBody variable;
The Microsoft.CodeAnalysis.CSharp.dll assembly has the SyntaxFactory class, which contains the AreEquivalent method. Pass the thenBody and elseBody variables to this method and the let the AreEquivalent method compare objects in those variables.

Based on the algorithm described above, you can write the ApplyRule method:

static bool ApplyRule(IfStatementSyntax ifStatement)
{
  if (ifStatement?.Else == null)
    return false;

  StatementSyntax thenBody = ifStatement.Statement;
  StatementSyntax elseBody = ifStatement.Else.Statement;

  return SyntaxFactory.AreEquivalent(thenBody, elseBody);
}

As a result we were able to write a rule that would allow us to no longer worry about copy-paste errors in if-else branches.

Which project type to choose for writing your own static analyzer?

In my opinion, you should base your choice on what you want to get from the analyzer.

If you write a static analyzer that should monitor compliance with the code style your company requires, then use a project like 'Analyzer with Code Fix'. Your analyzer will be conveniently integrated into the VS environment as an extension. Developers will see the results of its work right when writing code. Besides, with API from Roslyn, you can turn on hints (how to change code) and even automatic correction.

If you plan to use the analyzer as a separate application and not as a plugin, choose the 'Standalone Code Analysis Tool' project. Let's say that you want to incorporate the analyzer into your CI process and test projects on a separate server. Another advantage — the analyzer in the form of the extension for VS exists inside the 32-bit devenv.exe process. This process can use only a limited amount of memory. The analyzer as a separate application isn't afraid of such restrictions. However, Microsoft promises to make Visual Studio 2022 64-bit. If you make your analyzer for this IDE version, these restrictions on memory consumption shouldn't affect you.

The information in this this article can help you quickly write your own static analyzer that will solve your problems. What if you want not just to solve your problems, but detect a wide range of code defects? Then you have to spend your time and energy on learning how to use static flow analysis, symbolic calculations, method annotation, and so on. Only after that your analyzer will be able to compete with the paid ones and be useful for a large number of developers. If you don't want to spend that much time on this, you can use one of the existing analyzers. There is a variety of them, both paid and free. Here's a list of tools for static code analysis. If you want to see what these tools can do, read the article 'Top 10 bugs found in C# projects in 2020'.

Besides, don't forget that such analyzers provide some part of their functionality via additional extensions for various IDEs. It's convenient if plugin allows you to launch the analyzer within the IDE. You do not need to collapse the editor and launch a separate application. The plugin can also allow you to view the analysis results inside the IDE.

Working with Roslyn

We have inspected the templates that Visual Studio provides to create a new static code analyzer. Now let's take a closer look at Roslyn API so we can use it efficiently and correctly. The syntax tree is the first thing we need to get acquainted with.

Syntax tree

A compilation object uses the source code to build a syntax tree for each .cs file. You can see one of the trees in the Syntax Visualizer window. If you have .NET Compiler Platform SDK for Visual Studio, you can find this window in View -> Other Windows -> Syntax Visualizer.

0867_AnalyzingCodeWithRoslyn_API/image9.png

This is a very useful tool. It is especially useful for those who are just getting started with the tree structure and the element types represented in it. When moving through code in the Visual Studio editor, Syntax Visualizer goes to the corresponding tree element of the code fragment and highlights it. The Syntax Visualizer window also shows some properties for the currently selected element. For example, in the screenshot above, we see a specific type MethodDeclarationSyntax for the MethodDeclaration highlighted element.

For more visualization, you can select an element in the Syntax Visualizer window and invoke this element's context menu. As a result, you get a window that visualizes the syntax tree built for the selected element:

0867_AnalyzingCodeWithRoslyn_API/image10.png

If you don't see this element in the context menu, install DGML editor. You can do it via the Visual Studio Installer. Open the Visual Studio Installer and choose More -> Modify next to the desired VS instance. Then, go to Individual Component -> Code tools -> DGML editor.

However, this tool has its disadvantages:

If the Syntax Visualizer window is empty even though you chose the necessary code, then add and erase a space. After this manipulation the Syntax Visualizer window updates its contents and shows a tree for the selected code.
This window uses lots of resources, so unless you really need it, close it when working with large source code files.

Earlier in this article, we've mentioned a tree that Roslyn builds for C# code:

if (number > 0)
{

}

0867_AnalyzingCodeWithRoslyn_API/image12.png

This picture shows that the tree consists of elements represented by four colors. We can divide all tree elements into three groups:

Blue — syntax tree nodes;
Green — syntax tokens;
White and grey — syntax trivia. It contains additional syntax information.

Let's take a closer look at every group.

Syntax nodes

Syntax nodes represent syntactic constructions: declarations, operators, expressions, etc. When a tool analyzes the code, the main work falls on the node processing. The SyntaxNode abstract class is the basic node type. Every node that represents a particular language construction has a type, inherited from SyntaxNode. It defines a number of properties that simplify working with the tree. Here are some types along with their corresponding language constructs:

IfStatementSyntax — the if statement;
InvocationExpressionSyntax — the method call;
ReturnStatementSyntax – the return operator;
MemberAccessExpressionSyntax — access to class/structure members

For example, the IfStatementSyntax class has a functionality that was inherited from the SyntaxNode class and has other useful properties, such as Condition, Statement and Else. The Condition node represents the operator condition; the Statement node represents the body of the if statement; and the Else node represents the else block.

The SyntaxNode abstract class provides the developer with methods that are common for all nodes. Some of them are listed below:

ChildNodes gets a sequence of nodes that are children of the current one.
DescendantNodes gets a sequence of all descendant nodes.
Contains determines whether the node, that was passed as an argument, is a descendant of the current node.
IsKind takes the SyntaxKind enumeration element as a parameter and returns a boolean value. You can call IsKind for a tree node. This method checks that the node type that you passed matches the node type from which IsKind was called.

Besides, a number of properties are defined in the class. One of the most commonly used among them is Parent, which contains a reference to the parent node.

Creating a diagnostic rule with CSharpSyntaxWalker

When creating a rule based on the "Standalone Code Analysis Tool" project template, we got nodes of the IfStatementSyntax type. Then we worked with them by accessing the tree root and executing a LINQ query that selects nodes necessary for our analysis. A more elegant solution is to use the CSharpSyntaxWalker class. CSharpSyntaxWalker is an abstract class. When we call the Visit method, the class traverses the node and its descendant nodes, which are passed to Visit. CSharpSyntaxWalker performs depth-first traversal. For each node encountered, it calls the Visit method corresponding to the node type. For example, for an instance of the ClassDeclarationSyntax type it calls the VisitClassDeclaration method that takes the node of this type as a parameter. In our case, we need to create a class inherited from the CSharpSyntaxWalker. Then we override the method, which is called when CSharpSyntaxWalker visits a particular C# construct.

public class IfWalker : CSharpSyntaxWalker
{
  public StringBuilder Warnings { get; } = new StringBuilder();

  const string warningMessageFormat = 
    "'if' with equal 'then' and 'else' blocks is found in file {0} at line {1}";

  static bool ApplyRule(IfStatementSyntax ifStatement)
  {
    if (ifStatement.Else == null)
      return false;

    StatementSyntax thenBody = ifStatement.Statement;
    StatementSyntax elseBody = ifStatement.Else.Statement;

    return SyntaxFactory.AreEquivalent(thenBody, elseBody);
  }

  public override void VisitIfStatement(IfStatementSyntax node)
  {
    if (ApplyRule(node))
    {
      int lineNumber = node.GetLocation()
                           .GetLineSpan()
                           .StartLinePosition.Line + 1;

      warnings.AppendLine(String.Format(warningMessageFormat, 
                                        node.SyntaxTree.FilePath, 
                                        lineNumber));
    }
    base.VisitIfStatement(node);

  }
}

Note that the overridden VisitIfStatement method internally calls the base.VisitIfStatement method. This is necessary because the basic implementations of the Visit methods initiate child nodes's traversal. If you want to stop it, then don't call the basic implementation of this method when overriding the method.

Let's create a method, that uses our IfWalker class instance to start tree traversal:

public static void StartWalker(IfWalker ifWalker, SyntaxNode syntaxNode)
{
   ifWalker.Warnings.Clear();
   ifWalker.Visit(syntaxNode);
}

This is how the Main method looks like in this case:

static void Main(string[] args)
{
  string solutionPath = @"D:\Test\TestApp.sln";
  string logPath = @"D:\Test\warnings.txt";

  MSBuildLocator.RegisterDefaults();
  usng (var workspace = MSBuildWorkspace.Create())
  {
    Project project = GetProjectFromSolution(solutionPath, workspace);

    foreach (var document in project.Documents)
    {    
      var tree = document.GetSyntaxTreeAsync().Result;
      var ifWalker = new IfWalker();
      StartWalker(ifWalker, tree.GetRoot());

      var warnings = ifWalker.Warnings;
      if (warnings.Length != 0)
        File.AppendAllText(logPath, warnings.ToString());
    }
  }
}

It's up to you to choose which approach is best for you to get nodes for the analysis. You can write a LINQ query. You can override the methods of the CSharpSyntaxWalker class that are called when CSharpSyntaxWalker visits certain C# nodes. Your choice depends only on what's most suitable your task. I think traversal methods from the CSharpSyntaxWalker should be overridden if we plan to add a lot of diagnostic rules to the analyzer. If your utility is simple and aims at processing a specific node type, you can use a LINQ query to collect all the necessary C# nodes.

Syntax tokens

Syntax tokens are language grammar terminals. Syntax tokens are elements that are not further analyzed — identifiers, keywords, special characters. We barely work with them during the analysis. During the analysis, we use tokens to obtain their textual representation or to check the token type. Tokens are the tree leaves, they don't have child nodes. Besides, tokens are instances of the SyntaxToken structure, i.e. they are not inherited from SyntaxNode. However, tokens, just like nodes, may have syntax trivia. We'll get back to it in one of the article sections.

The main properties of the SyntaxToken are:

RawKind - a numeric representation of the token's SyntaxKind enumeration element;
Value - the token's object representation. For example, if a token represents a numeric literal of the int type, then Value returns an object of the int type with a corresponding value.
Text - a text representation of a token.

Creating a diagnostic rule that analyzes syntax tokens

Let's create a simple diagnostic rule that uses syntax tokes. This rule is triggered if a method name doesn't start with a capital letter:

class Program
{
  const string warningMessageFormat =
    "Method name '{0}' does not start with capital letter " + 
    "in file {1} at {2} line";

  static void Main(string[] args)
  {
    if (args.Length != 2)
      return;

    string solutionPath = args[0];
    string logPath = args[1];

    StringBuilder warnings = new StringBuilder();

    MSBuildLocator.RegisterDefaults();
    using (var workspace = MSBuildWorkspace.Create())
    {
      Project project = GetProjectFromSolution(solutionPath, workspace);

      foreach (var document in project.Documents)
      {
        var tree = document.GetSyntaxTreeAsync().Result;

        var methods = tree.GetRoot()
                          .DescendantNodes()
                          .OfType<MethodDeclarationSyntax>();

        foreach (var method in methods)
        {
          if (ApplyRule(method, out var methodName))
          {
            int lineNumber = method.Identifier
                                   .GetLocation()
                                   .GetLineSpan()
                                   .StartLinePosition.Line + 1;

            warnings.AppendLine(String.Format(warningMessageFormat, 
                                              methodName, 
                                              document.FilePath, 
                                              lineNumber));
          }
        }
      }
    }

    if (warnings.Length != 0)
        File.WriteAllText(logPath, warnings.ToString());
  }

  static bool ApplyRule(MethodDeclarationSyntax node, out string methodName)
  {
    methodName = node.Identifier.Text;
    return methodName.Length != 0 && !char.IsUpper(methodName[0]);
  }
}

In this rule, the Identifier property of the MethodDeclarationSyntax class determines whether a method name doesn't start with a capital letter. This property stores a token that checks the first character of its text representation.

Syntax trivia

Syntax trivia (additional syntactic information) includes the following tree elements: comments, preprocessor directives, various formatting elements (spaces, newline characters). These tree nodes are not descendants of the SyntaxNode class. The syntax trivia elements don't go into the IL code. However, they are represented in the syntax tree. Thanks to this, you can get completely identical source code from the existing tree, along with all the elements contained in all instances of the SyntaxTrivia structure. This tree feature is called full fidelity. The syntax trivia elements always belong to a token. There are Leading trivia and Trailing trivia. Leading trivia is additional syntactic information that precedes the token. Trailing trivia is additional syntactic info that follows the token. All elements of the additional syntactic information are of the SyntaxTrivia type. If you want to determine what exactly the element is, use the SyntaxKind enumeration along with the Kind and IsKind methods:

Look at the following code:

#if NETCOREAPP3_1
  b = 10;
#endif
  //Comment1
  a = b;

Here's what the directed syntax graph looks like for the code above:

0867_AnalyzingCodeWithRoslyn_API/image14.png

You can see that the 'a' token includes such syntax trivia as the preprocessor directives #if NETCOREAPP3_1 and #endif, the text itself inside these directives, the space and end-of-line characters, as well as a one-line comment. The '=' token has only one syntax trivia element attached to it. It's the space character. And the ';' token corresponds to the end-of-line character.

Use of syntax trivia in comment analysis

In addition to diagnostic rules based on tree node analysis, you can also create rules that analyze syntax trivia elements. Let's imagine that a company issued a new coding requirement: do not write the comments that are longer than 130 characters. We decided to check our project for such "forbidden" comments. We use a simple analyzer that parses syntax trivia elements. The code structure of this rule is almost identical to the rule that we created on the base of the "Standalone Code Analysis Tool" project template. But now, since we need comments, we call the DescendantTrivia method instead of calling the DescendantNodes method. After that we choose only those SyntaxTrivia, whose type is either SingleLineCommentTrivia, or MultiLineCommentTrivia, or SingleLineDocumentationCommentTrivia:

....
var comTriv = tree.GetRoot().DescendantTrivia()                                 
                  .Where(n =>   n.IsKind(SyntaxKind.SingleLineCommentTrivia)
                             || n.IsKind(SyntaxKind.
                                         SingleLineDocumentationCommentTrivia)
                             || n.IsKind(SyntaxKind.MultiLineCommentTrivia));
....

We also added the new SingleLineCommentFormatMessage and MultiLineCommentFormatMessage format messages for single-line and multi-line comments:

const string PleaseBreakUpMessage = "Please, break up it on several lines.";

string SingleLineCommentFormatMessage = 
    "Length of a comment at line {0} in file {1} exceeds {2} characters. "
  + PleaseBreakUpMessage;

string MultiLineCommentFormatMessage = 
    "Multiline comment or XML comment at line {0} in file {1} contains "
  + "individual lines that exceeds {2} characters." 
  + PleaseBreakUpMessage;

The last thing that we changed was the ApplyRule method:

void ApplyRule(SyntaxTrivia commentTrivia, StringBuilder warnings)
{
  const int MaxCommentLength = 130;

  const string PleaseBreakUpMessage = ....;

  string SingleLineCommentFormatMessage = ....;

  string MultiLineCommentFormatMessage = ....;

  switch (commentTrivia.Kind())
  {
    case SyntaxKind.SingleLineCommentTrivia:
    case SyntaxKind.SingleLineDocumentationCommentTrivia:
      {
        if (commentTrivia.ToString().Length > MaxCommentLength)
        {
          int line = commentTrivia.GetLocation().GetLineSpan()
                                  .StartLinePosition.Line + 1;

          string filePath = commentTrivia.SyntaxTree.FilePath;
          var message = String.Format(SingleLineCommentFormatMessage,
                                      line,
                                      filePath,
                                      MaxCommentLength);
          warnings.AppendLine(message);
        }
        break;
      }
    case SyntaxKind.MultiLineCommentTrivia:
      {
        var listStr = commentTrivia.ToString()
                                   .Split(new string[] { Environment.NewLine },
                                          StringSplitOptions.RemoveEmptyEntries
                                          );

        foreach (string str in listStr)
        {
          if (str.Length > MaxCommentLength)
          {
            int line = commentTrivia.GetLocation().GetLineSpan()
                                    .StartLinePosition.Line + 1;

            string filePath = commentTrivia.SyntaxTree.FilePath;
            var message = String.Format(MultiLineCommentFormatMessage,
                                        line,
                                        filePath,
                                        MaxCommentLength);

            warnings.AppendLine(message);          
          }
        }
        break;
      }  
  }
}

Now the ApplyRule method checks that single-line comments do not exceed 130 characters. In the case of multi-line comments, this method checks each comment line individually. If the condition is met, we add the corresponding message to warnings.

As a result, the Main method, which was designed to search for comments where strings exceed 130 characters, has the following code:

static void Main(string[] args)
{
  string solutionPath = @"D:\Test\TestForTrivia.sln";
  string logPath = @"D:\Test\warnings.txt";

  MSBuildLocator.RegisterDefaults();
  using (var workspace = MSBuildWorkspace.Create())
  {
    StringBuilder warnings = new StringBuilder();
    Project project = GetProjectFromSolution(solutionPath, workspace);

    foreach (var document in project.Documents)
    {
      var tree = document.GetSyntaxTreeAsync().Result;
      var comTriv = tree.GetRoot()
                        .DescendantTrivia()
                        .Where(n =>    
                                 n.IsKind(SyntaxKind.SingleLineCommentTrivia)
                              || n.IsKind( SyntaxKind
                                          .SingleLineDocumentationCommentTrivia)
                              || n.IsKind(SyntaxKind.MultiLineCommentTrivia));

      foreach (var commentTrivia in comTriv)
          ApplyRule(commentTrivia, warnings);
    }

    if (warnings.Length != 0)
      File.AppendAllText(logPath, warnings.ToString());
  }
}

Beside comments, you can also write a rule that searches for preprocessor directives. You can use the same IsKind method to determine the contents of the preprocessor directives.

methodDeclaration.DescendantTrivia()
                 .Any(trivia => trivia.IsKind(SyntaxKind.IfDirectiveTrivia));

Semantic model and symbols

In the examples above, we used syntactic trees and traversed their elements to analyzer projects. In many cases, traversing a syntax tree with CsharpSyntaxWalker is insufficient — we need to use additional methods. And here comes the semantic model. A compilation uses a syntax tree to obtain an object of the SemanticModel type. The Compilation.GetSemanticModel is used to do this. It takes an object of the SyntaxTree type as a required parameter.

A semantic model provides information about various entities: methods, local variables, fields, properties, etc. You need to compile your project without errors in order to obtain a correct semantic model.

So, to get a semantic model, we need an instance of the Compilation class. One of the ways to get a compilation object is to call the GetCompilationAsync method for the Project class instance. Earlier in this article we described how to get and use an instance of this class.

Compilation compilation = project.GetCompilationAsync().Result;

If you want to get a semantic model, call the GetSemanticModel method for the compilation object and pass an object of the SyntaxTree type:

SemanticModel model = compilation.GetSemanticModel(tree);

Another way to get a semantic model is to call the Create method from the CSharpCompilation class. We'll use this method in examples further in this article.

A semantic model provides access to the so-called symbols. They, in turn, allow you to get the information about the entity itself (be it a property, method, or something else). This information is necessary for the analysis. We can divide symbols into two categories:

symbols for getting information about the entity itself;
symbols for getting information about the entity type.

Every symbol contains the information about the type and namespace, where a particular element is defined. We can find out exactly where an element was defined: in the source code that you have access to, or in an external library. Besides, you can get information about whether the analyzed element is static, virtual, etc. All this information is provided through the ISymbol base interface functionality.

Let's use the following situation as an example. Suppose, for the analysis, you need to determine if a called method was overridden. In other words, you need to determine if the called method was marked by the override modifier during the declaration. In this case, we need a symbol:

static void Main(string[] args)
{
  string codeStr =
    @"
    using System;

    public class ParentClass
    {
      virtual public void Mehtod1()
      {
        Console.WriteLine(""Hello from Parent"");
      }
    }

    public class ChildClass: ParentClass
    {
      public override void Method1()
      {
        Console.WriteLine(""Hello from Child"");
      }
    }

    class Program
    {
      static void Main(string[] args)
        {
          ChildClass childClass = new ChildClass();
          childClass.Mehtod1();
        }
    }";

  static SemanticModel GetSemanticModelFromCodeString(string codeString)
  {
    SyntaxTree tree = SyntaxFactory.ParseSyntaxTree(codeStr);

    var msCorLibLocation = typeof(object).Assembly.Location;
    var msCorLib = MetadataReference.CreateFromFile(msCorLibLocation);

    var compilation = CSharpCompilation.Create("MyCompilation",
      syntaxTrees: new[] { tree }, references: new[] { msCorLib });

    return compilation.GetSemanticModel(tree);
  }

  var model = GetSemanticModelFromCodeString(codeStr);

  var methodInvocSyntax = model.SyntaxTree.GetRoot()
                               .DescendantNodes()
                               .OfType<InvocationExpressionSyntax>();

  foreach (var methodInvocation in methodInvocSyntax)
  {
    var methodSymbol = model.GetSymbolInfo(methodInvocation).Symbol;
    if (methodSymbol.IsOverride)
    {
      //Apply your additional logic for analyzing method.
    }
  }
}

The GetSemanticModelFromCodeString method parses codeStr passed as the codeString parameter and gets a syntax tree for it. After that an object of the CSharpCompilation type is created. This object is a result of compiling a syntax tree, that was obtained from the codeStr. We call the CSharpCompilation.Create method to run compilation. An array of syntax trees (source code to be compiled) and links to libraries are passed to this method. To compile codeStr, you need a reference only to the C# base class library - mscorlib.dll. After that, a semantic model object is returned via the CSharpCompilation.GetSemanticModel method call. A semantic model is used to get the SymbolInfo structure for the node corresponding to the method call. We have the semantic model object returned by CSharpCompilation.GetSemanticModel. This object's GetSymbolInfo method is called, with the node passed to it as a parameter. After we get SymbolInfo, we call its Symbol property. This property returns the symbol object, which contains the semantic information about the node passed to the GetSymbolInfo method. When we get the symbol, we can refer to its IsOverride property and determine if the method was obtained via the override modifier.

Some readers may suggest another way to determine whether a method is overridden – without using the semantic model:

....
var methodDeclarsSyntax = model.SyntaxTree.GetRoot()
                               .DescendantNodes()
                               .OfType<MethodDeclarationSyntax>();
....
foreach(var methodDeclaration in methodDeclarsSyntax)
{
  var modifiers = methodDeclaration.Modifiers;
  bool isOverriden =  
    modifiers.Any(modifier => modifier.IsKind(SyntaxKind.OverrideKeyword));
}

This way also works, but not in all cases. For example, if the method isn't declared in the source file for which the syntax tree was obtained, we cannot get a declaration for the necessary method. A more indicative case is when the called method was declared in an external library: in this scenario successful analysis cannot do without the semantic model.

Obtaining object information. Specifying symbol type

There are a number of derived types, from which we can get more specific information about an object. Such interfaces include IFieldSymbol, IPropertySymbol, IMethodSymbol and others. If we cast the ISymbol object to a more specific interface, we'll get access to properties specific to this interface.

For example, if we use the cast to IFieldSymbol, we can refer to the IsConst field and find out whether the node is a constant field. And if we use the IMethodSymbol interface, we can find out whether the method returns any value.

For symbols the semantic model defines the Kind property, which returns the elements of the SymbolKind enumeration. With this property we can find out what we are currently working with: a local object, a field, an assembly, etc. Also, in most cases, the value of the Kind property corresponds to a specific symbol type. This exact feature is used in the following code:

static void Main(string[] args)
{
  string codeStr =
    @"
    public class MyClass
    {
      public string MyProperty { get; }
    }

    class Program
    {
      static void Main(string[] args)
      {
        MyClass myClass = new MyClass();
        myClass.MyProperty;
      }
    }";

  ....
  var model = GetSemanticModelFromCodeString(codeStr);

  var propertyAccessSyntax = model.SyntaxTree.GetRoot().DescendantNodes()
                                  .OfType<MemberAccessExpressionSyntax>()
                                  .First();

  var symbol = model.GetSymbolInfo(propertyAccessSyntax).Symbol;
  if (symbol.Kind == SymbolKind.Property)
  {
    var pSymbol = (IPropertySymbol)symbol;

    var isReadOnly = pSymbol.IsReadOnly; //true
    var type = pSymbol.Type;             // System.String
  }
}

After we cast a symbol to IPropertySymbol, we can access properties that help to obtain additional information. Again, a simple example: MyProperty is accessed in the same source file where its declaration is located. This means you can obtain information, that the property doesn't have a setter, without using a semantic model. If the property is declared in another file or library, then the use of the semantic model is inevitable.

Obtaining object type information

When you need to obtain object type information for an object represented by a node, you can use the ITypeSymbol interface. To get it, call the GetTypeInfo method for an object of the SemanticModel type. This method returns the TypeInfo structure, that contains 2 important properties:

ConvertedType returns information about the type of the expression after the compiler performs an implicit cast. If there was no cast, the value returned is the same as that returned by the Type property;
Type returns the type of the expression represented in the node. If it is impossible to get the type of the expression, the null value is returned. If the type cannot be determined due to some error, the IErrorTypeSymbol interface is returned.

Here is an example of how you get the type of a property that is assigned a value:

static void Main(string[] args)
{  
  string codeStr =
    @"
    public class MyClass
    {
      public string MyProperty { get; set; }
    
      public MyClass(string value)
      {
        MyProperty = value;
      }
    }";
  ....

  var model = GetSemanticModelFromCodeString(codeStr);

  var assignmentExpr = model.SyntaxTree.GetRoot().DescendantNodes()
                            .OfType<AssignmentExpressionSyntax>()
                            .First();

  ExpressionSyntax left = assignmentExpr.Left;

  ITypeSymbol typeOfMyProperty = model.GetTypeInfo(left).Type;
}

If you use the ITypeSymbol interface, returned by these properties, you can get all the information about the necessary type. This information is extracted by accessing the properties, some of which are listed below:

AllInterfaces is a list of all interfaces a type implements. The interfaces implemented by base types are also taken into account;
BaseType is the base type;
Interfaces is a list of interfaces implemented directly by this type;
IsAnonymousType is information about whether a type is anonymous.

Some comments on the use of the semantic model

Accessing the semantic model during the analysis has its price. Tree traversal operations are faster than obtaining a semantic model. Therefore, if you want to get different symbols for nodes belonging to the same syntax tree, you need to get the semantic model only once. Then, if necessary, refer to the same instance of the SemanticModel class.

As additional information about using the semantic model, I also recommend using the following resources:

Learn Roslyn Now: Part 7 Introducing the Semantic Model is a great learning blog with examples of how to use Roslyn;
Introduction to Roslyn. Using static analysis tools for development is a good introduction to the general principles of Roslyn-based static analysis.

Conclusion

Well, I think the information presented here is enough to start an in-depth study of the capabilities of Roslyn. You can even write a simple – or maybe complex – static analyzer. Undoubtedly, to create serious tools, you need to take into account many different nuances and learn much more about both static analysis in general and Roslyn. This article, I hope, will be an excellent assistant at the beginning of your journey.

For a more detailed study of the Roslyn API, I advise you to study the documentation on the Microsoft website. If you want to improve, fix or study the source code of this API, then welcome to its GitHub repository. Believe me, there is still a lot to improve and fix in its API. For example, here is one article: "We check the source code of Roslyn". There we checked the Roslyn API source code with the help of the PVS-Studio static analyzer and found a lot of errors.

#CSharp #Knowledge

Tags:

#CSharp #Knowledge

Fill out the form in 2 simple steps below:

Your contact information:

Desired license type:

Creating Roslyn API-based static analyzer for C#

Static analyzers: what are they and why do we need them?

Creating an analyzer based on Visual Studio templates

Description of the "Analyzer with Code Fix (.NET Standard)" project and an example of its use

Create an analyzer based on the "Standalone Code Analysis Tool" project template

Which project type to choose for writing your own static analyzer?

Working with Roslyn

Syntax tree

Syntax nodes

Creating a diagnostic rule with CSharpSyntaxWalker

Syntax tokens

Creating a diagnostic rule that analyzes syntax tokens

Syntax trivia

Use of syntax trivia in comment analysis

Semantic model and symbols

Obtaining object information. Specifying symbol type

Obtaining object type information

Some comments on the use of the semantic model

Conclusion

Comments (0)

Want to try PVS‑Studio for free?

Achievements

PVS-Studio

Licensing

Company