Webinar: C++ semantics - 06.11
Many may have noticed that the PVS-Studio C# analyser uses Roslyn (.NET compiler platform) to obtain input data. Therefore, when we came across the project "Roslyn Analyzers" from Microsoft, checking it with PVS-Studio became inevitable. The project is an extension for Visual Studio, contains analytics of errors, style, and code complexity. Knowing the features of Roslyn allowed us to better understand, what Roslyn Analyzers' developers wanted to implement. So in our opinion, the check turned out to be quite engaging for our team.
The source code of Roslyn Analyzers can be downloaded from this repository. The repository also contains usage guidelines and a full description of its functionality. To check the code I used the PVS-Studio static code analyser, version 7.03.
This article is not intended to compare analysers. Well, for a number of reasons, we don't even want to write such articles. Both analysers are good in their own way and find different errors. So this is the article on the errors found in Roslyn Analyzers.
At the same time, we checked the code of PVS-Studio using Roslyn Analyzers. Nothing remarkable was found, so there is nothing to write on this topic. Of the useful, we only had recommendations to replace the equality operator (==) with Equals. In addition, we found several false positives and added exceptions to our analyser for similar patterns.
I think I must note the high quality of the Roslyn Analyzers' code. The PVS-Studio analyser issued only 31 warnings (of High certainty level) and 67 warnings (Medium certainty level) for its code per 400,000 lines of code.
It may be hard to read an article without previous experience of working with Roslyn. So I'll be doing small italic inserts explaining the platform features. Skip these places if you understand the code. If you want to understand the essence of Roslyn in depth, you're welcome to read the article: Introduction to Roslyn. Some of the inserts are copied right from this article.
PVS-Studio warning: V3127 Two similar code fragments were found. Perhaps, this is a typo and 'leadingTrivia' variable should be used instead of 'trailingTrivia' UseLiteralsWhereAppropriate.Fixer.cs 76
private async Task<Document> ToConstantDeclarationAsync(...)
{
....
if (leadingTrivia.Count == 0 && trailingTrivia.Count == 0)
{
leadingTrivia = leadingTrivia.AddRange(modifier.LeadingTrivia);
trailingTrivia = trailingTrivia.AddRange(modifier.TrailingTrivia);
}
else
{
trailingTrivia = trailingTrivia.AddRange(modifier.LeadingTrivia); // <=
trailingTrivia = trailingTrivia.AddRange(modifier.TrailingTrivia); // <=
....//here Trivia and trailingTrivia are handled
}
....
}
Trivia (additional syntax information) are those elements of the tree, which won't be compiled into IL-code. These include elements of formatting (spaces, line feed characters),comments, preprocessor directives. Are located in the tree with connection to other nods. The binding can be before the node - LeadingTrivia, or after - TrailingTrivia.
This code checks the number of elements in leadingTrivia and trailingTrivia arrays. If there are no elements - they are added in local leadingTrivia and trailingTrivia arrays. If there are elements in arrays - they are all added only in trailingTrivia (which was noticed by our analyser).
Perhaps, in the else branch the code author copied handling of the trailingTrivia array, but forgot to change the array for leadingTrivia, the same as it was made in another if branch.
On the other hand, in doing so, both lines of code would be the same and could be removed from the condition. So it's not very clear, but something's wrong with the code.
PVS-Studio warning: V3001 There are identical sub-expressions 'data1.IsReachableBlockData' to the left and to the right of the '==' operator. AnalysisEntityBasedPredicateAnalysisData.cs 39
protected AnalysisEntityBasedPredicateAnalysisData(....)
: base(....)
{
Debug.Assert(data1.IsReachableBlockData == data1.IsReachableBlockData);
....
}
Here in the condition the variable is compared to itself, which clearly doesn't make sense. In any case, in addition to editing this code, I suggest that developers of Roslyn Analyzers implement an analogue of our V3001 diagnostic (on the comparison of identical sub-expressions).
PVS-Studio warning: V3080 Possible null dereference of method return value. Consider inspecting: GetCandidateReferencedSymbols(...). SyntaxNodeHelper.cs 78
public static IEnumerable<IMethodSymbol> GetCandidateCalleeMethodSymbols(
SyntaxNode node, SemanticModel semanticModel)
{
foreach (ISymbol symbol in GetCandidateReferencedSymbols(
node, semanticModel))
{
if (symbol != null && symbol.Kind == SymbolKind.Method)
{
yield return (IMethodSymbol)symbol;
}
}
}
If we consider the method GetCandidateReferencedSymbols, we can see that it can return the null value:
public static IEnumerable<ISymbol> GetCandidateReferencedSymbols(
SyntaxNode node, SemanticModel semanticModel)
{
if (node == null)
{
return null;
}
return semanticModel.GetSymbolInfo(node).CandidateSymbols;
}
ISymbol is the base interface of the symbol, which provides methods that are common for all the objects, regardless of what they are - fields, properties or something else.
Indeed, if the node isn't assigned, null can get into the enumeration, resulting in NullReferenceException. The code can be fixed either by throwing an exception right from the method GetCandidateReferencedSymbols, or by adding a check after getting a value from it. I suggest that we choose the second, safer way:
public static IEnumerable<IMethodSymbol> GetCandidateCalleeMethodSymbols(
SyntaxNode node, SemanticModel semanticModel)
{
var candidateReferencedSymbols = GetCandidateReferencedSymbols(...);
if(candidateReferencedSymbols != null)
{
foreach (ISymbol symbol in candidateReferencedSymbols)
{
if (symbol != null && symbol.Kind == SymbolKind.Method)
yield return (IMethodSymbol)symbol;
}
}
}
PVS-Studio warning: V3125 The 'valueClauseName' object was used after it was verified against null. Check lines: 2320, 2318. DiagnosticAnalyzer.cs 2320
private SuppDiagReturnSymbolInfo SuppDiagReturnSymbol(....)
{
....
var valueClauseName = valueClauseMemberAccess.Name as IdentifierNameSyntax;
if (valueClauseName == null
|| valueClauseName.Identifier.Text != "Create")
{
ReportDiagnostic(context,
SuppDiagReturnValueRule,
valueClauseName.GetLocation(), // <=
propertyDeclaration.Identifier.Text);
return result;
}
....
}
MemberAccessExpressionSyntax is a class that reflects access to a method, property or a field of a certain element. The class has two properties: Expression (left part) and Name (right part).
The analyser noticed dereference right after checking for null. The best option is to get NullReferenceException. But those who are familiar with Roslyn might ask: what's the error? For trivial examples of fields or properties, Name will definitely always be IdentifierNameSyntax. As soon as the generic method is called, the type will become GenericNameSyntax, which cannot be cast to IdentifierNameSyntax. I'm not sure if this method can handle the call of the generic method, but I would foresee this case if I were at the developers' place.
PVS-Studio warning: V3080 Possible null dereference. Consider inspecting 'oldIdName'. CodeFixProvider.cs 1476
It's a pretty large method. No panic. You can scroll it, I'll describe the important points below.
private async Task<Document> IdDeclTypeAsync(....)
{
....
ExpressionSyntax oldIdName = null;
foreach (MemberDeclarationSyntax memberSyntax in members)
{
var fieldDeclaration = memberSyntax as FieldDeclarationSyntax;
if (fieldDeclaration == null)
continue;
if (fieldDeclaration.Declaration.Type is IdentifierNameSyntax fieldType
&& fieldType.Identifier.Text == "DiagnosticDescriptor")
{
....
for (int i = 0; i < ruleArgumentList.Arguments.Count; i++)
{
ArgumentSyntax currentArg = ruleArgumentList.Arguments[i];
string currentArgName = currentArg.NameColon.Name.Identifier.Text;
if (currentArgName == "id")
{
oldIdName = currentArg.Expression;
break;
}
}
continue;
}
....
}
var newRule = rule.ReplaceNode(oldIdName.Ancestors() // <=
.OfType<ArgumentSyntax>()
.First(), newArg);
...
}
So what's going on here: oldIdName is initialized by a null reference. The following conditions must be met to assign an object to oldIdName:
If the conditions are not favourable, NullReferenceException will be thrown when attempting to obtain Ancestors. That is, either the method crashes when calling it or the developer is confident that a declaration of this field will be in the method. For example, these conditions have been checked earlier. Or this is the method created by a code generator. In any case, this code is quite vulnerable to changes.
Ways to remedy this situations depend on what function had to the executed. It's worth adding the oldIdName check and exit, or, for example, throw an exception.
PVS-Studio warning: V3095 The 'rule' object was used before it was verified against null. Check lines: 2180, 2181. CodeFixProvider.cs 2180
internal static string GetFirstRuleName(ClassDeclarationSyntax declaration)
{
SyntaxList<MemberDeclarationSyntax> members = declaration.Members;
FieldDeclarationSyntax rule = null;
foreach (MemberDeclarationSyntax member in members)
{
rule = member as FieldDeclarationSyntax;
var ruleType = rule.Declaration.Type as IdentifierNameSyntax; // <=
if (rule != null
&& ruleType != null
&& ruleType.Identifier.Text == "DiagnosticDescriptor")
{break;}
rule = null;
}
....
}
ClassDeclarationSyntax is a class presentation in Roslyn. The property Members contains nodes of all class elements (field, property, methods, other classes and structures).
I even double-checked the behavior of Members when I saw this code. The developer was confident, that the first declaration would be a field's declaration. But in Members, elements are written in the order of their declaration in the class. The order of declarations doesn't change. So may be we'll try to get the declaration type from a non-existent field. In this case, NullRefenceException will be thrown. The developer was aware that there might not be a field and added the check... but later than it should be.
When editing the code, I'd rewrite the method using Linq.
internal static string GetFirstRuleName(ClassDeclarationSyntax declaration)
{
SyntaxList<MemberDeclarationSyntax> members = declaration.Members;
FieldDeclarationSyntax rule =
members.OfType<FieldDeclarationSyntax>()
.FirstOrDefault(x =>(x.Declaration.Type as IdentifierNameSyntax)?
.Identifier.Text == "DiagnosticDescriptor");
....
}
It looks a bit worse, but conveys the essence better.
PVS-Studio warning: V3137 The 'sourceOrigins' variable is assigned but is not used by the end of the function. TaintedDataAnalysis.TaintedDataOperationVisitor.cs 328
public override TaintedDataAbstractValue VisitArrayInitializer(
IArrayInitializerOperation operation,
object argument)
{
HashSet<SymbolAccess> sourceOrigins = null;
...
if (baseAbstractValue.Kind == TaintedDataAbstractValueKind.Tainted)
{
sourceOrigins = new HashSet<SymbolAccess>(...);
}
....
}
Actually, there is nothing to add to the message of the analyser. The field is really no longer used below in the method. No conditional compilation directives, no returns by ref. Not a single reference... it is not clear what this creature is for.
PVS-Studio warning: V3080 Possible null dereference. Consider inspecting 'methodDeclaration'. DiagnosticAnalyzer.cs 506
private bool CheckIfStatementAnalysis(...
IMethodSymbol analysisMethodSymbol)
{
var methodDeclaration = AnalysisGetStatements(analysisMethodSymbol)
as MethodDeclarationSyntax;
var body = methodDeclaration.Body as BlockSyntax;
if (body == null)
{ return false; }
....
}
The analyser warns that the AnalysisGetStatements method can return null. Let's have a look at it.
private MethodDeclarationSyntax AnalysisGetStatements(
IMethodSymbol
analysisMethodSymbol)
{
MethodDeclarationSyntax result = null;
if (analysisMethodSymbol == null)
{
return result;
}
var methodDeclaration = analysisMethodSymbol
.DeclaringSyntaxReferences[0]
.GetSyntax() as MethodDeclarationSyntax;
if (methodDeclaration == null)
{
return result;
}
return methodDeclaration;
}
MethodDeclarationSyntax is a representation of a method declaration in Roslyn. Although it is not essential here - just for the sake of satisfying possible curiosity.
If I get it right, a new entity is created here. The value of this variable doesn't change, but the variable is returned from the function twice. There is a feeling that the code is not finished.
PVS-Studio warning: V3125 The 'ifStatement' object was used after it was verified against null. Check lines: 788, 773. CodeFixProvider.cs 788
private async Task<Document> TriviaCountIncorrectAsync(
MethodDeclarationSyntax declaration)
{
SyntaxGenerator generator = SyntaxGenerator.GetGenerator(document);
....
var ifStatement = declaration.Body.Statements[2] as IfStatementSyntax;
if (ifStatement != null)
{
....
}
....
var oldBlock = ifStatement.Statement as BlockSyntax;
....
}
IfStatementSyntax is a representation of if condition in Roslyn. I'll highlight two properties - Condition, Statement. They contain representations of entry conditions and executable code when executing the condition.
If in Statement the code is in curly braces {}, the type this node will be BlockSyntax. This way, it's possible to get an array of expressions from it through the property Statements.
The analyzer triggered for ifStatement dereference without a check. Note that the needed check took place earlier along the code. I'd say, it's quite dangerous to cast the IfStatementSyntax.Statement type to BlockSyntax without checking. The fact of the matter is that the condition can be written in two ways:
if (true)
{
var A = b;
}
or as follows:
if (true)
var A = b;
When omitting curly brackets, Statement won't be of the BlockSyntax type, it'll be ExpressionStatementSyntax.
On the other hand, getting ifStatement looks as follows: declaration.Body.Statements[2], without checking the length of the Statements array. So developers are sure that there will be a condition. Perhaps, the clue to this method is in getting generator, even though it has nothing to do with ifStatement. Anyway, I think that the check is necessary, at least for a more meaningful exception.
PVS-Studio warning: V3139 Two or more case-branches perform the same actions. CodeMetricsAnalyzer.cs 251
static bool isApplicableByDefault(string ruleId, SymbolKind symbolKind)
{
switch (ruleId)
{
....
case CA1505RuleId:
switch (symbolKind)
{
case SymbolKind.NamedType:
case SymbolKind.Method:
case SymbolKind.Field:
case SymbolKind.Property:
case SymbolKind.Event:
return true;
default:
return false;
}
case CA1506RuleId:
switch (symbolKind)
{
case SymbolKind.NamedType:
case SymbolKind.Method:
case SymbolKind.Field:
case SymbolKind.Property:
case SymbolKind.Event:
return true;
default:
return false;
}
default:
throw new NotImplementedException();
}
}
Perhaps different behavior was meant for 1505 and 1506 rules. This way, we found a real error. But there is a chance that it's made intentionally to change the behavior later. Or may be the developer forgot for a second that conditions could be grouped.
Let's suggest that the code works correctly and the analyser complains only about the code style. Though we don't have diagnostics for bad style. This way, the best option to get rid of a warning, and a Copy-Paste error in code is to group the conditions:
static bool isApplicableByDefault(string ruleId, SymbolKind symbolKind)
{
switch (ruleId)
{
....
case CA1505RuleId:
case CA1506RuleId:
switch (symbolKind)
{
case SymbolKind.NamedType:
case SymbolKind.Method:
case SymbolKind.Field:
case SymbolKind.Property:
case SymbolKind.Event:
return true;
default:
return false;
}
default:
throw new NotImplementedException();
}
}
PVS-Studio warning: V3105 The 'lastField' variable was used after it was assigned through null-conditional operator. NullReferenceException is possible. UseLiteralsWhereAppropriate.cs 63
A curious case: in fact, this warning is a false positive, but when delving into the code I found another potential error.
public override void Initialize(AnalysisContext analysisContext)
{
var fieldInitializer = saContext.Operation as IFieldInitializerOperation;
analysisContext.RegisterOperationAction(saContext =>
{
var lastField = fieldInitializer?.InitializedFields.LastOrDefault();
var fieldInitializerValue = fieldInitializer?.Value;
if (fieldInitializerValue == null || lastField.IsConst ...)
return;
}
....
}
IFieldInitializerOperation interface of a field declaration. InitializedFields enables to get all declarations in case of overriding the field in an derived class, for example. Very rarely an array can be empty and most likely it is a compilation error.
This code checks condition in a way, which is tricky for our analyser in terms of its current level of development. The connection between lastField and fieldInitializerValue isn't obvious to the analyser and the warning is incorrect.
The check fieldInitializerValue == null checks lastField as well. Since we've initiated the check - let's pay attention to the call LastOrDefault. For reference types, the method might return null. The type InitializedFields - ImmutableArray<IFieldSymbol>. A developer uses the LastOrDefault method. But in case if the list of initialized fields doesn't contain a single character, we'll get a general exception NullReferenceException. I suggest using Last to get a more meaningful exception.
Roslyn Analyzers takes a curious approach to unit tests. Methods store long string literals, which contain classes for checking a certain diagnostic. I think, writing such code isn't convenient, since IntelliSence doesn't work inside literals.
I would suggest our approach instead: creating classes for each diagnostic rule. Further these classes are added in resources as files and are retrieved in tests for using specific diagnostics.
We have at least two classes for each diagnostic, with false and correct warnings (yes, special hodgie code is written there). No, we don't have vacancies of hodgie coders :). Unit tests traverse files by certain rules and notify if errors were found in false ones and there are no errors in good ones. When analyzing our unit test base, we can get more than 10,000 warnings. Sure, Roslyn Analyzers' tests might be located in a separate repository. It is also possible that a fundamentally different approach is used there. I haven't studied the insights of Roslyn Analyzers in more details.
At the moment, Roslyn Analyzers isn't the largest project from all open source static code analysers. One of the main goals of the project is usage of its diagnostics for writing own ones. In this regard, its high code quality gets even more important. I hope our article helped to make the project a little better.
For those who are choosing what static analyser to use for own project, I'd suggest using several ones. Various analysers complement each other. If the price of making an error in your project is high, its better to be insured by all possible means. However, we shouldn't forget that analysers should be up-to-date. Adding outdated analysers in a project can make it even worse, as it can give a false sense of security.
0