Posting of projects sources by Microsoft is a good reason to perform their analysis. This time is no exception and today we will look at suspicious places, found in Infer.NET code. Down with summary - get to the point!
Infer.NET is a machine learning system developed by Microsoft specialists. Project source code has become recently available on GitHub which gave rise to its check. More details about the project can be found, for example, here.
The project was checked by the PVS-Studio 6.26 static code analyzer. Let me remind you that PVS-Studio is searching for errors in C\C++\C# (and soon Java) code under Windows, Linux, macOS. C# code is analyzed only under Windows so far. You can download and try the analyzer on your project.
The check itself was quite simple and hassle-free. Before the check I downloaded the project sources from GitHub, restored the required packages (dependencies) and ensured that the project was successfully built. This is required so that the analyzer could have access to all needed information to perform a full-fledged analysis. After building in a couple of clicks, I ran the solution analysis through the PVS-Studio plugin for Visual Studio.
By the way, this is not the first project from Microsoft, checked using PVS-Studio - there were also others: Roslyn, MSBuild, PowerShell, CoreFX and others.
Note. If you or your friends are interested in Java code analysis, you can write to our support by selecting "I want the analyze Java". There is no public beta-version yet, but it's about to appear very soon. Someone in a secret lab (next door) people are actively working on it.
Nevertheless, enough of philosophical conversations - let's look at problems in code.
I suggest finding the error yourself - it's a completely possible task. I promise no burns in line with what was in the article "Top 10 Bugs in the C++ Projects of 2017". So, take your time to read the analyzer warning, given after the code fragment.
private void MergeParallelTransitions()
{
....
if ( transition1.DestinationStateIndex ==
transition2.DestinationStateIndex
&& transition1.Group ==
transition2.Group)
{
if (transition1.IsEpsilon && transition2.IsEpsilon)
{
....
}
else if (!transition1.IsEpsilon && !transition2.IsEpsilon)
{
....
if (double.IsInfinity(transition1.Weight.Value) &&
double.IsInfinity(transition1.Weight.Value))
{
newElementDistribution.SetToSum(
1.0, transition1.ElementDistribution,
1.0, transition2.ElementDistribution);
}
else
{
newElementDistribution.SetToSum(
transition1.Weight.Value, transition1.ElementDistribution,
transition2.Weight.Value, transition2.ElementDistribution);
}
....
}
PVS-Studio warning: V3001 There are identical sub-expressions 'double.IsInfinity(transition1.Weight.Value)' to the left and to the right of the '&&' operator. Runtime Automaton.Simplification.cs 479
As you can see from the source code fragment, the method is working with a couple of variables - transition1 and transition2. The use of similar names is sometimes justifiable, but it is worth remembering that in such a case, the likelihood of accidentally making a mistake somewhere with the name increases.
So it happened when checking the numbers on infinity (double.IsInfinity). Due to the error the value of one and the same variable transition1.Weight.Value was checked twice. The variable transition2.Weight.Value in the second subexpression had to become a checked value.
Another similar suspicious code.
internal MethodBase ToMethodInternal(IMethodReference imr)
{
....
bf |= BindingFlags.Public
| BindingFlags.NonPublic
| BindingFlags.Public
| BindingFlags.Instance;
....
}
PVS-Studio warning: V3001 There are identical sub-expressions 'BindingFlags.Public' to the left and to the right of the '|' operator. Compiler CodeBuilder.cs 194
When forming a bf variable value, the enumerator BindingFlags.Public is used twice. Either this code contains a redudant flag setting operation, or instead of second use of BindingFlags.Public another enumerator has to take place here.
By the way, this code is written in a single line in source codes. It seems to me that if it is formatted in a tabular style (as here), it's easier to find a problem.
Let's move on. I'm citing the whole method body and suggest again finding an error (or errors) yourself.
private void ForEachPrefix(IExpression expr,
Action<IExpression> action)
{
// This method must be kept consistent with GetTargets.
if (expr is IArrayIndexerExpression)
ForEachPrefix(((IArrayIndexerExpression)expr).Target,
action);
else if (expr is IAddressOutExpression)
ForEachPrefix(((IAddressOutExpression)expr).Expression,
action);
else if (expr is IPropertyReferenceExpression)
ForEachPrefix(((IPropertyReferenceExpression)expr).Target,
action);
else if (expr is IFieldReferenceExpression)
{
IExpression target = ((IFieldReferenceExpression)expr).Target;
if (!(target is IThisReferenceExpression))
ForEachPrefix(target, action);
}
else if (expr is ICastExpression)
ForEachPrefix(((ICastExpression)expr).Expression,
action);
else if (expr is IPropertyIndexerExpression)
ForEachPrefix(((IPropertyIndexerExpression)expr).Target,
action);
else if (expr is IEventReferenceExpression)
ForEachPrefix(((IEventReferenceExpression)expr).Target,
action);
else if (expr is IUnaryExpression)
ForEachPrefix(((IUnaryExpression)expr).Expression,
action);
else if (expr is IAddressReferenceExpression)
ForEachPrefix(((IAddressReferenceExpression)expr).Expression,
action);
else if (expr is IMethodInvokeExpression)
ForEachPrefix(((IMethodInvokeExpression)expr).Method,
action);
else if (expr is IMethodReferenceExpression)
ForEachPrefix(((IMethodReferenceExpression)expr).Target,
action);
else if (expr is IUnaryExpression)
ForEachPrefix(((IUnaryExpression)expr).Expression,
action);
else if (expr is IAddressReferenceExpression)
ForEachPrefix(((IAddressReferenceExpression)expr).Expression,
action);
else if (expr is IDelegateInvokeExpression)
ForEachPrefix(((IDelegateInvokeExpression)expr).Target,
action);
action(expr);
}
Found it? Let's check it!
PVS-Studio warnings:
Let's simplify the code so that problems could become more apparent.
private void ForEachPrefix(IExpression expr,
Action<IExpression> action)
{
if (....)
....
else if (expr is IUnaryExpression)
ForEachPrefix(((IUnaryExpression)expr).Expression,
action);
else if (expr is IAddressReferenceExpression)
ForEachPrefix(((IAddressReferenceExpression)expr).Expression,
action);
....
else if (expr is IUnaryExpression)
ForEachPrefix(((IUnaryExpression)expr).Expression,
action);
else if (expr is IAddressReferenceExpression)
ForEachPrefix(((IAddressReferenceExpression)expr).Expression,
action)
....
}
Conditional expressions and then-branches of several if statements are duplicated. Perhaps, this code was written by copy-paste method, which led to a problem. Now it turns out that then-branches of duplicates will never be executed because:
Since then-branches contain the same actions, now it looks like redundant code, which is confusing. Perhaps, there is a different kind of problem here - other checks had to be run instead of duplicates.
Let's continue.
public int Compare(Pair<int, int> x, Pair<int, int> y)
{
if (x.First < y.First)
{
if (x.Second >= y.Second)
{
// y strictly contains x
return 1;
}
else
{
// No containment - order by left bound
return 1;
}
}
else if (x.First > y.First)
{
if (x.Second <= y.Second)
{
// x strictly contains y
return -1;
}
else
{
// No containment - order by left bound
return -1;
}
}
....
}
PVS-Studio warnings:
The code looks very suspicious, because it contains two conditional statements with identical bodies of then and else-branches. Probably in both cases, it's worth returning different values. On the other hand, if it is conceived behavior it will be useful to remove redundant conditional statements.
I came across some more interesting loops. The example is given below:
private static Set<StochasticityPattern>
IntersectPatterns(IEnumerable<StochasticityPattern> patterns)
{
Set<StochasticityPattern> result
= new Set<StochasticityPattern>();
result.AddRange(patterns);
bool changed;
do
{
int count = result.Count;
AddIntersections(result);
changed = (result.Count != count);
break;
} while (changed);
return result;
}
PVS-Studio warning: V3020 An unconditional 'break' within a loop. Compiler DefaultFactorManager.cs 474
Due to unconditional break statement, exactly one loop iteration is executed, and a control changed variable is not even used. Generally speaking, the code looks strange and suspicious.
The same method (exact copy) took place in another class. Corresponding analyzer warning: V3020 An unconditional 'break' within a loop. Visualizers.Windows FactorManagerView.cs 350
By the way, I stumbled upon an unconditional continue statement in a loop (the analyzer found it by the same diagnostic), but above that there was the comment stating that it was a special temporary solution:
// TEMPORARY
continue;
Let me remind you that there were no such comments next to the unconditional break statement.
Let's move on.
internal static DependencyInformation GetDependencyInfo(....)
{
....
IExpression resultIndex = null;
....
if (resultIndex != null)
{
if (parameter.IsDefined(
typeof(SkipIfMatchingIndexIsUniformAttribute), false))
{
if (resultIndex == null)
throw new InferCompilerException(
parameter.Name
+ " has SkipIfMatchingIndexIsUniformAttribute but "
+ StringUtil.MethodNameToString(method)
+ " has no resultIndex parameter");
....
}
....
}
....
}
PVS-Studio warning: V3022 Expression 'resultIndex == null' is always false. Compiler FactorManager.cs 382
I'd like to note straight away that between the declaration and the given check, the value of the resultIndex variable might change. However, between checks resultIndex != null and resultIndex == null the value can not change. Therefore, the result of the expression resultIndex == null will always be false, and thus an exception will never be generated.
I hope that you have the interest to independent search for bugs even without my suggestions to find the problem, but just in case, I'll suggest doing it once again. The method code is small, I'll cite it entirely.
public static Tuple<int, string> ComputeMovieGenre(int offset,
string feature)
{
string[] genres = feature.Split('|');
if (genres.Length < 1 && genres.Length > 3)
{
throw
new ArgumentException(string.Format(
"Movies should have between 1 and 3 genres; given {0}.",
genres.Length));
}
double value = 1.0 / genres.Length;
var result
= new StringBuilder(
string.Format(
"{0}:{1}",
offset + MovieGenreBuckets[genres[0]],
value));
for (int i = 1; i < genres.Length; ++i)
{
result.Append(
string.Format(
"|{0}:{1}",
offset + MovieGenreBuckets[genres[i].Trim()],
value));
}
return
new Tuple<int, string>(MovieGenreBucketCount, result.ToString());
}
Let's see what is happening here. The input string is parsed by the character '|'. If the length of the array does not match the expected one, an exception has to be generated. Wait a second ... genres.Length < 1 && genres.Length > 3 ? Since there is no number that would suit both range of values required by the expression ([int.MinValue..1) and (3..int.MaxValue]), the result of the expression will always be false. Therefore, this check protects from nothing and an expected exception will not be thrown.
This is what the analyzer prevents us about: V3022 Expression 'genres.Length < 1 && genres.Length > 3' is always false. Probably the '||' operator should be used here. Evaluator Features.cs 242
I came across a suspicious division operation.
public static void CreateTrueThetaAndPhi(....)
{
....
double expectedRepeatOfTopicInDoc
= averageDocLength / numUniqueTopicsPerDoc;
....
int cnt = Poisson.Sample(expectedRepeatOfTopicInDoc);
....
}
PVS-Studio warning: V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. LDA Utilities.cs 74
Here's what suspicious in this place: integer division is executed (variables averageDocLength and numUniqueTopicsPerDoc are of the int type), but the result is written in a variable of the double type. Which begs the question: was it made deliberately or was division of real numbers implied? If the variable expectedRepeatOfTopicInDoc was of the int type, this would disallow possible issues.
In other places the method Poisson.Sample, the argument of which is a suspicious variable expectedRepeatOfTopicInDoc, is used, for example, as described below.
int numUniqueWordsPerTopic
= Poisson.Sample((double)averageWordsPerTopic);
averageWordsPerTopic is of the int type which is cast to double in the place of its using.
And here's another place of using:
double expectedRepeatOfWordInTopic
= ((double)numDocs) * averageDocLength / numUniqueWordsPerTopic;
....
int cnt = Poisson.Sample(expectedRepeatOfWordInTopic);
Note that variables have the same names as in the original example, only for the initialization of expectedRepeatOfWordInTopic division of real numbers is used (due to an explicit numDocs casting to the double type).
Overall, the starting source code fragment mentioned above highlighted by the analyzer with a warning is worth looking at.
Let's leave reflections on whether fix this or not to code authors (they know better), and we'll go further. To the next suspicious division.
public static NonconjugateGaussian BAverageLogarithm(....)
{
....
double v_opt = 2 / 3 * (Math.Log(mx * mz / Ex2 / 2) - m);
if (v_opt != v)
{
....
}
....
}
PVS-Studio warning: V3041 The expression was implicitly cast from 'int' type to 'double' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. Runtime ProductExp.cs 137
The analyzer again found a suspicious operation of integer division, as 2 and 3 are integer numeric literals, and the result of the expression 2 / 3 will be 0. As a result, the expression looks as follows:
double v_opt = 0 * expr;
You must admit, it's a bit strange. Several times I was getting back to this warning, trying to find a trick without seeking to add it to the article. The method is filled with math and formulas (the dismantling of which, frankly speaking, was not vey captivating), there's much to expect from here. Besides, I try to be as much as skeptical as possible to the warnings, which I include in the article and describe them only having preliminary deeply studied them.
Then it dawned on me-why do you need such a multiplier as 0, written as 2 / 3? Therefore, this place is, in any case, worth looking at.
public static void
WriteAttribute(TextWriter writer,
string name,
object defaultValue,
object value,
Func<object, string> converter = null)
{
if ( defaultValue == null && value == null
|| value.Equals(defaultValue))
{
return;
}
string stringValue = converter == null ? value.ToString() :
converter(value);
writer.Write($"{name}=\"{stringValue}\" ");
}
PVS-Studio warning: V3080 Possible null dereference. Consider inspecting 'value'. Compiler WriteHelpers.cs 78
Quite a fair analyzer warning based on a condition. Null reference dereferencing might occur in the expression value.Equals(defaultValue), if value == null. Since this expression is the right operand of the operator ||, for it's evaluation the left operand must have false value, and for this purpose it is enough to at least one of the variables defaultValue \ value to be not equal to null. In the end, if defaultValue != null, and value == null:
Let's see another case:
public FeatureParameterDistribution(
GaussianMatrix traitFeatureWeightDistribution,
GaussianArray biasFeatureWeightDistribution)
{
Debug.Assert(
(traitFeatureWeightDistribution == null &&
biasFeatureWeightDistribution == null)
||
traitFeatureWeightDistribution.All(
w => w != null
&& w.Count == biasFeatureWeightDistribution.Count),
"The provided distributions should be valid
and consistent in the number of features.");
....
}
PVS-Studio warning: V3080 Possible null dereference. Consider inspecting 'traitFeatureWeightDistribution'. Recommender FeatureParameterDistribution.cs 65
Let's omit extra strings, having left only the logic of evaluating boolean value to make it easier to sort out:
(traitFeatureWeightDistribution == null &&
biasFeatureWeightDistribution == null)
||
traitFeatureWeightDistribution.All(
w => w != null
&& w.Count == biasFeatureWeightDistribution.Count)
Again, the right operand of the operator || is evaluated only if the result of evaluating the left one is false. The left operand can take the false value, including when traitFeatureWeightDistribution == null and biasFeatureWeightDistribution != null. Then the right operand of the operator || will be evaluated, and calling traitFeatureWeightDistribution.All will lead to throwing of ArgumentNullException.
Another interesting piece of code:
public static double GetQuantile(double probability,
double[] quantiles)
{
....
int n = quantiles.Length;
if (quantiles == null)
throw new ArgumentNullException(nameof(quantiles));
if (n == 0)
throw new ArgumentException("quantiles array is empty",
nameof(quantiles));
....
}
PVS-Studio warning: V3095 The 'quantiles' object was used before it was verified against null. Check lines: 91, 92. Runtime OuterQuantiles.cs 91
Note that the quantiles.Length property is accessed, and then quantiles is checked for equality to null. In the end, if quantiles == null, the method will throw an exception, but the incorrect one and in a wrong place. Probably, the lines were inverted.
If you have successfully coped yourself with finding errors listed earlier, I suggest making yourself a cup of coffee and trying to repeat a heroic deed, having found an error in the below method. To make it a bit more interesting, I will cite the method code entirely.
(Click on the image to enlarge it)
Okay, okay, it was a joke (or you did it?!). Let's make the task simpler:
if (sample.Precision < 0)
{
precisionIsBetween = true;
lowerBound = -1.0 / v;
upperBound = -mean.Precision;
}
else if (sample.Precision < -mean.Precision)
{
precisionIsBetween = true;
lowerBound = 0;
upperBound = -mean.Precision;
}
else
{
// in this case, the precision should NOT be in this interval.
precisionIsBetween = false;
lowerBound = -mean.Precision;
lowerBound = -1.0 / v;
}
Is it better? The analyzer issued the following warning on this code: V3008 The 'lowerBound' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 324, 323. Runtime GaussianOp.cs 324
Indeed, in the last else-branch the value of the lowerBound variable is assigned twice in a row. Apparently (judging from the code above), the upperBound variable should participate in one of the assignments.
Let's move on.
private void WriteAucMatrix(....)
{
....
for (int c = 0; c < classLabelCount; c++)
{
int labelWidth = labels[c].Length;
columnWidths[c + 1] =
labelWidth > MaxLabelWidth ? MaxLabelWidth : labelWidth;
for (int r = 0; r < classLabelCount; r++)
{
int countWidth = MaxValueWidth;
if (countWidth > columnWidths[c + 1])
{
columnWidths[c + 1] = countWidth;
}
}
....
}
PVS-Studio warning: V3081 The 'r' counter is not used inside a nested loop. Consider inspecting usage of 'c' counter. CommandLine ClassifierEvaluationModule.cs 459
Note that the inner loop counter - r is not used in the body of this loop. Because of this, it turns out that in all iterations of the inner loop the same operations with the same elements are executed - in indexes the counter of external loop (c) is also used, not the one of the inner loop (r).
Let's see other interesting issues.
public RegexpFormattingSettings(
bool putOptionalInSquareBrackets,
bool showAnyElementAsQuestionMark,
bool ignoreElementDistributionDetails,
int truncationLength,
bool escapeCharacters,
bool useLazyQuantifier)
{
this.PutOptionalInSquareBrackets = putOptionalInSquareBrackets;
this.ShowAnyElementAsQuestionMark = showAnyElementAsQuestionMark;
this.IgnoreElementDistributionDetails =
ignoreElementDistributionDetails;
this.TruncationLength = truncationLength;
this.EscapeCharacters = escapeCharacters;
}
PVS-Studio warning: V3117 Constructor parameter 'useLazyQuantifier' is not used. Runtime RegexpFormattingSettings.cs 38
In the constructor one parameter is not used - useLazyQuantifier. It looks particularly suspicious in light that in a class, a property is defined with an appropriate name and type - UseLazyQuantifier. Apparently, one forgot to carry its initialization through the corresponding parameter.
I also met several potentially dangerous event handlers. An example of one of them is given below:
public class RecommenderRun
{
....
public event EventHandler Started;
....
public void Execute()
{
// Report that the run has been started
if (this.Started != null)
{
this.Started(this, EventArgs.Empty);
}
....
}
....
}
PVS-Studio warning: V3083 Unsafe invocation of event 'Started', NullReferenceException is possible. Consider assigning event to a local variable before invoking it. Evaluator RecommenderRun.cs 115
The fact of the matter is that between the check for null inequality and handler calling, event unsubscription can occur if, at the time between testing for null and calling the event handlers, the event will not have subscribers, an exception NullReferenceException will be thrown. To avoid such problems, you can, for example, retain a reference to the chain of delegates into a local variable, or use the '?.' operator to invoke handlers.
Apart from the above code fragment, 35 other such places were found.
By the way, 785 V3024 warnings took place. The V3024 warning is issued when comparing real numbers with using of operators '!=' or '=='. I will not dwell on why such comparisons are not always correct. More about this is written in the documentation, there is also a link to Stack Overflow.
Taking into account the fact that formulas and calculations were often met, those warnings could be important even being placed at the 3rd level (as they are hardly relevant in all projects).
If you are sure in these warnings' irrelevance, you can remove them almost with one click, reducing the total number of analyzer triggerings.
Somehow, it so happens that I haven't written an article for long about projects checks, so I was pleased to be involved in this process again. I hope you learnt something new\useful from this article, or at least read it with interest.
I wish developers quick fixing of problem places and I'd like to remind that making mistakes is OK, as we are humans. That's why we need extra tools like static analyzers to find what was missed by a person, right? Anyway, good luck with your project and thank you for your work!
In addition, remember that the maximum use from the static analyzer is gotten when its regular use.
All the best!
0