As we continue developing PVS-Studio static code analyzer, we often have to check large open-source projects by renowned developers. The fact that even such projects contain a certain amount of bugs adds even more sense and weight to our work. Unfortunately, everybody makes mistakes. No matter how carefully you control your code's quality, there is just no way to avoid "human error". As long as software is developed by humans, analysis tools like PVS-Studio will remain relevant and needed. Today, we are going to discuss errors found in the source code of MSBuild, which is, unfortunately, not perfect either.
Microsoft Build Engine (MSBuild) is a platform by Microsoft for building applications. It is usually used together with Microsoft Visual Studio but does not depend on it. MSBuild provides an XML schema for project files (*.csproj, *.vbproj, *.vcxproj) that controls how the build platform processes and builds software. The tool is shipped as part of the .NET platform and is written in C#.
Microsoft provides MSBuild source files for free upload to GitHub. Taking into account the high development standards applied by Microsoft, finding bugs in MSBuild might be a hard task even for top-quality static analyzers. But success comes with tenacity. With the help of PVS-Studio, version 6.07, we have checked the source code of MSBuild project, and here's what we have found.
MSBuild consists of 14 projects, which include a total of 1256 source files in C#. That makes approximately 440,000 LOC.
PVS-Studio issued 262 warnings for this project. The general analysis statistics with differentiation of warnings across severity levels are shown in the following chart:
As you can see from the chart, the tool issued 73 high-level warnings, 107 medium-level warnings, and 82 low-level warnings. We will focus on the High and Medium levels, as they contain potentially dangerous constructs and genuine bugs, while Low-level warnings, though dealing with errors as well, often turn out to be false positives, so we don't usually discuss them in our articles.
The analysis of MSBuild has revealed that about 45% of the High- and Medium-level warnings point to incorrect code (81 errors), while the rest warnings simply refer to constructs that PVS-Studio finds suspicious, and false positives rather than real errors. Some of the warnings were triggered by unit tests or code with comments about potentially dangerous constructs used to check for exceptions. In any case, the remaining warnings need to be examined by the developers as they are the only people who have the full knowledge of the code and can reliably estimate if the warnings are correct or not.
Based on these data, the PVS-Studio ratio of High- and Medium-level errors to 1 KLOC (i.e. error density) is only 0.184 (errors per 1 KLOC). This figure is not something to be surprised at in case of Microsoft projects and is a sign of the high quality of MSBuild's code.
Now let's discuss the analysis results in detail. Note also that the job of examining all the warnings issued for this project is beyond the scope of our article, so we'll only talk about the most interesting defects, classifying them into groups.
Incorrect null check
PVS-Studio diagnostic message: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'name'. AssemblyRemapping.cs 64
This is probably a classical error: we see it in nearly every project we check. It has to do with casting a variable to a different type using the as operator and testing the same variable, instead of the resulting one, for null:
AssemblyNameExtension name = obj as AssemblyNameExtension;
if (obj == null) // <=
{
return false;
}
Instead, it is the name variable that should be checked:
AssemblyNameExtension name = obj as AssemblyNameExtension;
if (name == null)
{
return false;
}
Late null check
PVS-Studio diagnostic message: V3095 The 'diskRoots' object was used before it was verified against null. Check lines: 2656, 2659. ToolLocationHelper.cs 2656
Note the diskRoots parameter. It is an instance of the List class and can have a value of null. However, the corresponding check is done only in the second if block, after the diskRoots variable has already been used for inserting values into a list:
private static void ExtractSdkDiskRootsFromEnvironment
(List<string> diskRoots, string directoryRoots)
{
if (!String.IsNullOrEmpty(directoryRoots))
{
....
diskRoots.AddRange(splitRoots); // <=
}
if (diskRoots != null) // <=
....
}
There are 8 more potentially dangerous constructs like that in MSBuild:
Wrong assumption about string length
PVS-Studio diagnostic message: V3057 The 'Substring' function could receive the '-1' value while non-negative value is expected. Inspect the second argument. Utilities.cs 328
For the if block to execute, there must be a string consisting of one or more characters, while inside that block, the programmer attempts to get a substring from the original string. Obviously, the second parameter of the Substring method will be negative for a one-character string, so the method will throw an ArgumentOutOfRangeException:
if (toolsVersionList.Length > 0)
{
toolsVersionList = toolsVersionList.Substring(0,
toolsVersionList.Length - 2);
}
This is what a safe version of this code could look like:
if (toolsVersionList.Length > 1)
{
toolsVersionList = toolsVersionList.Substring(0,
toolsVersionList.Length - 2);
}
Other similar errors:
Type conversion with the loss of significance
PVS-Studio diagnostic message: V3041 The expression was implicitly cast from 'long' type to 'float' type. Consider utilizing an explicit type cast to avoid the loss of a fractional part. An example: double A = (double)(X) / Y;. CommunicationsUtilities.cs 593
The variables now and s_lastLoggedTicks are of long type. They take part in some computations that should yield a value of float type. However, since the division operation is done over values of type long and only then is the resulting value cast to type float, it will result in the loss of precision:
float millisecondsSinceLastLog =
(float)((now - s_lastLoggedTicks)/10000L);
Fixed code:
float millisecondsSinceLastLog =
(float)(now - s_lastLoggedTicks)/10000;
Always be careful with computations where integer and floating-point values are used together.
Method that always returns true
PVS-Studio diagnostic message: V3009 It's odd that this method always returns one and the same value of 'true'. ComReference.cs 304
The GetTypeLibNameForITypeLib method returns true no matter what the conditions are:
internal static bool GetTypeLibNameForITypeLib(....)
{
....
if (typeLib2 == null)
{
....
return true; // <=
}
....
try
{
if (data == null || ...)
{
....
return true; // <=
}
....
}
catch (COMException ex)
{
....
return true; // <=
}
return true; // <=
}
At the same time, the value of type bool returned by the GetTypeLibNameForITypeLib method is checked in the caller. Effects of such behavior are unpredictable. This code needs to be refactored and fixed.
Meaningless comparison
PVS-Studio diagnostic message: V3022 Expression 'itemsAndMetadataFound.Metadata.Values.Count > 0' is always true. Evaluator.cs 1752
After the itemsAndMetadataFound.Metadata.Values.Count > 0 expression is evaluated in the external if block, the same check, this time pointless, is done inside that block:
if (itemsAndMetadataFound.Metadata != null &&
itemsAndMetadataFound.Metadata.Values.Count > 0)
{
....
if (itemsAndMetadataFound.Metadata.Values.Count > 0) // <=
{
needToProcessItemsIndividually = true;
}
....
}
In addition, MSBuild contains 7 more issues of this kind:
Mutually exclusive comparisons
PVS-Studio diagnostic message: V3011 Two opposite conditions were encountered. The second condition is always false. Check lines: 2840, 2838. XMake.cs 2840
For the if block to execute, the logger variable must have the null value. However, this variable is again tested for null inside that block, in the VerifyThrow method. So, that second check will always be false:
if (logger == null)
{
InitializationException.VerifyThrow(logger != null, // <=
"LoggerNotFoundError", unquotedParameter);
}
I'm not sure what this code should actually look like, but certainly not like that. Perhaps the if operator is not necessary at all.
Unused arguments in string formatting methods
PVS-Studio diagnostic message: V3025 Incorrect format. A different number of format items is expected while calling 'WriteLine' function. Arguments not used: 1st. Scheduler.cs 2216
The error is lurking in the second line. The programmer must have written it by copying the first line (the infamous copy-paste) and changing the first parameter in the copied code, but they forgot to remove the second parameter, _schedulingData.EventTime.Ticks, which was no more necessary:
file.WriteLine("Scheduler state at timestamp {0}:",
_schedulingData.EventTime.Ticks);
file.WriteLine("------------------------------------------------",
_schedulingData.EventTime.Ticks); // <=
So, the method WriteLine(string format, object arg0) is overridden incorrectly in the second line.
Other similar defects:
Unused parameter
PVS-Studio diagnostic message: V3061 Parameter 'numericValue' is always rewritten in method body before being used. NodePacketTranslator.cs 320
The list of the method's formal parameters includes variable numericValue whose value is never used as it is immediately replaced with a new value:
public void TranslateEnum<T>(ref T value, int numericValue)
{
numericValue = _reader.ReadInt32(); // <=
Type enumType = value.GetType();
value = (T)Enum.ToObject(enumType, numericValue);
}
Perhaps the programmer did some refactoring, but changing the method's signature (unlike its body) was not possible. Otherwise, it is better to fix the method:
public void TranslateEnum<T>(ref T value)
{
int numericValue = _reader.ReadInt32();
Type enumType = value.GetType();
value = (T)Enum.ToObject(enumType, numericValue);
}
Another similar warning:
Redundant assignment
PVS-Studio diagnostic message: V3005 The '_nextProjectId' variable is assigned to itself. LoggingService.cs 325
The analyzer detected a construct with an extra assignment to field _nextProjectId. The result of the MaxCPUCount + 2 expression is added to the value of _nextProjectId, and then the resulting value is assigned to the same field using the += operator. After that, this value is again assigned to the _nextProjectId field:
public int NextProjectId
{
get
{
lock (_lockObject)
{
_nextProjectId = _nextProjectId += MaxCPUCount + 2; // <=
return _nextProjectId;
}
}
}
There is no error in this code, but it does look odd. The construct should be simplified:
public int NextProjectId
{
get
{
lock (_lockObject)
{
_nextProjectId += MaxCPUCount + 2;
return _nextProjectId;
}
}
}
As a conclusion, I'd like to say that even such high-quality projects as MSBuild could benefit quite a lot from regular checks of their source code for potential and actual errors by static analyzers like PVS-Studio.
Feel free to use the demo version of PVS-Studio analyzer to check this project and take a look at the warnings we have discussed, as well as to check your own projects.
0