Webinar: Parsing C++ - 10.10
It's very interesting to check large projects. As a rule, we do manage to find unusual and peculiar errors, and tell people about them. Also, it's a great way to test our analyzer and improve all its different aspects. I've long been waiting to check 'Mono'; and finally, I got the opportunity. I should say that this check really proved its worth as I was able to find a lot of entertaining things. This article is about the bugs we found, and several nuances which arose during the check.
Mono is a project for creating a full-fledged implementation of .NET Framework which is free, and open source. The main developer of Mono - Xamarin Corporation, previously Novell.
Mono is a set of tools, including a C# compiler, the implementation environment .NET-mono (with JIT support) and mint (without JIT support), a debugger, a set of libraries, including the implementations of WinForms, ADO.NET and ASP.NET, also the compilers smcs (to create applications for Moonlight), and vbc (for applications, written on VB.NET).
In the scope of this project there are also bindings for the graphics library GTK+ to the .NET platform.
The source code is available from the repository on GitHub. The number of lines of code for analysis from the repository, downloaded from GitHub was about 3.6 million (excluding empty lines). Such a large codebase looks very attractive - errors must definitely be hiding somewhere there. On the other hand, the analysis of such a large project would be useful for the analyzer itself, as it will serve as a great stress-test.
The analysis tool - PVS-Studio static code analyzer. At this point the analyzer has more than 100 diagnostic rules, each of them is described in the documentation, containing information about the error, possible consequences and ways to correct it. Since the time of release, we managed to check a large number of various projects written in C#, such as Roslyn, Xamarin.Forms, Space Engineers, CoreFX, Code Contracts, and others (you can have a look at the full list via this link)
The analyzer itself is available via this link. The trial version should be enough to estimate the full value of this tool. If you are interested in this tool, you can write to us and we will provide a key for a closer acquaintance with the tool, and help you to set it up.
I also want to note that in the article, there were no errors from the files containing any mentions of Microsoft corporation. It is done mostly to avoid duplicating these errors, with others described in other articles. In any case, we've got enough material.
As always, this article doesn't contain all errors, as it would make it too large. I've tried to pick the most interesting fragments, but a lot of them still remained outside the scope of this article. Don't think that I want to blame the authors of 'Mono' for something. The number is so big, due to the project size, which is logical. Nevertheless, it would be great to fix the ones found, and to avoid new bugs getting into the code. Implementation of static analysis would be of great help in this. More details can be found in the corresponding section.
In an ideal world, checking a project and writing an article is carried out according to the following scenario: find a project -> build it -> run the analyzer on it -> find a decent amount of bugs -> write an article. Everyone is happy: we put a tick next to the checked project, people are reading a new article, the developers learned about bugs in the code, the author is praised for a good job.
Unfortunately, our world is not perfect. Quite often problems occur at various stages of this process. If there is a detailed manual of how to build the project, or it can be done ourselves - great! Then we can safely proceed with checking the project and writing an article. Otherwise we have a massive headache. That's exactly what happened with 'Mono'. The solution net_4_x.sln, combining C# projects doesn't get compiled "from the box" (i.e. right after downloading it from the repository). One of the building scripts was working incorrectly (there was a wrong path (possibly due to the fact that the hierarchy of directories was changed over time)), but fixing the path was also of no help.
Of course, I didn't want to give up; so I experimented with the compilation, even in my free time. But it didn't bring much result. Finally, having spent quite a number of hours on it, we decided to write the article "as is".
From time to time I state in the articles that the project should be compiled for proper analysis - with all the dependencies, without any bugs, and so on. As a rule, I try doing it this way; but there are always exceptions to the rule, as in this case, for example.
Of course, it's a bad idea to check an uncompiled project for several reasons:
Nevertheless, there were many suspicious fragments, some of which will be described below.
Lately, we try providing detailed statistics about the checked project: the total number of warnings, the number of false positives, and real mistakes.
Unfortunately, this time I can't bring such statistics. Firstly, there is a lot of code, as well as warnings. If the number of warnings to be analyzed is a few dozen, then they can be viewed and given a rough estimate. When the number of warnings is several hundred, then the analysis task becomes something far from trivial.
Secondly, this statistic may vary for a fully compiled project: the number can either increase or decrease. The analyzer may get more semantic information in a compiled project, which means that it can perform a more in-depth analysis (false positives will disappear, new warnings will be displayed). For those who are interested in how the semantic information affects the analysis, and which principles rely in its work, I suggest reading the article "Introduction to Roslyn. Using static analysis tools for program development".http://www.viva64.com/en/b/0399/
But you probably can't wait to see which interesting things can be found in the code project? Well, let's look at some code fragments.
The same subexpressions within a single expression
This is one of the most widespread mistakes; there are plenty of reasons for this. This could be copy-paste, similar variable names, overuse of IntelliSense, and simple inattentiveness. The programmer was distracted for a second - and so he made a mistake.
public int ExactInference (TypeSpec u, TypeSpec v)
{
....
var ac_u = (ArrayContainer) u;
var ac_v = (ArrayContainer) v;
....
var ga_u = u.TypeArguments;
var ga_v = v.TypeArguments;
....
if (u.TypeArguments.Length != u.TypeArguments.Length) // <=
return 0;
....
}
PVS-Studio warning: V3001 There are identical sub-expressions 'u.TypeArguments.Length' to the left and to the right of the '!=' operator. generic.cs 3135
Now, when the method code can not be simplified, it won't be hard to notice an error in the if statement - the parameter v, not u, should be used as an instance of the TypeSpec type. Perhaps the error was due to the fact that the characters u and v look quite the similar, and it's easy to confuse them if the person isn't focused on this expression.
The rest of the code was given to underline that these parameters are usually used together,
if (u.TypeArguments.Length != v.TypeArguments.Length)
return 0;
A case that is also of interest:
bool BetterFunction (....)
{
....
int j = 0;
....
if (!candidate_params &&
!candidate_pd.FixedParameters [j - j].HasDefaultValue) { // <=
return true;
}
....
if (!candidate_pd.FixedParameters [j - 1].HasDefaultValue &&
best_pd.FixedParameters [j - 1].HasDefaultValue)
return true;
if (candidate_pd.FixedParameters [j - 1].HasDefaultValue &&
best_pd.HasParams)
return true;
....
}
PVS-Studio warning: V3001 There are identical sub-expressions 'j' to the left and to the right of the '-' operator. ecore.cs 4832
The programmer made a mistake having written an expression j - j in one of the expressions for index evaluation. Thus there will be an access to the first element of the array. If it is exactly what is needed here, it would be more logical to use an integer literal, equal to 0. Other accesses by the index to this array: j - 1 prove the fact that it's a bug. Again I can suppose that the bug wasn't noticed because of some similarity in the characters j and 1, so it can go unnoticed, when looking quickly through the code.
Note by a colleague Andrey Karpov. When I was reading the draft of this article, I was about to make a mark that Sergey sited a wrong code fragment. I was looking at the code and didn't see the error. Only when I started reading the description, I got the idea. I confirm that this typo is very difficult to notice. Our PVS-Studio is awesome!
Let's continue blowing our minds:
internal void SetRequestLine (string req)
{
....
if ((ic >= 'A' && ic <= 'Z') ||
(ic > 32 && c < 127 && c != '(' && c != ')' && c != '<' &&
c != '<' && c != '>' && c != '@' && c != ',' && c != ';' &&
c != ':' && c != '\\' && c != '"' && c != '/' && c != '[' &&
c != ']' && c != '?' && c != '=' && c != '{' && c != '}'))
continue;
....
}
PVS-Studio warning: V3001 There are identical sub-expressions 'c != '<'' to the left and to the right of the '&&' operator. HttpListenerRequest.cs 99
The subexpression c != '<' is written twice within the expression. This is probably just an extra comparison.
protected virtual bool ShouldSerializeLinkHoverColor ()
{
return grid_style.LinkHoverColor != grid_style.LinkHoverColor;
}
PVS-Studio warning: V3001 There are identical sub-expressions 'grid_style.LinkHoverColor' to the left and to the right of the '!=' operator. DataGrid.cs 2225
I didn't have to simplify the code to make the error more obvious. Two similar subexpressions are involved in the comparison - grid_style.LinkHoverColor.
I.e., the code was probably meant to be like this:
protected virtual bool ShouldSerializeLinkHoverColor ()
{
return grid_style.LinkHoverColor != default_style.LinkHoverColor;
}
Why this way? In the code above there is a number of methods, where various properties of grid_style are compared with the properties of the object default_style. But in the latter case, the programmer let his guard down and made a mistake. Hmm... a last line effect?
Well, these errors are just classic:
static bool AreEqual (VisualStyleElement value1,
VisualStyleElement value2)
{
return
value1.ClassName == value1.ClassName && // <=
value1.Part == value2.Part &&
value1.State == value2.State;
}
PVS-Studio warning: V3001 There are identical sub-expressions 'value1.ClassName' to the left and to the right of the '==' operator. ThemeVisualStyles.cs 2141
The subexpression value1.ClassName was accidentally compared with itself. Of course, in the second case the object value2 should be used.
I think if we use table formatting for the alignment of the code, the mistake will be more difficult to notice. It's good way to prevent such typos from getting to the code:
static bool AreEqual (VisualStyleElement value1,
VisualStyleElement value2)
{
return
value1.ClassName == value1.ClassName
&& value1.Part == value2.Part
&& value1.State == value2.State;
}
The code, formatted in such a way is much easier to read, and makes it easier to notice that there is something wrong with one of the columns. See chapter 13 from the book "The Ultimate Question of Programming, Refactoring, and Everything" for more details.
The other suspicious fragments, detected by a diagnostic rule V3001 are given in the file.
Similar conditions in the construction 'else if'
enum TitleStyle {
None = 0,
Normal = 1,
Tool = 2
}
internal TitleStyle title_style;
public Point MenuOrigin {
get {
....
if (this.title_style == TitleStyle.Normal) { // <=
pt.Y += caption_height;
} else if (this.title_style == TitleStyle.Normal) { // <=
pt.Y += tool_caption_height;
}
....
}
PVS-Studio warning: V3003 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 597, 599. Hwnd.cs 597
The same expression this.title_style == TitleStyle.Normal is checked twice. Apparently, this code has an error. In spite of the value of the expression, given above, the expression pt.Y += tool_caption_height will never be executed. I can suggest that in the second case the programmer intended to compare the field title_style with the constant TitleStyle.Tool.
The same bodies 'if-then' and 'if-else'
public static void DrawText (....)
{
....
if (showNonPrint)
TextRenderer.DrawTextInternal (g, text, font,
new Rectangle (new Point ((int)x, (int)y), max_size), color,
TextFormatFlags.NoPadding | TextFormatFlags.NoPrefix, false);
else
TextRenderer.DrawTextInternal (g, text, font,
new Rectangle (new Point ((int)x, (int)y), max_size), color,
TextFormatFlags.NoPadding | TextFormatFlags.NoPrefix, false);
....
}
PVS-Studio warning: V3004 The 'then' statement is equivalent to the 'else' statement. System.Windows.Forms-net_4_x TextBoxTextRenderer.cs 79
The static method DrawTextInternal of the TextRenderer class with the same arguments will be called regardless of the value of the variable showNonPrint. It is possible that the mistake was made due to the use of copy-paste. The method call was copied, but the arguments remained forgotten.
The return value of a method is not used
public override object ConvertTo(.... object value,
Type destinationType)
{
....
if (destinationType == typeof(string)) {
if (value == null) {
return String.Empty;
}
else {
value.ToString();
}
}
....
}
PVS-Studio warning: V3010 The return value of function 'ToString' is required to be utilized. ColumnTypeConverter.cs 91
It's quite an interesting error, with apparently far-reaching consequences. You can see from the code that there is a type check, and if the type is string, then there is verification against null. Then starts the most interesting part; if the value reference has a null value, then the empty string returns, otherwise... Most likely, it was expected that the program would return a string object representation, but there is no return statement. Therefore, the return value of the method ToString() will not be used in any way, and the ConvertTo method will be executed further on. Thus, due to the forgotten return statement, the whole logic of the program has been changed. I assume that the correct version of the code should look like this:
if (value == null) {
return String.Empty;
}
else {
return value.ToString();
}
You will find out later the bug that we mean here
Usually I simplify the methods, so that the mistake is easier to see. Let's play a game this time. Find an error in the following code fragment. To make it more interesting, I will not tell you the type of error, and won't simplify the code (I already give here only part of the method).
You can click on the picture to enlarge it.
Well, how's it going? For some reason I think that most people haven't even tried. But I won't tease you anymore.
PVS-Studio warning: V3012 The '?:' operator, regardless of its conditional expression, always returns one and the same value: Color.FromArgb (150, 179, 225). ProfessionalColorTable.cs 258
Here it is, the unfortunate ternary operator:
button_pressed_highlight = use_system_colors ?
Color.FromArgb (150, 179, 225) :
Color.FromArgb (150, 179, 225);
Regardless of the value of the variable use_system_colors, the object button_pressed_highlight will be assigned the same value. If you think that such errors can sometimes be difficult to track, I suggest looking at the whole file (ProfessionalColorTable.cs) and understand that such errors are not just hard to keep track of yourself - it is simply impossible.
There were quite a number of similar fragments (as many as 32), which makes me doubt that it is a real bug, but some intended action. Nevertheless, the code looks strange, so I would suggest rechecking it. Even if this is not an error, but expected logic, it would be much easier to use simple assignment, rather than to write strange confusing ternary operators. The other V3012 warnings are given in the file.
Using a counter of a different loop
public override bool Equals (object obj)
{
if (obj == null)
return false;
PermissionSet ps = (obj as PermissionSet);
if (ps == null)
return false;
if (state != ps.state)
return false;
if (list.Count != ps.Count)
return false;
for (int i=0; i < list.Count; i++) {
bool found = false;
for (int j=0; i < ps.list.Count; j++) {
if (list [i].Equals (ps.list [j])) {
found = true;
break;
}
}
if (!found)
return false;
}
return true;
}
PVS-Studio warning: V3015 It is likely that a wrong variable is being compared inside the 'for' operator. Consider reviewing 'i' corlib-net_4_x PermissionSet.cs 607
The exit condition from the nested loop i < ps.list.Count looks suspicious. The variable j works here as a loop counter, but in the exit condition the variable i is used as a counter of an outer loop.
The intention of the author of the code is quite understandable - to check that the collections contain the same elements. But if an element from the list collection isn't on the ps.list, then the exit from the nested loop won't be done with the help of a break operator. At the same time the variable i doesn't change inside this loop, i.e. the expression i < ps.list.Count will always have a true value. As a result, the loop will be executed until there is a collection index out of bound (because of the constant increment of the j counter).
Verification against null of a wrong reference after its casting with the usage of an as operator
It turned out that this is a typical error for C#. We find it in almost every project that we write an article about. As a rule, V3019 detects cases of the following kind:
var derived = base as Derived;
if (base == null) {
// do something
}
// work with 'derived' object
The check base == null will save only if base really has a null value, and then it doesn't matter if we can do the casting or not. Apparently, the check of the derived reference was meant here. Then, if base != null, and the program failed to do the casting, but further on there is handling with members of the derived object, we'll get an exception of NullReferenceException type.
Moral: If you use this pattern, make sure that you verify a proper reference against null.
But this is all theory. Let's see what we managed to find in practice:
public override bool Equals (object o)
{
UrlMembershipCondition umc = (o as UrlMembershipCondition);
if (o == null)
return false;
....
return (String.Compare (u, 0, umc.Url, ....) == 0); // <=
}
PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'o', 'umc'. UrlMembershipCondition.cs 111
This pattern is just the same as the one described above. If the type of the object o isn't compatible with the type UrlMembershipCondition, and at the same time the o object isn't null, then upon the attempt to access the property umc.Url, we'll have the exception NullReferenceException.
Thus, to fix the error, we need to correct the check:
if (umc == null)
return false;
Take a look at another bungle:
static bool QSortArrange (.... ref object v0, int hi,
ref object v1, ....)
{
IComparable cmp;
....
cmp = v1 as IComparable;
if (v1 == null || cmp.CompareTo (v0) < 0) {
....
}
....
}
PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'v1', 'cmp'. Array.cs 1487
This situation is similar to that described above. The only difference - in the case of unsuccessful behavior, the exception NullReferenceException will be generated right away - right during the check of the expression.
The situation is quite the same in several other fragments, so, I'll provide 12 more warnings in a text file.
Unconditional exception throw
public void ReadEmptyContent(XmlReader r, string name)
{
....
for (r.MoveToContent();
r.NodeType != XmlNodeType.EndElement;
r.MoveToContent())
{
if (r.NamespaceURI != DbmlNamespace)
r.Skip();
throw UnexpectedItemError(r); // <=
}
....
}
PVS-Studio warning: V3020 An unconditional 'throw' within a loop. System.Data.Linq-net_4_x XmlMappingSource.cs 180
During the first iteration we'll get the exception UnexpectedItemError generated. At least, it looks strange. By the way, Visual Studio highlights an object r in the section where there is a change in the loop counter, with a hint about unreachable code. Perhaps, the author of the code just didn't use Visual Studio or didn't notice the warnings, so the bug remained in the code.
Suspicious 'if' statements
Quite often we see errors when there are two similar 'if' statements in the method, and the value of the objects used in the conditional expressions of these statements aren't changed. If any of these conditional expressions is true, then the body of the method body will be exited. Thus, the second 'if' will never be executed. Let's look at a code fragment, which contains just such an error:
public int LastIndexOfAny (char [] anyOf, int startIndex, int count)
{
....
if (this.m_stringLength == 0)
return -1;
....
if (this.m_stringLength == 0)
return -1;
....
}
PVS-Studio warning: V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless corlib-net_4_x String.cs 287
The method execution will never reach the second if statement given in this fragment, because if this.m_stringLength == 0, then the exit will be done upon the execution of the first conditional statement. We could justify the code, if the value of the field m_stringLength changed, but it's not so.
The consequences of the bug depend on the reason why it appeared:
An example of a more serious case can be seen in the following code fragment (click on the image to enlarge):
Of course, it's not hard to find an error in this code. Just kidding, of course it's not easy. Not for the analyzer. Let's use our good old method of simplifying the code, to see the bug more clearly:
private PaperKind GetPaperKind (int width, int height)
{
....
if (width == 1100 && height == 1700)
return PaperKind.Standard11x17;
....
if (width == 1100 && height == 1700)
return PaperKind.Tabloid;
....
}
PVS-Studio warning: V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless System.Drawing-net_4_x PrintingServicesUnix.cs 744
If the expression width == 1100 && height == 1700 is true, then only the first if statement will be executed. However, the values returned by this expression in case it's true, are different, so we can't just say that the second if statement is redundant. Moreover, perhaps there should be another expression in its condition. Obviously, the workflow of the program is damaged.
Finally, I would like to look at another piece of code with this error:
private void SerializeCore (SerializationStore store,
object value, bool absolute)
{
if (value == null)
throw new ArgumentNullException ("value");
if (store == null)
throw new ArgumentNullException ("store");
CodeDomSerializationStore codeDomStore =
store as CodeDomSerializationStore;
if (store == null)
throw new InvalidOperationException ("store type unsupported");
codeDomStore.AddObject (value, absolute);
}
PVS-Studio warning: V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless System.Design-plaindesign-net_4_x CodeDomComponentSerializationService.cs 562
This warning has a lot in common with V3019 warning, as we have the pattern of the verification against null after the casting with using the as operator of a wrong reference. No matter which warning is issued - the bug is quite evident.
There were other similar warnings:
Suspicious format strings
V3025 diagnostic rule detects wrong format strings. This is also the type of error, which we find in many projects we check. There are usually situations of two kinds:
In the first case, an exception of the type FormatException will be thrown, in the second case, the unused arguments will simply be ignored. Anyway, such fragments are worth reviewing and fixing.
Of course, I wouldn't speak about this diagnostic rule, if there were no similar errors found.
static IMessageSink GetClientChannelSinkChain(string url, ....)
{
....
if (url != null)
{
string msg = String.Format (
"Cannot create channel sink to connect to URL {0}.
An appropriate channel has probably not been registered.",
url);
throw new RemotingException (msg);
}
else
{
string msg = String.Format (
"Cannot create channel sink to connect to the remote object.
An appropriate channel has probably not been registered.",
url);
throw new RemotingException (msg);
}
....
}
PVS-Studio warning: V3025 Incorrect format. A different number of format items is expected while calling 'Format' function. Arguments not used: url. corlib-net_4_x RemotingServices.cs 700
I want to draw your attention to the second format string. It is a string literal, which does not provide the substitution of arguments (unlike the format string above). However, the Format method takes an url object as the second argument. It follows from the above that the url object will simply be ignored upon the forming of a new string, and information about it will not get into the text of the exception.
In C# 6.0, interpolated strings were added, which in some cases will help avoid the problems associated with the use of format strings, including incorrect number of arguments.
Let's look at one more erroneous code fragment:
public override string ToString ()
{
return string.Format ("ListViewSubItem {{0}}", text);
}
PVS-Studio warning: V3025 Incorrect format. A different number of format items is expected while calling 'Format' function. Arguments not used: text. System.Windows.Forms-net_4_x ListViewItem.cs 1287
Judging by the format string we can draw the conclusion that text in the curly brackets should be written in the result string. In fact, the resulting string will be like this:
"ListViewSubItem {{0}}"
To fix this bug, we could use interpolated strings to rewrite the method:
public override string ToString ()
{
return $"ListViewSubItem {{{text}}}",;
}
Or, staying faithful to the String.Format method, we should add a curly bracket on each side. Then the format string would look as follows:
"ListViewSubItem {{{0}}}"
Here is the last fragment with a format string. As always, the most interesting thing is served as a dessert:
void ReadEntropy ()
{
if (reader.IsEmptyElement)
throw new XmlException (
String.Format ("WS-Trust Entropy element is empty.{2}",
LineInfo ()));
....
}
PVS-Studio warning: V3025 Incorrect format. A different number of format items is expected while calling 'Format' function. Format items not used: {2}. Arguments not used: 1st. System.ServiceModel-net_4_x WSTrustMessageConverters.cs 147
I have no idea how a formatting element with the index '2' got into the format string, but it leads to quite an amusing error. It was meant to throw an exception with some text that is made by a format string. And an exception will be thrown. An exception of FormatException type, because the current format string requires 3 arguments (because it is the third that is needed), and only one is presented.
If the programmer confused only the number of the requested argument (during refactoring, for example), then this bug will be easy to fix:
"WS-Trust Entropy element is empty.{0}"
Other suspicious fragments detected by the rule V3025 are given in this file.
Access by null reference
private bool IsContractMethod (string methodName,
Method m,
out TypeNode genericArgument)
{
....
return m.Name != null && m.Name == methodName &&
(m.DeclaringType.Equals (this.ContractClass) // <=
|| (m.Parameters != null &&
m.Parameters.Count == 3 &&
m.DeclaringType != null && // <=
m.DeclaringType.Name != ContractClassName));
}
PVS-Studio warning: V3027 The variable 'm.DeclaringType' was utilized in the logical expression before it was verified against null in the same logical expression. Mono.CodeContracts-net_4_x ContractNodes.cs 211
Before accessing the Name property of the DeclaringType property, the programmer decided to play it safe and verify the DeclaringType property against null so that he does not access a null reference accidentally. The wish to do so is understandable and quite legitimate. The only thing is that it won't have any effect, because later in the code we see that the instance method Equals for the DeclaringType property, which means that if DeclaringType == null, we'll get an exception of the type NullReferenceException. To solve this problem, we can move the verification against null higher in the code, or use a null-conditional operator ('?.') which is available in C# 6.0.
Another case.
Node leftSentinel;
....
IEnumerator<T> GetInternalEnumerator ()
{
Node curr = leftSentinel;
while ((curr = curr.Nexts [0]) != rightSentinel && curr != null) {
....
}
}
PVS-Studio warning: V3027 The variable 'curr' was utilized in the logical expression before it was verified against null in the same logical expression. Mono.Parallel-net_4_x ConcurrentSkipList.cs 306
Again, the same situation. If curr == null, then we'll have the exit from the loop. If curr was originally null (at the moment of the execution of the code leftSentinel == null), we'll get the exception NullReferenceException again.
Redundant check
From time to time we see expressions of the following kind or similar to them:
!aa || (aa && bb)
They can be simplified to an expression of the following kind:
!aa || bb
In some cases you get some performance gain (albeit minor), but also the second option is easier to read when it is logically equivalent to the first (if subexpression aa does not change between the calls).
It is possible that instead of aa there was supposed to be another sub-expression:
!aa || (cc && bb)
Then we are talking about a real error. Anyway, in PVS-Studio there is a nice diagnostic rule, V3031, which detects cases like this one. Let's have a look at several more code fragments that were found with its help:
public void Emit(OpCode opc)
{
Debug.Assert(opc != OpCodes.Ret || (
opc == OpCodes.Ret && stackHeight <= 1));
....
}
PVS-Studio warning: V3031 An excessive check can be simplified. The '||' operator is surrounded by opposite expressions. mcs-net_4_x ILGenerator.cs 456
Redundant code. Accessing the object opc doesn't change its value, so this expression can be simplified:
Debug.Assert(opc != OpCodes.Ret || stackHeight <= 1));
Another code fragment:
public bool Validate (bool checkAutoValidate)
{
if ((checkAutoValidate && (AutoValidate != AutoValidate.Disable)) ||
!checkAutoValidate)
return Validate ();
return true;
}
PVS-Studio warning: V3031 An excessive check can be simplified. The '||' operator is surrounded by opposite expressions. System.Windows.Forms-net_4_x ContainerControl.cs 506
This situation is similar to the previous one. The expression can be easily and painlessly simplified like this:
!checkAutoValidate || (AutoValidate != AutoValidate.Disable)
Some of the warnings that I've selected are given in the file.
Code formatting that doesn't comply with the program logic
When creating diagnostic rules like the V3033, we were debating how relevant they are. The thing is that the diagnostics, related to the code formatting are quite peculiar, as most of the editors/development environments (the very Visual Studio) already formats the code as it is being written. Therefore, the probability of making such a mistake is quite small. I rarely see errors of this kind, but there were a couple of them in 'Mono'.
public bool this [ string header ] {
set {
....
if (value)
if (!fields.Contains (header))
fields.Add (header, true);
else
fields.Remove (header);
}
}
PVS-Studio warning: V3033 It is possible that this 'else' branch must apply to the previous 'if' statement. HttpCacheVaryByHeaders.cs 159
The code is formatted in such a way that it may seem that else refers to the first if statement. But it doesn't matter for the compiler how the code is formatted, because it will interpret this fragment in its own way, connecting else with the second if statement, as it should be. An interesting bug. The code aligned according to the given logic should be like this:
if (value)
if (!fields.Contains (header))
fields.Add (header, true);
else
fields.Remove (header);
A similar warning appeared again: V3033 It is possible that this 'else' branch must apply to the previous 'if' statement. HttpCacheVaryByParams.cs 102
One more diagnostic rule can be referred to this category - V3043.
Incorrect code:
public void yyerror (string message, string[] expected) {
....
for (int n = 0; n < expected.Length; ++ n)
ErrorOutput.Write (" "+expected[n]);
ErrorOutput.WriteLine ();
....
}
PVS-Studio warning: V3043 The code's operational logic does not correspond with its formatting. The statement is indented to the right, but it is always executed. It is possible that curly brackets are missing. cs-parser.cs 175
Judging by the code formatting (and forgetting about the programming rules), we can think that both method calls (Write and Writeline) refer to the for statement. In fact, only Write method will be called in the loop. There is definitely something wrong with this code! If the programmer really meant such logic (it may seem to be logical indeed - the elements get displayed, after which an empty string is inserted), why do we need formatting which is really misleading? On the other hand, it's hard to understand the true logic of the statement right away. It is for a reason that programmers stick to particular formatting styles.
private string BuildParameters ()
{
....
if (result.Length > 0)
result.Append (", ");
if (p.Direction == TdsParameterDirection.InputOutput) // <=
result.Append (String.Format("{0}={0} output",
p.ParameterName));
else
result.Append (FormatParameter (p));
....
}
PVS-Studio warning: V3043 The code's operational logic does not correspond with its formatting. The statement is indented to the right, but it is always executed. It is possible that curly brackets are missing. Tds50.cs 379
Then the second if statement is in no way related to the first one. Why should we mislead people working with this code?
public void Restore ()
{
while (saved_count < objects.Count)
objects.Remove (objects.Last ().Key);
referenced.Remove (objects.Last ().Key);
saved_count = 0;
referenced.RemoveRange (saved_referenced_count,
referenced.Count - saved_referenced_count);
saved_referenced_count = 0;
}
PVS-Studio warning: V3043 The code's operational logic does not correspond with its formatting. The statement is indented to the right, but it is always executed. It is possible that curly brackets are missing. XamlNameResolver.cs 81
Apparently, it was planned to remove objects from the collections and referenced values corresponding to a specific key. At the same time the programmer forgot the curly brackets, as a result only one value will be removed from the referenced collection. What is more interesting - putting curly brackets won't be enough here, as in this case during every loop iteration, from the referenced collection the object will be removed not by the key that was used during the removal from the objects collection. This happens due to the fact that at the time of calling the Remove method on the referenced collection, the objects collection will be changed, and thus the Last method will return a different item.
There were more error warnings related to the error in formatting, that don't comply with the program logic. Here are some of them:
Casting an object to its type/checking the object's compatibility with its type
V3051 diagnostic rule is responsible for such situations. As a rule, it finds the redundant code like this:
String str;
String str2 = str as String;
or
String str;
if (str is String)
But sometimes we see far more interesting cases.
Let's look at the following code fragment:
public string GenerateHttpGetMessage (Port port,
OperationBinding obin,
Operation oper,
OperationMessage msg)
{
....
MimeXmlBinding mxb =
(MimeXmlBinding) obin.Output
.Extensions
.Find (typeof(MimeXmlBinding))
as MimeXmlBinding;
if (mxb == null) return req;
....
}
PVS-Studio warning: V3051 An excessive type cast. The object is already of the 'MimeXmlBinding' type. SampleGenerator.cs 232
It may seem that there is nothing bad about superfluous casting. A little below we see that mxb is verified against null, so if the type is not compatible - it's okay. But nothing of the kind. The method Find returns an instance of the Object type, after which it is explicitly cast to the MimeXmlBinding type, and only after it is cast to the same type using the as operator. However, an explicit cast operator, if the argument has an incompatible type, does not return null (unlike the as operator), and throws the exception of InvalidCastException type. In the end, the check mxb == null will not help if the types are cast incorrectly.
The remaining warnings are not so interesting (e.g. excessive casting), so I'll give them a list in a text file.
The parameter is rewritten to the body of the method before it is used
The mere fact that a method parameter is immediately overwritten looks suspicious. Because the value received by the method isn't used in any way, but gets lost/ignored. Ideally, we should fix the method signature, and make the parameter as a local variable. The truth is, this approach is not always possible. For example, when you implement an interface or virtual functions.
internal static int SetErrorInfo (int dwReserved,
IErrorInfo errorInfo)
{
int retVal = 0;
errorInfo = null;
....
retVal = _SetErrorInfo (dwReserved, errorInfo);
....
}
PVS-Studio warning: V3061 Parameter 'errorInfo' is always rewritten in method body before being used. corlib-net_4_x Marshal.cs 1552
The value, received as an errorInfo parameter is not used in any way. The parameter is zeroed immediately and then passed to a method. In this case it would be logical to make errorInfo a local variable (if it is possible to change the method signature).
The other fragments are quite similar, so I'll put them in a list again in the following file.
Incorrect initialization of static members
class ResXResourceWriter : IResourceWriter, IDisposable
{
....
public static readonly string ResourceSchema = schema;
....
static string schema = ....;
....
}
PVS-Studio warning: V3070 Uninitialized variable 'schema' is used when initializing the 'ResourceSchema' variable. ResXResourceWriter.cs 59
The programmer wanted to set the value of a public static field ResourceSchema, read-only, equal to another static field - schema. We couldn't go without an error here. At the moment of the ResourceSchema initialization, the field schema will be initialized by the default value (in this case - null). It is unlikely that the developer meant for this.
Erroneous initialization of a static field decorated with [ThreadStatic] attribute
It's a rare and interesting bug. Let's look at the following code fragment:
static class Profiler
{
[ThreadStatic]
private static Stopwatch timer = new Stopwatch();
....
}
PVS-Studio warning: V3089 Initializer of a field marked by [ThreadStatic] attribute will be called once on the first accessing thread. The field will have default value on different threads. System.Data.Linq-net_4_x Profiler.cs 16
Decoration of a field by an attribute [ThreadStatic] means that in each thread, the value of this field will be unique. Looks simple. But the thing is that such fields cannot be initialized, either at the declaration, or in a static constructor. This is a great way to shoot yourself in the foot (or even both feet) and receive an error that will be very hard to catch.
In fact, if the initialization is performed during the declaration, the files will be initialized with the value only of the first thread that accesses it. For the other threads the field will have the default value (in this case - null, because Stopwatch - is a reference type). A static constructor will be also called only once, when accessing from the first thread. Consequently, in the remaining threads the file will be initialized to the default value.
The error is fairly complex, so I strongly recommend reading the documentation for the diagnostic rule to prevent such situations, and not spend precious time on debugging.
Intersecting ranges
public void EmitLong (long l)
{
if (l >= int.MinValue && l <= int.MaxValue) {
EmitIntConstant (unchecked ((int) l));
ig.Emit (OpCodes.Conv_I8);
} else if (l >= 0 && l <= uint.MaxValue) {
EmitIntConstant (unchecked ((int) l));
ig.Emit (OpCodes.Conv_U8);
} else {
ig.Emit (OpCodes.Ldc_I8, l);
}
}
PVS-Studio warning: V3092 Range intersections are possible within conditional expressions. Example: if (A > 0 && A < 5) { ... } else if (A > 3 && A < 9) { ... }. mcs-net_4_x codegen.cs 742
The analyzer found suspicious intersection in the expressions:
The range [0, Int32.MaxValue] is common for both of these expressions, so if the variable l has a value within this range, then we'll have the first condition true, despite the fact that the second could also be true.
Similar warnings:
Accessing the collection item by a constant index, carried out inside the loop
It's common practice to give short names to the counters. There is nothing wrong with calling a loop counter i or j - it's clear what this variable is (except in cases where the counters require more meaningful names). But there are times when not the loop counter, but a numeric literal is used as an index. It is clear that this is done deliberately, but sometimes it indicates an error. The diagnostic rule V3102 looks for a similar erroneous fragments.
public void ConvertGlobalAttributes (
TypeContainer member,
NamespaceContainer currentNamespace,
bool isGlobal)
{
var member_explicit_targets = member.ValidAttributeTargets;
for (int i = 0; i < Attrs.Count; ++i) {
var attr = Attrs[0];
if (attr.ExplicitTarget == null)
continue;
....
}
}
PVS-Studio warning: V3102 Suspicious access to element of 'Attrs' object by a constant index inside a loop. mcs-net_4_x attribute.cs 1272
On each iteration of the loop the variable attr is initialized with the same value - Attrs[0]. Further on it gets handled (the properties get called, it is passed to a method). I doubt that the programmer intended to work with the same value during all the loop iterations, so I suppose, the correct initialization should be like this:
var attr = Attrs[i];
There were similar errors in two more fragments:
Unsafe locks
We often see the code lock(this) or lock(typeof(....)) in the projects we check. This is not the best way to lock, as it can cause deadlocks. But let's go through this step by step. First, let's have a look at the dangerous code:
public RegistryKey Ensure (....)
{
lock (typeof (KeyHandler)){
....
}
}
PVS-Studio warning: V3090 Unsafe locking on a type. All instances of a type will have the same 'Type' object. corlib-net_4_x UnixRegistryApi.cs 245
A common problem with the possible deadlock is the fact that the lock is carried out on the same object. Therefore, the general tip, which will help to get rid of these problems is to use an object inaccessible from outside as a lock object - a local variable or a private field of the class.
What is the problem with the typeof operator? This operator always returns an instance of the Type type, the same for the same argument, therefore, the rule described above gets violated, and we get the problem of locking on the same object.
There were several fragments like this in the code:
The situation with the method GetType() isn't much different from the one described above:
void ConfigureHttpChannel (HttpContext context)
{
lock (GetType())
{
....
}
}
PVS-Studio warning: V3090 Unsafe locking on a type. All instances of a type will have the same 'Type' object. System.Runtime.Remoting-net_4_x HttpRemotingHandlerFactory.cs 61
GetType() method also returns an instance of the Type type, so if the lock is implemented somewhere using the GetType() method or the operator typeof, for the object of the same type - we'll have a possibility of a deadlock.
Now I would like to consider the locking on objects of the String type, because the situation gets way more interesting here and it gets much easier to make an error and harder to detect it.
const string Profiles_SettingsPropertyCollection =
"Profiles.SettingsPropertyCollection";
....
static void InitProperties ()
{
....
lock (Profiles_SettingsPropertyCollection) {
if (_properties == null)
_properties = properties;
}
}
PVS-Studio warning: V3090 Unsafe locking on an object of type 'String'. System.Web-net_4_x ProfileBase.cs 95
The main point remains the same - locking on the same object, but the details are more interesting. The access to the objects of the String type can be received from a different application domain (it is connected with the mechanism of string internment). Imagine what it is like to debug a deadlock that appeared because of the fact that in different application domains, the programmer used the same string as the lock object. The tip is very short - don't use the objects of the String type (and the Thread, too). You can find the description of these and other problems, connected with the usage of synchronization mechanism in the documentation for the diagnostic rule V3090.
Incorrect comparison
public bool Equals (CounterSample other)
{
return
rawValue == other.rawValue &&
baseValue == other.counterFrequency &&
counterFrequency == other.counterFrequency &&
systemFrequency == other.systemFrequency &&
timeStamp == other.timeStamp &&
timeStamp100nSec == other.timeStamp100nSec &&
counterTimeStamp == other.counterTimeStamp &&
counterType == other.counterType;
}
PVS-Studio warning: V3112 An abnormality within similar comparisons. It is possible that a typo is present inside the expression 'baseValue == other.counterFrequency'. System-net_4_x CounterSample.cs 139
I didn't the change the code formatting deliberately, instead copying the original version. The idea of the method is clear and there are no problems in understanding it - the programmer compares the field of the current object and a field of an object that was received as an argument. But even such a seemingly simple method contains an error. I am sure that if the code formatting was done better, it would have been easier to notice it. Here is the same code fragment, but now I formatted it:
public bool Equals (CounterSample other)
{
return
rawValue == other.rawValue &&
baseValue == other.counterFrequency && // <=
counterFrequency == other.counterFrequency && // <=
systemFrequency == other.systemFrequency &&
timeStamp == other.timeStamp &&
timeStamp100nSec == other.timeStamp100nSec &&
counterTimeStamp == other.counterTimeStamp &&
counterType == other.counterType;
}
Having formatted the code like this, it's much easier to notice the error - it's obvious that the same field is compared with different fields of a current object. As a result, the logic is broken.
It is clearly seen in this fragment that the formatting plays an important role! Even if the formatting style makes no difference to the compiler, it is important for the programmers, as it helps avoid trivial bugs and simplifies understanding of the code and its logic.
Ok, we have found the errors. Plenty of errors. If we count the errors that are described in the article, and those given in the files, then it's more than 167 bugs! I would like to note that these aren't all the erroneous/suspicious fragments - I have just chosen these ones for the article (which wasn't a problem). The majority of errors are left beyond the scope of this article, because it is already quite long.
There might arise a reasonable question - how to fix all of them? How to integrate a static analyzer into the project?
It is unlikely that there will be a separate team that will only be fixing bugs (although it can be our team, as it was with the Unreal Engine). It would be correct to note all the found bugs and gradually fix them, trying not to make new ones.
To simplify the first task there is a mechanism of mass warning suppression. This will help to differentiate the old and new errors, tracking the number of fixed and newly created bugs only in the new code. "Old" bugs are fixed separately.
Incremental analysis mode is designed to solve the second problem. This mode starts the analysis on the developer's machine immediately after compilation, allowing you to detect fresh errors and correct them before they get into the version control system (VCS).
Still, there will be errors that will get to the VCS. To detect the bugs as quickly as possible after they get to the VCS, it will be a good idea to implement static analysis into nightly builds, and use the analysis results. In which way? For example, you can notify the person responsible for the project, as well as the programmer who allowed this error to get into the repository.
It is important to correct such errors without delay, so that they don't get cluttered with more code, and the task does not become even more complicated.
Using the tips given above (integration of static analysis on the build server and on the machines of the developers) will reduce the cost of fixing errors, because when the process is properly organized, they will be fixed as soon as possible (before reaching the testers and certainly not getting to the release versions).
You may find more details about the integration of static analysis into the development process in the article "What is a quick way to integrate static analysis in a big project?".
There were comments to one of the articles: "You write that you check open-source products with your analyzer, but in fact, you are checking your analyzer!". There is some truth in that.
Indeed, we are constantly working on improvement of our analyzer, thus checking projects helps us to make it better - correct false positives, teach the analyzer to find new bugs, and so on. This part of the analysis usually remains beyond the scope of the article, as it is mostly interesting to the analyzer developers.
Still, we check open projects and most importantly, find real bugs there. What's more, we don't just find them, but inform the developers and all those who are interested in it. It's up to a person how to use this information further on. I suppose, the open-source community benefits from these checks. Not so long ago, we had a significant event: We found over 10000 bugs in various open source projects!
All in all, our main goal is to popularize the static analysis methodology in general, and the PVS-Studio analyzer in particular. Our articles are a great way to show how to use this methodology.
Try PVS-Studio on your project: http://www.viva64.com/en/pvs-studio/
Of course, we are a little sad that we couldn't check the compiled project, so that there wasn't a full and deep analysis. On the other hand, there was enough material, so the developers should think about ways to fix them and integrate static analysis into their development process. We are always glad to help with that.
0