Webinar: Evaluation - 05.12
To assess the quality of PVS-Studio C# diagnostics, we test it on a large number of software projects. Since projects are written by different programmers from different teams and companies, we have to deal with different coding styles, shorthand notations, and simply different language features. In this article, I will give an overview of some of the features offered by the wonderful C# language, as well as the issues that one may run into when writing in this language.
A little note.
This article was mostly written for the sake of curiosity and describes those things that were of interest to me personally.
As we all know, a property is a pair of functions - accessor and mutator - designed for writing or reading the value of a field. At least, things used to be that way before the release of C# version 3.0. In its traditional form, a property used to look like this:
class A
{
int index;
public int Index
{
get { return index; }
set { index = value; }
}
}
Years went by, and both the language standards and properties have acquired a number of new mechanisms.
So, here we go. The C# 3.0 standard brought us the well-known feature that allowed you to omit the field; that is, to declare a property in the following way:
class A
{
public int Index { get; set; }
}
The idea was pushed even further in C# 6.0 by allowing programmers to omit "set" as well:
class A
{
public int Index { get; }
}
It was possible to use this style before C# 6.0 too, but you could not assign anything to a variable declared in such a way. Now it has in fact become an equivalent to readonly fields, i.e. the values of such properties can be assigned only in the constructor.
Properties and fields can be initialized in different ways. For example, like this:
class A
{
public List<int> Numbers { get; } = new List<int>();
}
Or like this:
class A
{
public List<int> Numbers = new List<int>();
}
One more version:
class A
{
public List<int> Numbers => new List<int>();
}
In the last case, though, you will be unpleasantly surprised. You see, what we have actually created there is the following property:
class A
{
public List<int> Numbers { get { return new List<int>(); } }
}
That is, an attempt to fill Numbers with values will inevitably fail; you'll be getting a new list every time.
A a = new A();
a.Numbers.Add(10);
a.Numbers.Add(20);
a.Numbers.Add(30);
So be careful when using shorthand notations, as it may result in long bug-hunting sometimes.
These are not all the interesting features of properties. As I have already said, a property is a pair of functions, and in C# nothing prevents you from changing the parameters of functions.
For example, the following code compiles successfully and even executes:
class A
{
int index;
public int Index
{
get { return index; }
set {
value = 20;
index = value; }
}
}
static void Main(string[] args)
{
A a = new A();
a.Index = 10;
Console.WriteLine(a.Index);
}
However, the program will always output the number "20", but never "10".
You may wonder why one would need to assign the value 20 to value? Well, it appears to make sense. To explain this point, however, we'll have to set our discussion of properties aside for a while and talk about the @ prefix. This prefix allows you to declare variables that resemble keywords in spelling, for example @this, @operator and so on. At the same time, you are not prohibited from inserting this character wherever you please, for example:
class A
{
public int index;
public void CopyIndex(A @this)
{
this.@index = @this.index;
}
}
static void Main(string[] args)
{
A a = new A();
@a.@index = 10;
a.@CopyIndex(new A() { @index = 20 });
Console.WriteLine(a.index);
}
The output, as everywhere in this article, is the number "20", but never "10".
The @ prefix is actually required in one place only: when writing parameter name @this in the CopyIndex function. When used elsewhere, it's just redundant code, which also lacks clarity.
Now that we know all that, let's get back to properties and take a look at the following class:
class A
{
int value;
public int Value
{
get { return @value; }
set { @value = value; }
}
public A()
{
value = 5;
}
}
You may think that the value field of class A will change in the Value property, but it won't, and the following code will output 5, not 10.
static void Main(string[] args)
{
A a = new A();
a.Value = 10;
Console.WriteLine(a.Value);
}
This behavior is the result of the mismatch of @value in get and @value in set. In get, @value will be nothing more but a field of an A class. At the same time, in set, the @valueis a parameter of the set function. Thus we just write value in itself and do not touch value filed in the A class.
Let's first recall different methods of how arrays can be initialized:
string[] test1 = new string[] { "1", "2", "3" };
string[] test2 = new[] { "1", "2", "3" };
string[] test3 = { "1", "2", "3" };
string[,] test4 = { { "11", "12" },
{ "21", "22" },
{ "31", "32" } };
Lists are simpler and there is only one variant of initialization:
List<string> test2 = new List<string>(){ "1", "2", "3" };
Now, what about dictionaries?:
Dictionary<string, int> test =
new Dictionary<string, int>() { { "a-a", 1 },
{ "b-b", 2 },
{ "c-c", 3 } };
This one I saw for the first time, so this section is written mainly because of it:
Dictionary<string, int> test =
new Dictionary<string, int>() {
["a-a"] = 1,
["b-b"] = 2,
["c-c"] = 3
};
LINQ queries are in themselves a convenient feature: you make a sequence of necessary samples and get the required information at the output. Let's first discuss a couple of nice tricks that may not occur to you until you see them. Let's start with a basic example:
void Foo(List<int> numbers1, List<int> numbers2) {
var selection1 = numbers1.Where(index => index > 10);
var selection2 = numbers2.Where(index => index > 10);
}
As you can easily see, the code above contains several identical checks, so it would be better to enclose them in a separate "function":
void Foo(List<int> numbers1, List<int> numbers2) {
Func<int, bool> whereFunc = index => index > 10;
var selection1 = numbers1.Where(index => whereFunc(index));
var selection2 = numbers2.Where(index => whereFunc(index));
}
It looks better now; if functions are large, it's better still. The whereFunc call, however, looks somewhat untidy. Well, it's not a problem either:
void Foo(List<int> numbers1, List<int> numbers2) {
Func<int, bool> whereFunc = index => index > 10;
var selection1 = numbers1.Where(whereFunc);
var selection2 = numbers2.Where(whereFunc);
}
Now the code does look compact and neat.
Now let's talk about the specifics of LINQ-query execution. For example, the following code line won't trigger immediate sampling of data from the numbers1 collection.
IEnumerable<int> selection = numbers1.Where(whereFunc);
Sampling will start only after the sequence has been converted into the List<int> collection:
List<int> listNumbers = selection.ToList();
This nuance may cause a captured variable to be used after its value has changed. Here's a simple example. Suppose we need function Foo to return only those elements of the "{ 1, 2, 3, 4, 5 }" array whose numerical values are less than the current element's index. In other words, we need it to output the following:
0 :
1 :
2 : 1
3 : 1, 2
4 : 1, 2, 3
Our function will have the following signature:
static Dictionary<int, IEnumerable<int>> Foo(int[] numbers)
{ .... }
And this is how we will call it:
foreach (KeyValuePair<int, IEnumerable<int>> subArray in
Foo(new[] { 1, 2, 3, 4, 5 }))
Console.WriteLine(string.Format("{0} : {1}",
subArray.Key,
string.Join(", ", subArray.Value)));
It doesn't seem to be difficult. Now let's write the LINGQ-based implementation itself. This is what it will look like:
static Dictionary<int, IEnumerable<int>> Foo(int[] numbers)
{
var result = new Dictionary<int, IEnumerable<int>>();
for (int i = 0; i < numbers.Length; i++)
result[i] = numbers.Where(index => index < i);
return result;
}
Very easy, isn't it? We just "make" samples from the numbers array one by one.
However, what the program will output in the console is the following:
0 : 1, 2, 3, 4
1 : 1, 2, 3, 4
2 : 1, 2, 3, 4
3 : 1, 2, 3, 4
4 : 1, 2, 3, 4
The problem with our code has to do with the closure in the lambda expression index => index < i. The i variable was captured, but because the lambda expression index => index < i was not called until the string.Join(", ", subArray.Value) function was requested to return, the value that the variable referred to was not the same as when the LINQ query had been formed. When retrieving data from the sample, the i variable was referring to 5, which resulted in incorrect output.
The C++ language is famous for its hacks, workarounds, and other kludges - the series of XXX_cast functions alone counts for a lot. It is commonly believed that C# doesn't have any such things. Well, it's not quite true...
Here are a few keywords, for a start:
These words are unknown to IntelliSense, nor will you find any official MSDN entries on them.
So what are these wonder words?
__makeref takes an object and returns some "reference" to it as an object of type TypedReference. And as for the words __reftype and __refvalue, they are used, respectively, to find out the type and the value of the object referred to by this "reference".
Consider the following example:
struct A { public int Index { get; set; } }
static void Main(string[] args)
{
A a = new A();
a.Index = 10;
TypedReference reference = __makeref(a);
Type typeRef = __reftype(reference);
Console.WriteLine(typeRef); //=> ConsoleApplication23.Program+A
A valueRef = __refvalue(reference, A);
Console.WriteLine(valueRef.Index); //=> 10
}
Well, we could do this "stunt" using more common syntax:
static void Main(string[] args)
{
A a = new A();
a.Index = 10;
dynamic dynam = a;
Console.WriteLine(dynam.GetType());
A valuDynam = (A)dynam;
Console.WriteLine(valuDynam.Index);
}
The dynamic keyword allows us to both use fewer lines and avoid questions like "What's that?" and "How does it work?" that programmers not familiar with those words may ask. That's fine, but here's a somewhat different scenario where dynamic doesn't look that great compared to TypedReference.
static void Main(string[] args)
{
TypedReference reference = __makeref(a);
SetVal(reference);
Console.WriteLine(__refvalue(reference, A).Index);
}
static void SetVal(TypedReference reference)
{
__refvalue(reference, A) = new A() { Index = 20 };
}
The result of executing this code is outputting the number "20" in the console. Sure, we could pass dynamic into the function using ref, and it would work just as well.
static void Main(string[] args)
{
dynamic dynam = a;
SetVal(ref dynam);
Console.WriteLine(((A)dynam).Index);
}
static void SetVal(ref dynamic dynam)
{
dynam = new A() { Index = 20 };
}
Nevertheless, I find the version with TypedReference better, especially when you need to pass the information on and on through other functions.
There is one more wonder word, __arglist, which allows you to declare a variadic function whose parameters can also be of any type.
static void Main(string[] args)
{
Foo(__arglist(1, 2.0, "3", new A[0]));
}
public static void Foo(__arglist)
{
ArgIterator iterator = new ArgIterator(__arglist);
while (iterator.GetRemainingCount() > 0)
{
TypedReference typedReference =
iterator.GetNextArg();
Console.WriteLine("{0} / {1}",
TypedReference.ToObject(typedReference),
TypedReference.GetTargetType(typedReference));
}
}
It is strange that the foreach statement can't be used as an out-of-the-box solution to iterate through a list or access a list element directly. So, it's not that cool as C++ or JavaScript with its arguments :)
function sum() {
....
for(var i=0; i < arguments.length; i++)
s += arguments[i]
}
To sum it up, I'd like to say that C++ and C# are highly flexible languages as far as their grammar goes, and that's why they are convenient to use on the one hand, but don't protect you from typos on the other. There is an established belief that in C# it's impossible to make such mistakes as in C++, but it's just not true. This article demonstrates rather interesting language features, but the bulk of errors in C# has nothing to do with them; instead, they typically occur when writing common if-inductions, like in Infragistics project. For example:
public bool IsValid
{
get {
var valid =
double.IsNaN(Latitude) || double.IsNaN(Latitude) ||
this.Weather.DateTime == Weather.DateTimeInitial;
return valid;
}
}
V3001 There are identical sub-expressions 'double.IsNaN(Latitude)' to the left and to the right of the '||' operator. WeatherStation.cs 25
It is at points like this that human attention tends to weaken, which causes you later to waste a huge amount of time trying to track down "God-knows-what–God-knows-where". So don't miss the chance to protect yourself from bugs with the help of PVS-Studio static code analyzer.
0