Sergey Vasiliev

Oct 30 2020

Tags:

#CSharp #Knowledge

Check how you remember nullable value types. Let's peek under the hood

Oct 30 2020

Author: Sergey Vasiliev

Investigation
- Simple way
- Interesting way
Conclusion

Recently nullable reference types have become trendy. Meanwhile, the good old nullable value types are still here and actively used. How well do you remember the nuances of working with them? Let's jog your memory or test your knowledge by reading this article. Examples of C# and IL code, references to the CLI specification, and CoreCLR code are provided. Let's start with an interesting case.

Note. If you are interested in nullable reference types, you can read several articles by my colleagues: "Nullable Reference types in C# 8.0 and static analysis", "Nullable Reference will not protect you, and here is the proof".

Take a look at the sample code below and answer what will be output to the console. And, just as importantly, why. Just let's agree right away that you will answer as it is: without compiler hints, documentation, reading literature, or anything like that. :)

static void NullableTest()
{
  int? a = null;
  object aObj = a;

  int? b = new int?();
  object bObj = b;

  Console.WriteLine(Object.ReferenceEquals(aObj, bObj)); // True or False?
}

Well, let's do some thinking. Let's take a few main lines of thought that I think may arise.

1. Assume that int? is a reference type.

Let's reason, that int? is a reference type. In this case, null will be stored in a, and it will also be stored in aObj after assignment. A reference to an object will be stored in b. It will also be stored in bObj after assignment. As a result, Object.ReferenceEquals will take null and a non-null reference to the object as arguments, so...

That needs no saying, the answer is False!

2. Assume that int? is a value type.

Or maybe you doubt that int? is a reference type? And you are sure of this, despite the int? a = null expression? Well, let's go from the other side and start from the fact that int? is a value type.

In this case, the expression int? a = null looks a bit strange, but let's assume that C# got some extra syntactic sugar. Turns out, a stores an object. So does b. When initializing aObj and bObj variables, objects stored in a and b will be boxed, resulting in different references being stored in aObj and bObj. So, in the end, Object.ReferenceEquals takes references to different objects as arguments, therefore...

That needs no saying, the answer is False!

3. We assume that here we use Nullable<T>.

Let's say you didn't like the options above. Because you know perfectly well that there is no int?, but there is a value type Nullable<T>, and in this case Nullable<int> will be used. You also realize that a and b will actually have the same objects. With that, you remember that storing values in aObj and bObj will result in boxing. At long last, we'll get references to different objects. Since Object.ReferenceEquals gets references to the different objects...

That needs no saying, the answer is False!

4. ;)

For those who started from value types - if a suspicion crept into your mind about comparing links, you can view the documentation for Object.ReferenceEquals at docs.microsoft.com. In particular, it also touches on the topic of value types and boxing/unboxing. Except for the fact that it describes the case, when instances of value types are passed directly to the method, whereas we made the boxing separately, but the main point is the same.

When comparing value types, if objA and objB are value types, they are boxed before they are passed to the ReferenceEquals method. This means that if both objA and objB represent the same instance of a value type, the ReferenceEquals method nevertheless returns false, as the following example shows.

Here we could have ended the article, but the thing is that... the correct answer is True.

Well, let's figure it out.

Investigation

There are two ways - simple and interesting.

Simple way

int? is Nullable<int>. Open documentation on Nullable<T>, where we look at the section "Boxing and Unboxing". Well, that's all, see the behavior description. But if you want more details, welcome to the interesting path. ;)

Interesting way

There won't be enough documentation on this path. It describes the behavior, but does not answer the question 'why'?

What are actually int? and null in the given context? Why does it work like this? Are there different commands used in the IL code or not? Is behavior different at the CLR level? Is it another kind of magic?

Let's start by analyzing the int? entity to recall the basics, and gradually get to the initial case analysis. Since C# is a rather "sugary" language, we will sometimes refer to the IL code to get to the bottom of things (yes, C# documentation is not our cup of tea today).

int?, Nullable<T>

Here we will look at the basics of nullable value types in general: what they are, what they are compiled into in IL, etc. The answer to the question from the case at the very beginning of the article is discussed in the next section.

Let's look at the following code fragment:

int? aVal = null;
int? bVal = new int?();
Nullable<int> cVal = null;
Nullable<int> dVal = new Nullable<int>();

Although the initialization of these variables looks different in C#, the same IL code will be generated for all of them.

.locals init (valuetype [System.Runtime]System.Nullable`1<int32> V_0,
              valuetype [System.Runtime]System.Nullable`1<int32> V_1,
              valuetype [System.Runtime]System.Nullable`1<int32> V_2,
              valuetype [System.Runtime]System.Nullable`1<int32> V_3)

// aVal
ldloca.s V_0
initobj  valuetype [System.Runtime]System.Nullable`1<int32>

// bVal
ldloca.s V_1
initobj  valuetype [System.Runtime]System.Nullable`1<int32>

// cVal
ldloca.s V_2
initobj  valuetype [System.Runtime]System.Nullable`1<int32>

// dVal
ldloca.s V_3
initobj  valuetype [System.Runtime]System.Nullable`1<int32>

As you can see, in C# everything is heartily flavored with syntactic sugar for our greater good. But in fact:

int? is a value type.
int? is the same as Nullable<int>. The IL code works with Nullable<int32>
int? aVal = null is the same as Nullable<int> aVal = new Nullable<int>(). In IL, this is compiled to an initobj instruction that performs default initialization by the loaded address.

Let's consider this code:

int? aVal = 62;

We're done with the default initialization - we saw the related IL code above. What happens here when we want to initialize aVal with the value 62?

Look at the IL code:

.locals init (valuetype [System.Runtime]System.Nullable`1<int32> V_0)
ldloca.s   V_1
ldc.i4.s   62
call       instance void valuetype 
           [System.Runtime]System.Nullable`1<int32>::.ctor(!0)

Again, nothing complicated - the aVal address pushes onto the evaluation stack, as well as the value 62. After the constructor with the signature Nullable<T>(T) is called. In other words, the following two statements will be completely identical:

int? aVal = 62;
Nullable<int> bVal = new Nullable<int>(62);

You can also see this after checking out the IL code again:

// int? aVal;
// Nullable<int> bVal;
.locals init (valuetype [System.Runtime]System.Nullable`1<int32> V_0,
              valuetype [System.Runtime]System.Nullable`1<int32> V_1)

// aVal = 62
ldloca.s   V_0
ldc.i4.s   62
call       instance void valuetype
                           [System.Runtime]System.Nullable`1<int32>::.ctor(!0)

// bVal = new Nullable<int>(62)
ldloca.s   V_1
ldc.i4.s   62
call       instance void valuetype
                           [System.Runtime]System.Nullable`1<int32>::.ctor(!0)

And what about the checks? What does this code represent?

bool IsDefault(int? value) => value == null;

That's right, for better understanding, we will again refer to the corresponding IL code.

.method private hidebysig instance bool
IsDefault(valuetype [System.Runtime]System.Nullable`1<int32> 'value')
cil managed
{
  .maxstack  8
  ldarga.s   'value'
  call       instance bool valuetype 
             [System.Runtime]System.Nullable`1<int32>::get_HasValue()
  ldc.i4.0
  ceq
  ret
}

As you may have guessed, there is actually no null - all that happens is accessing the Nullable<T>.HasValue property. In other words, the same logic in C# can be written more explicitly in terms of the entities used, as follows.

bool IsDefaultVerbose(Nullable<int> value) => !value.HasValue;

IL code:

.method private hidebysig instance bool 
IsDefaultVerbose(valuetype [System.Runtime]System.Nullable`1<int32> 'value')
cil managed
{
  .maxstack  8
  ldarga.s   'value'
  call       instance bool valuetype 
             [System.Runtime]System.Nullable`1<int32>::get_HasValue()
  ldc.i4.0
  ceq
  ret
}

Let's recap.

Nullable value types are implemented using the Nullable<T> type;
int? is actually a constructed type of the unbound generic value type Nullable<T>;
int? a = null is the initialization of an object of Nullable<int> type with the default value, no null is actually present here;
if (a == null) - again, there is no null, there is a call of the Nullable<T>.HasValue property.

The source code of the Nullable<T> type can be viewed, for example, on GitHub in the dotnet/runtime repository - a direct link to the source code file. There's not much code there, so check it out just for kicks. From there, you can learn (or recall) the following facts.

For convenience, the Nullable<T> type defines:

implicit conversion operator from T to Nullable<T>;
explicit conversion operator from Nullable<T> to T.

The main logic of work is implemented by two fields (and corresponding properties):

T value - the value itself, the wrapper over which is Nullable<T>;
bool hasValue - the flag indicating "whether the wrapper contains a value". It's in quotation marks, since in fact Nullable<T> always contains a value of type T.

Now that we've refreshed our memory about nullable value types, let's see what's going on with the boxing.

Nullable<T> boxing

Let me remind you that when boxing an object of a value type, a new object will be created on the heap. The following code snippet illustrates this behavior:

int aVal = 62;
object obj1 = aVal;
object obj2 = aVal;

Console.WriteLine(Object.ReferenceEquals(obj1, obj2));

The result of comparing references is expected to be false. It is due to 2 boxing operations and creating of 2 objects whose references were stored in obj1 and obj2

Now let's change int to Nullable<int>.

Nullable<int> aVal = 62;
object obj1 = aVal;
object obj2 = aVal;

Console.WriteLine(Object.ReferenceEquals(obj1, obj2));

The result is expectedly false.

And now, instead of 62, we write the default value.

Nullable<int> aVal = new Nullable<int>();
object obj1 = aVal;
object obj2 = aVal;

Console.WriteLine(Object.ReferenceEquals(obj1, obj2));

Aaand... the result is unexpectedly true. One might wonder that we have all the same 2 boxing operations, two created objects and references to two different objects, but the result is true!

Yeah, it's probably sugar again, and something has changed at the IL code level! Let's see.

Example N1.

C# code:

int aVal = 62;
object aObj = aVal;

IL code:

.locals init (int32 V_0,
              object V_1)

// aVal = 62
ldc.i4.s   62
stloc.0

// aVal boxing
ldloc.0
box        [System.Runtime]System.Int32

// saving the received reference in aObj
stloc.1

Example N2.

C# code:

Nullable<int> aVal = 62;
object aObj = aVal;

IL code:

.locals init (valuetype [System.Runtime]System.Nullable`1<int32> V_0,
              object V_1)

// aVal = new Nullablt<int>(62)
ldloca.s   V_0
ldc.i4.s   62
call       instance void
           valuetype [System.Runtime]System.Nullable`1<int32>::.ctor(!0)

// aVal boxing
ldloc.0
box        valuetype [System.Runtime]System.Nullable`1<int32>

// saving the received reference in aObj
stloc.1

Example N3.

C# code:

Nullable<int> aVal = new Nullable<int>();
object aObj = aVal;

IL code:

.locals init (valuetype [System.Runtime]System.Nullable`1<int32> V_0,
              object V_1)

// aVal = new Nullable<int>()
ldloca.s   V_0
initobj    valuetype [System.Runtime]System.Nullable`1<int32>

// aVal boxing
ldloc.0
box        valuetype [System.Runtime]System.Nullable`1<int32>

// saving the received reference in aObj
stloc.1

As we can see, in all cases boxing happens in the same way - values of local variables are pushed onto the evaluation stack (ldloc instruction). After that the boxing itself occurs by calling the box command, which specifies what type we will be boxing.

Next we refer to Common Language Infrastructure specification, see the description of the box command, and find an interesting note regarding nullable types:

If typeTok is a value type, the box instruction converts val to its boxed form. ... If it is a nullable type, this is done by inspecting val's HasValue property; if it is false, a null reference is pushed onto the stack; otherwise, the result of boxing val's Value property is pushed onto the stack.

This leads to several conclusions that dot the 'i':

the state of the Nullable<T> object is taken into account (the HasValue flag we discussed earlier is checked). If Nullable<T> does not contain a value (HasValue - false), the result of boxing is null;
if Nullable<T> contains a value (HasValue - true), it is not a Nullable<T> object that is boxed, but an instance of type T that is stored in the value field of type Nullable<T>;
specific logic for handling Nullable<T> boxing is not implemented at the C# level or even at the IL level - it is implemented in the CLR.

Let's go back to the examples with Nullable<T> that we touched upon above.

First:

Nullable<int> aVal = 62;
object obj1 = aVal;
object obj2 = aVal;

Console.WriteLine(Object.ReferenceEquals(obj1, obj2));

The state of the instance before the boxing:

T -> int;
value -> 62;
hasValue -> true.

The value 62 is boxed twice. As we remember, in this case, instances of the int type are boxed, not Nullable<int>. Then 2 new objects are created, and 2 references to different objects are obtained, the result of their comparing is false.

Second:

Nullable<int> aVal = new Nullable<int>();
object obj1 = aVal;
object obj2 = aVal;

Console.WriteLine(Object.ReferenceEquals(obj1, obj2));

The state of the instance before the boxing:

T -> int;
value -> default (in this case, 0 - a default value for int);
hasValue -> false.

Since is hasValue is false, objects are not created. The boxing operation returns null which is stored in variables obj1 and obj2. Comparing these values is expected to return true.

In the original example, which was at the very beginning of the article, exactly the same thing happens:

static void NullableTest()
{
  int? a = null;       // default value of Nullable<int>
  object aObj = a;     // null

  int? b = new int?(); // default value of Nullable<int>
  object bObj = b;     // null

  Console.WriteLine(Object.ReferenceEquals(aObj, bObj)); // null == null
}

For the sake of interest, let's look at the CoreCLR source code from the dotnet/runtime repository mentioned earlier. We are interested in the file object.cpp, specifically, the Nullable::Box method with the logic we need:

OBJECTREF Nullable::Box(void* srcPtr, MethodTable* nullableMT)
{
  CONTRACTL
  {
    THROWS;
    GC_TRIGGERS;
    MODE_COOPERATIVE;
  }
  CONTRACTL_END;

  FAULT_NOT_FATAL();      // FIX_NOW: why do we need this?

  Nullable* src = (Nullable*) srcPtr;

  _ASSERTE(IsNullableType(nullableMT));
  // We better have a concrete instantiation, 
  // or our field offset asserts are not useful
  _ASSERTE(!nullableMT->ContainsGenericVariables());

  if (!*src->HasValueAddr(nullableMT))
    return NULL;

  OBJECTREF obj = 0;
  GCPROTECT_BEGININTERIOR (src);
  MethodTable* argMT = nullableMT->GetInstantiation()[0].AsMethodTable();
  obj = argMT->Allocate();
  CopyValueClass(obj->UnBox(), src->ValueAddr(nullableMT), argMT);
  GCPROTECT_END ();

  return obj;
}

Here we have everything we discussed earlier. If we don't store the value, we return NULL:

if (!*src->HasValueAddr(nullableMT))
    return NULL;

Otherwise we initiate the boxing:

OBJECTREF obj = 0;
GCPROTECT_BEGININTERIOR (src);
MethodTable* argMT = nullableMT->GetInstantiation()[0].AsMethodTable();
obj = argMT->Allocate();
CopyValueClass(obj->UnBox(), src->ValueAddr(nullableMT), argMT);

Conclusion

You're welcome to show the example from the beginning of the article to your colleagues and friends just for kicks. Will they give the correct answer and justify it? If not, share this article with them. If they do it - well, kudos to them!

I hope it was a small but exciting adventure. :)

P.S. Someone might have a question: how did we happen to dig that deep in this topic? We were writing a new diagnostic rule in PVS-Studio related to Object.ReferenceEquals working with arguments, one of which is represented by a value type. Suddenly it turned out that with Nullable<T> there is an unexpected subtlety in the behavior when boxing. We looked at the IL code - there was nothing special about the box. Checked out the CLI specification - and gotcha! The case promised to be rather exceptional and noteworthy, so here's the article right in front of you.

P.P.S. By the way, recently, I have been spending more time on Twitter where I post some interesting code snippets and retweet some news in the .NET world and so on. Feel free to look through it and follow me if you want (link to the profile).

#CSharp #Knowledge