Webinar: Parsing C++ - 10.10
C# has low barriers to entry and forgives a lot. Seriously, you may not understand how things work under the hood but still write code and remain easy-going about this. Though you still have to deal with different nuances over time. Today, we'll look at one of such subtle aspects - handling enumerations.
Rarely do we get the chance to find a developer who hasn't encountered enumerations. However, anyone can make an error when using them. It is more likely if:
Besides, in practice the problems below may not be issues for your application. However, if such code repeatedly executes (like tens of millions of times) and starts inconveniencing, you'll already know what you're dealing with.
Note. All the research we will be doing below has been done for .NET Framework. It's an important comment. We'll talk about .NET a bit later.
I encountered this problem not long ago when I was dealing with various optimizations of the C# PVS-Studio analyzer. Yes, we already had one article on this subject, but I think there will be more.
During this process, I was fixing various places in code. As practice has shown, even small edits can boost performance if made in the app's bottlenecks.
At some point, based on the profiling results, I got to VariableAnnotation class. We'll consider its simple version:
enum OriginType
{
Field,
Parameter,
Property,
....
}
class VariableAnnotation<T> where T : Enum
{
public T Type { get; }
public SyntaxNode OriginatingNode { get; }
public VariableAnnotation(SyntaxNode originatingNode, T type)
{
OriginatingNode = originatingNode;
Type = type;
}
public override bool Equals(object obj)
{
if (obj is null)
return false;
if (obj is not VariableAnnotation<T> other)
return false;
return Enum.Equals(this.Type, other.Type)
&& this.OriginatingNode == other.OriginatingNode;
}
public override int GetHashCode()
{
return this.OriginatingNode.GetHashCode()
^ this.Type.GetHashCode();
}
}
Now let's write two simple methods in which:
Corresponding methods:
static void EqualsTest()
{
var ann1 = new VariableAnnotation<OriginType>(new SyntaxNode(),
OriginType.Parameter);
var ann2 = new VariableAnnotation<OriginType>(new SyntaxNode(),
OriginType.Parameter);
while (true)
{
var eq = Enum.Equals(ann1, ann2);
}
}
static void GetHashCodeTest()
{
var ann = new VariableAnnotation<OriginType>(new SyntaxNode(),
OriginType.Parameter);
while (true)
{
var hashCode = ann.GetHashCode();
}
}
If you run any of these methods and watch the application in dynamics, you can note nasty specifics: it puts pressure on GC.
For example, this can be seen in Visual Studio ''Diagnostic Tools'' window.
Process Hacker on the ".NET performance" tab of process information also shows this.
Above examples clearly indicate two culprits:
Let's deal with them one-by-one.
Here's the code we'll investigate next:
static void EnumEqTest(OriginType originLhs, OriginType originRhs)
{
while (true)
{
var eq = Enum.Equals(originLhs, originRhs);
}
}
The first thing experts will pay attention to is that there is no Enum.Equals. IDE will help here, by the way. In this case, the Object.Equals(object objA, object objB) method is called.
The IDE itself drops a hint about this:
We work with instances of value type, whereas we need reference types to call the method. Therefore, boxing will take place before the method call. By the way, if you look into the IL code, you can find boxing commands:
.method private hidebysig static void
EnumEqTest(valuetype EnumArticle.Program/OriginType originLhs,
valuetype EnumArticle.Program/OriginType originRhs) cil managed
{
// Code size 20 (0x14)
.maxstack 8
IL_0000: ldarg.0
IL_0001: box EnumArticle.Program/OriginType
IL_0006: ldarg.1
IL_0007: box EnumArticle.Program/OriginType
IL_000c: call bool [mscorlib]System.Object::Equals(object,
object)
IL_0011: pop
IL_0012: br.s IL_0000
}
Here we clearly see the call of the System.Object::Equals(object, object) method. The command of arguments boxing - box (IL_0001, IL_0007) - is also called.
Since we box objects only to call the method, the corresponding references are not saved anywhere. Hence, the boxed objects will be cleaned up during garbage collection.
Note. Someone may say — everyone can see that Enum.Equals == Object.Equals. Look, even IDE highlights this. The answer is no, no, and again no. The simplest proof is that such code was written. And I'm sure some developers use a similar way of comparison. As for "obviousness", very often people fall into the trap of thinking that if something is obvious to them, it's obvious to everyone. That's not the case.
If we change the Enum.Equals call (in fact — Object.Equals) to compare through '==', we get rid of unnecessary boxing:
var eq = originLhs == originRhs;
However, we should remember that the generic code version (the VariableAnnotation type was generic) will not compile:
static void EnumEq<T>(T originLhs, T originRhs) where T : Enum
{
while (true)
{
// error CS0019: Operator '==' cannot be applied
// to operands of type 'T' and 'T'
var eq = originLhs == originRhs;
}
}
Calls of instance Enum.Equals and Enum.CompareTo methods will not work out for us—they entail boxing.
The way out can be the generic EqualityComparer<T> type. For example, one can safely use a default comparator. The code will roughly look as follows:
static void EnumEq<T>(T originLhs, T originRhs) where T : Enum
{
while (true)
{
var eq = EqualityComparer<T>.Default.Equals(originLhs, originRhs);
}
}
The EqualityComparer<T>.Equals(T x, T y) method receives arguments of generic type, and therefore does not require boxing (at least before its call). Inside the method call, it's okay too.
Boxing commands are gone in the IL command code:
.method private hidebysig static void
EnumEq<([mscorlib]System.Enum) T>(!!T originLhs,
!!T originRhs) cil managed
{
// Code size 15 (0xf)
.maxstack 8
IL_0000: call
class [mscorlib]System.Collections.Generic.EqualityComparer`1<!0>
class [mscorlib]System.Collections.Generic.EqualityComparer`1<!!T>
::get_Default()
IL_0005: ldarg.0
IL_0006: ldarg.1
IL_0007: callvirt
instance bool class
[mscorlib]System.Collections.Generic.EqualityComparer`1<!!T>::Equals(!0,
!0)
IL_000c: pop
IL_000d: br.s IL_0000
}
Visual Studio profiler doesn't capture any garbage collection events in this code.
Process Hacker indicates the same thing.
You might become interested in how EqualityComparer<T> really works on the inside. As for me, I got curious. The source code of this type is available, for example, at referencesource.microsoft.com.
Now consider what is going on with the Enum.GetHashCode method. Let's start with the following code:
static void EnumGetHashCode(OriginType origin)
{
while (true)
{
var hashCode = origin.GetHashCode();
}
}
You may be surprised by what is happening here: boxing and as a result the GC pressure. The profiler and Process Hacker signals us about this again.
So why not indulge yourself and get nostalgic? Let's compile this code via Visual Studio 2010. We'll get the IL code like this:
.method private hidebysig static void EnumGetHashCode(valuetype
EnumArticleVS2010.Program/OriginType origin) cil managed
{
// Code size 14 (0xe)
.maxstack 8
IL_0000: ldarg.0
IL_0001: box EnumArticleVS2010.Program/OriginType
IL_0006: callvirt instance int32 [mscorlib]System.Object::GetHashCode()
IL_000b: pop
IL_000c: br.s IL_0000
}
Everything seems to be expected: the box command is in the right place (IL_0001). This answers the question where the boxing and the GC pressure come from.
Let's return to the modern world and now compile the code in Visual Studio 2019. We got the following IL code:
.method private hidebysig static void
EnumGetHashCode(valuetype EnumArticle.Program/OriginType origin) cil managed
{
// Code size 16 (0x10)
.maxstack 8
IL_0000: ldarga.s origin
IL_0002: constrained. EnumArticle.Program/OriginType
IL_0008: callvirt instance int32 [mscorlib]System.Object::GetHashCode()
IL_000d: pop
IL_000e: br.s IL_0000
}
Suddenly, the box command disappeared (just like a pencil in "The Dark Knight"). Yet the boxing and the GC pressure remained. At this point I decided to check out the Enum.GetHashCode() implementation at referencesource.microsoft.com.
[System.Security.SecuritySafeCritical]
public override unsafe int GetHashCode()
{
// Avoid boxing by inlining GetValue()
// return GetValue().GetHashCode();
fixed (void* pValue = &JitHelpers.GetPinningHelper(this).m_data)
{
switch (InternalGetCorElementType())
{
case CorElementType.I1:
return (*(sbyte*)pValue).GetHashCode();
case CorElementType.U1:
return (*(byte*)pValue).GetHashCode();
case CorElementType.Boolean:
return (*(bool*)pValue).GetHashCode();
....
default:
Contract.Assert(false, "Invalid primitive type");
return 0;
}
}
}
The most intriguing part here is the comment "Avoid boxing... ". It's like something doesn't add up...
Boxing must be missing, as well as the box command in the IL code. But memory allocation in the managed heap and garbage collection events are in place.
Let's see the CIL specification to get a better deal with IL code. I cite the method call again so that you have it right in front of your eyes:
ldarga.s origin
constrained. EnumArticle.Program/OriginType
callvirt instance int32 [mscorlib]System.Object::GetHashCode()
As for the ldarga.s instruction, it's all simple. The address of the method argument is loaded to the evaluation stack.
Next comes the constrained. prefix. Prefix format:
constrained. thisType
Stack transition:
..., ptr, arg1, ... argN -> ..., ptr, arg1, ... arg
Depending on what thisType is, the way the ptr managed pointer is handled differs:
As noted in the specification, the latter case is only possible when the method is declared in System.Object, System.ValueType, and System.Enum and not is overridden in the child type.
The second case in the list above allows you to exclude an object boxing when a method is called, if possible. But we faced a third case. GetHashCode is overridden in System.Enum. System.Enum is the base type for OriginType. However, the enumeration itself does not override the methods from System.Enum. This is why the boxing happens when they are called.
I'd like to emphasize that this is relevant for any value types. If you don't override the base method, the object will be boxed to call it.
struct MyStructBoxing
{
private int _field;
}
struct MyStructNoBoxing
{
private int _field;
public override int GetHashCode()
{
return _field;
}
}
static void TestStructs(MyStructBoxing myStructBoxing,
MyStructNoBoxing myStructNoBoxing)
{
while (true)
{
var hashCode1 = myStructBoxing.GetHashCode(); // boxing
var hashCode2 = myStructNoBoxing.GetHashCode(); // no boxing
}
}
But let's go back to the enumerations. We can't override the method in enumeration. So what can we do with them?
The System.Collections.Generic.EqualityComparer<T> type that I have mentioned before may be really helpful here. This type contains the generic GetHashCode method - public abstract int GetHashCode(T obj):
var hashCode = EqualityComparer<OriginType>.Default.GetHashCode(_origin);
As I said earlier, everything said above was relevant to the .NET Framework. Let's see how things are going in .NET, shall we?
As expected, boxing is present. No surprise here, as we still need to call the Object.Equals(object, object) method. So it's not worth comparing enumeration elements in this way anyway.
Speaking about the Enum.Equals instance method, the argument still has to be boxed.
And this is where a nice surprise was waiting for me!
Let's recall the code example:
static void GetHashCodeTest(OriginType origin)
{
while (true)
{
var hashCode = origin.GetHashCode();
}
}
Let me remind you that when you run this code in .NET Framework, new temporary objects are created because of boxing. The result is additional GC pressure.
But nothing similar happens when using .NET (and .NET Core)! No temporary objects, no GC pressure.
Okay, we kind of dealt with the boxing issue. Let's move on to the performance question. At the same time, we'll compare the speed of the same code for .NET Framework and .NET.
All the code for the compared methods is the same. There will be two differences: how we compare enumeration elements and how we get hash codes.
Description of comparison ways used in methods:
Execution times are compared below.
.NET Framework 4.8
.NET 5
I'm thrilled with the results of EqualityComparer<T> on .NET 5. As for the performance, we got about the same time as in direct comparison of enumeration items. Kudos to Microsoft! When you update the target framework/runtime, you get optimization out of the box without changing C# code.
Description of ways to get hash code used in methods:
The first and the last points are clear now. The second and third are hash code hacks, inspired by Enum.GetHashCode and Int32.GetHashCode implementations. They are still unresistant to changes of underlying type and not very obvious. I'm not encouraging to write like this. Yet I added them to the tests for the sake of interest.
Execution times are compared below.
.NET Framework 4.8
.NET 5
We've got 2 good news at once:
C# is cool. You can code in it for years and not know about nuances related to basic things: why out -parameters can remain uninitialized; why the result of nullable-value boxing can be null; why boxing happens when you call GetHashCode for enumerations. And when you have to deal with something like this, it may be extremely engaging to get the point. I'm getting high from that. I hope you do as well.
As usual, consider subscribing to my Twitter so you don't miss out on anything noteworthy.
0