Webinar: C++ semantics - 06.11
New version of .NET enhanced the performance of the Min, Max, Average and Sum methods for arrays and lists. How much do you think their execution speed has increased? 2x or 5x? No, they got even faster. Let's see how that was achieved.
LINQ (Language-Integrated Query) is a simple and convenient query language. It allows you to express complex operations in a simple way. Almost every .NET developer uses LINQ. However, this simplicity of use comes at a price of the execution speed and extra memory allocation. In most situations, it has no significant effect. However, in cases when performance is critical, these limitations may be pretty unpleasant.
So, the recent update has enhanced the performance of the following methods: Enumerable.Max, Enumerable.Min, Enumerable.Average, Enumerable.Sum.
Let's see how their performance increased using the following benchmark:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Collections.Generic;
using System.Linq;
[MemoryDiagnoser(displayGenColumns: false)]
public partial class Program
{
static void Main(string[] args) =>
BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
[Params (10, 10000)]
public int Size { get; set; }
private IEnumerable<int> items;
[GlobalSetup]
public void Setup()
{
items = Enumerable.Range(1, Size).ToArray();
}
[Benchmark]
public int Min() => items.Min();
[Benchmark]
public int Max() => items.Max();
[Benchmark]
public double Average() => items.Average();
[Benchmark]
public int Sum() => items.Sum();
}
Benchmark results:
Method |
Runtime |
Size |
Mean |
Ratio |
Allocated |
---|---|---|---|---|---|
Min |
.NET 6.0 |
10 |
75.491 ns |
1.00 |
32 B |
Min |
.NET 7.0 |
10 |
7.749 ns |
0.10 |
- |
Max |
.NET 6.0 |
10 |
71.128 ns |
1.00 |
32 B |
Max |
.NET 7.0 |
10 |
6.493 ns |
0.09 |
- |
Average |
.NET 6.0 |
10 |
68.963 ns |
1.00 |
32 B |
Average |
.NET 7.0 |
10 |
7.315 ns |
0.11 |
- |
Sum |
.NET 6.0 |
10 |
69.509 ns |
1.00 |
32 B |
Sum |
.NET 7.0 |
10 |
9.058 ns |
0.13 |
- |
Min |
.NET 6.0 |
10000 |
61,567.392 ns |
1.00 |
32 B |
Min |
.NET 7.0 |
10000 |
2,967.947 ns |
0.05 |
- |
Max |
.NET 6.0 |
10000 |
56,106.592 ns |
1.00 |
32 B |
Max |
.NET 7.0 |
10000 |
2,948.302 ns |
0.05 |
- |
Average |
.NET 6.0 |
10000 |
52,803.907 ns |
1.00 |
32 B |
Average |
.NET 7.0 |
10000 |
2,967.810 ns |
0.06 |
- |
Sum |
.NET 6.0 |
10000 |
52,732.121 ns |
1.00 |
32 B |
Sum |
.NET 7.0 |
10000 |
5,897.220 ns |
0.11 |
- |
The results show that the execution time for finding the minimum element of an array has generally decreased. In 10 times for small arrays and 20 times for arrays containing 10,000 elements. Similarly, for other methods (except for finding the sum, the difference between the sizes of the collections did not affect the results much).
It is also worth noting that in .NET 7 no additional memory is allocated when methods are called.
Let's check how these methods work with List<T>.
Method |
Runtime |
Size |
Mean |
Ratio |
Allocated |
---|---|---|---|---|---|
Min |
.NET 6.0 |
10 |
122.554 ns |
1.00 |
40 B |
Min |
.NET 7.0 |
10 |
8.995 ns |
0.07 |
- |
Max |
.NET 6.0 |
10 |
115.135 ns |
1.00 |
40 B |
Max |
.NET 7.0 |
10 |
9.171 ns |
0.08 |
- |
Average |
.NET 6.0 |
10 |
110.825 ns |
1.00 |
40 B |
Average |
.NET 7.0 |
10 |
8.163 ns |
0.07 |
- |
Sum |
.NET 6.0 |
10 |
113.812 ns |
1.00 |
40 B |
Sum |
.NET 7.0 |
10 |
13.197 ns |
0.12 |
- |
Min |
.NET 6.0 |
10000 |
91,529.841 ns |
1.00 |
40 B |
Min |
.NET 7.0 |
10000 |
2,941.226 ns |
0.03 |
- |
Max |
.NET 6.0 |
10000 |
84,565.787 ns |
1.00 |
40 B |
Max |
.NET 7.0 |
10000 |
2,957.451 ns |
0.03 |
- |
Average |
.NET 6.0 |
10000 |
81,205.103 ns |
1.00 |
40 B |
Average |
.NET 7.0 |
10000 |
2,959.882 ns |
0.04 |
- |
Sum |
.NET 6.0 |
10000 |
81,857.576 ns |
1.00 |
40 B |
Sum |
.NET 7.0 |
10000 |
5,783.370 ns |
0.07 |
- |
In .NET 6, all operations on arrays much faster than on lists. The same is true for small collections in .NET 7. As the number of elements increases, lists are equal to arrays in performance.
According to the test results, lists' performance increased by 31 times.
Let's take a closer look at the implementation of the Min method.
That's how the Min method is implemented in .NET 6:
public static int Min(this IEnumerable<int> source)
{
if (source == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.source);
}
int value;
using (IEnumerator<int> e = source.GetEnumerator())
{
if (!e.MoveNext())
{
ThrowHelper.ThrowNoElementsException();
}
value = e.Current;
while (e.MoveNext())
{
int x = e.Current;
if (x < value)
{
value = x;
}
}
}
return value;
}
The method is quite simple. We get the IEnumerable<int> collection, take the collection element and use the MoveNext method to get the next element. We compare them, save the one that is smaller, and repeat until the end of the collection.
The new version of the Min method is different:
public static int Min(this IEnumerable<int> source) => MinInteger(source);
The MinInteger method is applied to a collection of integers. Let's examine it in more details.
private static T MinInteger<T>(this IEnumerable<T> source)
where T : struct, IBinaryInteger<T>
{
T value;
if (source.TryGetSpan(out ReadOnlySpan<T> span))
{
if (Vector.IsHardwareAccelerated &&
span.Length >= Vector<T>.Count * 2)
{
.... // Optimized implementation
return ....;
}
}
.... //Implementation as in .NET 6
}
Primarily, we try to get the object of the ReadOnlySpan<T> type from the provided collection. If we fail to get the ReadOnlySpan collection representation, then the code branch (that matches the implementation of the Min method in .NET 6) is executed. In our case, we can get ReadOnlySpan, since we use arrays and lists.
But what is ReadOnlySpan actually? The Span and ReadOnlySpan types provide safe representation of a continuous managed and unmanaged memory area. The structure of the Span type is defined as a ref struct. This means that it can only be placed on the stack, which makes it possible to avoid allocating additional memory and improves data performance.
The Span type has also undergone some changes in the new version of C#. Since C# 11 introduced the option to create reference fields in a ref struct, the internal representation of Span<T> has changed. Previously, a special internal type — ByReference<T>, was used to store a reference to the beginning of the memory area, but there was no security check in it. Now ref fields are used for this purpose. It provides a more secure memory operation. You can read more about other changes in C# 11 in the article.
But let's get back to the Min method. When you get ReadOnlySpan, the method tries to perform a vector search using the Vector<T> type. To do this, the following condition should be met:
if (Vector.IsHardwareAccelerated && span.Length >= Vector<T>.Count * 2)
The first part of the condition checks whether the Vector.IsHardwareAccelerated property returns true. Let's take a look at the implementation of this property.
public static bool IsHardwareAccelerated
{
[Intrinsic]
get => false;
}
The [Intrinsic] attribute is applied to the getter. The attribute indicates that the value returned by IsHardwareAccelerated can be replaced JIT. The property returns true if hardware acceleration can be applied to operations on vectors through built-in JIT support, otherwise false is returned. To enable hardware acceleration, you need to run the build for the x64 platform with the Release configuration or build the project for AnyCPU with the "Prefer 32-bit" setting disabled.
To fulfill the second part of the condition, the size of Span should be at least 2 times the size of the vector.
How is the size of this vector calculated?
The Vector<T> data type is used in the new implementation of the Min method to create a vector. This type is a wrapper for three other types: Vector64<T>, Vector128<T> and Vector256<T>. They contain vectorized data of the corresponding length. Vector128<T> can store 16 8-bit, 8 16-bit, 4 32-bit or 2 64-bit values. The type to be used is selected depending on whether it is supported by the processor or not.
Thus, if the Vector128<T> type is used when executing the method, then the Span type obtained from the array or list must contain 8 or more elements of the int type.
If all conditions are met, the method uses the advantages of the ReadOnlySpan and Vector types to optimize finding the minimum:
private static T MinInteger<T>(this IEnumerable<T> source)
where T : struct, IBinaryInteger<T>
{
....
if (Vector.IsHardwareAccelerated && span.Length >= Vector<T>.Count * 2)
{
var mins = new Vector<T>(span);
index = Vector<T>.Count;
do
{
mins = Vector.Min(mins, new Vector<T>(span.Slice(index)));
index += Vector<T>.Count;
}
while (index + Vector<T>.Count <= span.Length);
value = mins[0];
for (int i = 1; i < Vector<T>.Count; i++)
{
if (mins[i] < value)
{
value = mins[i];
}
}
....
}
Let's explore how the vectorized implementation of finding a minimum works. A vector is created from the first N Span elements (N depends on the type of vector; for Vector128<int> it is 4 elements). This vector is compared with a vector of the following N elements in Vector.Min. The resulting vector contains the minimum value of each pair of elements of the two given vectors. Then the next part of Span is taken, and the search continues. You only need to find the minimum value among those stored in the resulting vector. The advantage of using a vector is that all operations on its elements occur simultaneously.
An example of using Span and vectorization to optimize the Min method is shown above. But what about the other methods? You can meet similar features for the Max, Average, Sum methods. The optimized versions of these methods are available for arrays and lists of int, long, float, double and decimal types.
Microsoft developers used Span and hardware acceleration to work with vectors. As a result, the .NET community brought visible enhancement to LINQ performance. Now we can use advanced methods in cases where the speed of code execution, and allocated memory are critical.
Actually, .NET 7 got a few other performance improvements. You can read about them in .NET Blog.
0