Webinar: Parsing C++ - 10.10
List is one of the most used collections in C#. Let's inspect the features of List and look at how some of its parts are implemented.
This article is about List<T> from the System.Collections.Generic namespace. More specifically, the article describes the internal implementation and some features of List<T>. This is the most used collection in C#, and it's not just my opinion — Andrew Troelsen, Philip Japikse, and Jon Skeet wrote the same in their books. And that's understandable: List<T> is easy to work with. It's quite flexible and thus covers most of the daily tasks for developers. It also has a large number of helpful methods. The collection's abilities are expanded even further with the help of LINQ.
The source code of the List<T> class is available on GitHub. This means we can look at its implementation. Let's analyze the most important aspects.
The List<T> class is a sequential and dynamically resizable list of items. Under the hood, List<T> is based on an array.
The List<T> class has 3 main fields:
The list is dynamically resizable. What happens when we add an item to the list?
public void Add(T item)
{
_version++;
T[] array = _items;
int size = _size;
if ((uint)size < (uint)array.Length)
{
_size = size + 1;
array[size] = item;
}
else
{
AddWithResize(item);
}
}
First, the value of the _version field is increased by 1. We'll inspect why this happens a bit later in this article. After that, two local variables are created — array with the elements of the T type and size of the int type. The values of the appropriate fields are assigned to these variables. Next, if an array still has space for one element, then the array element is changed by the size + 1 index. If an array doesn't have any more space left, then the AddWithResize method is called.
private void AddWithResize(T item)
{
Debug.Assert(_size == _items.Length);
int size = _size;
Grow(size + 1);
_size = size + 1;
_items[size] = item;
}
Here, the Grow method is called to increase the internal array's current size. Then, the same actions are performed as in the Add method, to add an element if there's available space left.
Let's look closely at the Grow method:
private void Grow(int capacity)
{
....
int newcapacity = _items.Length == 0 ? DefaultCapacity : 2 * _items.Length;
if ((uint)newcapacity > Array.MaxLength) newcapacity = Array.MaxLength;
if (newcapacity < capacity) newcapacity = capacity;
Capacity = newcapacity;
}
The algorithm of the Grow method is as follows:
Why do we need the _version field, the value of which changed in the Add method? As mentioned above, this field allows you to track the list version. This field's value is checked when the list is enumerated. For example, let's look at the ForEach method:
public void ForEach(Action<T> action)
{
....
int version = _version;
for (int i = 0; i < _size; i++)
{
if (version != _version)
{
break;
}
action(_items[i]);
}
if (version != _version)
ThrowHelper
.ThrowInvalidOperationException_InvalidOperation_EnumFailedVersion();
}
Before enumeration is started, the value of the _version field is saved to a variable. If during the traversal the list is changed, then the traversal is stopped and System.InvalidOperationException is thrown. The _version field is similarly tracked in List<T>.Enumerator. Therefore, if we change the list when traversing it in foreach, it will also result in an exception thrown.
List<T> has a constructor that takes a number (initial capacity) as a first argument.
List<int> list = new List<int>(8);
If a developer knows in advance what list size they need, they can set it. This eliminates unnecessary copying operations and memory allocation for a new array when a new item is added.
By the way, we can also manage the size of the internal array using the Capacity property:
list.Capacity = 8;
Let's look at the code of this property:
public int Capacity
{
get => _items.Length;
set
{
if (value < _size)
{
ThrowHelper.ThrowArgumentOutOfRangeException(....);
}
if (value != _items.Length)
{
if (value > 0)
{
T[] newItems = new T[value];
if (_size > 0)
{
Array.Copy(_items, newItems, _size);
}
_items = newItems;
}
else
{
_items = s_emptyArray;
}
}
}
}
The get accessor returns the _items.Length value, i.e. the length of the internal array.
The set accessor works as follows:
Insert
The Insert method allows us to insert an item to the collection only within the collection's range. If the number of items in the collection is equal to the size of the internal array, then the array's capacity is increased via the Grow(_size + 1) method. If we try to insert an item by an index that's larger than list.Count, System.ArgumentOutOfRangeException is thrown.
List<string> list = new List<string>() { "1", "2"};
list.Insert(1, "10"); // OK
list.Insert(2, "15"); // OK
list.Insert(10, 12); // throw exception
Such behavior remains even with explicit management of the internal array's size.
Look at the example:
List<string> list = new List<string>() { "1", "2"};
list.Capacity = 8;
list.Insert(3, "3");
The Capacity property is assigned 8, and the internal array is resized. However this doesn't allow us to insert an item to a position greater than list.Count. As a result of executing the code, an exception is thrown.
Clear
This method clears the collection. As a result of this operation the Count property will be 0. Items of the reference type get the default value. If the collection items are structures and have fields of reference type, these fields also get the default value. Note that the size of the internal array remains unchanged. If before the Clear call the Capacity property was equal to 8, then after Clear the array size remains equal to 8. To clear the memory allocated for the array, we need to call TrimExcess after Clear.
TrimExcess
This method makes the size of the internal array equal to the number of items in the list. We should use it when we know that no more items will be added to the collection.
list.Clear();
list.TrimExcess();
Sort and OrderBy
There are several differences between these two methods:
List<T> is generic. This means that when we create a list, we must specify what type of objects it works with.
List<string> list = new List<string>();
Jeffrey Richter in his book "CLR via C#" describes the following advantages of generics:
The same book (the beginning of the 12th chapter) has a good example of comparing List<T> and ArrayList, its non-generic analogue. The essence of this test is in adding an item to the list and assigning the same item to a variable 10 million times.
Below is an example of testing ArrayList with a value type:
public void ValueTypeArrayList()
{
ArrayList a = new ArrayList();
for (Int32 n = 0; n < _count; n++)
{
a.Add(n);
Int32 x = (Int32)a[n];
}
}
Testing was performed with objects of value (Int32) and reference (String) types.
After rewriting the code given in the book and testing it with BenchmarkDotNet I got the following results:
We can see from the results that the List<T> algorithm works with Int32 much faster than ArrayList with Int32. 13 times faster! Besides, the memory is allocated 4 times less with List<T>.
Due to the fact that many packing operations are performed during ArrayList operation, the number of garbage collections also increases. And obtaining an item requires unpacking. All of that leads to the decreased performance.
The difference with reference types is insignificant, since there are no packing and unpacking operations, and these operations are heavy. Judging by the code, a small difference in speed appears because of the type conversion operation.
As mentioned above, if a developer knows the list size in advance, they can specify it.
Let's do a small test.
public void ListWithoutCapacity()
{
for (int i = 0; i < Count; i++)
{
List<int> list = new List<int>();
for (int j = 0; j < Length; j++)
{
list.Add(j);
}
}
}
Here, 150 000 items are added to list. Let's perform this operation 1000 times. Then, let's compare the performance with the same method but with the specified capacity, which is equal to the number of addition operations.
The results show that the time spent on executing the method without capacity is 2 times more than with a pre-set one. Also, the memory is allocated 4 times more in this case. Such actions eliminate 17 unnecessary copy operations on each iteration of the external loop.
Let's take three options for determining that the list is not empty:
After testing, we get the following results for the list of 1 500 000 items:
The fastest option is accessing the Count property, since it just returns the value of the _size field.
The Count method tries to convert the initial collection to ICollection. If the conversion is successful, the method returns the value of the Count property. If the conversion failed, we have to traverse the whole collection to count the number of elements. Luckily, List<T> has this interface.
The Any method returns true if it finds at least one element in the collection.
We can safely say that List<T> is more user-friendly version of the array. For example, it's more convenient to work with a list when the number of sequence elements is unknown in advance.
C# has many more collections that help developers in their work. Some of them are more specific, some of them are less. Hope this article helps you too and makes your understanding of lists a little better :).
0