Artem Rovenskii

Oct 04 2022

Tags:

#CSharp #Knowledge

List in C#: implementation and features

Oct 04 2022

Author: Artem Rovenskii

Introduction
Inside List<T>
A bit about performance
Conclusion

List is one of the most used collections in C#. Let's inspect the features of List and look at how some of its parts are implemented.

Introduction

This article is about List<T> from the System.Collections.Generic namespace. More specifically, the article describes the internal implementation and some features of List<T>. This is the most used collection in C#, and it's not just my opinion — Andrew Troelsen, Philip Japikse, and Jon Skeet wrote the same in their books. And that's understandable: List<T> is easy to work with. It's quite flexible and thus covers most of the daily tasks for developers. It also has a large number of helpful methods. The collection's abilities are expanded even further with the help of LINQ.

Inside List<T>

The source code of the List<T> class is available on GitHub. This means we can look at its implementation. Let's analyze the most important aspects.

The List<T> class is a sequential and dynamically resizable list of items. Under the hood, List<T> is based on an array.

The List<T> class has 3 main fields:

T[] _items is an internal array. The list is built on the base of this array.
int _size stores information about the number of items in the list.
int _version stores the version of the collection.

How to add an item to the list

The list is dynamically resizable. What happens when we add an item to the list?

public void Add(T item)
{
  _version++;
  T[] array = _items;
  int size = _size;
  if ((uint)size < (uint)array.Length)
  {
    _size = size + 1;
    array[size] = item;
  }
  else
  {
    AddWithResize(item);
  }
}

First, the value of the _version field is increased by 1. We'll inspect why this happens a bit later in this article. After that, two local variables are created — array with the elements of the T type and size of the int type. The values of the appropriate fields are assigned to these variables. Next, if an array still has space for one element, then the array element is changed by the size + 1 index. If an array doesn't have any more space left, then the AddWithResize method is called.

private void AddWithResize(T item)
{
  Debug.Assert(_size == _items.Length);
  int size = _size;
  Grow(size + 1);
  _size = size + 1;
  _items[size] = item;
}

Here, the Grow method is called to increase the internal array's current size. Then, the same actions are performed as in the Add method, to add an element if there's available space left.

Let's look closely at the Grow method:

private void Grow(int capacity)
{
  ....

  int newcapacity = _items.Length == 0 ? DefaultCapacity : 2 * _items.Length;

  if ((uint)newcapacity > Array.MaxLength) newcapacity = Array.MaxLength;

  if (newcapacity < capacity) newcapacity = capacity;

  Capacity = newcapacity;
}

The algorithm of the Grow method is as follows:

If an internal array is empty, then the capacity of the list is 4, otherwise double the length of the array;
if the new capacity value is greater than the maximum possible length of the array, then the capacity will be equal to Array.MaxLength;
if the new value of the collection's capacity is less than the current one, then the new capacity will be equal to the current one;
finally, newcapacity is written to the Capacity property.

Why do we need the _version field?

Why do we need the _version field, the value of which changed in the Add method? As mentioned above, this field allows you to track the list version. This field's value is checked when the list is enumerated. For example, let's look at the ForEach method:

public void ForEach(Action<T> action)
{
  ....
  int version = _version;

  for (int i = 0; i < _size; i++)
  {
    if (version != _version)
    {
      break;
    }
    action(_items[i]);
  }

  if (version != _version)
    ThrowHelper
      .ThrowInvalidOperationException_InvalidOperation_EnumFailedVersion();
}

Before enumeration is started, the value of the _version field is saved to a variable. If during the traversal the list is changed, then the traversal is stopped and System.InvalidOperationException is thrown. The _version field is similarly tracked in List<T>.Enumerator. Therefore, if we change the list when traversing it in foreach, it will also result in an exception thrown.

Capacity

List<T> has a constructor that takes a number (initial capacity) as a first argument.

List<int> list = new List<int>(8);

If a developer knows in advance what list size they need, they can set it. This eliminates unnecessary copying operations and memory allocation for a new array when a new item is added.

By the way, we can also manage the size of the internal array using the Capacity property:

list.Capacity = 8;

Let's look at the code of this property:

public int Capacity
{
  get => _items.Length;
  set
  {
    if (value < _size)
    {
      ThrowHelper.ThrowArgumentOutOfRangeException(....);
    }

    if (value != _items.Length)
    {
      if (value > 0)
      {
        T[] newItems = new T[value];
        if (_size > 0)
        {
          Array.Copy(_items, newItems, _size);
        }
        _items = newItems;
      }
      else
      {
        _items = s_emptyArray;
      }
    }
  }
}

The get accessor returns the _items.Length value, i.e. the length of the internal array.

The set accessor works as follows:

if value is less than the number of items in the collection, an exception is thrown.
if value is not equal to the length of the internal array and value is greater than 0, a new array with a capacity equal to value is created;
if the number of items in the list is greater than 0, then the items from the old array are copied to the new one;
if value is 0, then an empty array is assigned to the field, which is an internal array.

Other features of the List<T> methods

Insert

The Insert method allows us to insert an item to the collection only within the collection's range. If the number of items in the collection is equal to the size of the internal array, then the array's capacity is increased via the Grow(_size + 1) method. If we try to insert an item by an index that's larger than list.Count, System.ArgumentOutOfRangeException is thrown.

List<string> list = new List<string>() { "1", "2"};
list.Insert(1, "10"); // OK
list.Insert(2, "15"); // OK
list.Insert(10, 12); // throw exception

Such behavior remains even with explicit management of the internal array's size.

Look at the example:

List<string> list = new List<string>() { "1", "2"};
list.Capacity = 8;
list.Insert(3, "3");

The Capacity property is assigned 8, and the internal array is resized. However this doesn't allow us to insert an item to a position greater than list.Count. As a result of executing the code, an exception is thrown.

Clear

This method clears the collection. As a result of this operation the Count property will be 0. Items of the reference type get the default value. If the collection items are structures and have fields of reference type, these fields also get the default value. Note that the size of the internal array remains unchanged. If before the Clear call the Capacity property was equal to 8, then after Clear the array size remains equal to 8. To clear the memory allocated for the array, we need to call TrimExcess after Clear.

TrimExcess

This method makes the size of the internal array equal to the number of items in the list. We should use it when we know that no more items will be added to the collection.

list.Clear();
list.TrimExcess();

Sort and OrderBy

There are several differences between these two methods:

the Sort method belongs to the List<T> class, while OrderBy is an extension method from LINQ;
the Sort method modifies the initial collection, while OrderBy returns the sorted copy with the IOrderedEnumerable<TSource> type;
the OrderBy method performs stable sorting, Sort does not. If we use the Sort method, the equivalent items may be reordered.

A bit about performance

List<T> versus ArrayList

List<T> is generic. This means that when we create a list, we must specify what type of objects it works with.

List<string> list = new List<string>();

Jeffrey Richter in his book "CLR via C#" describes the following advantages of generics:

source code protection;
type safety;
cleaner code;
better performance.

The same book (the beginning of the 12th chapter) has a good example of comparing List<T> and ArrayList, its non-generic analogue. The essence of this test is in adding an item to the list and assigning the same item to a variable 10 million times.

Below is an example of testing ArrayList with a value type:

public void ValueTypeArrayList()
{
  ArrayList a = new ArrayList();
  for (Int32 n = 0; n < _count; n++)
  {
    a.Add(n);
    Int32 x = (Int32)a[n];
  }
}

Testing was performed with objects of value (Int32) and reference (String) types.

After rewriting the code given in the book and testing it with BenchmarkDotNet I got the following results:

We can see from the results that the List<T> algorithm works with Int32 much faster than ArrayList with Int32. 13 times faster! Besides, the memory is allocated 4 times less with List<T>.

Due to the fact that many packing operations are performed during ArrayList operation, the number of garbage collections also increases. And obtaining an item requires unpacking. All of that leads to the decreased performance.

The difference with reference types is insignificant, since there are no packing and unpacking operations, and these operations are heavy. Judging by the code, a small difference in speed appears because of the type conversion operation.

Advantages of assigning value to Capacity

As mentioned above, if a developer knows the list size in advance, they can specify it.

Let's do a small test.

public void ListWithoutCapacity()
{
  for (int i = 0; i < Count; i++)
  {
    List<int> list = new List<int>();
    for (int j = 0; j < Length; j++)
    {
      list.Add(j);
    }
  }
}

Here, 150 000 items are added to list. Let's perform this operation 1000 times. Then, let's compare the performance with the same method but with the specified capacity, which is equal to the number of addition operations.

The results show that the time spent on executing the method without capacity is 2 times more than with a pre-set one. Also, the memory is allocated 4 times more in this case. Such actions eliminate 17 unnecessary copy operations on each iteration of the external loop.

What is the fastest way to determine that there are items in the list?

Let's take three options for determining that the list is not empty:

use the Count method from LINQ and compare the result with 0;
use the Count property and compare the result with 0;
use the Any extension method from LINQ.

After testing, we get the following results for the list of 1 500 000 items:

The fastest option is accessing the Count property, since it just returns the value of the _size field.

The Count method tries to convert the initial collection to ICollection. If the conversion is successful, the method returns the value of the Count property. If the conversion failed, we have to traverse the whole collection to count the number of elements. Luckily, List<T> has this interface.

The Any method returns true if it finds at least one element in the collection.

Conclusion

We can safely say that List<T> is more user-friendly version of the array. For example, it's more convenient to work with a list when the number of sequence elements is unknown in advance.

C# has many more collections that help developers in their work. Some of them are more specific, some of them are less. Hope this article helps you too and makes your understanding of lists a little better :).

#CSharp #Knowledge

Tags:

#CSharp #Knowledge

Fill out the form in 2 simple steps below:

Your contact information:

Desired license type:

List in C#: implementation and features

Introduction

Inside List<T>

How to add an item to the list

Why do we need the _version field?

Capacity

Other features of the List<T> methods

A bit about performance

List<T> versus ArrayList

Advantages of assigning value to Capacity

What is the fastest way to determine that there are items in the list?

Conclusion

Comments (0)

Want to try PVS‑Studio for free?

Achievements

PVS-Studio

Licensing

Company