Nikita Lipilin

Mar 04 2021

Tags:

#CSharp #Knowledge

What is yield and how does it work in C#?

Mar 04 2021

Author: Nikita Lipilin

Why you need yield
How to use yield
When do I use yield?
Limitations
So how exactly does this work?
Conclusion

C# capabilities keep expanding from year to year. New features enrich software development. However, their advantages may not always be so obvious. For example, the good old yield. To some developers, especially beginners, it's like magic - inexplicable, but intriguing. This article shows how yield works and what this peculiar word hides. Have fun reading!

0808_What_is_yield_and_how_does_it_work/image1.png

Why you need yield

The yield keyword is used to build generators of element sequences. These generators do not create collections. Instead, the sequence stores the current state - and moves on to the next state on command. Thus, memory requirements are minimal and do not depend on the number of elements. It's not hard to guess that generated sequences can be infinite.

In the simplest scenario, the generator stores the current element and contains a set of commands that must be executed to get a new element. This is often much more convenient than creating a collection and storing all of its elements.

While there is nothing wrong with writing a class to implement the generator's behavior, yield simplifies creating such generators significantly. You do not have to create new classes - everything works already.

I must point out here that yield is not a feature available exclusively in C#. However, while the concept is the same, in different languages yield may be implemented and used differently. Which is why here's one more reminder that this article talks about yield only in the context of C#.

How to use yield

A standard case

To begin, create a method that generates the sequence you need. The only limitation here is that the method must return one of the following types:

IEnumerable
IEnumerable<T>
IEnumerator
IEnumerator<T>

Though you can use yield in methods, properties and operators, to simplify this article I'll review only methods.

Take a look at this simple yield method:

static IEnumerator<int> GetInts()
{
  Console.WriteLine("first");
  yield return 1;

  Console.WriteLine("second");
  yield return 2;
}

static void Main()
{
  IEnumerator<int> intsEnumerator = GetInts(); // print nothing
  Console.WriteLine("...");                    // print "..."

  intsEnumerator.MoveNext();                   // print "first"
  Console.WriteLine(intsEnumerator.Current);   // print 1
}

When the GetInts function is called, it returns an object that implements IEnumerator<int>. Then the method exits before it can reach any other code.

The MoveNext method's first call executes the code inside GetInts - until the first yield return. The value specified in the yield return is assigned to the Current property.

Thus, this code's first output is "...", then "first", and at the end "1" - a value from the Current property.

The next time you call MoveNext again, the method's execution will pick up where it left off. The console will display the "second" message, and 2 will be recorded to the Current property.

Calling MoveNext for the third time will start executing the GetInts method from the moment it was earlier suspended. Since the GetInts method contains no more code, the third MoveNext method call will return false. Further MoveNext method's calls will have no effect and will also return false.

If you call the GetInts method once more, it will return a new object that will allow you to start generating new elements.

Local variables, fields, and properties

Local variables initialized inside yield methods, retain their values between MoveNext method calls. For example:

IEnumerator<double> GetNumbers()
{
  string stringToPrint = "moveNext";
  Console.WriteLine(stringToPrint);  // print "moveNext"
  yield return 0;
  Console.WriteLine(stringToPrint);  // print "moveNext"
  stringToPrint = "anotherStr";
  yield return 1;
  Console.WriteLine(stringToPrint);  // print "anotherStr"
}

If you use the GetNumbers method to create a new generator, the first two times you call the generator's MoveNext method, the output will be "moveNext". The MoveNext method's third call will print "anotherStr". This is predictable and logical.

However, working with fields and properties may not be as simple. For example:

string message = "message1";

IEnumerator<int> GetNumbers()
{
  Console.WriteLine(message);
  yield return 0;
  Console.WriteLine(message);
  yield return 1;
  Console.WriteLine(message);
}
void Method()
{
  var generator = GetNumbers();
  generator.MoveNext(); // print "message1"
  generator.MoveNext(); // print "message1"
  message = "message2";
  generator.MoveNext(); // print "message2"
}

In the code sample above, the GetNumbers method accesses and uses the message field. The field value changes while the sequence is being generated - and this change affects the sequence generation logic.

A similar thing happens with properties: if a property value changes, this may affect the generated sequence.

yield break

Aside from yield return, C# offers you another statement - yield break. It allows you to stop sequence generation - that is, exit the generator for good. If the MoveNext method executes yield break, the return is false. No changes to fields or properties can make the generator work again. However, if the method that uses yield is called for the second time - it's a completely different story, because a new object generator is created. That generator would not have encountered yield break.

Let's take a look at a sample generator that uses yield break:

IEnumerator<int> GenerateMultiplicationTable(int maxValue)
{
  for (int i = 2; i <= 10; i++)
  {
    for (int j = 2; j <= 10; j++)
    {
      int result = i * j;

      if (result > maxValue)
        yield break;

      yield return result;
    }
  }
}

The GenerateMultiplicationTable method multiplies numbers from 2 to 10 by each other and returns a sequence that contains the results. If the numbers' product exceeds a defined limit (the maxValue parameter), the sequence generation stops. This generator exhibits this behavior thanks to yield break.

Returning IEnumerable

As I mentioned at the beginning, a method that uses yield can return IEnumerable, that is, a sequence itself instead of the sequence's iterator. An IEnumerable type object often proves to be more convenient, because the IEnumerable interface provides many extension methods, and also supports the foreach loop.

Note. If a method's return type is IEnumerable, the returned object implements both IEnumerable and IEnumerator. However, it's a bad idea to cast an IEnumerable type object to IEnumerator :). Why? I'll explain later when we get under the hood of this system.

For now, let's take a look at this example:

void PrintFibonacci()
{
  Console.WriteLine("Fibonacci numbers:");

  foreach (int number in GetFibonacci(5))
  {
    Console.WriteLine(number);
  }
}

IEnumerable<int> GetFibonacci(int maxValue)
{
  int previous = 0;
  int current = 1;

  while (current <= maxValue)
  {
    yield return current;

    int newCurrent = previous + current;
    previous = current;
    current = newCurrent;
  }
}

The GetFibonacci method returns the Fibonacci sequence whose two first elements equal 1. Since the method's return type is IEnumerable, the PrintFibonacci method can use the foreach loop to traverse the elements inside the sequence.

Note that each time PrintFibonacci iterates through the IEnumerable sequence, the GetFibonacci function executes from the beginning. Here's why this happens. The foreach loop uses the GetEnumerator method to traverse elements inside the sequence. Every new GetEnumerator call returns an object that iterates through the sequence elements from the very beginning. For example:

int _rangeStart;
int _rangeEnd;

void TestIEnumerableYield()
{
  IEnumerable<int> polymorphRange = GetRange();

  _rangeStart = 0;
  _rangeEnd = 3;

  Console.WriteLine(string.Join(' ', polymorphRange)); // 0 1 2 3

  _rangeStart = 5;
  _rangeEnd = 7;

  Console.WriteLine(string.Join(' ', polymorphRange)); // 5 6 7
}

IEnumerable<int> GetRange()
{
  for (int i = _rangeStart; i <= _rangeEnd; i++)
  {
    yield return i;
  }
}

At the string.Join first call, the function iterates through the IEnumerable type object for the first time, and as a result the GetRange method is executed. You could achieve a similar result by writing a foreach loop. Then the _rangeStart and _rangeEnd fields are set to new values and - behold - we get a different result from iterating through the very same IEnumerable type object!

If you are familiar with LINQ, such behavior may not seem so unusual - after all, the results of LINQ queries are processed the same way. Less experienced developers, however, may be stumped by this phenomenon. Remembering that in some scenarios IEnumerable objects and LINQ queries deliver such results will save you a lot of time in the future.

Aside from repeated queries being able to produce unexpected results, there is another problem. All operations done to initialize elements will be repeated. This can have a negative effect on the application's performance.

When do I use yield?

You can use yield everywhere in your app or nowhere at all. This depends on the particular case and particular project. Aside from the obvious use cases, this construction can help you simulate parallel method execution. The Unity game engine often employs this approach.

As a rule, you do not need yield for simple element filtering or to transform elements from an existing collection - LINQ can handle this in most cases. However, yield allows you to generate sequences of elements that do not belong to any collection. For example, when working with a tree, you may need a function that traverses a particular node's ancestors:

public IEnumerable<SyntaxNode> EnumerateAncestors(SyntaxNode node)
{
  while (node != null)
  { 
    node = node.Parent;
    yield return node;
  }
}

The EnumerateAncestors method allows you to traverse ancestors starting from the closest one. You do not need to create collections, and you can stop element generation at any moment - for example when the function finds a specific ancestor. If you have ideas on how to implement this behavior without yield (and your code is at least somewhat concise), I'm always looking forward to your comments below :).

Limitations

Despite its many advantages and possible use cases, the yield statement has a number of limitations related to its internal implementation. I clarified some of them in the next section that explores how the yield statement's magic works. For now, let's just take a look at the list of those restrictions:

although the IEnumerator interface contains the Reset method, yield methods return objects that implement the Reset method incorrectly. If you try to call such object's Reset method, the NotSupportedException exception will be thrown. Be careful with this: do not pass a generator object to methods that might call its Reset method;
you cannot use yield in anonymous methods or lambda-expressions;
you cannot use yield in methods that contain unsafe code;
you cannot use the yield return statement inside the try-catch block. However, this limitation does not apply to try statements inside try-finally blocks. You can use yield break in try statements inside both try-catch and try-finally blocks.

So how exactly does this work?

Let's use the dotPeek utility to see what yield statements look like under the hood. Below is the GetFibonacci function that generates the Fibonacci sequence until the maxValue limitation is reached:

IEnumerable<int> GetFibonacci(int maxValue)
{
  int previous = 0;
  int current = 1;

  while (current <= maxValue)
  {
    yield return current;

    int newCurrent = previous + current;
    previous = current;
    current = newCurrent;
  }
}

Let's enable the 'Show compiler-generated code' setting and decompile the application with dotPeek. What does the GetFibonacci method really look like?

Well, something like this:

[IteratorStateMachine(typeof(Program.<GetFibonacci>d__1))]
private IEnumerable<int> GetFibonacci(int maxValue)
{
  <GetFibonacci>d__1 getFibonacciD1 = new <GetFibonacci>d__1(-2);
  getFibonacciD1.<>4__this = this;
  getFibonacciD1.<>3__maxValue = maxValue;
  return (IEnumerable<int>)getFibonacciD1;
}

Almost nothing like the original method, right? Not to mention that the code looks a little strange. Well, let's take a crack at it.

First, we'll translate the whole thing into a language we can understand (no, not IL):

[IteratorStateMachine(typeof(GetFibonacci_generator))]
private IEnumerable<int> GetFibonacci(int maxValue)
{
  GetFibonacci_generator generator = new GetFibonacci_generator(-2);
  generator.forThis = this;
  generator.param_maxValue = maxValue;
  return generator;
}

This code is the same, but the names are easier on the eyes, and excessive code structures are eliminated. Also, the C# compiler has no problem understanding this code, in comparison to the code listed earlier. This is the code format I use from now on in the article. If you want to see what this code looks like as-is, grab dotPeek (or even better - ildasm) and go ahead :).

This code creates a special object. The object stores a link to the current item and the maxValue parameter value. '-2' is passed to the constructor - as we see further, this is the generator's starting state.

The compiler created the generator class automatically, and all the logic we put into the function is implemented there. Now we can take a look at what this class contains.

Let's start with the declaration:

class GetFibonacci_generator : IEnumerable<int>,
                               IEnumerable,
                               IEnumerator<int>,
                               IEnumerator,
                               IDisposable

Nothing unexpected, really... Except for IDisposable that came out of nowhere! It may also seem odd that the class implements IEnumerator, even though the GetFibonacci method returns IEnumerable<int>. Let's figure out what happened.

Here's the constructor:

public GetFibonacci_generator(int startState)
{
  state = startState;
  initialThreadId = Environment.CurrentManagedThreadId;
}

The state field stores the '-2' startState value passed to the generator at the initialization. The initialThreadId field stores the ID of the thread where the object was created. I'll explain the purpose of these fields later. Now let's take a look at the GetEnumerator implementation:

IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
  GetFibonacci_generator generator;
  
  if (state == -2 && initialThreadId == Environment.CurrentManagedThreadId)
  {
    state = 0;
    generator = this;
  }
  else
  {
    generator = new GetFibonacci_generator(0);
    generator.forThis = forThis;
  }
  
  generator.local_maxValue = param_maxValue;
  
  return generator;
}

See how when certain conditions are met, the method returns the same object instead of a new one? This peculiarity might seem quite unexpected. The following code fragment confirms it:

IEnumerable<int> enumerable = prog.GetFibonacci(5);
IEnumerator<int> enumerator = enumerable.GetEnumerator();

Console.WriteLine(enumerable == enumerator);

This code's output is 'True'. Who would have thought? :)

At the GetEnumerator method call, the returned object's state field is assigned to '0'. This is an important step.

After the conditional statement, another meaningful assignment takes place:

generator.local_maxValue = param_maxValue

Take another look at the GetFibonacci method (or, to be exact, at what the compiler transformed it into). See how the maxValue parameter is recorded into the param_maxValue field? It is also recorded to the local_maxValue field.

At first glance, it may seem unclear why the generator uses two fields - param_maxValue and local_maxValue - to store the maxValue parameter. I'll clarify the mechanics of this further on in this article. Right now, let's take a look at the MoveNext method:

bool IEnumerator.MoveNext()
{
  switch (state)
  {
    case 0:
      state = -1;
      local_previous = 0;
      local_current = 1;
      break;
    case 1:
      state = -1;
      local_newCurrent = local_previous + local_current;
      local_previous = local_current;
      local_current = local_newCurrent;
      break;
    default:
      return false;
  }
  
  if (local_current > local_maxValue)
    return false;
  
  _current = local_current;
  state = 1;
  
  return true;
}

This method implements all logic we programmed into the GetFibonacci method. Before MoveNext exits, it writes the current result into the _current field. This is the value we get when we access the sequence generator's Current property.

If the sequence generation must be stopped (in this case when local_current > local_maxValue), the generator's state remains equal to '-1'. When the generator's state field value is '-1', the generator exits - MoveNext does not do anything and returns false.

Note that when MoveNext returns false, the _current field value (as well as the Current property value) remains unchanged.

Tricks with type casting

Previously we discussed that when you create a new generator, the '-2' value is recorded to the state field. But take a look at the code. If state = -2, then MoveNext does not perform any actions and returns false. Essentially, the generator does not work. Luckily, the GetEnumerator method call replaces the -2 state with 0. What about calling MoveNext without calling GetEnumerator? Is this possible?

The GetFibonacci method's return type is IEnumerable, thus, there is no access to the MoveNext method. Nevertheless, the returned object implements both IEnumerable and IEnumerator - so you can use type casting. In this case the developer does not need GetEnumerator and can call the generator's MoveNext. However, all calls will return false. Thus, though you may be able to 'cheat' the system, this hardly benefits you in any way.

Conclusion. When a yield method returns an IEnumerable type object, this object implements both IEnumerable and IEnumerator. Casting this object to IEnumerator produces a generator that is useless until the GetEnumerator method is called. At the same time, if a generator seems 'dead', it may suddenly start working after the GetEnumerator method call. The code below demonstrates this behavior:

IEnumerable<int> enumerable = GetFibonacci(5);
IEnumerator<int> deadEnumerator = (IEnumerator<int>)enumerable;

for (int i = 0; i < 5; ++i)
{
  if (deadEnumerator.MoveNext())
  {
    Console.WriteLine(deadEnumerator.Current);
  }
  else
  {
    Console.WriteLine("Sorry, your enumerator is dead :(");
  }
}

IEnumerator<int> enumerator = enumerable.GetEnumerator();
Console.WriteLine(deadEnumerator == enumerator);

for (int i = 0; i < 5; ++i)
{
  if (deadEnumerator.MoveNext())
  {
    Console.WriteLine(deadEnumerator.Current);
  }
  else
  {
    Console.WriteLine("Sorry, your enumerator is dead :(");
  }
}

What do you think the console will display after the code above is executed? Hint: The code produces the Fibonacci sequence's first five elements - 1, 1, 2, 3, 5.

We have just reviewed a case of casting to IEnumerator. Is it possible to play around with casting to IEnumerable?

Obviously, an object returned by GetEnumerator's first call can be cast to IEnumerable and will work as expected. Take a look at this example:

IEnumerable<int> enumerable = GetInts(0);                     
IEnumerator<int> firstEnumerator = enumerable.GetEnumerator();
IEnumerable<int> firstConverted = (IEnumerable<int>)firstEnumerator;

Console.WriteLine(enumerable == firstEnumerator);
Console.WriteLine(firstConverted == firstEnumerator);
Console.WriteLine(firstConverted == enumerable);

This code above prints three 'True' entries in the console window, because all three references point to the same object. Here, casting does not bring any surprises, and will produce a link to an existing (and, therefore, correctly working) object.

What about a different scenario? For example, GetEnumerator is called for the second time or in a different thread - and the value it returns is cast to IEnumerable. Take a look at this sample yield method:

IEnumerable<string> RepeatLowerString(string someString)
{
  someString.ToLower();

  while (true)
  {
    yield return someString;
  }
}

At a first glance the RepeatLowerString method receives a string as a parameter, converts it to lowercase and returns it indefinitely.

Have you noticed something odd in the code above? The RepeatLowerString method, opposite to what you may expect, generates a sequence of references to the unchanged someString string.

This happens because the ToLower method creates a new string and does not modify the original string. It is not too important in our case, but in real software such mistakes lead to sad consequences and they are worth fighting against. An incorrect ToLower method call may not seem significant. However, sometimes a function is called incorrectly somewhere in a large pile of code - and that error is almost impossible to track down.

If the project is large, its developers often use a static code analyzer. A static code analyzer is an application that can quickly detect many code bugs. For example, a static code analyzer could scan the RepeatLowerString method and find that error I described earlier. However, the analyzer is definitely not limited to detecting "meaningless calls" - it covers an extensive list of problems.

I recommend that you use a static analyzer on your projects. The PVS-Studio tool is a good choice. It checks projects written in C#, C, C++, and Java and detects a wide variety of problems in source code. Interested? You can read more about PVS-Studio on its official website and get the analyzer's free trial version.

Meanwhile, I fixed the RepeatLowerString method:

IEnumerable<string> RepeatLowerString(string someString)
{
  string lower = someString.ToLower();

  while (true)
  {
    yield return lower;
  }
}

Now let's experiment with casting to IEnumerable:

IEnumerable<string> enumerable = RepeatLowerString("MyString");
IEnumerator<string> firstEnumerator = enumerable.GetEnumerator();

IEnumerator<string> secondEnumerator = enumerable.GetEnumerator();
var secondConverted = (IEnumerable<string>)secondEnumerator;

var magicEnumerator = secondConverted.GetEnumerator();

for (int i = 0; i < 5; i++)
{
  magicEnumerator.MoveNext();
  Console.WriteLine(magicEnumerator.Current);
}

What will the console display after this code is executed?

0808_What_is_yield_and_how_does_it_work/image2.png

Nothing! All this masterful formation will crash with NullReferenceException. Didn't expect this?

Maybe not. Buy now we already have enough information to explain this behavior. Let's walk through the example step-by-step.

The exception was thrown when magicEnumerator.MoveNext() called the ToLower method. ToLower is called for the someString parameter. Inside the generator, this parameter is represented by two fields: param_someString and local_someString:

public string param_someString;
private string local_someString;

Note that the MoveNext method (where the exception was thrown) uses the local_someString field:

bool IEnumerator.MoveNext()
{
  switch (this.state)
  {
    case 0:
      this.state = -1;
      this.local_lower = this.local_someString.ToLower();
      break;
    case 1:
      this.state = -1;
      break;
    default:
      return false;
  }
  this._current = this.local_lower;
  this.state = 1;
  return true;
}

The null value was recorded into the local_someString field. But where did this value come from?

When GetEnumerator is called, the value from param_someString is always written to the local_someString field of the returned object:

IEnumerator<string> IEnumerable<string>.GetEnumerator()
{
  RepeatLowerString_generator generator;
  
  if (state == -2 && initialThreadId == Environment.CurrentManagedThreadId)
  {
    state = 0;
    generator = this;
  }
  else
  {
    generator = new RepeatLowerString_generator(0);
    generator.forThis = forThis;
  }
  
  generator.local_someString = param_someString;
  
  return generator;
}

Is that where null came from? Yes it is. But how did null end up in this field? Let's take one more look at the code snippet:

IEnumerable<string> enumerable = RepeatLowerString("MyString");
IEnumerator<string> firstEnumerator = enumerable.GetEnumerator();

IEnumerator<string> secondEnumerator = enumerable.GetEnumerator();
var secondConverted = (IEnumerable<string>)secondEnumerator;

var magicEnumerator = secondConverted.GetEnumerator();

for (int i = 0; i < 5; i++)
{
  magicEnumerator.MoveNext(); // NRE
  Console.WriteLine(magicEnumerator.Current);
}

The second time GetEnumerator is called, we get a new object that has a correct value in the local_SomeString field. Does the GetEnumerator method also set the param_someString value? Sadly, no. So this field gets the default value - that is, that very null.

And then the param_someString field is used to set local_someString for the magicEnumerator object! And the exception is thrown exactly when the MoveNext method attempts to call local_someString.ToLower().

Conclusion. If GetEnumerator returns something other than this, the resulting object cannot fulfill the role of IEnumerable. Such object's param_* fields will not have values necessary for correct operation. This peculiarity does not affect yield methods that do not require any parameters. For example:

IEnumerable<int> GetPositive()
{
  int i = 0;
  
  while (true)
    yield return ++i;
}

The GetPositive method returns an ascending sequence of positive numbers, starting with 1. Now take a look at the GetPositive method use example:

IEnumerable<int> enumerable = GetPositive();
IEnumerator<int> firstEnumerator = enumerable.GetEnumerator();

IEnumerator<int> secondEnumerator = enumerable.GetEnumerator();
var secondConverted = (IEnumerable<int>)secondEnumerator;

IEnumerator<int> magicEnumerator = secondConverted.GetEnumerator();

for (int i = 0; i < 5; i++)
{
  magicEnumerator.MoveNext();
  Console.WriteLine(magicEnumerator.Current);
}

This code works correctly and displays numbers 1 through 5 on the screen. But don't do this. No, really :).

2 fields for one parameter

When reviewing the generated class, you may have an inevitable question: why this class has two fields to store the parameter value - instead of one. By this time, you may have guessed what is happening here, but just in case, let's take a closer look.

Here's another yield method:

IEnumerable<int> GetInts(int i)
{
  while (true)
  {
    yield return i++;
  }
}

This is a simple method that produces an ascending sequence of integers, starting with i that is passed as a parameter. The created generator's MoveNext method looks something like this:

bool IEnumerator.MoveNext()
{
  switch (this.state)
  {
    case 0:
      this.state = -1;
      break;
    case 1:
      this.state = -1;
      break;
    default:
      return false;
  }
  this._current = this.local_i++;
  this.state = 1;
  return true;
}

Look closely. The important part is, the local_i field's value is incremented every time MoveNext is called. This field's initial value was set at the GetEnumerator method's call. The value is retrieved from the second field - in this case, param_i:

IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
  GetInts_generator generator;
  
  if (   state == -2 
      && initialThreadId == Environment.CurrentManagedThreadId)
  {
    state = 0;
    generator = this;
  }
  else
  {
    generator = new GetInts_generator(0);
    generator.forThis = forThis;
  }
  
  generator.local_i = param_i;
  
  return generator;
}

The GetInts yield method's call sets the param_i field's value:

[IteratorStateMachine(typeof(GetInts_generator))]
private IEnumerable<int> GetInts(int i)
{
  GetInts_generator generator = new GetInts_generator(-2);
  generator.forThis = this;
  generator.param_i = i;
  return generator;
}

After this the param_i value never changes. Why do we need the param_i field here? Why, for example, won't we assign a value straight to local_i?

The GetInts yield method we listed earlier returns IEnumerable type objects. For this type of objects you can call GetEnumerator several times. As we know, at the first call the generator returns itself. Keeping this thought in mind, let's take a look at the following code:

IEnumerable<int> enumerable = GetInts(0);
// enumerable.param_i = 0

IEnumerator<int> firstEnumerator = enumerable.GetEnumerator(); 
// firstEnumerator.local_i = enumerable.param_i

Console.WriteLine(enumerable == firstEnumerator); // True

firstEnumerator.MoveNext(); 
// firstEnumerator.local_i++
firstEnumerator.MoveNext(); 
// firstEnumerator.local_i++

IEnumerator<int> secondEnumerator = enumerable.GetEnumerator(); 
// secondEnumerator.local_i = ?

In the first line, GetInts is called, and it returns the enumerable generator. The '0' argument we passed to the GetInts method is written to the generator's param_i field. Then we get firstEnumerator. This will be practically the same object as enumerable. At the GetEnumerator method's call, an IEnumerator type object is returned. This object's local_i field is assigned the value from the enumerable object's param_i field.

Then the MoveNext method is called a couple of times. This leads to changes in the local_i value - both for firstEnumerator and enumerable, because these links refer to the same object.

At the end of the code snippet, the second IEnumerator is acquired. What do you think, is the value of the local_i field at initialization? Obviously, the value is the same as the one passed to the GetInts yield method initially.

This is exactly the value that the param_i field stores. No matter how the local_i value changes with MoveNext calls, the param_i field remains unchanged. As we saw earlier, the param_i field's value is recorded to the local_i field object the GetEnumerator method returns.

Conclusion. Objects the GetEnumerator method returns, are to an extent independent of each other. To start generating sequences, they use parameters passed at the yield method's call. This is possible thanks to storing the original parameter in an additional field.

Returning an IEnumerator object

Above we reviewed a few features of generators, whose classes are based on yield methods that return IEnumerable. All of them are in some way connected to the fact that the generator class implements both IEnumerator and IEnumerable. Everything is much simpler with classes generated based on methods that return IEnumerator, because such generator classes do not implement IEnumerable. Consequently, type casting tricks we discussed earlier will not work anymore. Below I listed the main features of classes generated for the yield method that returns IEnumerator and the yield method that returns IEnumerable:

no GetEnumerator method;
no initialThreadId field;
the use of one field to store parameter values instead of two.

Aside from this, there is a slight difference in how the generator classes are created. You may remember when a generator class is created for the yield method that returns IEnumerable, a '-2' value is recorded to the state field and the value is changed only when GetEnumerator is called. When state is '-2', the MoveNext method does not do anything and returns false.

If a generator is created for a method that returns IEnumerator, it does not have any GetEnumerator methods. Which is why '0' is recorded to the state field right after an item is instantiated.

Why the generator implements Dispose

The generator is forced to implement Dispose, because IEnumerable<T> derives from IDisposable. In most cases the generator's Dispose method is empty. However, sometimes Dispose contains code. These cases involve the using operator.

Take a look at the code fragments below:

using (var disposableVar = CreateDisposableObject())
{
  ....
}

using var disposableVar = CreateDisposableObject();
....

This code ensures the Dispose method is called for a disposableVar object - either when the first block exits (first example), or when the method exits (second example). You can read more about using in the official documentation.

The using statement inside the yield method affects the generator class the compiler creates. In particular, Dispose can be called for objects that are inside using blocks. However, Dispose will be called even if an exception was thrown during execution - this is the using operator's expected behavior.

As you might guess, the generator's Dispose method makes Dispose calls for all the corresponding fields. Such fields represent local variables involved with the using operator inside the original yield method.

Let's take a look at the example below:

static IEnumerable<string> GetLines(string path)
{
  using (var reader = new StreamReader(path))
  {
    while (!reader.EndOfStream)
      yield return reader.ReadLine();
  }
}

This method returns an object that reads information from a file line by line. The using block does not affect the GetEnumerator method contents, but leads to a new method emerging:

private void Finally1()
{
  this.state = -1;
  if (this.local_reader == null)
    return;
  this.local_reader.Dispose();
}

After Dispose is called, the state field is assigned a value that will force MoveNext to not perform any actions and return false.

There may be more than one of such finally methods. If a yield method contains several using blocks, more finally methods are added and the structure of the MoveNext and Dispose methods become more complex. Here's what the Dispose method looks in this simple case:

void IDisposable.Dispose()
{
  switch (this.state)
  {
    case -3:
    case 1:
      try
      {
      }
      finally
      {
        this.Finally1();
      }
      break;
  }
}

At first glance, the structure looks unnecessarily complicated. However, making the original method's structure more complex and including several using statements fill the method with meaning. If this sounds interesting to you, I suggest you experiment with this yourself :).

Calling the generator's Dispose method makes sense if you need to stop sequence generation and free used resources. There may be other cases when this call and inheritance from IDisposable is handy. If you have ideas about what these scenarios may be, please share them in the comments below.

Now let's take a quick look at MoveNext:

bool IEnumerator.MoveNext()
{
  try
  {
    switch (this.state)
    {
      case 0:
        this.state = -1;
        this.local_reader = new StreamReader(this.local_path);
        this.state = -3;
        break;
      case 1:
        this.state = -3;
        break;
      default:
        return false;
    }
    if (!this.local_reader.EndOfStream)
    {
      this._current = this.local_reader.ReadLine();
      this.state = 1;
      return true;
    }
    this.Finally1();
    this.local_reader = null;
    return false;
  }
  fault
  {
    Dispose();
  }
}

This code executes when you've included the using operator into the yield method. Take a look at the fault block. In fact, at the time I am writing this article C# does not support this type of structure. However, this structure is used in IL-code. Here's how it works in the simplest case: if an exception is thrown in the try block, the steps from the fault block are performed. Although, I suppose, everything is not that simple here. What do you think? Please share your thoughts about the fault block features in the comments below :).

Thus, you can be sure that Dispose is called for all variables declared through using, and exactly when needed. Errors do not affect this behavior.

Do not call Reset!

Finally, let's make sure that the Reset method in the generator class really does throw an exception.

[DebuggerHidden]
void IEnumerator.Reset()
{
  throw new NotSupportedException();
}

It's all clear here - we can see NotSupportedException. Consequently, you need to remember, that you should pass the generator only to methods that do not call Reset. You can also pass the generator to methods that handle this exception correctly.

Conclusion

In this article I tried to gather information on yield in C# and to break it down for you into as many chunks as possible. I examined various cases: from the simplest samples - to methods with loops and branches. I inspected cases when yield is convenient and when there's no need for it. I even 'looked under the hood', deepening your understanding of the code and helping you understand its magic.

The 'Limitations' section mentioned that you cannot use yield return inside try-catch blocks. Now that you know what yield methods really are, you can ponder upon this and other limitations. If you want someone else to do it, you can click here and here.

Methods that use yield can really simplify your life sometimes. Behind this magic exists an entire class the compiler generated, which is why I recommend you use the yield feature only when it is significantly more convenient that, for example, LINQ. It is also important to differentiate between the cases, when 'lazy execution' is handy - and when it's better to just stick elements into a good old List and not worry :).

If you liked my article, subscribe to my Twitter account. Every once in a while, I write about fascinating features I find when coding - or announce useful articles on various topics.

Well, that's it for today. Thank you for reading!

#CSharp #Knowledge