Monday 9 April 2012

ABC of C# Iterator Pattern


Introduction
The aim of this alternative tip is to give more relevant information to the beginner as well as why the heck one should bother about iterators at all.
Lets start with that: why to bother what the iterator pattern is? You use the iterator pattern most likely in your every day work maybe without being aware of:
IList<string> names = new List<string>() { "Himanshu", "Hetal", "Viral" };

foreach (string name in names)
{
    Console.Write("Name : {0}", name);
}

The iterator tells the foreach loop in what sequence you get the elements.

Using  Code

A class that can be used in a foreach loop must provide a IEnumerator<T> GetEnumerator() { ... } method. The method name is reserved for that purpose. This function defines in what sequence the elements are returned.

Some classes may also provide the non-generic IEnumerator GetEnumerator() { ... } method. This is from the older days where there were no generics yet, e.g. all non-generic collections like Array, etc. provide only that "old-fashioned" iterator function.

Behind the scenes, the foreach loop

foreach (string name in names) { ... }

translates into:

Explicit Generic Version                                                                               Explicit non-generic version

using (var it = names.GetEnumerator())          var it = names.GetEnumerator()
while (it.MoveNext())                           while (it.MoveNext())
{                                               {
    string name = it.Current;                       string name = (string)it.Current;
    ....                                            ....
}                                               }

the two explicit iterator calls can be combined into one:

var it = names.GetEnumerator()
using (it as IDisposable)
while (it.MoveNext())
{
    string name = it.Current;
    ....
}

So, the core of the C# implementation of the Iterator Pattern is the GetEnumerator() method. What are now these IEnumerator/IEnumerator<T> interfaces?

What’s an iterator?

An iterator provides a means to iterate (i.e. loop) over some items. The sequence of elements is given by the implementations of the IEnumerator/IEnumerator<T> interfaces:

namespace System.Collections
{
    public interface IEnumerator
    {
        object Current { get; }
        bool MoveNext();
        void Reset();
    }
}
namespace System.Collections.Generic
{
    public interface IEnumerator<out T> : IDisposable, IEnumerator
    {
        T Current { get; }
    }
}

The pattern is basically given by MoveNext() and Current. The semantics is that one has to first call MoveNext() to get to the first element. If MoveNext() returns false, then there is no more element. Current returns the current element. You are not supposed to call Current if the preceeding MoveNext() returned false.
The MoveNext() gives the next element in the sequence of elements - what ever that sequence is, e.g. from first to last, or sorted by some criteria, or random, etc.
You know now how to apply the iterator pattern (e.g. in a foreach loop) and that this is possible for all classes that provide the above mentioned GetEnumerator() method (the iterator).
What is IEnumerable/IEnumerable<> for?
These interfaces are quite simple:
namespace System.Collections
{
    public interface IEnumerable
   {
        IEnumerator GetEnumerator();
   }
}

namespace System.Collections.Generic
{
    public interface IEnumerable<out T> : IEnumerable
   {
        IEnumerator<T> GetEnumerator();
   }
}
So, easy answer: they provide an iterator (one "old-fashioned", one with generics).
A class that implements one of these interfaces provides an iterator implementation. Furthermore, such an instance can be used wherever one of these interface is needed.
 Note: it is not required to implement this interface to have an iterator: one can provide its GetEnumerator() method without implementing this interface. But in such a case, one can not pass the class to a method where IEnumerable<T> is to be passed.
E.g. there is a List<T> constructor that takes an IEnumerable<T> to initialize its content from that iterator.
namespace System.Collections.Generic
{
    public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
    {
        ...
        public List(IEnumerable<T> collection);
        ...
    }
}
If you look now at the LINQ extension methods: many of these base on IEnumerable<T>, thus, extending any iterator class by some new function that often return yet another iterator. E.g.
namespace System.Linq
{
    public static class Enumerable
     {

        ...
        public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source, Func<TSource, TResult> selector);

        ...
        public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source, Func<TSource, bool> predicate);

        ...
    }
}
This is used as:
    List<string> list = ...;
    var query = from s in list where s.Length > 2 select s;
    foreach (string s in query)
    {
       ...
    }
And again, the C# language provides an alterantive way to express this (one could say simpler):
    List<string> list = ...;
    var query = from s in list where s.Length > 2 select s;
     foreach (string s in query)
    {
       ...
    }
This is LINQ - Language Integrated Queries: Extension methods that can be expressed in the form from ... in ... where ... select (to show some of the LINQ keywords). Please note that you can always write a LINQ expression as a chain of extension methods as shown above.
So, now you know the benefits of the IEnumerable<T> interfaces and where and how they are used.








No comments:

Post a Comment