Showing posts with label LinQ. Show all posts
Showing posts with label LinQ. Show all posts

Monday, 9 April 2012

ABC of C# Iterator Pattern


Introduction
The aim of this alternative tip is to give more relevant information to the beginner as well as why the heck one should bother about iterators at all.
Lets start with that: why to bother what the iterator pattern is? You use the iterator pattern most likely in your every day work maybe without being aware of:
IList<string> names = new List<string>() { "Himanshu", "Hetal", "Viral" };

foreach (string name in names)
{
    Console.Write("Name : {0}", name);
}

The iterator tells the foreach loop in what sequence you get the elements.

Using  Code

A class that can be used in a foreach loop must provide a IEnumerator<T> GetEnumerator() { ... } method. The method name is reserved for that purpose. This function defines in what sequence the elements are returned.

Some classes may also provide the non-generic IEnumerator GetEnumerator() { ... } method. This is from the older days where there were no generics yet, e.g. all non-generic collections like Array, etc. provide only that "old-fashioned" iterator function.

Behind the scenes, the foreach loop

foreach (string name in names) { ... }

translates into:

Explicit Generic Version                                                                               Explicit non-generic version

using (var it = names.GetEnumerator())          var it = names.GetEnumerator()
while (it.MoveNext())                           while (it.MoveNext())
{                                               {
    string name = it.Current;                       string name = (string)it.Current;
    ....                                            ....
}                                               }

the two explicit iterator calls can be combined into one:

var it = names.GetEnumerator()
using (it as IDisposable)
while (it.MoveNext())
{
    string name = it.Current;
    ....
}

So, the core of the C# implementation of the Iterator Pattern is the GetEnumerator() method. What are now these IEnumerator/IEnumerator<T> interfaces?

What’s an iterator?

An iterator provides a means to iterate (i.e. loop) over some items. The sequence of elements is given by the implementations of the IEnumerator/IEnumerator<T> interfaces:

namespace System.Collections
{
    public interface IEnumerator
    {
        object Current { get; }
        bool MoveNext();
        void Reset();
    }
}
namespace System.Collections.Generic
{
    public interface IEnumerator<out T> : IDisposable, IEnumerator
    {
        T Current { get; }
    }
}

The pattern is basically given by MoveNext() and Current. The semantics is that one has to first call MoveNext() to get to the first element. If MoveNext() returns false, then there is no more element. Current returns the current element. You are not supposed to call Current if the preceeding MoveNext() returned false.
The MoveNext() gives the next element in the sequence of elements - what ever that sequence is, e.g. from first to last, or sorted by some criteria, or random, etc.
You know now how to apply the iterator pattern (e.g. in a foreach loop) and that this is possible for all classes that provide the above mentioned GetEnumerator() method (the iterator).
What is IEnumerable/IEnumerable<> for?
These interfaces are quite simple:
namespace System.Collections
{
    public interface IEnumerable
   {
        IEnumerator GetEnumerator();
   }
}

namespace System.Collections.Generic
{
    public interface IEnumerable<out T> : IEnumerable
   {
        IEnumerator<T> GetEnumerator();
   }
}
So, easy answer: they provide an iterator (one "old-fashioned", one with generics).
A class that implements one of these interfaces provides an iterator implementation. Furthermore, such an instance can be used wherever one of these interface is needed.
 Note: it is not required to implement this interface to have an iterator: one can provide its GetEnumerator() method without implementing this interface. But in such a case, one can not pass the class to a method where IEnumerable<T> is to be passed.
E.g. there is a List<T> constructor that takes an IEnumerable<T> to initialize its content from that iterator.
namespace System.Collections.Generic
{
    public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
    {
        ...
        public List(IEnumerable<T> collection);
        ...
    }
}
If you look now at the LINQ extension methods: many of these base on IEnumerable<T>, thus, extending any iterator class by some new function that often return yet another iterator. E.g.
namespace System.Linq
{
    public static class Enumerable
     {

        ...
        public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source, Func<TSource, TResult> selector);

        ...
        public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source, Func<TSource, bool> predicate);

        ...
    }
}
This is used as:
    List<string> list = ...;
    var query = from s in list where s.Length > 2 select s;
    foreach (string s in query)
    {
       ...
    }
And again, the C# language provides an alterantive way to express this (one could say simpler):
    List<string> list = ...;
    var query = from s in list where s.Length > 2 select s;
     foreach (string s in query)
    {
       ...
    }
This is LINQ - Language Integrated Queries: Extension methods that can be expressed in the form from ... in ... where ... select (to show some of the LINQ keywords). Please note that you can always write a LINQ expression as a chain of extension methods as shown above.
So, now you know the benefits of the IEnumerable<T> interfaces and where and how they are used.








Tuesday, 18 October 2011

Query Execution in LinQ


In LINQ, queries have two different behaviors of execution: immediate and deferred. In this article, we will take a quick overview of how Deferred query execution and Immediate Query Execution works in LINQ
Deferred Query Execution
To understand Deferred Query Execution, let’s take the following example which declares some Employees and then queries all employees with Age > 28:

class Employee

{

public int ID { get; set; }

public string Name { get; set; }

public int Age { get; set; }

}

static void main(string[] args)
{
var empList = new List<Employee>(
new Employee[]
{
new Employee{ID=1, Name=“Himanshu”, Age=“30″},
new Employee{ID=2, Name=“Rahul”, Age=“35″},
new Employee{ID=3, Name=“Hetal”, Age=“26″},
new Employee{ID=4, Name=“Varsha”, Age=“28″},
});
var lst = from e in empList
where e.Age > 28 //<= query seems to be executed here
select new { e.Name };
foreach (var emp in lst)
Console.WriteLine(emp.Name);
Console.ReadLine();
}
OUTPUT: Himanshu, Rahul
Looking at the query shown above, it appears that the query is executed at the point where the arrow is pointing towards. However that’s not true. The query is actually executed when the query variable is iterated over, not when the query variable is created. This is called deferred execution.
Now how do we prove that the query was not executed when the query variable was created? It’s simple. Just create another Employee instance after the query variable is created
static void main(string[] args)
{
var empList = new List<Employee>(
new Employee[]
{
new Employee{ID=1, Name=“Himanshu”, Age=“30″},
new Employee{ID=2, Name=“Rahul”, Age=“35″},
new Employee{ID=3, Name=“Hetal”, Age=“26″},
new Employee{ID=4, Name=“Varsha”, Age=“28″},
});
var lst = from e in empList
where e.Age > 28 //<= query Variable
select new { e.Name };
empList.Add(new Employee { ID = 5, Name = “Tarun”, Age = “39″ }); //<= New employee initialization
foreach (var emp in lst)
Console.WriteLine(emp.Name);
Console.ReadLine();
}
Notice we are creating a new Employee instance after the query variable is created. Now had the query been executed when the query variable is created, the results would be the same as the one we got earlier, i.e. only two employees would meet the criteria of Age > 28. However the output is not the same
OUTPUT: Himanshu, Rahul, Tarun.
What just happened is that the execution of the query was deferred until the query variable was iterated over in a foreach loop. This allows you to execute a query as frequently as you want to, like fetching the latest information from a database that is being updated frequently by other applications. You will always get the latest information from the database in this case.

Immediate Query Execution
You can also force a query to execute immediately, which is useful for caching query results. Let us say we want to display a count of the number of employees that match a criteria.
static void main(string[] args)
{
var empList = new List<Employee>(
new Employee[]
{
new Employee{ID=1, Name=“Himanshu”, Age=“30″},
new Employee{ID=2, Name=“Rahul”, Age=“35″},
new Employee{ID=3, Name=“Hetal”, Age=“26″},
new Employee{ID=4, Name=“Varsha”, Age=“28″},
});
var lst = (from e in empList
where e.Age > 28
select e).Count(); //<= Immediate Execution
empList.Add(new Employee { ID = 5, Name = “Tarun”, Age = “39″ });
Console.WriteLine(“Total employees whose age is > 28 are {0}”, lst);
Console.ReadLine();
}
In the query shown above, it order to count the elements that match the condition, the query must be executed, and this is done automatically when Count( ) is called. So adding a new employee instance after the query variable declaration does not have any effect here, as the query is already executed. The output will be 2, instead of 3.
The basic difference between a Deferred execution vs Immediate execution is that Deferred execution of queries produce a sequence of values, whereas Immediate execution of queries return a singleton value and is executed immediately. Examples are using Count(), Average(), Max() etc.