Collections






Collections

Arrays have some obvious limitations. The most obvious is that they are fixed size. If you only allocate space for 10 items initially, once you'd like to add the 11th item you have to manually increase the size. This entails allocating an entirely new array of larger capacity, copying the old contents over, appending the new item, and updating all of the appropriate references to point to the new array. Another example is when you need to insert or remove an item from the middle of an array. How would you do that with a fixed size array? Well, it's less than straightforward than you'd like and entails copying array contents around quite a bit. These situations are very common, and shouldn't be hard.

Of course, you could write your own routines to do the above. But it's less than enjoyable to write and rewrite these algorithms time and time again when you really want to be focusing on adding value to your application. Thankfully, the .NET Framework offers a set of collections APIs. This includes not only support dynamically sized array-like sequences of objects but also dictionaries that associate keys with values (often called hash tables, or associative arrays), stacks and queues, collections that store their contents in a sorted state, and more. This section will walk through all of the most common collection types available in the Framework.

The .NET Framework's collection APIs fall into two major categories: those that take advantage of the CLR's support for generics and those that do not. These live in System.Collections.Generic and System.Collections, respectively, and are covered in this section in that order. The .NET Framework initially shipped without support for generics, and System.Collections is a holdover from those days. Just about all of the functionality available there is also available in the generic-based collections, albeit with more power and type safety.

Nevertheless, there is still quite a bit of code out there that uses the System.Collections APIs, so it's a good idea to understand and learn as much as you can about them. Before generics-based collections were introduced, adopting collections (e.g., ArrayList) over arrays required giving up some level of type safety for the sake of dynamic resizing. Now this sacrifice is no longer necessary.

Generic Collections

The collections in System.Collections.Generic enable you to use generics to constrain what type of item may be stored inside of an instance, increasing type safety without reducing the generality of the algorithms these types offer. For example, you can create and use an instance of a list that will only accept and deal with Strings. Alternatively, you can create your own class for which each instance will only store Strings with very little coding effort required. As you will see in this section, you can also mix these together. This namespace is new in the .NET Framework version 2.0. Most type we will discuss here reside in the mscorlib.dll assembly, although a few can be found in System.dll.

In this section, we'll first take a look at the common base classes that unify all generic collections, followed by the prebuilt collection classes available in the Framework. Lastly, we'll discuss how to go about creating your own collection classes by taking advantage of the rich type hierarchy available. For details on the CLR type system's first class support for generics, please refer back to Chapter 2.

The Usefulness of Generics

As a brief illustration of the benefits of having generic collections, consider the following set of examples. This should help you to understand and realize the power that generics brings to the table. Some of the operations displayed will be foreign right now but will make much more sense as you read further through this chapter.

Consider what we would write in version 1.x of the Framework, using weakly typed collections:

using System;
using System.Collections;

// ...

IList myIntList = new ArrayList();

myIntList.Add(55);
myIntList.Add("Uh oh");
myIntList.Add(10);

foreach (int i in myIntList)
    Console.WriteLine(i);

Some important things to notice about this sample are as follows: An instance of ArrayList is created whose intention is to store only integers, although — other than naming the variable myIntList — this intent isn't captured in a way that anybody (including the runtime) can detect. The Add operation takes an item as input and stores it in the collection.

But IList accepts anything of type Object because it is meant to be a general purpose class; thus, the Add("Uh oh") succeeds. Later on, when the foreach statement iterates through the contents, it expects to find only integers. When it encounters the mistakenly added String, an InvalidCastException is thrown at runtime. This happens several lines from the origin of the problem and can be very difficult to debug (seldom is it as blatant as the example above). Another implication of accepting Object is that value types must be boxed before being Added. Similarly, items must be unboxed as they are taken out of the list; this is hidden in the foreach loop above. This is sure to impact the performance of our code, especially if we're adding and removing lots of values.

This problem leaves developers with no choice but to either accept these risks, try to mitigate them by checking for valid types whenever examining a collection's contents (perhaps using Debug.Asserts), or go so far as to create a custom strongly typed collection that rejects anything not of the expected type. System.Collections.Specialized.StringCollection is an example of the latter that ships with the Framework. You can consider these the complacent, defensive, and offensive approaches, respectively. In most cases, the latter technique still caused problems at runtime, but they occur when somebody tries to put the wrong thing into a collection, rather than when innocently reading its contents. This can substantially improve the debuggability of your programs.

With generics, however, this problem goes away entirely. Consider this similar example:

using System;
using System.Collections.Generic;

IList<int> myIntList = new List<int>();

myIntList.Add(55);
myIntList.Add("Uh oh");
myIntList.Add(10);

foreach (int i in myIntList)
    Console.WriteLine(i);

First, notice that when we construct the list, we specify right there that it should only accept integers. We do this by using the "list of T" type, List<T>, supplying int as the type argument for its type parameter T. The compiler and runtime both enforce that only integers are used with this instance. And because generics are built right into the type system, adding value types to a list with a type parameter that is a value type does not incur the same boxing penalty that the weakly typed collections do. (The same goes for removing and unboxing.) Most importantly, though, the myIntList.Add("Uh oh") line will actually fail to even compile. This means that we never have to deal with the tricky situation above, where somebody accidentally put an instance of the wrong type into a collection.

The problems and solutions above are very reminiscent of C++'s Standard Template Library (STL), a standardized rich framework of collection APIs. The types in STL use C++ Templates, which are very similar to generics, seeking to provide parametric polymorphism. The main difference is generics' runtime type system support versus Templates, which are implemented entirely during precompilation. But STL accomplishes the same goal of providing general-purpose collection algorithms with a strong level of type safety. In fact, it's undeniable that the .NET Framework's generics collections were heavily influenced by STL. In parallel with development of the .NET Framework generics classes, the Visual C++ team developed a managed version of the C++ STL APIs that integrate seamlessly with generics, called STL.NET. Please refer to the "Further Reading" section for more information.

Base Interfaces

All generic collection classes implement a core set of basic interfaces that enable common operations independent of the algorithms used by an implementation. This allows generalized algorithms to operate on collections of things. It is useful to understand the relationships between these types when trying to understand the concrete classes available. Often you can effectively deal with collections just in terms of their interfaces, worrying about the implementing type only when instantiating a new collection instance.

Figure illustrates the type hierarchy containing these common base interfaces.

Image from book
Figure: Generic collections base interface hierarchy.
Simple Collections (ICollection<T>)

The ICollection<T> interface is used to represent a simple collection of items, each of type T. For example, an ICollection<string> contains a collection of string references. This interface exposes the capability to access and modify the contents of a collection and to determine its length. It also derives from IEnumerable<T>, meaning that any implementations of ICollection<T> will also supply a GetEnumerator method that returns an enumerator, using which you may walk its contents. (More on that below.)

namespace System.Collections.Generic
{
    public interface ICollection<T> : IEnumerable<T>
    {
        // Properties
        int Count { get; }
        bool IsReadOnly { get; }

        // Methods
        void Add(T item);
        void Clear();
        bool Contains(T item);
        void CopyTo(T[] array, int arrayIndex);
        IEnumerator<T> GetEnumerator(); // Inherited from IEnumerable<T>
        IEnumerator GetEnumerator();    // Inherited from IEnumerable<T>
        bool Remove(T item);
    }
}

The Add method adds an item at an unspecified location of the collection, and Remove removes the (first occurrence of the) specified item from the list. Remove returns true if it successfully removed the item or false otherwise (e.g., if it were unable to locate it within its contents). The Clear method will wipe out the entire contents of the target collection. The IsReadOnly property indicates whether the instance is read-only or not; if it is, expect all of the methods just mentioned to fail via thrown exceptions at runtime. Specific implementation details, such as how the Add method places the item into its contents (e.g., at the beginning, end, at a random point in the middle, etc.) are not provided by this interface. For these details, please refer to the concrete type you are using, detailed later in this chapter.

ICollection<T> also provides a couple properties with which to read information about a collection instance. The Count property will retrieve the number of items currently stored. If you want to access these contents, you'll have to either use the supplied enumerator or copy the contents to a target array using the CopyTo method. Because this interface doesn't make guarantees about how an implementation will store its contents, you cannot access individual elements by a numeric index. Lastly, Contains will search for the specified item in its contents and return either true or false to indicate whether it was found.

This code illustrates some of these operations working together:

ICollection<int> myCollection = new Collection<int>();

// First, add a few elements to the collection:
myCollection.Add(105);
myCollection.Add(232);

myCollection.Add(350);

// ...And then delete one:
myCollection.Remove(232);

// Search for some specific elements:
Console.WriteLine("Contains {0}? {1}", 105, myCollection.Contains(105));
Console.WriteLine("Contains {0}? {1}", 232, myCollection.Contains(232));

// Enumerate the collection's contents:
foreach (int i in myCollection)
    Console.WriteLine(i);

// Lastly, copy the contents to an array so that we may iterate that:
int[] myArray = new int[myCollection.Count];
myCollection.CopyTo(myArray, 0);
for (int i = 0; i < myArray.Length; i++)
    Console.WriteLine(myArray[i]);

The result of executing this code is that myCollection.Contains(105) returns true but myCollection .Contains(232) returns false (since we removed it with the myCollection.Remove(232) call shortly after adding it). The program prints out the contents of the collection first, followed by the array. Both contain the same thing: the numbers 105 and 350.

Indexable Collections (IList<T>)

IList<T> interface derives from ICollection<T> and adds a few members to support adding, removing, and accessing contents using a 0-based numerical index. This means that an IList<T> can be used wherever an ICollection<T> was expected but acts much like an array. Of course, the big difference between an IList<T> and an array is that lists support dynamic resizing.

namespace System.Collections.Generic
{
    public interface IList<T> : ICollection<T>
    {
        // Note: Members inherited from ICollection<T> have been omitted.

        // Properties
        T this[int index] { get; set; }

        // Methods
        int IndexOf(T item);
        void Insert(int index, T item);
        void RemoveAt(int index);
    }
}

A couple additional ways to modify a collection are introduced by this interface. First, Insert places an item into the list at the specified index. RemoveAt similarly removes the item at the specified index. You may also access a list's contents based on an index. An indexer is available that takes an index into the collection and returns the item located at that index. The IndexOf method returns the integer index of the specified index in the list, or -1 if it wasn't found in the collection.

This code sample illustrates the use of these new members:

IList<double> myList = new List<double>();

// First, we add, insert, and remove some items:
myList.Add(10.54);
myList.Add(209.2234);
myList.Insert(1, 39.999);
myList.Add(438.2);
myList.Remove(10.54);

// Then, we print some specific element indexes:
Console.WriteLine("IndexOf {0} = {1}", 209.2234, myList.IndexOf(209.2234));
Console.WriteLine("IndexOf {0} = {1}", 10.54, myList.IndexOf(10.54));

// Lastly, we enumerate the list using Count and IList<T>'s indexer:
for (int i = 0; i < myList.Count; i++)
    Console.WriteLine(myList[i]);

We set up our list by adding two numbers (10.54 and 209.2234), inserting 39.999 between them, adding 438.2 at the end, and then removing the number 10.54. We then locate the index of 209.2234, which ends up being 1 since we removed the previously first element 10.54 but still have the 39.999 just before 209.2234. We also look for 10.54, which results in -1 because we removed it from the collection. Lastly, we loop through the contents of the list, accessing its contents with the indexer, a very convenient way to access a list much as you would an array.

Dictionaries (IDictionary<TKey,TValue>)

A dictionary is a container with a collection of key and value associations. Aside from dictionary, this data structure is often called an associative array, map, or hash table, the latter of which is actually the name of a common implementation technique for creating dictionaries. With IDictionary<TKey, TValue>, the type parameters TKey and TValue represent the type of the keys and values it can store, respectively. So, for example, an IDictionary<string, int> is a dictionary that contains string keys that map to integer values.

namespace System.Collections.Generic
{
    // Note: Members inherited from ICollection<T> have been omitted.

    public interface IDictionary<TKey, TValue> :
        ICollection<KeyValuePair<TKey, TValue>>
    {
        // Note: Members inherited from ICollection<...> have been omitted.

        // Properties
        TValue this[TKey key] { get; set; }
        ICollection<TKey> Keys { get; }
        ICollection<TValue> Values { get; }

        // Methods
        void Add(TKey key, TValue value);
        bool ContainsKey(TKey key);
        bool Remove(TKey key);

        bool TryGetValue(TKey key, out TValue value);
    }

    [Serializable, StructLayout(LayoutKind.Sequential)]
    public struct KeyValuePair<TKey, TValue>
    {
       public TKey Key;
       public TValue Value;
       public KeyValuePair(TKey key, TValue value);
       public override string ToString();
    }
}

Each key-value association is represented using a KeyValuePair object. Thus, this type is really just a collection of KeyValuePair<TKey,TValue>'s. KeyValuePair is a simple, lightweight value type that provides two public fields: Key of type TKey and Value of type TValue. IDictionary adds some convenience operations to abstract away the fact that it is really just a faade over a collection of key-value pairs. Most of the time you are able to deal with the key and value types directly rather than pairs.

Both the Add method and index-based setter create an association between a key and value, usually resulting in a new KeyValuePair (although this is implementation-specific). The Add(KeyValuePair<K,V>) overload inherited from ICollection can be used in cases where you already have or wish to manually construct a key-value pair yourself. ContainsKey just indicates whether any value association exists for the provided key. The Remove method removes a value association based on the key, returning a bool to communicate whether anything was found and removed or not. As previously mentioned, a read-write indexer is also available, which uses the key as the index and enables you to set or retrieve the value associated with it. Lastly, the Keys and Values properties expose collections of the full set of keys and values currently stored in the dictionary.

This example code illustrates some of the above operations:

IDictionary<string, decimal> salaryMap = new Dictionary<string, decimal>();

// Add some entries into the dictionary:
salaryMap.Add("Sean", 62250.5M);
salaryMap.Add("Wolf", 16000.0M);
salaryMap.Add("Jamie", 32900.99M);

// Now, remove one:
salaryMap.Remove("Wolf");

// Check whether certain keys exist in the map:
Console.WriteLine(salaryMap.ContainsKey("Sean"));  // Prints 'True'
Console.WriteLine(salaryMap.ContainsKey("Steve"));  // Prints 'False'

// Retrieve some values from the map:
Console.WriteLine("{0:C}", salaryMap["Sean"]); // Prints '$62,250.50'
Console.WriteLine("{0:C}", salaryMap["Jamie"]); // Prints '$32,900.99'

// Now just iterate over the map and add up the values:
decimal total = 0.0M;
foreach (decimal d in salaryMap.Values)
    total += d;
Console.WriteLine("{0:C}", total); // Prints '$95,151.49'

Notice that we walk over the Values collection in this example. Say that you wanted to loop over the key-value pairs themselves, printing them out to the console. A common (but usually very inefficient) way to do this is as follows:

foreach (string key in salaryMap.Keys)
    Console.WriteLine("{0} == {1}", key, salaryMap[key]);

Consider what's actually happening here. We have to do an extra lookup within the loop in order to access a key's associated value (i.e., salaryMap[key]). Given that the dictionary is usually producing its Keys based on the actual associations themselves, you're causing it to walk the list multiple times. IDictionary<TKey,TValue> derives from ICollection<KeyValuePair<TKey,TValue>>, meaning that you can use the KeyValuePair type itself during iteration. We can substantially improve the above code as follows:

foreach (KeyValuePair<string, decimal> kvp in salaryMap)
    Console.WriteLine("{0} == {1}", kvp.Key, kvp.Value);

This prints the same output but does so in a more efficient manner.

Enumerators (IEnumerable<T> and IEnumerator<T>)

The general concept of an enumerator is that of a type whose sole purpose is to advance through and read another collection's contents. Enumerators do not provide write capabilities. This type can be viewed as a cursor that advances over each individual element in a collection, one at a time.

IEnumerable<T> represents a type whose contents can be enumerated, while IEnumerator<T> is the type responsible for performing the actual enumeration:

using System;
using System.Collections;

namespace System.Collections.Generic
{
    public interface IEnumerable<T> : IEnumerable
    {
        // Methods
        IEnumerator<T> GetEnumerator();
        IEnumerator GetEnumerator(); // Inherited from IEnumerable
    }

    public interface IEnumerator<T> : IDisposable, IEnumerator
    {
        // Properties
        T Current { get; }
        object Current { get; }      // Inherited from IEnumerator

        // Methods
        void Dispose();              // Inherited from IDisposable
        bool MoveNext();             // Inherited from IEnumerator
        void Reset();                // Inherited from IEnumerator
    }
}

Upon instantiation, an enumerator becomes dependent on a collection. IEnumerable<T> is a very simple interface but is implemented by every generic collection class in the Framework (as well as arrays). As depicted in Figure, it is even derived from by the other base interfaces in this section: ICollection<T>, IList<T>, and IDictionary<K,V>. Enumerator construction is almost always handled by the collection type itself.

Note

Note that implementing IEnumerable<T> consists of implementing the IEnumerable interface too. This means that you will have to supply two GetEnumerator methods: one that returns an IEnumerator<T> and another that returns IEnumerator. This is for backward compatibility with the nongeneric collections. Ordinarily, you can implement this as a one-liner that just performs a return ((IEnumerable<T>)this).GetEnumerator().

An enumerator can become invalid after construction should the collection change underneath it, in which case it must throw an InvalidOperationException. This could happen if another thread modifies a collection while you're enumerating its contents, for example.

Walking an Enumerator's Contents

Both the C# foreach and VB ForEach constructs rely on enumerators to access the contents of any enumerable collection. So when you write the following code:

IEnumerable<string> enumerable = /*...*/;
foreach (string s in enumerable)
    Console.WriteLine(s);

The C# compiler will actually emit a call to enumerable's GetEnumator method for you and will use the resulting enumerator to step through its contents. You can also use the same members used by the foreach construct to manually traverse a collection. For example, the above code would actually look something like this if you were to use the resulting IEnumerator<T> directly:

IEnumerable<string> enumerable = /*...*/;
IEnumerator<string> enumerator = enumerable.GetEnumerator();
while (enumerator.MoveNext())
{
    string s = enumerator.Current;
    Console.WriteLine(s);
}

This is not wildly different, but the foreach example is more readable and less manual work.

The MoveNext method simply advances the position of the current enumerator, returning true if it was able to advance, or false if it has come to the end of the contents. Notice that when the enumerator is initially constructed, it is positioned before the first element. You must invoke MoveNext once before trying to access the first element. The Current property returns the element that is currently positioned at, of type T.

C# Iterators (Language Feature)

If you are interested in creating an IEnumerator<T>, C# provides a powerful and flexible iterator construct. To create an iterator, simply create a method that either returns either IEnumerable<T> (or just IEnumerable, the weakly typed equivalent) or IEnumerator<T>, and generates a sequence of values using the yield statement. The C# compiler will go ahead and create the underlying IEnumerator<T> type for you:

using System.Collections;
using System.Collections.Generic;

class DoublerContainer : IEnumerable<int>
{
    private List<int> myList = new List<int>();

    public IEnumerator<int> GetEnumerator()
    {
        foreach (int i in myList)
            yield return i * 2;
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return ((IEnumerable<int>)this).GetEnumerator();
    }
}

Rather than manually creating a new IEnumerator<T> class, the C# compiler does it for us. The result is an enumerator that calculates the next value each time a call to MoveNext is made, passing the next value and waiting to be called again via a call to yield. When the method returns, it indicates to the caller that they have reached the end of the collection.

Iterators are quite complex under the hood. They perform a state machine transformation on the code so that we get the appearance of yielding values. This is much like coroutines or continuations in programming environments like Lua and Ruby but is really just a compiler trick. Each time we yield a value, the method is saving state to a closure object such that it can pick up where it left off next. The stack is not maintained, however, as is the case with coroutines. While the resulting code is quite lengthy — too lengthy to copy and paste into this book — I encourage you to disassemble the output from compiling the above example; it's impressive!

Note that you don't have to use this technique just for straightforward enumeration of collections. You can even create types that produce and consume things, or that perform incremental calculations of algorithms. Consider Listing 6-1, which incrementally calculates the Fibonacci series using iterators.

Listing 6-1: Calculation of the Fibonacci series using iterators
Image from book
IEnumerable<int> Fibonacci(int max)
{
    int i1 = 1;
    int i2 = 1;

    while (i1 <= max)
    {
        yield return i1;
        int t = i1;
        i1 = i2;
        i2 = t + i2;
    }
}

Image from book

To print out the sequence of numbers this generates, you could use the following code:

foreach (int i in Fibonacci(21))
    Console.WriteLine(i);

This is of course a pretty esoteric example but consider if the method generated continuous stock quotes, the latest weather forecast, or some other useful real-time data. The expressiveness of this construct is quite powerful indeed.

Collections Implementations

There are quite a few generics-based collection classes available in the Framework, each of which is appropriate for different scenarios. These types implement the various interfaces introduced in the previous section. The following text only highlights the implementation-specific details and does not go over the general interface methods. See above for more details on those.

Standard List (List<T>)

The List<T> class is the most commonly used implementation of IList<T>, providing an ordered, indexable collection of objects. It is a very basic dynamically sized list whose capacity automatically grows to accommodate new items.

When you construct a new list, you can optionally pass in its initial capacity as an argument using the List<T>(int capacity) constructor. This is usually a good idea. By default, when you use the default List<T>() constructor, your list will have a list with a very small initial capacity. This means that you'll likely end up unnecessarily resizing your array right away. The capacity of an List<T> grows exponentially when it has become full. In other words, if you try to add a new item for which it doesn't have space, its internal buffer will grow to twice its current value.

You can also create a new List<T> from any IEnumerable<T> object by using the List<T>(IEnumerable<T>) constructor. This allocates a new list into which it copies the contents of the target enumerable object. So for instance, you can easily create lists from other implementations of collections without having to manually copy their contents:

// Creating a list from an array:
List<int> intList = new List<int>(new int[] { 3, 5, 15, 1003, 25 });

// Creating a list from a dictionary:
IDictionary<string, DateTime> dictionary = /*...*/;
List<KeyValuePair<string, DateTime>> keyValueList =
    new List<KeyValuePair<string, DateTime>>(dictionary);

// Creating a list from a queue:
Queue <string> q = new Queue<string>(); //...
List<string> stringList = new List<string>(q);

// ...and so forth.

Of course, List<T> supports all of IList<T> and ICollection<T>'s methods, so you can Add, Remove, and index into the array. The AddRange(IEnumerable<T>) method takes an enumerable object and adds its contents into an existing list.

If you need to get a copy of a list that is read-only, for example to hand out a copy of a class's internal list, you can use the AsReadOnly() method. This returns you a collection that cannot be modified, represented by the ReadOnlyCollection<T> type (discussed later):

public ICollection<T> GetSomeList()
{
    List<int> list = /*...*/;
    ReadOnlyCollection<int> roList = list.AsReadOnly();
    return roList; // The caller cannot modify the contents of this list.
}

The type also provides a variety of methods such as IndexOf and LastIndexOf to obtain indexes to items within a list. Many operations provide overloads to target a specific range of a list's contents, for example GetRange, RemoveRange, and so on. These are identifiable because the last two parameters are both integers, index and count. Index is a 0-based index that specifies where to begin the operation, and count indicates the number of elements that should be affected.

List<T> also has methods like Reverse and Sort that actually modify the order in which a list's items are stored (in place). The default no-argument version of Sort will only function correctly if T implements IComparable. Otherwise, it requires an IComparer<T> or Comparison<T> to implement the comparison logic. Sorting and comparisons, including the IComparer<T> and Comparison<T> types, are described later in this chapter.

The BinarySearch method enables more efficient searching for large lists of sorted elements. Rather than a linear search through the contents — e.g., what IndexOf implements — a more intelligent algorithm is used. BinarySearch does not work properly on lists that have not been sorted due to the nature of the algorithm. It returns the index to the specified item in the list, or a negative number if it cannot be located. If you used custom sort logic, you must supply the same comparer algorithm that you used for sorting; otherwise, the search will not correctly navigate the list to locate the sought-after item.

Note

The reason that a BinarySearch(...) only works on sorted input is a byproduct of the binary search algorithm itself. It works by splitting a list in halves. It then compares the sought-after value against the midpoint of these halves. If the sought-after value is "less than" (or more appropriately, ordered before — that is Compare(soughtAfter, midpoint) is negative) — the search will continue the search in the left half (splitting that further, and continuing recursively). If the value is "greater than" it will do the same in the right half. If the two values are equal, the item has been found. As you can see, this algorithm will only work if the list it operations on has been sorted. Otherwise, the assumptions it makes about comparability and the structure of the data relative to the current position would be entirely wrong.

List<T> of course has a standard enumerator along with its indexer to enable traversal of its contents. In addition to those mechanisms, however, you can also use the ForEach(Action<T>) method. It accepts a delegate with the signature void(T) and executes it once for each item in the list. This example shows use of all three mechanisms to print out a list's contents:

List<string> myList = /*...*/;

// 'foreach' using an enumerator:
foreach (string s in myList)
    Console.WriteLine(s);

// Using the indexer:

for (int i = 0; i < myList.Count; i++)
    Console.WriteLine(myList[i]);

// Using the Action<T> delegate:
myList.ForEach(delegate(string s) { Console.WriteLine(s); });

The output for each approach is the same — a printout of all of myList's contents.

Functional Methods

Action<T> is one of a few functional delegates in the BCL. The others include Converter<TInput,TOutput> and Predicate<T>. They incorporate some popular concepts from functional-style programming, such as those found in LISP and Ruby, into the .NET Framework. C#'s support for anonymous delegates compliments this as a first class way of programming in the Framework. These types themselves are detailed further later in this section; for now, we will discuss how they interact with the List<T> type. As noted above, many of the following functions are also available for use on arrays as static methods on the Array type.

The List<T> method ConvertAll<TOutput>(Converter<T,TOutput> converter) returns a new List<TOutput>, which contains the result of applying the conversion function over each element in the target list. This is often referred to as a map operation. If you are not familiar with functional programming this approach can look and feel a bit unconventional. But it's very powerful once you are comfortable with it. This method takes a Converter delegate whose purpose is to, given an instance of type T, return an instance of type TOutput.

Consider this brief example as an illustration; we have a list of strings and would like to quickly parse each of them to construct a list of integers:

List<string> stringList = new List<string>(
    new string[] { "99", "182", "15" });
List<int> intList = stringList.ConvertAll<int>(Convert.ToInt32);

Notice how little code we had to write. There was no need to manually iterate through the list and perform conversions along the way (ConvertAll does that for us). This powerful construct enables us to easily reuse other methods that follow the pattern of taking input of one type and returning another, as shown in the above example where we take advantage of the preexisting Convert.ToInt32(string) method.

Of course, you can write your own conversion function, too. The above example will unfortunately result in an InvalidFormatException being thrown should we have a string in the wrong format. We might want to avoid this situation, and so we could write own method instead:

int ConvertStringToInt(string input)
{
    int result;
    if (!int.TryParse(input, out result))
        result = -1;
    return result;
}

This method tries to parse an integer from a string, returning -1 if it failed because the string was in the wrong format. We can then use that in our conversion to avoid its failing midway:

List<string> stringList = new List<string>(
    new string[] { "99", "182", "invalid", "15" });
List<int> intList = stringList.ConvertAll<int>(ConvertStringToInt);

Notice that the anonymous delegate syntax could very well be used here, too. For long methods that have the potential to be reused — like ConvertStringToInt above — it's usually a good idea to refactor it out into a standalone method. This is what the C# compiler does with anonymous delegates anyhow, so there isn't any performance benefit or drawback in either direction.

The Predicate<T> delegate represents a method that, given an instance of T, will return true or false. This is very useful when filtering a list for elements that satisfy specific criteria. Again, this is a very powerful construct but may take some time to get used to. For example, consider if you had a list of people and wanted to extract any who are between the ages of 20 and 30. Without predicates, you might write code that looks like this:

List<Person> allPeople = /*...*/;
List<Person> matches = new List<Person>();
foreach (Person p in allPeople)
{
    if (p.Age >= 20 && p.Age < 30)
        matches.Add(p);
}

It is quite obvious what's happening here, and it is easy to read, but it is nonetheless fairly boilerplate code to write. You just create a list to hold your results, iterate through an existing list, and add anything that satisfies the criteria. Well, this is precisely what FindAll(Predicate<T>) does:

List<Person> allPeople = /*...*/;
List<Person> matches = allPeople.FindAll(
    delegate(Person p) { return p.Age >= 20 && p.Age < 30; });

This code has the same result as the example above, albeit with a more concise syntax. It traverses the list's contents, applies the predicate against each element, and if the result is true adds the element to its result list. It then returns the list of all elements that matched. This construct makes querying a list of objects quite similar to the way you would query a database table.

There are several other predicate-based operations available on List<T>. Find(Predicate<T>) and FindLast(Predicate<T>) return the first element they find that matches the predicate, starting from the beginning of the list or the end, respectively. FindIndex(Predicate<T>) and FindLastIndex (Predicate<T>) are similar, except that they return the index of the matching element instead of the element itself. Two methods exist that return a bool to indicate that elements of the list satisfy a given predicate: Exists(Predicate<T>) just checks whether at least one item satisfying the predicate was found in the list, while TrueForAll(Predicate<T>) will return true only if the predicate is true for every element within the list. Lastly, using the RemoveAll(Predicate<T>) method, you can quickly remove every element from a list that satisfies the given predicate.

Standard Dictionary (Dictionary<TKey,TValue>)

This type is the standard implementation of IDictionary<TKey,TValue> and represents a collection of associated key-value pairs, stored as instances of KeyValuePair<TKey,TValue>. It also implements a variety of other weakly typed collection interfaces (outlined later in this chapter), meaning that you can interoperate with APIs that expect the older types.

Dictionary<TKey,TValue> uses a hash-table algorithm for its internal storage. You rarely need to be cognizant about this, but it does mean that your keys must be of a type that provide a suitable implementation of GetHashCode. Without doing so, this collection won't perform as efficiently as it might otherwise have, and you could even run into subtle problems. For example, if your type's GetHashCode changes after you've added it to the dictionary, your subsequent lookups will fail. Additionally, because of this algorithm used for filing new key-value pairs away, the ordering is not predictable. If you expect to read back keys from a dictionary in the same order in which you added them, or even sorted in some intuitive way, you will quickly find out that this isn't the case.

Note

A hash table is a well-known algorithm that uses individual keys' hash codes as indexes into a set of buckets. Buckets are allocated and split based on internal heuristics. Because an instance will not contain as many buckets as unique hash codes that could be returned, it will evenly group disparate things together into the same bucket. The less evenly distributed key hash codes are, the less efficient a hash table will be at both accessing and modifying its contents. This is one primary reason that it is important to follow the guidelines in Chapter 5 for implementations of GetHashCode.

There are some constructors also available for dictionary construction, other than its default constructor. The Dictionary<TKey,TValue>(int capacity) overload enables you to set the initial capacity of the dictionary, which avoids unnecessary resizes as a result of using a low default capacity (much as with lists, as noted above). Dictionary<TKey,TValue>(Dictionary<TKey,TValue> dictionary) is a standard copy constructor. There are several overrides that also take an IEqualityComparer <TKey>, allowing you to provide a custom hash-coding algorithm for the key's type instead of relying on TKey's implementation of GetHashCode.

The type also adds a few methods not found on the IDictionary interface itself. For example, you can Clear a dictionary's contents, and access the Count property, representing the number of key-value pairs it contains. TryGetValue(K key, out V value) returns a bool to indicate whether the specified key was found in the dictionary or not; if it was, the out parameter value is set to its associated value. Instead of writing the usual boilerplate code:

Dictionary<TKey, TValue> d = /*...*/;
TKey keyToLkup = /*...*/;
if (d.ContainsKey(keyToLkup))
{
    TValue val = d[keyToLkup];
    // Process the value...
}

you can write your code using TryGetValue:

Dictionary<TKey, TValue> d = /*...*/;
TKey keyToLkup = /*...*/;
TValue val = /*...*/;
if (d.TryGetValue(keyToLkup, out val))
{
    // Process the value...
}

There isn't a clear readability advantage between the two, but the TryGetValue version has the added advantage that it needn't look up the key twice in order to avoid a KeyNotFoundException.

Always-Sorted Dictionary (SortedDictionary<K,V>)

SortedDictionary<TKey,TValue> is almost identical to Dictionary<TKey,TValue>, namely a collection of associated key-value pairs of type KeyValuePair<TKey,TValue> and likewise implementing IDictionary<TKey,TValue>. As its name implies, however, this type actually makes the guarantee that its contents will always be stored and returned sorted by key. It does not use a hash-table algorithm for internal storage, instead using a balanced tree algorithm.

The sorting of keys can be controlled using a custom IComparer<TKey>; in fact, if TKey does not implement IComparable, passing an IComparer<TKey> is required (otherwise, an InvalidOperationException will be thrown at runtime). In the following example, String is used as the key type, which of course implements IComparable.

SortedDictionary<string, int> sd = new SortedDictionary<string, int>();

// Add a bunch of pairs, tracking the sequence in which we add them:
int sequence = 0;
sd.Add("some", sequence++);
sd.Add("interesting", sequence++);
sd.Add("words", sequence++);
sd.Add("to", sequence++);
sd.Add("be", sequence++);
sd.Add("sorted", sequence++);

// Now, iterate through the list:
foreach (KeyValuePair<string, int> kvp in sd)
    Console.WriteLine("{0} : {1}", kvp.Key, kvp.Value);

This just adds some string-based keys with a number that indicates the sequence in which they were added. When you run this code, it outputs the following:

be : 4
interesting : 1
some : 0
sorted : 5
to : 3
words : 2

SortedDictionary does not define many interesting members beyond what IDictionary and Dictionary already provide. Refer to the section comparisons later in this chapter for more details on supplying custom comparers for purposes of sorting.

Custom Collection Base Type (Collection<T>)

System.Collections.ObjectModel.Collection<T> was designed to be a starting point for custom implementations of IList<T>. List<T>, for example, was not explicitly designed for subclassing, and thus the experience of doing so leaves something to be desired. Collection<T> offers a well-defined set of extensibility points — protected virtual methods to which all public nonvirtual methods converge — greatly reducing the effort required to create a custom collection. It implements the various weakly typed interfaces for you. Furthermore, because it's a concrete class (i.e., not abstract) you can actually create and work with instances of it (although it is just a thin wrapper over a private List<T> field).

Note

Note that this type can be found in the System.dll assembly, not mscorlib.dll like most of the other generic collection types.

To create your own custom collection, you must subclass Collection<T> and do the following:

  • Decide which constructors you will support. Collection<T> comes with two: the default no-argument constructor and one which simply wraps an existing List<T>. Clearly, your custom collection might need more or less than this.

  • Customize behavior of methods that modify the collection. The core of this functionality is the InsertItem method. Various public methods such as Add and Insert call through to this function. Similarly, SetItem is called when setting a collection's contents via element index and RemoveItem is called when removing elements from the collection. Lastly, ClearItems is called whenever somebody invokes the public Clear method, removing all items from the underlying list. You needn't override any of these, or you can override just a few or all of them, depending on your needs.

For example, this class inherits from Collection<T> and notifies subscribes whenever an item is added to the list. For completeness, you might want to add notifications for insertion, removal, and clearing. This is left as an exercise to the reader:

using System;
using System.Collections.ObjectModel;

public class NotificationList<T> : Collection<T>
{
    public event EventHandler<ItemInsertedArgs<T>> ItemAdded;

    protected override void InsertItem(int index, T item)
    {
        EventHandler<ItemInsertedArgs<T>> handler = ItemAdded;
        if (handler != null)
        {
            handler(this, new ItemInsertedArgs<T>(index, item));
        }
        base.InsertItem(index, item);
    }
}

public class ItemInsertedArgs<T> : EventArgs
{
    public int Index;
    public T Item;

    public ItemInsertedArgs(int index, T item)
    {
        this.Index = index;
        this.Item = item;
    }
}

Somebody can simply subscribe to the NotificationList's ItemAdded event, and any method calls that end up invoking InsertItem will trip the event handler.

Read-Only Collection (ReadOnlyCollection<T>)

System.Collections.ObjectModel.ReadOnlyCollection<T> is a very simple implementation of IList<T> and represents a collection that cannot be modified. It actually hides those members from IList<T> and ICollection<T> (through private interface implementation), which can alter its contents. For example, Add can only be called by casting to either IList<T> or ICollection<T>. This will just result in a NotSupportedException being thrown at runtime.

This type is useful especially in public APIs where you would like to hand out a list that callers cannot modify. It is a good alternative returning an expensive-to-make copy, as ReadOnlyCollection<T> actually just thinly wraps an instance of IList<T> (it doesn't copy it). Note that, for example, List<T>'s AsReadOnly method returns a ReadOnlyCollection<T>.

LIFO Stack (Stack<T>)

A Stack<T> is a traditional last in, first out (LIFO) data structure (a.k.a. first in, last out). It derives from Collection<T>. All insertions and removals into the stack occur at the same "end" of the list, that is, the top a.k.a. the front. Whenever you remove an item, you will retrieve the most recent addition to the stack. You modify a stack by pushing items onto the top using the Push method. You then pop the topmost item off using the Pop method. Popping has the consequence of not only returning but also removing the top element. The Peek method enables you to take a look at the top element without actually removing it.

Figure depicts a stack.

Image from book
Figure: A stack. (a) Initial elements inserted—10, 15, 3, and 25, (b) pushing 99 onto the stack, (c) after the push, and (d) popping 99 off the top of the stack.

Because of the way stacks grow upward as you add items and shrink back down toward the bottom when removing them, they are good data structures for "remembering" things that must be dealt with later in the reverse order they were encountered. In fact, many algorithms rely on the idea of a stack (function calls on your physical stack are one example).

This simple C# example makes use of a Stack<T>:

Stack<int> s = new Stack<int>();

s.Push(10); // Contents: 10
s.Push(15); // Contents: 15, 10
s.Push(3);  // Contents: 3, 15, 10

Console.WriteLine(s.Peek()); // Prints '3'

Console.WriteLine(s.Pop()); // Prints '3'; Contents: 15, 10
Console.WriteLine(s.Pop()); // Prints '15'; Contents: 10
Console.WriteLine(s.Pop()); // Prints '10'; Contents: <empty>

This code simply illustrates that pushing and popping elements into and out of a stack does in fact occur in LIFO order.

FIFO Queue (Queue<T>)

Queue<T> is another classic data structure, where all elements are first in, first out (FIFO). This is sometimes called last in, last out. Similar to Stack<T>, Queue<T> is an implementation of Collection<T> with specialized semantics around insertion and deletion of elements. In plan words, items are inserted and removed from opposite "ends" of the queue. Specifically, the first item added to your queue will also be the first item removed, the second item added will be the second removed, and so forth. We say that the queue has a front and back (a.k.a. beginning and end), and that we enqueue elements onto the back using the Enqueue method, and dequeue elements from the front using the Dequeue method. As with Stack, you can Peek at the beginning element without actually removing it.

Figure illustrates a queue.

Image from book
Figure: A queue. (a) Initial elements inserted—10, 15, 3, and 25, (b) enqueueing 99 onto the front of the queue, (c) after the enqueue, and (d) dequeueing 10 off the back of the queue.

Queues are aptly named, as the structure's semantics map relatively straightforward to the general use of the word. They are good for situations in which you maintain the ordering of items inserted, and you consume them in the order in which they arrived. For example, using similar code from above, consider this snippet of code.

Queue<int> q = new Queue<int>();

q.Enqueue(10); // Contents: 10
q.Enqueue(15); // Contents: 15, 10
q.Enqueue(3);  // Contents: 3, 15, 10

Console.WriteLine(q.Peek()); // Prints '10'

Console.WriteLine(q.Dequeue()); // Prints '10'; Contents: 3, 15
Console.WriteLine(q.Dequeue()); // Prints '15'; Contents: 3
Console.WriteLine(q.Dequeue()); // Prints '3'; Contents: <empty>

This illustrates quite succinctly the difference between LIFO (stack) and FIFO (queue).

LinkedList<T>

The .NET Framework provides a doubly linked list class, yet another classic collection data structure. The LinkedList<T> class — found in the System.dll assembly — stores a collection of nodes of type LinkedListNode<T>, each of which is responsible for maintaining a pointer to the next element in line. That is, the list itself maintains only a pointer to front and back elements (a.k.a. head and tail) — represented as Front and Back properties on the list type — and each node has a next and previous pointers to the nodes surrounding it in the list.

A linked list doesn't allocate sequential storage for its elements as an array or traditional list does but rather uses pointers to chain individual elements together. This makes linked lists characteristically quick for insertion and deletion operations but slower for traversal. To access each element requires an additional pointer dereference. Furthermore, linked list elements point to other areas of memory, which can end up being far from the source node in memory (resulting in poor locality). Lastly, linked lists do not handle random access well, resulting in a longer seek time to locate specific elements within a list. This is because accessing an arbitrary node within a list requires that you start at either the front or back, and work inward, trying to locate it one node at a time. The average time to find an element grows linearly with respect to the size of your linked list, so for very large lists this quickly becomes horribly inefficient.

Refer to Figure for an illustration of a linked list.

Image from book
Figure: Illustration of a linked list with elements inserted—10, 15, 3, 25, and 99.

When working with LinkedListNode<T> instances, you will find that they contain three interesting properties: Next, Previous, and Value. Next and Previous return references to the neighboring nodes. Next will be null if you've reached end of a list, while Previous will similarly be null when at the beginning. Value accesses the actual value that this node wraps (of type T). So, for example, if you have a LinkedList<string>, then Value will return the string to which a node refers.

You can insert a node between two elements using the AddAfter(LinkedListNode<T>, ...) and AddBefore(LinkedListNode<T>, ...) methods. AddFirst and AddLast are simply shortcuts to list.AddBefore(list.First, ...) and list.AddAfter(list.Last, ...), respectively. Inserting nodes will patch up surrounding pointers for you. Remove takes a node out of a list and patches up its surrounding nodes so that their pointers are correct.

Consider this example code:

LinkedList<int> list = new LinkedList<int>();

list.AddFirst(10);             // Contents: ‡10
list.AddLast(15);              // Contents: ‡10‡15
list.AddLast(3);               // Contents: ‡10‡15‡3
list.AddLast(99);              // Contents: ‡10‡15‡3‡99
list.AddBefore(list.Last, 25); // Contents: ‡10‡15‡3‡25‡99

LinkedListNode<int> node = list.First;
while (node != null)
{
    Console.WriteLine(node.Value);
    node = node.Next;
}

This snippet creates the list illustrated in Figure by adding nodes in specific positions into the list. We then enumerate over the array by grabbing a pointer to its First node and walking through the chain using the Next property of each node. This code would go into an infinite loop for a circular linked list; that is, a linked list where the Tail.Next pointed to First and where First.Previous pointed to Last. The LinkedList<T> class does not maintain its internal state as a circular linked list, so this is not a problem.

Weakly Typed Collections

As noted above, before the generic-based collection APIs came onto the scene in 2.0, the only collections available were the weakly typed ones in System.Collections. You'll likely run into situations where older APIs expect instances of these types, and thus it doesn't hurt to have a cursory awareness of them. Moving forward you should make liberal use of the generics types, as the loss of typing necessary with the weakly typed collection APIs is quite problematic and should be avoided. Most of these types offer operations that are nearly identical to their strongly typed counterparts, so in the following section I will only point out where they differ.

Base Interfaces

Exactly as in the System.Collections.Generic base interfaces, there are IEnumerable, IEnumerator, ICollection, IList, and IDictionary interfaces (sans generic parameters). These have slight differences from the generic equivalents. Obviously, the most significant difference is that they deal in terms of System.Objects instead of types supplied via a generic argument.

There is also an IHashCodeProvider interface in System.Collections, enabling implementations to supply a custom hash code for objects. This is done by implementing the method int GetHashCode(object obj) and can be used with hash-code-based data structures where a type's GetHashCode implementation is insufficient. This functionality has been collapsed into the IEqualityProvider<T> interface in the generics collection library.

Implementations

Similarly to the collections interfaces, for nearly every implementation type in System.Collections there is a strongly typed equivalent in System.Collections.Generic.

In summary:

  • ArrayList is an implementation of IList and is very similar to List<T>, differing in very few areas not even meriting a mention.

  • Hashtable is a hash-table-based dictionary that implements the IDictionary interface. Its strongly typed counterpart is Dictionary<TKey,TValue>, and it similarly differs in very few aspects. Just as Dictionary<TKey,TValue> stores key-value pairs as instances of KeyValuePair<TKey,TValue>, Hashtable stores its entries as instances of the DictionaryEntry type.

  • Much like SortedDictionary<T>, which stores its contents in sorted order, it has a SortedList class. Despite its misleading name, this is actually an implementation of IDictionary and contains a collection of key-value pairs like Hashtable;

  • There are weakly typed versions of Stack and Queue, which are special LIFO and FIFO data structures, respectively. As expected, they are nearly identical to their generics-version counterparts.

  • Just as Collection<T> provides a convenient starting place for custom collection implementations, the weakly typed CollectionBase class services this same purpose. This type implements IList and provides a set of protected virtual methods that you can override. These are all of the form OnEvent and OnEventComplete, enabling you to perform actions before and after certain events (i.e., Clear, Insert, Remove, Set, or Validate) occur.

  • Lastly, the BitArray class is one without a counterpart in the generics APIs. It enables you to store a bit array, represented as a set of bools. There is a set of convenient operations that perform bitwise operations on its contents, such as And, Or, Xor, and Not. This type is used rather infrequently, and its constructors and various methods are self-explanatory. We won't go into further detail here.

As you can see, the step toward generics is one without nearly any signal loss when compared with the older weakly typed collections APIs.

Comparability

There are many ways to perform comparisons between objects of the same type. We've mentioned them throughout the above text in passing. For instance, the sorting algorithms on Array and List<T> require that comparability is well defined on a set of objects in order to proceed. Many types take care of encapsulating their own comparison algorithm via the IComparable<T> and IEquatable<T> interfaces. Sometimes, however, you must implement your own comparison algorithm "from the outside," which can be accomplished using a custom comparer. This section discusses the available means of performing comparisons and why you might choose one over the other.

Comparison Commonalities

Regardless of the specific comparable mechanism below that you choose, the contract remains the same: a compare method is one which, when given two objects (usually of the same type), returns an integer to indicate the proper ordering relative to each other. Specifically, this method must return:

  • 0 if the two objects are equal (i.e., sorting or compare order relative to each other is indifferent).

  • A negative integer if the first object should be ordered before the second, or

  • A positive integer if the first object should be ordered after the second.

This logic can take the form of a method that takes two objects as arguments, or alternatively one that assumes that the target of the invocation is the first while the second is supplied as a single argument.

Encapsulated Comparisons (IComparable<T> and IEquatable<T>)

System.IComparable<T> — and its weakly typed counterpart IComparable — enable you to supply a natural compare algorithm. A natural compare is the default comparison logic to be used to determine relative ordering between instances of a type. It will be used by various infrastructure components (e.g., collections) by default unless overridden by one of the mechanisms specified below. For example, a call to List<T>.Sort will use the IComparable<T> implementation to sort its contents (as will a call to Array.Sort).

All of the .NET Framework base primitives have the notion of comparability. This is because they are things like numbers and strings, for example, which have a widely accepted natural ordering. Therefore, they all implement IComparable<T> and IComparable. For complex data types, however, determining some logical natural ordering to use can be relatively subjective. As an example, given a Person type with properties such as Name, Age, BillingAddress, Company, and so on, which fields would you use for sorting? You could choose to sort by Name, but BillingAddress might be more common to some people. This is where a custom comparer (below) might make more sense.

Implementing these interfaces involves just a single method each:

namespace System
{
    public interface IComparable<T>
    {
        int CompareTo(T other);
    }
    public interface IComparable
    {

        int CompareTo(object obj);
    }
}

Both CompareTo methods simply implement the contract outlined above. They both take a single argument, the object to compare this against.

For strongly typed equality comparisons — which are usually defined on value types only (relying on default reference-equality semantics for reference types) — an interface IEquatable<T> is available:

namespace System
{
    public interface IEquatable<T>
    {
        bool Equals(T other);
    }
}

There is no weakly-typed equivalent because Object already offers a virtual Equals method with an Object parameter. Usually if you implement both IComparable<T> and IEquatable<T>, you can simply make Equals shunt to CompareTo, that is, as in return CompareTo(other) == 0.

Listing 6-2 shows an example of a Person type that implements each of these interfaces.

Listing 6-2: A type that implements comparison and equality interfaces
Image from book
class Person : IComparable<Person>, IEquatable<Person>, IComparable
{
    public string Name;
    public int Age;
    public string Company;

    // Implements IComparable<Person>.CompareTo:
    public int CompareTo(Person other)
    {
        if (other == null)
            return -1;
        return this.Name.CompareTo(other.Name);
    }

    // Implements IComparable.CompareTo:
    public int CompareTo(object obj)
    {
        Person p = obj as Person;
        return CompareTo(p);
    }

    // Implements IEquatable<Person>.Equals:
    public bool Equals(Person other)
    {
        return ((IComparable<Person>)this).CompareTo(other) == 0;
    }

    // Overrides Object.Equals:

    public override bool Equals(object obj)
    {
        Person p = obj as Person;
        return Equals(p);
    }
}
Image from book

In this example, we made the natural sorting of Person use the Name property. This is nice because when used with the Array.Sort method, for example, it will just work (assuming that the user wanted to sort by name). If the sort order is not what the user of the type desired, he or she can then use a custom comparer. This is detailed in the following section.

Custom Comparers (IComparer<T> and IEqualityComparer<T>)

The IComparer<T> and IComparer interfaces reside in the System.Collections.Generic and System.Collections namespaces, respectively, and both enable you to supply a compare algorithm. This can be used either to override a default IComparable implementation on a type or to supply one where none exists.

Once you have implemented the interfaces, you must instantiate an instance and pass it to the receiving method. They are quite simple to implement. And as you would expect, the Compare methods abides by the contract outlined above for comparison methods:

namespace System.Collections.Generic
{
    public interface IComparer<T>
    {
        int Compare(T x, T y);
    }
}

namespace System.Collections
{
    public interface IComparer
    {
        int Compare(object x, object y);
    }
}

This mechanism is useful, for example, in situations where you need to supply multiple reusable ways to compare the same type of objects. Your comparison algorithm can then depend on state during its comparison routine. For example, you could provide both ascending and descending versions available for several of a type's properties; an example of this technique is shown in Listing 6-3.

Listing 6-3: Custom comparer, ascending and descending
Image from book
enum SortDirection
{
    Ascending,
    Descending
}

class PersonAgeComparer : IComparer<Person>

{
    private int modifier;

    public PersonAgeComparer(SortDirection direction)
    {
        if (direction == SortDirection.Ascending)
            modifier = 1;
        else
            modifier = -1;
    }

    public int Compare(Person x, Person y)
    {
        return x.Age.CompareTo(y.Age) * modifier;
    }
}
Image from book

Similarly to comparison, an IEqualityComparer<T> class exists to enable you to supply equality algorithms, for example, for searching lists for an element with List<T>.IndexOf. It also provides a custom hash-code implementation for objects, which can likewise be used to override the function used when hashing keys for a hash table. There is a corresponding weakly typed IEqualityComparer class too:

class System.Collections.Generic
{
    public interface IEqualityComparer<T>
    {
        bool Equals(T x, T y);
        int GetHashCode(T obj);
    }
}

class System.Collections
{
    public interface IEqualityComparer
    {
        bool Equals(object x, object y);
        int GetHashCode(object obj);
    }
}
Delegate-Based Comparison (Comparison<T>)

This is a delegate whose signature is int xxx(T x, T y) and implementation contract are the same as in the comparisons detailed above. It allows for lightweight comparison methods without having to implement an entire interface (e.g., IComparer<T>) and is also great for situations where you can throw together a quick comparison routine using C#'s anonymous delegate syntax.

For example, say that we just wanted to quickly sort a list of people by their Company property. We could implement this using an IComparer<T>:

class PersonComparer : IComparer<Person>
{
    public int Compare(Person x, Person y)

    {
        return x.Company.CompareTo(y.Company);
    }
}

//...

List<Person> list = /*...*/;
List.Sort(new PersonComparer());

Or we could do it more concisely using the Comparison<T> delegate:

List<Person> list = /*...*/
list.Sort(delegate(Person x, Person y)
    { return x.Company.CompareTo(y.Company); });

Notice that the Comparison<T> type declaration for our anonymous delegate is inferred by the C# compiler. This code results in sorting the list's contents using each person's name. Being able to quickly throw together a delegate instead of implementing an entire interface is quite powerful.

Functional Delegate Types

As discussed briefly above, there is a set of functional-style delegate types now available in the System namespace. They have been deeply integrated into the collections APIs (i.e., List<T> instance methods and Array static methods) but can be used for any other situation where you intend to pass functions around as first-class constructs.

Action Functions (Action<T>)

This delegate's signature is void xxx(T obj) (that is, it returns void and accepts a single argument of type T). As its name implies, its intention is to take action based on its input. This can be something that modifies the contents of the parameter, or even something that interacts with other objects. You have seen it used in List<T>'s ForEach method, for example, enabling you to take an action for each item in its contents. However, it can be used anytime an action needs to be executed as a result of some condition. For example, consider the list class in Listing 6-4, enabling execution of an arbitrary action whenever something is added to it.

Listing 6-4: An action collection
Image from book
class ActionCollection<T> : Collection<T>
{
    private Action<T> action;

    public ActionCollection(Action<T> action)
    {
        this.action = action;
    }

    protected override void InsertItem(int index, T item)
    {
        action(item);

        base.InsertItem(index, item);
    }
}
Image from book

The ActionCollection<T> type provides an extensibility point without any further subclassing being necessary. For example, consider if we wanted to print out list additions to the console. Now we could do this with some simple code, as follows:

ActionCollection <string> ac = new ActionCollection<string>(
    delegate(string s) { Console.WriteLine(s); });
ac.Add("This will be printed to the console");

You could use similar techniques to trigger insertions or deletions from a database based on list modification, modify the state of an element before it gets inserted, or keep track of modifications in a transaction log of sorts, just to name a few examples.

Comparison Functions (Comparison<T>)

This delegate has a signature of int xxx(T x, T y) and is supposed to compare the two operands for ordering purposes. The method should follow the general comparison contract, as outlined further in the previous section on comparability. Specifically, if x and y are equal, the return value should be 0; if x should come before y, the result should be a negative number; or if x comes after y, the result should be a positive number. This delegate is most often used when sorting collections of objects. See the section on comparisons above for more details.

Conversion Functions (Converter<TInput,TOutput>)

A converter is a delegate with a signature of TOutput xxx(TInput from) and whose purpose is to convert an object of type TInput into an object of type TOutput. This was illustrated earlier in this chapter in the context of converting entire lists into lists of a differing type, in other words a map operation. The System.Convert class has a whole set of static out-of-the-box conversion methods available that match this delegate's signature.

This is not its only use, however. For example, you may want to enable general transformation to immutable objects by enabling a Converter<TInput,TOutput> to be passed in to an operation that then gets applied. There's also nothing to say that TInput and TOutput must be of a different type. When dealing with immutable types in particular, any transformation must actually make a new instance rather than changing the target, illustrating a case where TInput and TOutput will be the same.

For a similar example to ActionCollection<T> above, imagine a case where a list needs to convert each object before adding to its contents. Listing 6-5 illustrates a type that enables this.

Listing 6-5: A converter-based collection
Image from book
class ConverterCollection<T> : Collection<T>
{
    private Converter<T,T> convert;

    public ConverterCollection(Converter<T,T> convert)

    {
        this.convert= convert;
    }

    protected override void InsertItem(int index, T item)
    {
        base.InsertItem(index, convert(item));
    }
}
Image from book

As an example of where this could be helpful, consider that you have a list of strings and would like to ensure anything added to it is stored in all uppercase. Well, using this new ConverterCollection<T> type, it is quite simple.

ConverterCollection<string> c = new ConverterCollection<string>(
    delegate (string s) { return s.ToUpper(); });
c.Add("Hello");
c.Add("World!");

Instead of adding items as they are supplied, the result of converting the item is actually stored. In this example, "HELLO" and "WORLD!" are inserted rather than the actual supplied values "Hello" and "World!" passed in by the user of the class. This has a variety of other interesting uses.

Predicate Functions (Predicate<T>)

A predicate is a function that, given an instance of type T, returns true or false. It is used primarily to check if input meets certain conditions and to take action based on the outcome. You will find it used quite heavily on the Array and List<T> classes to perform generic operations on a collection of things. It can be used for other purposes, however, in general-purpose algorithms that rely on filtering (e.g., query processors, data import and processing systems).

Further Reading

The Art of Computer Programming, Volumes 1-3, Third Edition; Donald E. Knuth; ISBN 0-201-89683-4, 0-201-89684-2, 0-201-89685-0; Addison-Wesley, 1997.

Introduction to Algorithms, Second Edition; Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein; ISBN 0-262-03293-7; MIT Press, 2001.

STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library, Second Edition; David R. Musser, Gillmer J. Derge, Atul Saini; ISBN 0-201-37923-6; Addison Wesley, 2001.

Professional .NET 2.0 Generics; Tod Golding; ISBN 0-764-55988-5; Wrox, 2005.



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows