Arrays






Arrays

An array is an ordered sequence of homogeneous elements. By homogenous, I mean that they all share some common base type; in some cases, this means System.Object. An array can store any type polymorphically compatible with its declared type; so, for example, an Object[] array can contain instances of any type that has Object in its ancestor hierarchy (which is to say any type, of course). An array has a fixed size — called its length — set at instantiation time. Its elements are indexable, enabling you to store and retrieve information using a numeric index within the array's range (an array's range is from 0 to its length - 1). For example, I can construct an array of Strings and subsequently index into it as follows (in C#):

string[] myStringArray = new string[10]; // 10-element array
myStringArray[0] = "A";
// ...
string zerothElement = myStringArray[0];
Console.WriteLine(zerothElement);

This has the effect of allocating a 10-element array. We then store the string "A" into the 0th element position. (The first element of arrays on the CLR is located in the 0th position, a carryover from C-style languages.) Later on, we read back the 0th element for purposes of printing to the console.

Each array has a rank which indicates its number of dimensions. Dimensions enable an array to store matrix-like sequences that are indexed through coordinates. For example, a one-dimensional array has a rank of one and contains a single sequence of elements indexed by a single number. An array of rank two is a matrix, and it is indexed by an row and column coordinate, each within the range of its respective dimension.

int[] oneDimension = new int[10];
oneDimension[5] = /*...*/;
int[,] twoDimension = new int[10,10];
twoDimension[2,5] = /*...*/;

In this snippet of code, oneDimension is a 10-element single dimension array. twoDimension, however, is a 10x10 2-dimensional array, with 100 total elements (10*10). To work with its elements, we must supply both the row and column of the element that we'd like. We discuss the concept of multidimensional arrays further below.

Single-Dimensional Arrays

Single-dimensional arrays are often referred to as vectors. In Chapter 3, you saw the various IL instructions used to work with arrays. This included newarr to instantiate a new array, ldelem and stelem to load and store elements in an array, and ldlen to access the length of an array. Languages, including C#, typically offer syntax to work with arrays. In the absence of that, the System.Array type can be used for dynamic array manipulation (discussed below).

As an example of such syntax, consider the following operations:

  • Instantiating a new array of type T with n elements: T[] array = new T[n]. For example, an array of 20 integers: int[] array = new int[20].

  • Obtaining the length (of type Int32) of an array: array.Length.

  • Accessing a specific element i of an array: int x = array[i], or, alternatively, setting element i of an array to a value x: array[i] = x.

  • Lastly, you can enumerate the contents using the IEnumerator<T>, for example as in IEnumerator<int> = array.GetEnumerator(). This can be used with C#'s foreach syntax, for example foreach (int x in array) { /*...*/ }.

Additional operations can be performed using the Array type.

Multidimensional Arrays

The CLR allows has first-class support for arrays of multiple dimensions, of which there are two types. This section is not applicable to single-dimension arrays.

Rectangular Arrays

The first type of multidimensional array is called a rectangular array. It gets its name from the way in which it is laid out in memory as contiguous rows and columns (which, when drawn on a piece of paper, looks like a rectangle). This makes them very efficient to access and offers good memory locality when working with them (all of the elements are stored together in memory).

Note

The analogy of rows and columns admittedly breaks down when dealing with arrays with a rank greater than two. It is actually quite rare to have arrays of such large ranks, but nonetheless the CLR supports them. Drawing them on paper is quite difficult.

For each dimension of a rectangular array, its length is the same as all other dimensions at the same depth. In other words, each row has the same number of columns. Rectangular arrays are the only true multidimensional type available on the CLR; jagged arrays give the appearance of being multidimensional through language syntax but aren't really (you will see why in a moment).

You can visualize a rectangular array as a rectangle for 2-dimensional arrays (a cube for 3-dimensional, and so forth), while a jagged array can vary in size per dimension. So, for example if you need to store a 2-dimensional array of five rows and eight columns, a rectangular array is perfect. If the number of columns varies per row however, a jagged array is a better choice. This point is illustrated in the diagrams Figure and Figure. The former figure depicts a 58 rectangular array.

Image from book
Figure: A 58 rectangular array.
Image from book
Figure: Illustration of a jagged array in memory.

You can see here that all rows have the same number of columns, and any individual slot can be accessed directly by using an index [row,column]. In other words, data is accessed in row-major order. It should be made clear that you cannot have a varying number of columns per row, a situation which jagged arrays do indeed support. To construct an array like that depicted in Figure, you can use the following C# declaration:

int[,] rectArray = new int[5,8];

This allocates a block of memory, each element of which is initialized to default(T), where T is the type of the array's elements. In this case, this is the integer 0. For arrays of reference types, each slot would contain the null constant. If you'd like to create an array with a specific set of contents, you can do so inline with C# as follows:

int[,] rectArray = new int[,] {
    { 10, 10, 1999 },
    { 11, 25, 2005 },
    { 06, 30, 9999 }
};

This code results in rectArray pointing to a 3x3 array, with 10 in position [0,0], 10 in [0,1], 1999 in [0,2], 11 in [1,0], and so forth. To loop through its contents, print them out, and verify that this is the case, you can use this code:

for (int i = 0; i < rectArray.GetLength(0); i++)
    for (int j = 0; j < rectArray.GetLength(1); j++)
        Console.WriteLine("[{0},{1}] = {2}", i, j, rectArray[i, j]);

We rely on the GetLength(int dimension) method, defined on the Array class, to provide an integer length of the specified dimension. Notice that we must use this rather than the ordinary Length property to get the semantics we desire. The Length property actually returns the array's total element count, which in the case of the array above, is 9 (3 rows * 3 columns).

Because rectangular arrays are simply a contiguous set of elements with a friendly syntax and set of methods to make working with them more intuitive, you can iterate over the raw underlying contents using the C# foreach statement:

foreach (int x in rectArray)
    Console.WriteLine(x);

This should simply highlight the fact that rectangular arrays are simple syntactic sugar.

Jagged Arrays

The second type of array is what is referred to as a jagged array. This is not a true multidimensional array because it's merely an array of array references (recursively ad infinitum). With jagged arrays individual dimensions at the same depth can have any arbitrary size regardless of other dimensions at the same depth. These can themselves also contain other arrays, recursively.

Jagged arrays always have a rank of 1 from the point of view of the CLR; the fact that the references to other arrays are present is opaque to the runtime. These other arrays can either be other jagged arrays of rank 1 (which can refer further to other arrays, and so on) or the leaf level arrays of rank 1 which contain the real elements. We say that jagged arrays have a conceptual rank (e.g., a 3-dimensional jagged array can be called of rank 3, even though it is just an array of rank 1 whose contents point to another jagged array of rank 1 whose contents point to an array of rank 1 that contains the elements). If you use the Array.Rank property, it will always report back a rank of 1 for jagged arrays.

Because jagged arrays are not contiguous blocks of memory, an actual element of an array will require n - 1 dereferences, where n is the conceptual rank of the array. In other words, to access element [0][5] in a 2-dimensional array, you will be required to traverse one reference to get to the array that contains the real elements, and then access the appropriate element index. Figure illustrates a jagged array data structure in memory.

Notice that the sizes of the second dimension change based on which single dimension index you are examining. This can be a 0-sized array or even null, as shown above in the second outer array index. This should really drive home the point that these are not multidimensional arrays in memory.

Interestingly, jagged arrays use a traditional syntax that most people are more familiar with. Thus, they often end up being used where rectangular arrays would have been more appropriate. You declare a jagged array in C# as follows:

int[][] jaggedArray = new int[5][];

Notice here that we specify only the size of the first dimension in the instantiation, not the second. By default, all of the first dimension's elements are set to null. To instantiate elements to compose the second dimension (e.g., as shown in Figure), the following code is necessary:

jaggedArray[0] = new int[8];
jaggedArray[1] = new int[2];
jaggedArray[2] = null; // This line is not necessary, null is the default...
jaggedArray[3] = new int[8];
jaggedArray[4] = new int[1];

Now all of the slots have been created, and each new array contains a set of elements equal to default(T), where T is the type of element the array stores. Like the rectangular array example above, this means that each element has been set to the integer 0. Having that null reference stuck in there makes it hard to walk through its contents reliably (you'd have to check for null during your foreach statement), so let's set index 2 to an empty array with jaggedArray[2] = new int[0] and illustrate how to traverse its contents:

jaggedArray[2] = new int[0];

// Walk each dimension of our array:
for (int i = 0; i < jaggedArray.Length; i++)
    for (int j = 0; j < jaggedArray[i].Length; j++)
        Console.WriteLine(jaggedArray[i][j]);

Of course, you could also use C#'s foreach statement to iterate through the contents. Notice that the syntax and effect are slightly different from what is shown above for rectangular arrays. You are responsible for manually traversing the intermediary arrays to get to the actual elements:

foreach (int[] a in jaggedArray)
    foreach (int x in a)
        Console.WriteLine(x);

You can also declare an array's contents inline during construction slightly differently than you would with a rectangular array:

int[][] jagged2 = new int[][] {
    new int[] { 10, 10, 1999 },
    new int[] { 11, 25 }
};

Notice that you must explicitly construct the inner arrays using typical array construction syntax. And, as is the case with all jagged arrays, your inline construction is permitted to specify a varying number of elements per inner array.

Base Class Library Support (System.Array)

An instance of any array implicitly derives from System.Array. When you work with arrays, the CLR will usually handle operations on them internally using special IL opcodes; however, certain categories of arrays and operations actually end up relying on the System.Array type to perform the work. Because concrete arrays subclass Array, there are several benefits that you gain from polymorphism, for example free interoperability with the weakly typed and generic collections APIs.

Collections Interoperability

First and foremost, Array gives all arrays the capability to integrate seamlessly with both the weakly typed and strongly typed generic collections discussed later on in this chapter. For example, because Array implements IList, these lines of code are perfectly valid:

int[] arrayOfInts = new int[] { 10, 99, 50 };
Array a = (Array)arrayOfInts;
IList list = (IList)arrayOfInts;
ICollection collection = (ICollection)arrayOfInts;
IEnumerable enumerable = (IEnumerable)arrayOfInts;

This means that you can pass in an array to any API that expects an IList, ICollection, or IEnumerable type. Likewise, you can also integrate with APIs that expect ICollection<T> or IEnumerable<T> types. Using these types avoids needlessly boxing elements of an array simply in order to pass the array as an IEnumerable, for example. This code is similarly valid:

int[] arrayOfInts = new int[] { 10, 99, 50 };
ICollection<int> collection = (ICollection<int>)arrayOfInts;
IEnumerable<int> enumerable = (IEnumerable<int>)arrayOfInts;

You'll likely notice that the Array class does not actually implement any generic collection interfaces. The CLR does, however, implement them on concrete array types, such as int[], for example.

Common Array-Based Tasks

There are a set of common operations that all arrays expose, as a result of specific methods inherited from Array Regardless of whether you have a reference to an Array, a string[], or an int[], for example, you can make use of them. In addition to these methods, the Array type makes a set of useful utility static methods available. We'll briefly take a look at these in this section. Most are very simple — for example, calling Clear to zero out an array's contents — and thus this discussion is not comprehensive.

Cloning an Array

The Array.Clone instance method will return you a new array (of type Object[], obviously requiring casting). It is initialized to reference the same contents as the target array. Any references to objects in the array will still point to the same objects on the GC heap. This is useful when handing callers of a public API a copy of an array where you do not want them to alter the contents behind your back. This is a relatively common yet subtle to discover problem. For example, consider this public method:

private readonly String[] monthConstants = new String[] {
    "January", "February", "March", "April", "May", "June",
    "July", "August", "September", "October", "November", "December"
};

public String[] Months
{
    get { return monthConstants; }
}

The author of Months may think that because monthConstants is marked readonly, it's safe to hand out a reference to it. Unfortunately, this is wrong! readonly just indicates that, in this example, the reference monthConstants can't be updated to refer to a different array. This is the same problem any mutable data structure would have. A caller can still go ahead and change the contents of the array by setting individual elements:

String[] months = Months;
months[3] = "Not-April";

Now anybody who inspects the contents of monthConstants will see "Not-April" instead of "April" in the array's third index. There are two possible solutions to this problem. If your users need a mutable array and you simply don't want their updates to affect your private state, you can return a "tearoff" as follows:

public String[] Months
{
    get { return (String[])monthConstants.Clone(); }
}

Note that cloning isn't cheap. It allocates an entirely new array and has to copy the data elements from the original array to the new one. This could have the effect of impacting performance when used in tight loops, for example; this is especially problematic with properties, where the user of your API is likely to conclude that accessing properties is cheap enough to do, for example, inside a tight loop.

If the caller of your API really doesn't need to modify the array returned, then you should just use the static Array.AsReadOnly<T>(T[]) method. This returns a ReadOnlyCollection<T>, which cannot be modified:

public IEnumerable<String> Months
{
    get { return Array.AsReadOnly<String>(monthConstants); }
}

Notice that this approach requires that we change the return signature of the method. This demonstrates why it is important to get these things correct the first time around — if this mistake were made in a public reusable API, it would be difficult to change later on after people had taken dependencies on this method in their code. Furthermore, it's wise to consider caching the read-only collection, especially if the underlying monthConstants array will never change.

Copying an Array's Elements

There are several ways to copy elements from one array to another. The Array class has some static helper methods to do this in addition to two instance-based APIs. The static Copy methods take a sourceArray and destinationArray array references to indicate from and to where copying is to occur. The three-parameter overloads take a length parameter to indicate the total number of elements to copy, and the five-parameter overloads additionally take a sourceIndex parameter, indicating at what source position to begin the copying. By default, Copy will copy from the beginning of the source to the beginning of the target array. Alternatively, you can pass in a destinationIndex to control at what position elements are copied to in the destination. There is also an instance CopyTo method, which is just a shortcut to these static members, passing this as the source array.

In summary, the relevant methods are:

using System;
public class Array
{
    // Static copy methods
    public static void Copy(Array sourceArray, Array destinationArray,
        int length);
    public static void Copy(Array sourceArray, Array destinationArray,
        long length);
    public static void Copy(Array sourceArray, int sourceIndex,
        Array destinationArray, int destinationIndex, int length);
    public static void Copy(Array sourceArray, long sourceIndex,
        Array destinationArray, long destinationIndex, long length);

    // Instance copy methods
    public void CopyTo(Array array, int index);
    public void CopyTo(Array array, long index);
}

For example, given two arrays:

int[] srcArray = new int[] { 1, 2, 3 };
int[] destArray = new int[] { 4, 5, 6, 7, 8, 9 };

We can copy from source to destination in a number of ways. The first two lines are equivalent and overwrite the first three elements: 4, 5, and 6; the third line overwrites the last three elements: 7, 8, and 9; lastly, the fourth line overwrites the three elements starting at element 2 in the destination: 6, 7, and 8:

Array.Copy(srcArray, destArray, srcArray.Length);
srcArray.CopyTo(destArray, 0);
Array.Copy(srcArray, 3);
Array.Copy(srcArray, 0, destArray, 2, srcArray.Length);

The resulting destArrays in all four cases are equivalent to:

new int[] { 1, 2, 3, 7, 8, 9};
new int[] { 1, 2, 3, 7, 8, 9};
new int[] { 4, 5, 6, 1, 2, 3 };
new int[] { 4, 5, 1, 2, 3, 9 };

An ArgumentException will be thrown if the destination array does not have sufficient capacity to receive the copy. These methods also throw an ArgumentException if you attempt to use these on multidimensional arrays.

Dynamic Array Access

There are some methods available that enable you to work dynamically with arrays without using the language syntax exclusively. This can be useful in dynamic programming scenarios (discussed in Chapter 14). The static Array.CreateInstance method enables you to dynamically create an array instance at runtime. You supply a Type representing the type of element the resulting array may contain along with the desired length (or array of lengths if creating a multidimensional array).

An example of using this method to create a string array is:

String[] myArray = (String[])Array.CreateInstance(typeof(String), 15);

The result of this call is a 15-element array, which you can then further use as though it were created with a new String[15] statement.

The int GetLength(int dimension), int GetLowerBound(int dimension), int GetUpperBound(int dimension), object GetValue(...), and void SetValue(object value, ...) methods all enable you to further work with arrays without resorting to language support. There are a number of useful overrides for many of these, mostly for use with multidimensional arrays. They are relatively self-explanatory, and therefore we won't go into further detail here.

Static Array Helper Methods

There is a large set of convenient static helper methods on System.Array. The functions are quite similar to the helper methods available on the List<T> collections class, described further below. Instead of discussing nearly identical operations in multiple places, you should refer to the section on List<T> below to understand what methods are available. They will be mentioned briefly in passing here but not discussed in depth.

The BinarySearch method performs a typical binary search on a presorted array, returning the index where the sought after item was found. For large sorted arrays, this can be significantly faster than a linear search through the array. If the return value is negative, the item was not found in the array. Both weakly typed and strongly typed versions of this API are available (i.e., operating on Array and T[]). BinarySearch also offers overloads to enable custom comparisons, for example, by supplying an IComparer<T>. For instance:

int[] array = new int[] { 8, 2, 3, 4, 1, 3 };
Array.Sort(array);
if (Array.BinarySearch<int>(array, 9) >= 0) {
    // found
} else {
    // not found
}

The U[] ConvertAll<T,U>(T[], Converter<T,U>) method returns a new array that is the result of applying a conversion function to each element in the array. This is commonly referred to mapping or projection in other languages. T and U can be of the same or different type. For example, we might want to parse an array of strings into an array of doubles:

string[] strArray = new string[] { "75.3", "25.999", "105.25" };
double[] doubleArray = Array.ConvertAll<string, double>(
    strArray, Convert.ToDouble);

Similarly, we might just want to generate an array of a bunch of uppercased strings:

string[] strArray = new string[] { "Blah", "Foo", "Canonical" };
string[] upperArray = Array.ConvertAll<string, string>(
    strArray, delegate(string s) { return s.ToUpper(); });

You can reverse or sort the contents of an array with the Reverse and Sort methods. Both of these operations happen in place, meaning that the contents of the target array are actually modified rather than a new copy being returned. Reversal is relatively straightforward and does not need further discussion. However, I encourage you to refer to the sections on List<T> sorting and comparability found later in this chapter. The Exists, Find, FindAll, FindFirst, IndexOf, LastIndexOf, and TrueForAll methods all use predicates to locate items inside an array. Please refer to the section on List<T>'s support for identical methods for details.

Fixed Arrays

For interoperability purposes (see Chapter 11), it is sometimes helpful to lay out an array field inline within its enclosing data type rather than as a reference to a separate data type. We discussed in Chapter 3 some of the ways you can control a value type's layout. But because arrays are treated as ordinary reference type fields, they show up as a reference inside of values (rather than a sequence of bits that composes the array's contents, as with other value types, for example).

As of 2.0, you may mark an array as being fixed to mean that its size is specified at the point of declaration and will never change. The CLR responds by laying out its contents inside the value itself:

struct Foo
{
    public fixed char buffer[256];
}

The result is that unmanaged code may access array elements by offsetting right into the value instead of having to dereference the array pointer. Only 1-dimensional arrays declared on value types may be marked as fixed.



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows