Assembly Loading




Assembly Loading

The process of actually loading an assembly into memory and preparing it for code execution by the CLR is more involved than you might think. Of course, from a user standpoint, the process is (usually) entirely transparent. Most of this information is merely an implementation detail and can be skimmed if not skipped entirely. But if you run into problems loading the correct binary, or even failure to load binaries altogether, this information will be valuable to you.

In the following paragraphs, we'll take a closer look at the assembly-loading process. This process generally consists of three steps: binding, mapping, and loading. After that, we'll inspect static and early binding, and then turn to what APIs exist for you to dynamically load assemblies. We'll touch a bit on CLR internals in the process.

Inside the Bind, Map, Load Process

A number of steps take place in order to determine what code to load, where to load it from, and what context it will be loaded into. A conceptual overview of the process is depicted in Figure. We will briefly discuss each step in the process further below.

Image from book
Figure: An overview of the early-bound assembly load process.

A related part of binding is probing. Probing is the act of searching for the physical binary based on the version and location information discovered earlier in the load process. Roughly speaking, these four activities can be conceptually envisioned as follows:

  • Binding: Taking input from the user (when manually loading) or during dependency resolution, consulting system configuration and the Fusion subsystem, and using that information determine the identity of the assembly to load.

  • Probing: Binding often relies on the Fusion subsystem to perform probing in order to locate an assembly against which to bind. Probing encapsulates much of the complexity of locating assemblies on your system so that the CLR loader doesn't have to.

  • Mapping: Once the identity of the assembly to load is determined, it must be read and mapped into memory. The physical bits are mapped into memory space to which the CLR has access.

  • Loading: The last phase of the process is to prepare the code loaded into memory for execution. This consists of verification, creation of data structures and services in the CLR, and finally executing any initialization code.

The remainder of this section describes the binding and probing process further. Mapping and loading are mostly implementation details that you seldom need to worry about.

Binding to an Assembly

The binding process accepts a variety of inputs, including either a fully or partially qualified assembly name, a file path, or a byte[] block of memory. It then uses this input to decide what bits must actually get loaded and from where. The case of the byte[] is quite simple: the hard work is already done, and we can simply move on to mapping it into memory, verifying its contents, and working directly with it. However, in the case of a strong name or partial name, there is a bit of work to do first.

The first step is to transform a name into a location. For assembly loads that don't specify version or key information, policy is not consulted. Loads that come from disk or a network location (e.g., Assembly.LoadFrom, described below) that use assemblies with this information will consult policy before fully loading the assembly; this is determined by reading the assembly's manifest. But for all other loads, no configuration or GAC searching is performed.

Policy and Configuration

Configuration is searched for a match, based on the version and key information supplied. This permits binding redirects to change the version information of the assembly being requested. Ordinarily, this is used to force forward references to older assemblies; this is useful for redirection of older clients to new binaries with bug fixes, for example for service packs. It is done using a <bindingRedirect /> tag in the configuration file; this can be done either at the machine level (\Windows\Microsoft.NET\Framework\x.x.xxxx\CONFIG\) or via an application-specific configuration file (YourApp.exe.config). Of course, two versions of the same assembly can run side by side (SxS) in a single AppDomain, provided that the policy doesn't roll references to the old one forward before it can even be loaded.

After the policy has been applied, the GAC is searched for a match. If an exact match of the version is found in the GAC, probing is not needed. And remote locations (e.g., as specified in the codebase) are not searched. Otherwise, if there are any <codeBase /> hints specified in the <assemblyBinding> <dependentAssembly /></assemblyBinding> configuration section, the runtime will try using the locations specified there and again try to avoid probing altogether. For strongly named assemblies, the codebase can refer to an Internet or intranet location; otherwise, it must refer to a directory relative to the application directory. Please refer to the SDK for detailed information about codebases.

Consulting the Cache

The next step in the bind process is to determine whether it can reuse an existing assembly. It does so by examining the AppDomain-local cache of previous bind activities for the context we're loading in (described further below). If we have loaded the target assembly already we can use the cached information to save from having to consult the (potentially costly) probing process. This history includes both successful and failed bindings. If an entry is found for an assembly with the same identity the bind process is done and the already loaded code is just reused. Otherwise, the binder proceeds with the process of trying to find a suitable match.

Assuming that no codebase was found and no cached entry was found, the next step is to probe. Prior to 2.0, LoadFrom and LoadFile would bypass the policy and probing altogether, but starting in 2.0, default binding policy overrides any attempts to load from a precise location. This means that if you're trying to load an assembly from a local file or via a UNC that contains strong-name information, see if it can locate a copy in the GAC or assembly base and private paths with the same identity. Only if it fails to locate an assembly with the same identity using this process will it load the precise file specified.

Probing for the Bits (Fusion)

Fusion is the code name for the CLR subsystem that takes responsibility for hunting down assemblies. When presented with the assembly bind results, it looks at the physical file on disk. This process is called probing. This process entails a simple stepwise search through a set of well-known locations to find the assembly to be loaded. Throughout the process, Fusion attempts to find an assembly that matches the information included in the reference. In the case of strong name binding, this is the full assembly name. But in many cases, this will just be a match on simple assembly name.

Fusion starts its search in the directory specified by the AppDomain's BaseDirectory property. Specifically, it tries two permutations of the path; for references that do not have culture information, they are as follows:

  • <BaseDirectory>\<AssemblyName>.dll and <BaseDirectory>\<AssemblyName>\<AssemblyName>.dll

For references with culture information, the two paths are slightly different:

  • <BaseDirectory>\<Culture>\<AssemblyName>.dll and <BaseDirectory>\<Culture>\<AssemblyName>\<AssemblyName>.dll

If that fails, it looks at the AppDomain's private binary paths specified by its PrivateBinPaths property. Similar to the above, it will append <Culture> information if it is present.

When constructing a new AppDomain, you may specify the BaseDirectory and PrivateBinPaths information via an instance of the AppDomainSetup class. Alternatively, you can use an application configuration file; for example, to specify private paths:

<configuration>
    <runtime>
        <assemblyBinding>
            <probing
        </assemblyBinding>
    </runtime>
</configuration>

When probing is unable to locate an assembly, it will trigger the AppDomain.AssemblyResolve event, permitting user code to perform its own custom loading. If all else fails, a TypeLoadException is thrown (if the load process was invoked due to a reference to a type residing in a dependent assembly) or a FileNotFoundException (if the load process was invoked manually).

Debugging the Assembly Load Process

Because the CLR looks in a large number of locations for your assembly — some of which depend on configuration — there's quite a bit that can go wrong in the process. From not finding assemblies you thought it should find to picking up incorrect versions of assemblies (or even copies you didn't know existed), it can be frustrating to debug these types of issues. And it can sometimes be all a result of a small typo in your configuration file!

Luckily, the CLR is willing to log every step of its process for you to analyze in such situations. To turn on logging, just fire up the fuslogvw.exe program, located in your .NET Framework SDK's bin directory. Click on the Settings button to turn logging on or off. You'll likely want "Log bind failures to disk," although "Log all binds to disk" can be interesting to learn more about the probing process.

Note

Caution: turning on logging has a fairly significant performance overhead, so make sure that you remember to shut it off when you're done debugging.

Not only will you see detailed bind information in the exception details, but you can also review the precise steps Fusion took by using the fuslogvw.exe program to view log information.

Custom Assembly Binding

If Fusion is unable to locate the requested assembly, an event is fired on the current AppDomain, giving you a chance to invoke custom binding behavior. Many enterprises prefer to store assemblies in some central location that Fusion doesn't know about, for example a UNC file share, hidden behind a web service interface, in a database, and so on. In such cases, you can write an event handler, subscribe to the AppDomain.AssemblyResolve event, perform custom binding and loading, and load the assembly yourself. Your event handler will be invoked before failure is reported by Fusion.

This example demonstrates a custom binder that searches a database for an assembly that Fusion failed to load. It's oversimplified for illustrative purposes, but it should give you an idea of what's possible:

Assembly CustomResolver(object sender, ResolveEventArgs args)
{
    Assembly loaded = null;

    using (SqlConnection cn = new SqlConnection("foo"))
    {
        SqlCommand cmd = cn.CreateCommand();
        cmd.CommandText =
            "SELECT bits FROM assembly_lkup WHERE name=@assemblyName";
        cmd.Parameters.AddWithValue("@assemblyName", args.Name);

        SqlDataReader reader = cmd.ExecuteReader();
        if (reader.Read())
        {
            SqlBinary bin = (SqlBinary)reader[0];
            loaded = Assembly.Load(bin.Value);
        }
    }

    return loaded;
}

To wire up this new handler, simply make a call to:

AppDomain.CurrentDomain.AssemblyResolve += CustomResolver;

Assuming that attempts to load your assembly fail (which can be guaranteed by using some form of name-mangling scheme), your CustomResolver event will be fired. And then your custom code will perform the database query and, assuming that it finds what it was looking for, deserialize its byte[] using the Assembly.Load function.

Load Contexts

To further complicate matters, assembly loading behavior also depends on the context in which the load occurs. This primarily impacts how assembly dependencies are located. There are three logical contexts in the runtime. The context chosen depends on how the assembly load process is initiated, whether it's static or dynamic loading, and whether it's resolving dependencies for an already loaded assembly. Further, each AppDomain maintains a history of loads and binds — successes and failures — and ensures that a unique bind is only issued once.

  • Normal: If an assembly was found during the ordinary binding and probing process, it is loaded in the Normal context. This includes searching the GAC (for strongly named assemblies) and the AppDomain's application base and private binary paths. NGen assemblies are loaded only in the Normal context. Once an assembly has been loaded in this context, dependencies will be loaded in the same context (that is, they must exist in one of the locations specified above).

  • The Normal context is useful because it often "does the right thing" automatically. It will not, however, resolve references from assemblies in the Normal context to assemblies that would be or have been already loaded in other contexts; given that much of the time these will be GAC-loaded assemblies, which may only depend on other assemblies in the GAC, this is ordinarily not a problem. But if an assembly loaded from an application directly depends on something loaded from a URL, the CLR won't be able to bind the two together. You might consider implementing custom binding using the AppDomain.AssemblyResolve event if this is a problem for you.

  • LoadFrom: If an assembly is loaded from a file path, UNC, or URL, it will be loaded in the LoadFrom context. A number of mechanisms use this context without you necessarily knowing about it. As a rule of thumb, whenever you supply path information for the load process, it is likely getting loaded in the LoadFrom context (e.g., dynamic loading using Assembly.LoadFrom, Assembly.LoadFile). This usually only happens in dynamic loading scenarios. Assemblies loaded in LoadFrom will resolve dependent assemblies in the Normal context, but not vice versa.

  • Anonymous: Any assembly that is loaded by a mechanism not discussed above falls into this third context. This generally means the Assembly.Load overload, which takes a byte[] and dynamically generated assemblies (e.g., Reflection.Emit). The usefulness of this context is that you can bypass the ordinary load and probing costs when you know you don't need them. Assemblies in the Anonymous context generally cannot be resolved by dependency resolution unless you implement custom binding logic.

We will reference these contexts below when explaining the behavior of both early- and late-bound assembly loading.

Unloading Assemblies

Once an assembly gets loaded into an AppDomain, it remains until the AppDomain is unloaded. AppDomains can be unloaded via the AppDomain.Unload API. Even once the assembly is no longer in use — for example, no active instances of types — the assembly remains in memory. This is even true should the managed Assembly reference go out of scope and be reclaimed by the GC. This can actually be a problem with dynamically generated assemblies. We discuss AppDomains and their various capabilities in Chapter 10, although due to the tight relationship between assemblies and AppDomains we will briefly touch on some concepts here.

You should consider setting up individual AppDomains to contain assemblies that are loaded dynamically, that is, if you need to unload them at some point in your application's lifecycle. This is particularly true when doing on-the-fly compilation with using Reflection.Emit. You can accomplish this by first creating a new AppDomain, and then calling AppDomain.ExecuteAssembly or DoCallBack to execute code in the target AppDomain.

This code shows an example of isolating code in an assembly entirely within another AppDomain:

// Set up a new AppDomain, and load the DLL inside of it:
AppDomain ad = AppDomain.CreateDomain("Isolated");
ad.ExecuteAssembly("program.exe");

// ...

// Elsewhere, on another thread, we can dump the contents of the AppDomain:
AppDomain.Unload(ad);

Be careful when using either mechanism. Accidentally referencing the metadata from the target assembly in another AppDomain, for example using Type or Assembly references, will cause the assembly to be loaded inside the referencing domain. You might do this, for example, if you invoked AppDomain.Load and then performed a DoCallBack invocation that referenced types from that assembly. Load returns an Assembly reference in the source AppDomain, which causes the load to occur. If this happens, the Unload will not get rid of both copies.

Domain Neutrality

When an assembly is loaded, the process causing the load to occur has a chance to hint at whether the assembly should be loaded domain specific (a.k.a. domain bound) or domain neutral. Domain-specific assemblies are local to a single AppDomain, meaning that if two AppDomains in a process need to access the same code, there will actually be two copies of the assembly's internal data structures and native code loaded in the same process. Domain neutral assemblies, on the other hand, can be shared across AppDomains albeit at a slight cost. Clearly domain neutral assemblies still must allocate space for AppDomain-specific data (e.g., static variables) regardless of domain neutrality versus specificity.

Note

Note that metadata is always shared for read access across AppDomains in the same process. It would be silly to duplicate the same exact IL numerous times, for example. What is being referred to here is the native code and data structures that the JIT and CLR allocate in order to execute the assembly's code. There is a difference.

One important consequence of this design is that domain neutral assemblies can never be unloaded from the process! We'll first discuss how the specific versus neutral decision gets made (and by whom) and then take a look at the tradeoffs between the two.

Default Loading Policies

The default loading policy is to load mscorlib domain neutral, and everything else as domain specific. This makes sense because mscorlib will likely be used in every managed application or library written. But other hosts may choose to load things in a customized manner that makes more sense for them. There are roughly three options that the loader recognizes when hinting at loading policy:

  • Single domain: Only mscorlib is loaded domain neutral. Everything else is domain specific.

  • Multi domain: All assemblies are loaded domain neutral.

  • Multi domain host: All assemblies that are loaded from the GAC will be loaded domain neutral. Everything else is loaded as domain specific. Prior to 2.0, the meaning of this policy was slightly different: any assembly with a strong name loaded in the Normal context would be loaded as domain neutral.

ASP.NET, for example, chooses to use multidomain host loading. This makes sense, because often machine-wide libraries are shared among several web applications on the same machine. Before the change in 2.0 noted above, ASP.NET would have to recycle the aspnet_wp.exe process anytime a new strongly named assembly was deployed. With the new behavior, it will only have to recycle the process if you update an already loaded assembly in the GAC.

If you are writing your own host or are running an EXE-compiled application, you can provide an override or hint to control domain specificity of the code that is loaded by the executing program. We don't discuss hosting in any great depth in this book, so it doesn't make sense to drill deep into that right here. Changing behavior in hosting environments boils down to just passing the right flag (i.e., STARTUP_LOADER_OPTIMIZATION_*) as the flags argument when invoking CorBindToRuntimeEx. However, for ordinary managed EXEs, you can annotate your entrypoint method with the LoaderOptimizationAttribute. It accepts an enumeration value, three values of which map exactly to the policies outlined above. For example:

using System;

class LoaderTest

{
    [LoaderOptimization(LoaderOptimization.MultiDomain)]
    static int Main(string[] args)
    {
        /* ... */
        return 0;
    }
}

With this code snippet, all assemblies that get loaded by the program will end up in multidomain mode. That is, every assembly will get loaded as domain neutral. Be cautious when using this; changing policies such as this can cause subtle problems.

Specific vs. Neutral Tradeoffs

The primary downside to domain specific assembly loading is code duplication. Say that you have an application with 10 AppDomains. If each of those AppDomains used code from System.dll (quite reasonable due to the commonality of System.dll in managed programs), you'd have to load 10 copies of System.dll into your process. Some sharing of code and data structures does happen — for example, the CLR notices when the same assembly is multi-AppDomain and adjusts its internal data structures accordingly — but this would nevertheless hurt the working set of your program quite dramatically.

For example, consider this code:

using System;

class TestWorkingSet
{

  static void Main()
  {
      Console.WriteLine(Environment.WorkingSet);
      LoadSystem();
      Console.WriteLine(Environment.WorkingSet);
      LoadInAppDomains(10);
      Console.WriteLine(Environment.WorkingSet);
  }

  static void LoadSystem()
  {
      Type uriType = typeof(System.Uri);
  }

  static void LoadInAppDomains(int n)
  {
      for (int i = 0; i < n; i++)
      {
         AppDomain ad = AppDomain.CreateDomain(i.ToString());
         ad.DoCallBack(delegate { LoadSystem(); });
      }
  }
}

Results will differ based on a variety of factors, but when I run this program my working set jumps from 3.5MB after the initial program loads to 4.25MB after System.dll gets loaded once, and nears 8MB after loading System.dll into 10 additional AppDomains. Clearly, this is a situation where System.dll would have been better of being loaded as domain neutral. Sure enough, annotating the Main method with [LoaderOptimization(LoaderOptimization.MultiDomain)] reduces the working set of the 10 AppDomain state by about 2MB.

There are other performance benefits to loading as domain neutral too. For example, if an assembly is loaded as domain neutral, it gets initialized and prepared for execution by the engine only once, not once for each AppDomain that needs to use it (with caveats of allocating statics and initializing class constructors). Domain-specific assemblies need to do a non-trivial amount of work in order to prepare the metadata in one AppDomain for execution in another, including rejitting and generating all the CLR's runtime data structures for it.

Loading something domain neutral, however, can actually have negative impacts if not used carefully. First and foremost, it means that code can never get unloaded, even after it's no longer in use. Once something is loaded as domain neutral, you can't simply dump an AppDomain and expect all of the code that got loaded by it to go away. You have to dump the whole process. This can be problematic for dynamically created assemblies, as is the case with Reflection.Emit assemblies.

Because domain neutral assemblies are stored in cross-AppDomain memory space, things like statics and class constructors, which are meant to be local to an AppDomain, require special per-AppDomain storage regardless of loading policy (along with associated indirection and marshaling costs). Thus, you might actually notice a slowdown when accessing these types of members in domain neutral situations. Lastly, domain neutral assemblies must ensure that the transitive closure of their static dependencies will get loaded as domain neutral, too. For very fundamental assemblies such as mscorlib, which don't depend on anything managed, this is simple. For applications that reference a lot of external libraries, this can be a mistake and requires disciplined factoring.

Loading the CLR

An EXE's bootstrapping code must load the CLR into the process and hand off the metadata to be executed. It does so through the shim, that is, mscoree.dll. The shim is responsible for selecting either the workstation or server build of the CLR, found in mscorwks.dll and mscorsvr.dll, respectively. It decides where to look for these DLLs and which flavor to load based on a number of factors, including registry settings and whether you're on a uni- or multiprocessor machine. Please refer to Chapter 3 for more information on differences between these two builds. From there, other CLR DLLs will get pulled in as needed in order to execute your code. For example, mscorjit.dll will be used to JIT the IL (in the case of non-NGen assemblies).

There are several ways in which to start the load process. The EXE bootstrapper mentioned above simply uses a call to mscoree.dll!_CorExeMain in its Win32 entrypoint. This is entirely transparent for most developers; clicking on the EXE in Explorer will magically load the CLR and your program (thanks to the use of the PE file format). DLLs contain a tiny snippet of code that forward to mscoree.dll!_CorDllMain.

The CLR also exposes a set of COM interfaces with which one may load the CLR into the process. This is how Internet Explorer and SQL Server 2005 in particular get the CLR loaded. This is done through the ICorRuntimeHost interface — called the hosting API — through a combination of executing the CorBindToRuntimeEx and Start functions. In addition to simply loading the CLR into the process, programs interacting with the CLR through the hosting interface may also override policies. These policies include things like mapping logical threads to physical units of execution, hooking all managed memory allocations and lock acquisitions and releases, controlling AppDomain loads, and much, much more. Please refer to the "Further Reading" section for follow-up information on the hosting APIs.

Static Assembly Loading

Early-bound assembly loading is the default process by which assemblies get loaded on the CLR. This is the case when the manifest for an assembly contains an AssemblyRef to another physical assembly. If you take a dependency on an assembly and call an exported method, for example, the code and metadata for that assembly must be loaded into the AppDomain in order for JIT and execution. This is determined by the fact that the TypeRef or MethodRef contains a reference to the specific external AssemblyRef. If the assembly hasn't already been loaded into the AppDomain, the loader takes over, finds the correct assembly based on the dependency information in your program, loads the code, and returns control to execute the method.

Assemblies generally get loaded whenever code needs to be executed or metadata inspected. Notice that this is done lazily — or just in time, just like Win32's LoadLibrary — meaning that the dependencies aren't even resolved and loaded until the code or metadata for that assembly is needed by the runtime. Specifically, there are a number of events that can trigger this process, for example:

  • The EXE bootstrapper needs to load the primary assembly in order to execute it.

  • When an already loaded assembly makes a call to code in a dependent assembly, the type reference is resolved based on the assembly's manifest, and the physical assembly is loaded in order to execute the code.

  • Manually loading an assembly through the various System.Reflection.Assembly APIs.

  • Remoting a non-MarshalByRefObject instance across an AppDomain boundary and accessing any of its members will cause the assembly defining that type to be loaded in the receiving AppDomain.

  • Transferring a Reflection object (e.g., Type, Assembly) across an AppDomain boundary causes the assembly to be loaded in the receiving AppDomain if the assembly wasn't already loaded and is not domain neutral. We discuss domain neutrality a bit later in the chapter.

Let's briefly look at how type references get resolved, and how this triggers the loading of an assembly.

Resolving Type References

In the metadata, all references to types (e.g., methods, fields) are fully qualified type names, also called TypeSpecs (or MethodSpecs or FieldSpecs). These specifications include the assembly name as a prefix. An associated AssemblyRef tells the CLR binding information about the assembly. For example, this snippet of textual IL shows a method calling into the mscorlib.dll and System.dll assemblies:

.method public hidebysig instance void TestMethod() cil managed
{
    .maxstack  2
    ldstr      "http://www.bluebytesoftware.com/blog/"
    newobj     instance void [System]System.Uri::.ctor(string)
    callvirt   instance string [mscorlib]System.Object::ToString()

    call       void [mscorlib]System.Console::WriteLine(string)
    ret
}

You can see that the System.Uri type and the reference to its .ctor(string) method are both prefixed with [System], and that System.Object::ToString() and System.Console::WriteLine(string) are both prefixed with [mscorlib]. When this program gets compiled into an assembly, if you take a look at its manifest you'll notice external declarations for both of these assemblies:

.assembly extern mscorlib
{
    .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
    .ver 2:0:0:0
}
.assembly extern System
{
    .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
    .ver 2:0:0:0
}

Notice that we have the strong name (name, public key, and version) of each external assembly that this was compiled against. Unless a reference to System.dll has been encountered in this program already, the newobj call to System.Uri's constructor will cause it to get loaded, jitted (if an NGen image wasn't found), and executed.

Dynamic Assembly Loading

In most cases, you don't have to deal with loading assembly images by hand. Your application will begin execution through the ordinary EXE bootstrap process — or perhaps it will be controlled by your host, for example Internet Explorer or SQL Server — and the runtime will then automatically resolve dependent assemblies while executing it. In this scenario, which is by far the most pervasive, you don't have to do anything to invoke the loading process. It just happens.

But there are scenarios in which you want to control assembly loading by hand. When you are doing late-bound, dynamic programming (described in detail in Chapter 14), for example, you will want to load assemblies based on information that might only be available at runtime. Consider a plug-in architecture in which assemblies conform to a standard interface and get installed into a configuration database. This database will contain the strong names of assemblies to load. Your program must query it and know how to load these in a late-bound fashion. These assemblies could implement a standard interface, yet the general plug-in host could remain totally ignorant of the implementation type.

For example, consider an abbreviated example of a plug-in host (please see the section "AppDomain Isolation" in Chapter 10 for more details and examples on such architectures):

interface IHostApplication { /* ... */ }
interface IPlugin {
    string Name { get; }
    void Start(IHostApplication app);
    void Shutdown();
}
void ShowPlugin(IHostApplication app, string key) {

    byte[] assemblyBytes = LoadPluginFromDb(key);
    Assembly a = Assembly.Load(assemblyBytes);
    IPlugin plugin = (IPlugin)a.CreateInstance(GetPluginTypeFromDb(key));
    plugin.Start(app);
}

The functionality described herein assists you to do things like this. Another example scenario is when you want to hook into the default probing process to perform some custom assembly loading. We saw an example of this above. This enables scenarios in which you deserialize assemblies from a central database — useful in enterprise deployment scenarios — or extend the ordinary probing process to look in alternative locations for the bits to load, for instance. Overriding the default binding behavior is a surprisingly common task.

The System.Reflection.Assembly type exposes a set of static Load* methods and overloads. There are several such methods, offering a variety of means by which to specify the name and location of the assembly to load. But the intent of each is the same: the method loads an assembly into the current AppDomain and returns a reference to an Assembly object exposing its metadata. The choice of which method overload to choose is based on what information you have about the assembly to load, and the context from which you would like to load it. Load enables you to supply an AssemblyName, string, or byte[] for loading purposes. LoadFrom and LoadFile accept string-based paths to the assembly which is to be loaded. Each of these is briefly described below.

Default Loading (Assembly.Load)

Assembly.Load will use the default CLR loading behavior when presented with an AssemblyName or string — described above — to locate and load your assembly. Roughly, it will determine whether the assembly has already been loaded in the current AppDomain (and if so, just use that), search the known NGen images and GAC, look in the current execution directory (AppDomain.BaseDirectory), and search any current extended paths stored in the AppDomain (e.g. AppDomain.PrivateBinPaths). This is a simplification of the process described earlier.

For example, you might construct an AssemblyName instance and call Load as follows:

// Create the fully qualified assembly name:
AssemblyName name = new AssemblyName();
name.Name = "TestDll.dll";
name.Version = new Version("1.0");
name.KeyPair = new StrongNameKeyPair(File.OpenRead("TestDll.pubkey"));

// Now perform the load:
Assembly a = Assembly.Load(name);

// Demonstrate creating an instance from the loaded assembly:
IFoo foo = a.CreateInstance("Foo") as IFoo;
if (foo != null)
    foo.Bar("Test");

This small code snippet dynamically loads an assembly based on Fusion's default binding behavior. It uses an AssemblyName which contains a Name and Version, and reads a KeyPair from a 128-bit key file stored on disk. The last bit simply demonstrates how one might go about dynamically loading an instance of a type that implements a known interface. The Assembly.CreateInstance method, among others, enables you to bind to and create instances of types dynamically.

Load's byte[] overload takes a raw byte representation of the assembly to load. In this case, the loader will simply parse the block of memory as though it had read the PE file directly from disk itself. One possible use of this API was shown earlier in this section when I wrote a custom assembly binder that takes a blob of serialized data out of a SQL Server database and turns it into a living, breathing assembly.

Using the Load method is generally preferred as it follows the standard assembly loading behavior. It basically does the right thing, and will reduce administrative and support costs down the road due to servicing and debugging challenges with the other variants. However, it cannot be used to load files from precise locations on disk that aren't known to the AppDomain or over the network (without using the byte array, that is). This is where LoadFrom and LoadFile become useful.

Loading from a Location (Assembly.LoadFrom, LoadFile)

Using Assembly.LoadFrom enables you to supply a string-based path, which specifies where the assembly to load resides. This can be a plain file path or a network UNC, for example. It essentially determines the full version information (from reading the file) and then invokes the Fusion load process for it. This ensures that it will find identity-equivalent copies of the same assembly.

One nice thing about LoadFrom is that it will successfully load any dependencies that reside at the same path as the primary assembly. Another benefit is that the Fusion loader will intercept LoadFrom calls when it determines it knows of an alternative copy with the same identity. This can also reduce duplication of assemblies in memory if the loader has seen the same assembly go through the Normal context. Unfortunately, LoadFrom will always skip existing NGen images for the assembly.

Similar to LoadFrom, but subtly different, is LoadFile. This function also takes a file path from which to load an assembly, but it will avoid consulting Fusion altogether. Starting with 2.0, any policy and the GAC will be consulted first to see if a version of this file is found.

Both of these mechanisms have some drawbacks, namely that loading the same assembly more than once can result in multiple distinct (and incompatible) in-memory representations. LoadFrom doesn't suffer from this so much due to its reliance on Fusion. But type instances created from two assemblies loaded from separate paths are not considered compatible, potentially leading to unexpected InvalidCastExceptions if you are trying to share data between them. Further, it can lead to memory bloat because previously loaded copies of the same assembly won't be reused as is the case with Assembly.Load.

Reflection Loading (Assembly.ReflectionOnlyLoad, ReflectionOnlyLoadFrom)

There are two other variants on the assembly load process: ReflectionOnlyLoad and Reflection OnlyLoadFrom. They have similar overloads to their Load and LoadFrom counterparts. But they differ from the other load processes in that the assembly is only loaded for inspection by the System .Reflection APIs, and that the Fusion probing process will never intercept and redirect your requested to open a precise from the location specified when using ReflectionOnlyLoadFrom.

You can't actually create instances of types from assemblies loaded using reflection-only loaded assemblies nor can you execute any code inside of them. But it does cut down on the cost of loading the assembly since only the metadata is needed and not the actual code itself. In many cases, you might only want to perform analysis on the metadata, in which case the code is an extra, unneeded tax. For more information on reflection, please refer to Chapter 14. To detect whether an assembly was loaded for reflection-only, you can consult the ReflectionOnly property on the Assembly class.

Reflection-only loading does not automatically probe for and load dependent assemblies. So, if you need to access detailed metadata for an assembly's dependencies, you will need to subscribe to the dependency resolution event to manually load dependencies. For example, this code will throw an unhandled ReflectionTypeLoad exception:

Assembly systemAssembly = Assembly.ReflectionOnlyLoadFrom(
    "C:\\Windows\\Microsoft.NET\\v2.0.50727\\system.dll");
Type[] systemTypes = systemAssembly.GetTypes();

To fix this, you must hook the AppDomain's ReflectionOnlyAssemblyResolve event to perform the manual dependency resolution. This entails loading the dependency using ReflectionOnlyLoad* and returning it from inside your event handler. For example:

const string currentAssemblyKey = "CurrentReflectionAssemblyBase";

Assembly ReflectionOnlyLoadFrom(string assemblyPath)
{
    AppDomain currentAd = AppDomain.CurrentDomain;
    ResolveEventHandler customResolveHandler =
        new ResolveEventHandler(CustomReflectionOnlyResolver);
    currentAd.ReflectionOnlyAssemblyResolve += customResolveHandler;

    // Store the base directory from which we're loading in ALS
    currentAd.SetData(currentAssemblyKey,
        Path.GetDirectoryName(assemblyPath));

    // Now load the assembly, and force the dependencies to be resolved
    Assembly assembly = Assembly.ReflectionOnlyLoadFrom(assemblyPath);
    Type[] types = assembly.GetTypes();

    // Lastly, reset the ALS entry and remove our handler
    currentAd.SetData(currentAssemblyKey, null);
    currentAd.ReflectionOnlyAssemblyResolve -= customResolveHandler;

    return assembly;
}

Assembly CustomReflectionOnlyResolver(object sender, ResolveEventArgs e)
{
    AssemblyName name = new AssemblyName(e.Name);
    string assemblyPath = Path.Combine(
        (string)AppDomain.CurrentDomain.GetData(currentAssemblyKey),
        name.Name + ".dll");

    if (File.Exists(assemblyPath))
    {
        // The dependency was found in the same directory as the base
        return Assembly.ReflectionOnlyLoadFrom(assemblyPath);
    }
    else
    {
        // Wasn't found on disk, hopefully we can find it in the GAC...
        return Assembly.ReflectionOnlyLoad(name.Name);
    }
}

Once the original code is changed to call our custom ReflectionOnlyLoad helper (e.g., Assembly systemAssembly = ReflectionOnlyLoadFrom("%PATH_TO_FX%\\system.dll")) the assembly and all of its dependencies will be loaded in the reflection-only context. A list of assemblies loaded in this context is available by calling ReflectionOnlyGetAssemblies() on the current AppDomain.

For example, this code enumerates all such assemblies:

Array.ForEach<Assembly>(
    AppDomain.CurrentDomain.ReflectionOnlyGetAssemblies(),
    delegate(Assembly a) { Console.WriteLine("* "+ a.FullName); });

The result of executing this immediately after loading System.dll in the fashion shown above is:

* System, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
* System.Configuration, Version=2.0.0.0, Culture=neutral,
PublicKeyToken=b03f5f7f11d50a3a
* System.Xml, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
* Microsoft.Vsa, Version=8.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a

Unloading reflection-only assemblies occurs the same way as ordinary assemblies, so you'll likely want to isolate code that inspects them into a separate AppDomain so that they don't hang around forever after you've loaded them.

Reflection-Emit Assemblies

There is one last mechanism with which to instantiate an assembly inside an AppDomain: to create them from scratch. The System.Reflection.Emit APIs allow you to generate code at runtime inside a dynamic assembly. This assembly can then get saved to disk, thrown away, or executed in memory. This works by interacting with an instance of the AssemblyBuilder class. A detailed coverage of Reflection.Emit can be found in Chapter 14.

Type Forwarding

One challenge when designing managed programs is cleanly factoring types and their dependencies. Versioning makes this process more challenging. For example, if you make a decision in version 1.0 of your library to depend on System.Windows.Forms.dll in a critical system component, it's hard to reverse that decision later on if you deem it appropriate. One solution might be to move all types that depend on UI into their own DLL, for example. But high standards for application compatibility always trump the prospect of breaking changes, which what moving a type out of an assembly certainly would be classified as.

In 2.0 a new attribute, System.Runtime.CompilerServices.TypeForwardedToAttribute, has been added to remedy this exact situation. You may annotate your type in an assembly with this attribute, and point to another entirely separate assembly in which the type may now be found, and the compiler will insert a forwarder pseudo-custom attribute in the assembly's exported type metadata and add an AssemblyRef and refer to it from the affected ExportedType's implementation token.

For example, say that we shipped a version of our assembly TyFwdExample.dll:

using System;
using System.Windows.Forms;

public class Foo

{
    public void Baz()
    {
        // Some generally useful code...
    }
}

public class Bar
{
    public void Baz()
    {
        MessageBox.Show("Howdy, partner");
    }
}

And then one of our users took a dependency on the Foo and Bar types:

class Program
{
    static void Main()
    {
        Foo f = new Foo();
        f.Baz();
        Bar b = new Bar();
        b.Baz();
    }
}

Clearly moving Bar into a new assembly would break the old client unless the user recompiled the code using both assemblies in his or her list of references. But we can split the original code into two DLLs and use a type forwarder in the original to reference our new assembly.

For example, we could create two source files. First is the new DLL:

using System.Windows.Forms;

public class Bar
{
    public void Baz()
    {
        MessageBox.Show("Howdy, partner");
    }
}

And then we can modify the code for the old DLL like this:

using System;
using System.Runtime.CompilerServices;

[assembly:TypeForwardedTo(typeof(Bar))]

public class Foo
{

    public void Baz()
    {
        // Some generally useful code...
    }
}

Notice that we've added an assembly-level TypeForwardedToAttribute, which passes in the type to which we are forwarding. Of course, when we recompile the new DLL we must specify that it references the new one. If you inspect the resulting metadata (e.g., with ildasm.exe), you'll notice the new forwarder information.

Binary compatibility is preserved. Old clients needn't recompile and may keep their references to the original assembly. Yet the runtime will redirect all requests for the forwarded type through the old assembly to the new assembly. New applications can deliberately reference the assemblies they need, for example if they'd like to avoid the penalty of pulling in both assemblies.