POCO as a Lifestyle






POCO as a Lifestyle

What I also just said between the lines is that I'd really like to try to keep the main asset of my applications as free from infrastructure-related distractions as possible. The Plain Old Java Object (POJO) and Plain Old CLR Object (POCO) movement started out in Java land as a reaction against J2EE and its huge implications on applications, such as how it increased complexity in everything and made TDD close to impossible. Martin Fowler, Rebecca Parsons, and Josh MacKenzie coined the term POJO for describing a class that was free from the "dumb" code that is only needed by the execution environment. The classes should focus on the business problem at hand. Nothing else should be in the classes in the Domain Model.

Note

This movement is one of the main inspirations for lightweight containers for Java, such as Spring [Johnson J2EE Development without EJB].


In .NET land it has taken a while for Plain Old... to receive any attention, but it is now known as POCO.

POCO is a somewhat established term, but it's not very specific regarding persistence-related infrastructure. When I discussed this with Martin Fowler he said that perhaps Persistence Ignorance (PI) is a better and clearer description. I agree, so I'll change to that from now on in this chapter.

PI for Our Entities and Value Objects

So let's assume we want to use PI. What's it all about? Well, PI means clean, ordinary classes where you focus on the business problem at hand without adding stuff for infrastructure-related reasons. OK, that didn't say all that much. It's easier if we take a look at what PI is not. First, a simple litmus test is to see if you have a reference to any external infrastructure-related DLLs in your Domain Model. For example, if you use NHibernate as your O/R Mapper and have a reference to nhibernate.dll, it's a good sign that you have added code to your Domain Model that isn't really core, but more of a distraction.

What are those distractions? For instance, if you use a PI-based approach for persistent objects, there's no requirement to do any of the following:

  • Inherit from a certain base class (besides object)

  • Only instantiate via a provided factory

  • Use specially provided datatypes, such as for collections

  • Implement a specific interface

  • Provide specific constructors

  • Provide mandatory specific fields

  • Avoid certain constructs

There is at least one more, and one that is so obvious that I forgot. You shouldn't have to write database code such as calls to stored procedures in your Domain Model classes. But that was so obvious that I didn't write specifically about it.

Let's take a closer look at each of the other points.

Inherit from a Certain Base Class

With frameworks, a very common requirement for supporting persistence is that they require you to inherit from a certain base class provided by the framework.

The following might not look too bad:

public class Customer : PersistentObjectBase
{
    public string Name = string.Empty;
    ...

    public decimal CalculateDepth()
    ...
}

Well, it wasn't too bad, but it did carry some semantics and mechanisms that aren't optimal for you. For example, you have used the only inheritance possibility you have for your Customer class because .NET only has single inheritance. It's certainly arguable whether this is a big problem or not, though, because you can often design "around" it.

It's a worse problem if you have developed a Domain Model and now you would like to make it persistent. The inheritance requirement might very well require some changes to your Domain Model. It's pretty much the same when you start developing a Domain Model with TDD. You have the restriction from the beginning that you can't use inheritance and have to save that for the persistence requirement.

Something you should look out for is if the inheritance brings lots of public functionality to the subclass, which might make the consumer of the subclass have to wade through methods that aren't interesting to him.

It's also the case that it's not usually as clean as the previous example, but most of the time PersistentObjectBase forces you to provide some method implementations to methods in PersistentObjectBase, as in the Template Method pattern [GoF Design Patterns]. OK, this is still not a disaster, but it all adds up.

Note

This doesn't necessarily have to be a requirement, but can be seen as a convenience enabling you to get most, if not all, of the interface implementation that is required by the framework if the framework is of that kind of style. We will discuss this common requirement in a later section.

This is how it was done in the Valhalla framework that Christoffer Skjoldborg and I developed. But to be honest, in that case there was so much work that was taken care of by the base class called EntityBase that implementing the interfaces with custom code instead of inheriting from EntityBase was really just a theoretical option.


Only Instantiate via a Provided Factory

Don't get me wrong, I'm not in any way against using factories. Nevertheless, I'm not ecstatic at being forced to use them when it's not my own sound decision. This means, for instance, that instead of writing code like this:

Customer c = new Customer();

I have to write code like this:

Customer c = (Customer)PersistentObjectFactory.CreateInstance
    (typeof(Customer));

Note

I know, you think I did my best to be unfair by using extremely long names, but this isn't really any better, is it?

Customer c = (Customer)POF.CI(typeof(Customer));


Again, it's not a disaster, but it's not optimal in most cases. This code just looks a lot weirder than the first instantiation code, doesn't it? And what often happens is that code like this increases testing complexity.

Often one of the reasons for the mandatory use of a provided factory is that you will consequently get help with dirty checking. So your Domain Model classes will get subclassed dynamically, and in the subclass, a dirty-flag (or several) is maintained in the properties. The factory makes this transparent to the consumer so that it instantiates the subclass instead of the class the factory consumer asks for. Unfortunately, for this to work you will also have to make your properties virtual, and public fields can't be used (two more small details that lessen the PI-ness a little). (Well, you can use public fields, but they can't be "overridden" in the generated subclass, and that's a problem if the purpose of the subclass is to take care of dirty tracking, for example.)

Note

There are several different techniques when using Aspect-Oriented Programming (AOP) in .NET, where runtime subclassing that we just discussed is probably the most commonly used. I've always seen having to declare your members as virtual for being able to intercept (or advice) as a drawback, but Roger Johansson pointed something out to me. Assume you want to make it impossible to override a member and thereby avoid the extra work and responsibility of supporting subclassing. Then that decision should affect both ordinary subclassing and subclassing that is used for reasons of AOP. And if you make the member virtual, you are prepared for having it redefined, again both by ordinary subclassing and AOP-ish subclassing.

It makes sense, doesn't it?


Another common problem solved this way is the need for Lazy Load, but I'd like to use that as an example for the next section.

Use "Specially" Provided Datatypes, Such as Collections

It's not uncommon to have to use special datatypes for the collections in your Domain Model classes: special as in "not those you would have used if you could have chosen freely."

The most common reason for this requirement is probably for supporting Lazy Load, [Fowler PoEAA], or rather implicit Lazy Load, so that you don't have to write code on your own for making it happen. (Lazy Load means that data is fetched just in time from the database.)

But the specific datatypes could also bring you other functionality, such as special delete handling so that as soon as you delete an instance from a collection the instance will be registered with the Unit of Work [Fowler PoEAA] for deletion as well. (Unit of Work is used for keeping track of what actions should be taken against the database at the end of the current logical unit of work.)

Note

Did you notice that I said that the specific datatypes could bring you functionality? Yep, I don't want to sound overly negative about NPI (Not-PI).


You could get help with bi-directionality so that you don't have to code it on your own. This is yet another example of something an AOP solution can take care of for you.

Implement a Specific Interface

Yet another very regular requirement on Domain Model classes for being persistable is that they implement one or more infrastructure-provided interfaces.

This is naturally a smaller problem if there is very little code you have to write in order to implement the interface(s) and a bigger problem if the opposite is true.

One example of interface-based functionality could be to make it possible to fill the instance with values from the database without hitting setters (which might have specific code that you don't want to execute during reconstitution).

Another common example is to provide interfaces for optimized access to the state in the instances.

Provide Specific Constructors

Yet another way of providing values that reconstitute instances from the database is by requiring specific constructors, which are constructors that have nothing at all to do with the business problem at hand.

It might also be that a default constructor is needed so that the framework can instantiate Domain Model classes easily as the result of a Get operation against the database. Again, it's not a very dramatic problem, but a distraction nonetheless.

Provide Mandatory Specific Fields

Some infrastructure solutions require your Domain Model classes to provide specific fields, such as Guid-based Id-fields or int-based Version-fields. (With Guid-based Id-fields, I mean that the Id-fields are using Guids as the datatype.) That simplifies the infrastructure, but it might make your life as a Domain Model-developer a bit harder. At least if it affects your classes in a way you didn't want to.

Avoid Certain Constructs/Forced Usage of Certain Constructs

I have already mentioned that you might be forced to use virtual properties even if you don't really want to. It might also be that you have to avoid certain constructs, and a typical example of this is read-only fields. Read-only (as when the keyword readonly is used) fields can't be set from the outside (except with constructors), something that is needed to create 100% PI-Domain Model classes.

Using a private field together with a get-only property is pretty close to a read-only field, but not exactly the same. It could be argued that a read-only field is the most intention-revealing solution.

Note

Something that has been discussed a lot is whether .NET attributes are a good or bad thing regarding decorating the Domain Model with information about how to persist the Domain Model.

My opinion is that such attributes can be a good thing and that they don't really decrease the PI level if they are seen as default information that can be overridden. I think the main problem is if they get too verbose to distract the reader of the code.


PI or not PI?

PI or not PIof course it's not totally binary. There are some gray areas as well, but for now let's be happy if we get a feeling for the intention of PI rather than how to get to 100%. Anything extreme incurs high costs. We'll get back to this in Chapter 9, "Putting NHibernate into Action," when we discuss an infrastructure solution.

Note

What is an example of something completely binary in real life? Oh, one that I often remind my wife about is when she says "that woman was very pregnant."


Something we haven't touched on yet is that it also depends on at what point in "time" we evaluate whether we use PI or not.

Runtime Versus Compile Time PI

So far I have talked about PI in a timeless context, but it's probably most important at compile time and not as important at runtime. "What does that mean?" I hear you say? Well, assume that code is created for you, infrastructure-related code that you never have to deal with or even see yourself. This solution is probably better than if you have to maintain similar code by hand.

This whole subject is charged with feelings because it's controversial to execute something other than what you wrote yourself. The debugging experience might turn into a nightmare!

Note

Mark Burhop commented as follows:

Hmmm... This was the original argument against C++ from C programmers in the early 90s. "C++ sticks in new code I didn't write." "C++ hides what is really going on." I don't know that this argument holds much water anymore.


It's also harder to inject code at the byte level for .NET classes compared to Java. It's not supported by the framework, so you're on your own, which makes it a showstopper in most cases.

What is most often done instead is to use some alternative techniques, such as those I mentioned with runtime-subclassing in combination with a provided factory, but it's not a big difference compared to injected code. Let's summarize with calling it emitting code.

The Cost for PI Entitites/Value Objects

I guess one possible reaction to all this is "PI seems greatwhy not use it all the time?" It's a law of nature (or at least software) that when everything seems neat and clean and great and without fault, then come the drawbacks. In this case, I think one such is overhead.

I did mention earlier in this chapter that speed is something you will sacrifice for a high level of PI-ness, at least for runtime PI, because you are then directed to use reflection, which is quite expensive. (If you think compile-time PI is good enough, you don't need to use reflection, but can go for an AOP solution instead and you can get a better performance story.)

You can easily prove with some operation in a tight loop that it is magnitudes slower for reading from/writing to fields/properties with reflection compared to calling them in the ordinary way. Yet, is the cost too high? It obviously depends on the situation. You'll have to run tests to see how it applies to your own case. Don't forget that a jump to the database is very expensive compared to a lot you're doing in your Domain Model, yet at the same time, you aren't comparing apples and apples here. For instance, the comparison might not be between an ordinary read and a reflection-based read.

A Typical Example Regarding Speed

Let's take an example to give you a better understanding of the whole thing. One common operation in a persistence framework is deciding whether or not an instance should be stored to the database at the end of a scenario. A common solution to this is to let the instance be responsible for signaling IsDirty if it is to be stored. Or better still, the instance could also signal itself to the Unit of Work when it gets dirty so that the Unit of Work will remember that when it's time to store changes.

But (you know there had to be a "but," right?) that requires some abuse of PI, unless you have paid with AOP.

Note

There are other drawbacks with this solution, such as it won't notice the change if it's done via reflection and therefore the instance changes won't get stored. This drawback was a bit twisted, though.


An alternative solution is not to signal anything at all, but let the infrastructure remember how the instances looked when fetched from the database. Then at store time compare how the instances look now to how they looked when read from the database.

Do you see that it's not just a comparison of one ordinary read to one reflection-based read, but they are totally different approaches, with totally different performance characteristics? To get a real feeling for it, you can set up a comparison yourself. Fetch one million instances from the database, modify one instance, and then measure the time difference for the store operation in both cases. I know, it was another twisted situation, but still something to think about.

Other Examples

That was something about the speed cost, but that's not all there is to it. Another cost I pointed out before was that you might get less functionality automatically if you try hard to use a high level of PI. I've already gone through many possible features you could get for free if you abandon some PI-ness, such as automatic bi-directional support and automatic implicit Lazy Load.

It's also the case that the dirty tracking isn't just about performance. The consumer might be very interested as well in using that information when painting the formsfor example, to know what buttons to enable.

So as usual, there's a tradeoff. In the case of PI versus non-PI, the tradeoff is overhead and less functionality versus distracting code in the core of your application that couples you to a certain infrastructure and also makes it harder to do TDD. There are pros and cons. That's reasonable, isn't it?

The Cost Conclusion

So the conclusion to all this is to be aware of the tradeoffs and choose carefully. For instance, if you get something you need alongside a drawback you can live with, don't be too religious about it!

That said, I'm currently in the pro-PI camp, mostly because of how nice it is for TDD and how clean and clear I can get my Entities and Value Objects.

I also think there's a huge difference when it comes to your preferred approach. If you like starting from code, you'll probably like PI a great deal. If you work in an integrated tool where you start with detailed design in UML, for example, and from there generate your Domain Model, PI is probably not that important for you at all.

But there's more to the Domain Model than Entities and Value Objects. What I'm thinking about are the Repositories. Strangely enough, very little has been said as far as PI for the Repositories goes.

PI for Our Repositories

I admit it: saying you use PI for Repositories as well is pushing it. This is because the purpose of Repositories is pretty much to give the consumer the illusion that the complete set of Domain Model instances is around, as long as you adhere to the protocol to go to the Repository to get the instances. The illusion is achieved by the Repositories talking to infrastructure in specific situations, and talking to infrastructure is not a very PI-ish thing to do.

For example, the Repositories need something to pull in order to get the infrastructure to work. This means that the assembly with the Repositories needs a reference to an infrastructure DLL. And this in its turn means that you have to choose between whether you want the Repositories in a separate DLL, separate from the Domain Model, or whether you want the Domain Model to reference an infrastructure DLL (but we will discuss a solution soon that will give you flexibility regarding this).

Problems Testing Repositories

It's also the case that when you want to test your Repositories, they are connected to the O/R Mapper and the database.

Note

Let's for the moment assume that we will use an O/R Mapper. We'll get back to a more thorough discussion about different options within a few chapters.


Suddenly this provides you with a pretty tough testing experience compared to when you test the Entities and Value Objects in isolation.

Of course, what you could do is mock your O/R Mapper. I haven't done that myself, but it feels a bit bad on the "bang for the bucks" rating. It's probably quite a lot of work compared to the return.

Problems Doing Small Scale Integration Testing

In previous chapters I haven't really shown any test code that focused on the Repositories at all. Most of the interesting tests should use the Domain Model. If not, it might be a sign that your Domain Model isn't as rich as it should be if you are going to get the most out of it.

That said, I did use Repositories in some tests, but really more as small integration tests to see that the cooperation between the consumer, the Entities, and the Repositories worked out as planned. As a matter of fact, that's one of the advantages Repositories have compared to other approaches for giving persistence capabilities to Domain Models, because it was easy to write Fake versions of the Repositories. The problem was that I wrote quite a lot of dumb code that has to be tossed away later on, or at least rewritten in another assembly where the Repositories aren't just Fake versions.

What also happened was that the semantics I got from the Fake versions wasn't really "correct." For instance, don't you think the following seems strange?

[Test]
public void FakeRepositoryHaveIncorrectSemantics()
{
    OrderRepository r1 = new OrderRepository();
    OrderRepository r2 = new OrderRepository();

    Order o = new Order();

    r1.Add(o);
    x.PersistAll();

    //This is fine:
    Assert.IsNotNull(r1.GetOrder(o.Id));
    //This is unexpected I think:
    Assert.IsNull(r2.GetOrder(o.Id));
}

Note

As the hawk-eyed reader saw, I decided to change AddOrder() to Add() since the last chapter.


I'm getting a bit ahead of myself in the previous code because we are going to discuss save scenarios shortly. Anyway, what I wanted to show was that the Fake versions of Repositories used so far don't work as expected. Even though I thought I had made all changes so far persistent with PersistAll(), only the first Repository instance could find the order, not the second Repository instance. You might wonder why I would like to write code like that, and it's a good question, but it's a pretty big misbehavior in my opinion.

What we could do instead is mock each of the Repositories, to test out the cooperation with the Entities, Repositories, and consumer. This is pretty cheaply done, and it's also a good way of testing out the consumer and the Entities. However, the test value for the Repositories themselves isn't big, of course. We are kind of back to square one again, because what we want then is to mock out one step further, the O/R Mapper (if that's what is used for dealing with persistence), and we have already talked about that.

Earlier Approach

So it's good to have Repositories in the first place, especially when it comes to testability. Therefore I used to swallow the bitter pill and deal with this problem by creating an interface for each Repository and then creating two implementing classes, one for Fake and one for real infrastructure. It could look like this. First, an interface in the Domain Model assembly:

public interface ICustomerRepository
{
    Customer GetById(int id);
    IList GetByNamePattern(string namePattern);
    void Add(Customer c);
}

Then two classes (for example, FakeCustomerRepository and MyInfrastructureCustomer-Repository) will be located in two different assemblies (but all in one namespace, that of the Domain Model, unless of course there are several partitions of the Domain Model). See Figure.

Two Repository assemblies


That means that the Domain Model itself won't be affected by the chosen infrastructure when it comes to the Repositories, which is nice if it doesn't cost anything.

But it does cost. It also means that I have to write two Repositories for each Aggregate root, and with totally different Repository code in each case.

Further on, it means that the production version of the Repositories lives in another assembly (and so do the Fake Repositories), even though I think Repositories are part of the Domain Model itself. "Two extra assemblies," you say, "That's no big deal." But for a large application where the Domain Model is partitioned into several different assemblies, you'll learn that typically it doesn't mean two extra assemblies for the Repositories, but rather the amount of Domain Model assemblies multiplied by three. That is because each Domain Model assembly will have its own Repository assemblies.

Even though I think it's a negative aspect, it's not nearly as bad as my having the silly code in the Fake versions of the Repositories. That feels just bad.

A Better Solution?

The solution I decided to try out was creating an abstraction layer that I call NWorkspace [Nilsson NWorkspace]. It's a set of adapter interfaces, which I have written implementations for in the form of a Fake. The Fake is just two levels of hashtables, one set of hashtables for the persistent Entities (simulating a database) and one set of hashtables for the Unit of Work and the Identity Map. (The Identity Map keeps track of what identities, typically primary keys, are currently loaded.)

The other implementation I have written is for a specific O/R Mapper.

Note

When I use the name NWorkspace from now on, you should think about it as a "persistence abstraction layer." NWorkspace is just an example and not important in itself.


Thanks to that abstraction layer, I can move the Repositories back to the Domain Model, and I only need one Repository implementation per Aggregate root. The same Repository can work both against an O/R Mapper and against a Fake that won't persist to a database but only hold in memory hashtables of the instances, but with similar semantics as in the O/R Mapper-case. See Figure.

A single set of Repositories thanks to an abstraction layer


The Fake can also be serialized to/deserialized from files, which is great for creating very competent, realistic, and at the same time extremely refactoring-friendly early versions of your applications.

Another possibility that suddenly feels like it could be achieved easily (for a small abstraction layer API at least) could be to Mock the infrastructure instead of each of the Repositories. As a matter of fact, it won't be a matter of Mocking one infrastructure-product, but all infrastructure products that at one time will have adapter implementations for the abstraction layer (if that happens, that there will be other implementations than those two I wroteit's probably not that likely). So more to the point, what is then being Mocked is the abstraction layer.

It's still a stretch to talk about PI Repositories, but with this solution I can avoid a reference to the infrastructure in the Domain Model. That said, in real-world applications I have kept the Repositories in a separate assembly anyway. I think it clarifies the coupling, and it also makes some hacks easier to achieve and then letting some Repository methods use raw SQL where that proves necessary (by using connection strings as markers for whether optimized code should be used or not).

However, instead of referring to the Persistence Framework, I have to refer to the NWorkspace DLL with the adapter interfaces, but that seems to be a big step in the right direction. It's also the case that there are little or no distractions in the Repositories; they are pretty "direct" (that is, if you find the NWorkspace API in any way decent).

So instead of writing a set of Repositories with code against an infrastructure vendor's API and another set of Repositories with dummy code, you write one set of Repositories against a (naïve) attempt for a standard API.

Note

I'm sorry for nagging, but I must say it again: It's the concept I'm after! My own implementation isn't important at all.


Let's find another term for describing those Repositories instead of calling the PI Repositories. What about single-set Repositories? OK, we have a term for now for describing when we build a single set of Repositories that can be used both in Fake scenarios and in scenarios with a database. What's probably more interesting than naming those Repositories is seeing them in action.

Some Code in a Single-Set Repository

To remind you what the code in a Fake version of a Repository could look like, here's a method from Chapter 5:

//OrderRepository, a Fake version
public Order GetOrder(int orderNumber)
{
    foreach (Order o in _theOrders)
    {
        if (o.OrderNumber == orderNumber)
            return o;
    }
    return null;
}

OK, that's not especially complex, but rather silly, code.

If we assume that the OrderNumber is an Identity Field [Fowler PoEAA] (Identity Field means a field that binds the row in the database to the instance in the Domain Model) of the Order, the code could look like this when we use the abstraction layer (_ws in the following code is an instance of IWorkspace, which in its turn is the main interface of the abstraction layer):

//OrderRepository, a single-set version
public Order GetOrder(int orderNumber)
{
    return (Order)_ws.GetById(typeof(Order), orderNumber);
}

Pretty simple and direct I think. Andagainthat method is done now, both for Fake and for when real infrastructure is used!

The Cost for Single-Set Repositories

So I have yet another abstraction. Phew, there's getting to be quite a lot of them, don't you think? On the other hand, I believe each of them adds value.

Still, there's a cost, of course. The most obvious cost for the added abstraction layer is probably the translation at runtime that has to be done for the O/R Mapper you're using. In theory, the O/R Mapper could have a native implementation of the abstraction layer, but for that to happen some really popular such abstraction layer must be created.

Then there's a cost for building the abstraction layer and the adapter for your specific O/R Mapper. That's the typical framework-related problem. It costs a lot for building the framework, but it can be used many times, if the framework ever becomes useful.

With some luck, there will be an adapter implementation for the infrastructure you are using and then the cost isn't yours, at least not the framework-building cost. There's more, though. You have to learn not only the infrastructure of your choice, but also the abstraction layer, and that can't be neglected.

Note

It was easier in the past as you only had to know a little about Cobol and files. Now you have to be an expert on C# or Java, Relational Databases, SQL, O/R Mappers, and so on, and so forth. If someone tries to make the whole thing simpler by adding yet another layer, that will tip the scales, especially for newcomers.


Yet another cost is, of course, that the abstraction layer will be kind of the least common denominator. You won't find all the power there that you can find in your infrastructure of choice. Sure, you can always bypass the abstraction layer, but that comes with a cost of complexity and external Repository code, and so on. So it's important to investigate whether your needs could be fulfilled with the abstraction layer to 30%, 60%, or 90%. If it's not a high percentage, it's questionable whether it's interesting at all.

Ok, let's return to the consumer for a while and focus on save functionality for a change.



 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows 

brutalcode
BrutalCode 4 years, 7 months ago #