Wednesday 29 December 2010

News time

News time, children!

My blogging activity might be low recently, but there are two news worth mentioning.

Fisrt, the repository move. I've been trying Mercurial for some time. It's time to admit I like it. I you are interested in following the code I write, from now on you'll have to look at my Mercurial repository at Bitbucket.

I know that Google Code has Mercurial support, but I simply preferred to go to another site. One advantage of using a distributed version control system is that I can move hosts any time I want, without loosing anything. Anyways, my experience with Bitbucket has been very good so far.

Second and last, my blog url has changed. You might already have noticed, but you're now at http://jplabs.bochi.it/.

Thanks for listening! And now, some music.

Friday 15 October 2010

A Tale of Singletons and Global State

Some weeks ago, I received one odd feedback. I was talking to this guy and he said "A fellow of mine read in your blog that you are in favor of singletons. Is that true?" I remembered that the post was ultimately about Design Patterns. I used some extrapolations of the Singleton Pattern to get to Dependency Injection. I never supported singletons. I just used it as a very simple and known pattern to get to something bigger. I was so astonished by the question that I couldn't answer it. At the time, I was not able to support my real opinion. Today, I'll try to express myself clearer.

According to wikipedia, the singleton pattern is a design pattern used to restrict the instantiation of a class to one object. This is useful when exactly one object is needed to coordinate actions across the system. In the usual way it's implemented, there's a private constructor and an static property or method that always returns the same instance. There are a few big problem with such a class. One, it violates the single responsibility principle. A class should not be responsible for its own lifetime. Two, the singleton instance is attached to the global state.

It happens that both these problems can be eliminated and we still have a singleton. The creation of the singleton could be delegated to a factory. And the factory doesn't have to be attached to the global state. That's what differentiates a good singleton (one instance per factory) from a bad singleton (one instance per AppDomain or JVM).

I have seen a good deal of bad singletons in projects I've worked on. They caused me pain. Many times, I was forced to convert some of them into something nicer. I know how hard they are to test, and how they tend to create hidden dependencies. I don't like the bad singleton. Some people even say they are evil. Nevertheless, they are ubiquitous. Just for the sake of the example, let's consider the .NET framework. Here is a list of singletons (and static functions that access some form of global state) built in the framework:
System.AppDomain.CurrentDomain;
System.DateTime.Now;
System.Environment.*;
System.Globalization.CultureInfo.CurrentCulture;
System.IO.File.Open(...);
System.Random.Random(); //uses System.Environment.TickCount
System.Threading.SynchronizationContext.Current;
System.Threading.Thread.CurrentThread;
System.Threading.Tasks.Task.Factory;
System.Transactions.Transaction.Current;
System.Web.HttpContext.Current;
System.Windows.Application.Current;
System.Windows.DependencyProperty.Register(...);
System.Windows.EventManager.*;
System.Windows.Forms.Application.*;
System.Windows.Forms.MessageBox.Show(...);

Of course, .NET is not an exception. All the main platforms we use today have quite a few examples like those. Do you know any singleton-free programming platform? The only one I've heard of is Newspeak. Why aren't there more languages like that? Why do we use programming platforms so packed with vicious singletons and APIs that access global state? I don't have a complete answer to these questions, but one thing is certain. There must be a good reason for them to be there. The cost of removing all singletons probably doesn't pay off. In fact, I think it's close to being impossible.

Let's consider the computer's clock. In .NET, we can read the current time by reading the DateTime.Now property. Although it's not a classic singleton, it's completely equivalent to one. It could be like that: Clock.Instance.GetCurrentDateTime(). The real problem here is that we are accessing global state. Every time we read the value, it changes. Now, is that such a bad thing? Of course not. A clock is supposed to move without being told so. A stopped clock is useless. Would you blame the framework designers for coding a singleton in such a special case? I wouldn't. There's usually only one physical clock available in a machine anyway.

What about a logical clock? I needed one once. The application I came to work with had a rule engine. Many of the rules were dependent on the current date and time. For example, the customer would be eligible for a fee waiver only if his/her last payment were due for less than a couple of days. No surprise, the team found out that rules like that were the hardest to test. The only way to fake the system time is by changing the computer's clock. However, it affected the application in other unforeseen ways, and that was not a viable solution. Not being able to test all the rules, the original team did the best they could with the time that was given to them. They deployed the application without testing everything.

Nonetheless, the situation can always get worse. Another problem appeared. We realized that many users were working in a different time zone and that the application was using the machine's local time. It was supposed to use US Central Standard Time. Compensating the local time zone wouldn't fix it because some users had an incorrect time zone set. And the users could always fiddle with the local clock. In the end, we simply couldn't trust the system's time.

The solution? I created a logical clock that gets synced with a trusted server every now and then, completely independent of the local time. I just provide the rule engine with a proper clock and the time zone bug was fixed. Since there's only one synced clock per rule engine (and one rule engine per application), the clock is a singleton. A good one though. Good enough to have one additional benefit. We could now test the time-dependent rules easily. All we need to do is to mock the clock.

So, what's my take on singletons? They are a necessary evil that has to be understood and handled with care. If you really need to create one, make sure it's a good one.

Thursday 9 September 2010

Tweaked Events at CodeProject

Two days ago, I published my first article at CodeProject: Tweaked Events (ex-"Custom Events"). Go check it out! :)

Wednesday 19 May 2010

On Design Patterns and Dependency Injection

Since I heard of design patterns for the first time, there was one aspect of it that puzzled me. They seemed to be universally considered a good thing by itself. There seems to be a common belief that the lack of patterns in a program is a bad sign. Or, conversely, the ubiquitous presence of patterns in a software code is a good sign.

I never agreed to anything like that. Ideas like that go against my instincts. More specifically, the instinct to avoid recurrent code patterns (in the dictionary meaning). They are bad for maintenance, to mention one thing. Well, the so called design patterns are exactly that, recurrent code patterns. So, why are they are so popular? The reason is that their basic idea is mostly misinterpreted.

A while back, I read a very enlightening text written by Mark Dominus entitled Design patterns of 1972. He starts by saying:

"Patterns" that are used recurringly in one language may be invisible or trivial in a different language.

And concludes affirming:

Patterns are signs of weakness in programming languages.

I agree with him. And so did most of the people that read and commented about the article. Ralph Johnson (one of the authors of the ”Design Patterns” book) wrote a reply. And Mark wrote a reply to Ralph’s reply. I think there are some ego sparks here and there, but they generally agree with each other. If you are surprised by their agreement, you definitely should read their texts.

Creational Patterns and Dependency Injection

Despite their value, I will not simply quote other people’s opinions. I have a point to make, too. I’ve been reviewing the whole subject recently, and an idea came to my mind. It’s not very innovative, but it’s worth phrasing anyway.

All creational patterns (namely Abstract Factory, Builder, Factory Method, Prototype, and Singleton) are no more than specific applications of dependency injection.

I’ll try to build my case around the singleton pattern. It’s possibly the simplest pattern, and the easiest one to understand. To begin with, I do not want to advocate it as bad pattern. For many simple situations, it fits the problem perfectly. I see it as an extreme simplified solution to a possibly more complex problem. It has drawbacks, though. Some cases require something similar, but just a little bit more sophisticated. I’ll go over some possible modifications to the singleton pattern through the rest of this article.

Before moving on, a comment. I’ll use some C# for my examples below for a single reason, it’s the language I know the most. I’m sure the general idea is valid for any language.

- Global State and Object Life Cycle

Everyone knows that global state is evil. The strongest argument against singletons is that they bring create global to an application. Depending on global state has consequences.

Suppose you have a singleton being used a multi-threaded application. A reasonable assumption is that the singleton object has to be thread-safe, i.e., it has to be able to be accessed by different threads without misbehaving. That’s not completely necessary.

One alternative is having a not-so-singleton object that has one instance per thread instead of one instance per process. This can be easily achieved with thread-local storage, which is supported by most programming languages. Even though these thread-local singletons are not valid to every problem, they are the easiest and safest way to have a thread-safe singleton.

We can go deeper on the idea of multi-instance-singletons (by the way, the term “singleton” might not be appropriate at this point, but that’s just a matter of naming). In many cases, what you really need is one instance per “context” or “call”, or whatever. A nice .NET example is on the System.Web.HttpContext class. It has a static property called Current that return an HttpContext. This might not be commonly seen as an use of the singleton pattern, but it’s a very close variation. The way I see it, it’s basically a singleton that has one instance per http request.

Controlling the life cycle of “singleton” instances in these complex situations can be too complicated, and too specific to still be considered a pattern. That’s not actually a problem. I’ll get back to the subject later.

- Abstraction and Decoupling

Now, I want to expose a different facet of the pattern under discussion. Usually (at least in my code), the exposed type of the singleton instance is more abstract than its actual concrete type. Something like the C# code below.

public static readonly IAbstractThing Instance = new ConcreteThing();

You may reasonably argue that this is not  a good usage of OO. If IAbstractThing were destined to have only one concrete implementation, then you wouldn’t even need the interface to start with. But, in real world applications, that’s commonly not the case.

Exposing an abstract type opens some possibilities. You could, initially, delegate the responsibility of creating the instance to another component. By doing that, you achieve more than one advantage. You are now close to being able to create mock objects for unit testing. A secondary (or should I say primary?) advantage is better decoupling. The component that exposes the instance does not depend on any concrete implementations anymore, i.e., there will be one less dependency around

A group of components that work in this way (one component exposes an abstract object, and another creates the object) has dependency injection in it, even if no DI framework was used.

- Dependency Injection

Of course, the more sophisticated you get, the farther you get from the original pattern. All the intermediary variations are valid solutions for real problems. At each point, the decision of whether improve the code or not is a matter of balancing the costs of development (including test, support, etc.) with the actual advantages it will bring to the application.

The usual shortcut through such evolutionary process is a framework. And there is a lot of stable and dependable DI frameworks free for anyone to use. They are capable of providing a much more powerful solutions than any of the creational design patterns can, and at a lower cost. As I mentioned earlier, life cycle management might get very complex. Not with a good DI framework. Most of them support a broad range of extendable life cycle styles. By the way, my current favorite is Ninject. If you are a .NET programmer, I recommend that you give it a try.

Final Remarks

If you are building a non-trivial application and find yourself using a big mix of abstract factories, builders, prototypes, or whatever, be sure of one thing. You are in need for a good DI framework. Any language that has built-in DI, or has a built-in DI library, or enables such a library to be build, do not need to depend on “creational patterns”.

A similar statement can be made to other types of design patterns. Consider the strategy pattern, for example. It’s trivially implemented using first-class functions (C#’s  anonymous methods). The observer pattern can be effortlessly implemented with C# events. The iterator pattern can be graciously implemented with the IEnumerable interface and some help from the foreach keyword. Using the yield return keyword, it’s possible to elegantly represent complex object interactions with a only a few lines of code.

Please, don’t get me wrong. Design patterns are good in many ways. For one thing, they provide a common vocabulary to programmers going through similar problems. What I’m saying is that they are generally misunderstood and overrated. Their ultimate purpose is to become invisible. Do not think design patterns are timeless. They aren’t.

Monday 15 March 2010

Lock-free Synchronization in C# 4 Events

If you're keeping yourself up-to-date with .NET Framework 4.0 and C# 4, you might have read "Events get a little overhaul in C# 4" series on Chris Burrows' blog. He's explaining some changes to the intrinsic implementation of events. They were introduced in C# 4, which is in RC now. One important detail is that there are breaking changes included. So, you'd better get familiarized with that. If you haven't read his latest posts, please, do it now.

(intermission)

Welcome back. If you are fond of details like me, you might've gotten curious about that "lock-free synchronization code" used to add (and remove) event handlers in C# 4. That's exactly what I intend to show here.

The first step is simple. With a little help from Reflector, I disassembled the code generated for an simple event. After some variable renaming, here's what I've got:

private EventHandler dlgMyEvent;

public event EventHandler MyEvent
{
    add
    {
        EventHandler valueBeforeCombine;
        EventHandler valueBeforeAttribution = this.dlgMyEvent;
        do
        {
            valueBeforeCombine = valueBeforeAttribution;
            EventHandler newValue = (EventHandler) Delegate.Combine(valueBeforeCombine, value);
            valueBeforeAttribution = Interlocked.CompareExchange<EventHandler>(ref this.dlgMyEvent, newValue, valueBeforeCombine);
        }
        while (valueBeforeAttribution != valueBeforeCombine);
    }
    remove
    {
        // code ommitted (it's too similar to 'add')
    }
}

This code isn't obvious to understand (at least for me). I took quite some time to figure out what was happening. Of course, variables like handler2 and handler3 don't help much. I spared you from the trouble and renamed them.

Looking carefully, the secret ingredient is the Interlocked.CompareExchange. Since most people are not familiar with this function (I wasn't), let me explain it. An equivalent implementation would be like the code below. There's a very important difference, though. The real one runs as an atomic operation.
public static T CompareExchange<T>(ref T location, T value, T comparand)
{
    var previousValue = location;
    if (location == comparand) location = value;
    return previousValue;
}

Now, we have everything needed to comprehend the idea behind of the new event implementation. To make it crystal clear, let me expose it in human language. It goes like that:
  1. Copy this.dlgMyEvent into valueBeforeCombine;
  2. Create a new delegate called newValue by combining valueBeforeCombine with the supplied value;
  3. Atomically, verify that dlgMyEvent is equal to valueBeforeCombine, and overwrite dlgMyEvent with newValue;
  4. If the overwrite didn't happen (i.e., dlgMyEvent changed sometime between step 1 and 3), go back to step 1 and try everything again;
When I finally visualized this pattern, it felt like an epiphany. As a computer scientist, I'm ashamed of not knowing it before. Now, I know it's called compare-and-swap. Obviously, it can be used a other scenarios, but all this is very low-level coding. So, kids, don't try this at home. :)

I've a final comment. I decided to imitate MS and implemented a similar synchronization mechanism in my Custom Events. Here's the latest source download link.

Tuesday 12 January 2010

My Extension Method Colllection

In addition to the small components I've shared in this blog, there's a lot of short functions that I've collected over time. Most of them are not interesting enough for a post of their own. Actually, I never seem to find a proper place to store this stuff. Until recently.

I realized that most of these functions were already coded as extension methods. And the ones that weren't extension methods, could be refactored to be. So, I started to put all of them in a single assembly called JpLabs.Extensions.

Some of this methods were extracted from sources like this question at SO, but the majority were written by me (or a co-worker) for direct use in other projects.

The full source is available to download here. Feel free to use it as you see fit. All feedback is welcome.

Thursday 7 January 2010

Iterators and Argument Validation

Iterators were introduced in C# 2.0, so they are not new. Before LINQ was born, I think most C# programmers had never seen an yield operator. I'm not sure when I saw it for the first time, but I certainly wasn't an early user. I still remember the puzzlement I felt trying to understand how this stuff works.

If you ever opened the Enumerable class in Reflector you might have noticed that it has several nested private compiler-generated classes. This nested classes are generated whenever a class have iterator functions.

The point I want to make in this post is about argument validation. I've seen in more than once people making a simplification that, IMHO, should be considered a defect in production code. As an example, consider the code below (which I extracted from an answer in stackoverflow).

public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first, IEnumerable<TSecond> second, Func<TFirst, TSecond, TResult> selector
) {
if (first == null) throw new ArgumentNullException("first");
if (second == null) throw new ArgumentNullException("second");
if (selector == null) throw new ArgumentNullException("selector");

using (var enum1 = first.GetEnumerator())
using (var enum2 = second.GetEnumerator())
while (enum1.MoveNext() && enum2.MoveNext())
yield return selector(enum1.Current, enum2.Current);
}

It's quite simple and it might be hard to find any issues even after reading it carefully. It will even pass through positive testing. The problem only appears when you try to send a null value to any of the three parameters. As you may know, the code of an iterator function is not executed right after the call. It's executed only when the IEnumerable is enumerated. This behavior is called delayed execution. Because of that, when a null argument is passed, the ArgumentNullException won't be thrown until the enumeration of the returned value. This is highly undesirable.

Fortunately, there's a very simple way to avoid the problem. As mentioned in MSDN community content in this page, you should break the iterator function in two functions. Like this:
public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first, IEnumerable<TSecond> second, Func<TFirst, TSecond, TResult> selector
) {
if (first == null) throw new ArgumentNullException("first");
if (second == null) throw new ArgumentNullException("second");
if (selector == null) throw new ArgumentNullException("selector");

return Zip(first, second, selector);
}

private static IEnumerable<TResult> ZipIterator<TFirst, TSecond, TResult>(
IEnumerable<TFirst> first, IEnumerable<TSecond> second, Func<TFirst, TSecond, TResult> selector
) {
using (var enum1 = first.GetEnumerator())
using (var enum2 = second.GetEnumerator())
while (enum1.MoveNext() && enum2.MoveNext())
yield return selector(enum1.Current, enum2.Current);
}

I highlighted the key points. Notice that only the second function is an iterator, and it's private. The only place where it should be called is in the first function.

By the way, Zip is an extension method added in .NET 4.0. If you want to read more about Zip and the elusive behavior of exceptions in iterator functions, I recommend this two posts:

- Bart's LINQ's new Zip operator at http://community.bartdesmet.net/blogs/bart/archive/2008/11/03/c-4-0-feature-focus-part-3-intermezzo-linq-s-new-zip-operator.aspx;

- Eric's Zip Me Up at http://blogs.msdn.com/ericlippert/archive/2009/05/07/zip-me-up.aspx;

- Eric's High maintenance at http://blogs.msdn.com/ericlippert/archive/2008/09/08/high-maintenance.aspx.