Thursday 27 November 2008

Dynamic Lambda Expressions Using An Isolated AppDomain

Background

In August, I submitted a post about Lambda Expressions and Differentiation. There, I told shortly about a dynamic compiler for lambda expressions using CodeDom. I've enhanced the compiler so that it now deserves a post of its own.

Introduction

I'll start with the original problem that I needed to solve. I wanted to make a console application that would accept lambda expressions wrote by the user, transform it in another lambda expression and show it. After some research, let me list some of the stuff that can or can't be done with .NET:

  1. One can write a lambda expressions to a easily readable string (that is similar to C# code, but it's not compilable);
  2. One can dynamically create lambda expressions using static functions of the Expression class;
  3. One cannot create (at least not directly) lambda expressions from a string containing valid C# code;
  4. One can dynamically compile code using CodeDOM.

Due to feature #1, it's effortless to show the transformed lambda expressions to the user. Feature #2 is essential to the transformations that I intended to do, which was finding derivatives (please, read the original post if you're interested).

A major problem came along when I realized the fact #3. So, the source lambda expressions in the first versions of my little application were hard coded. After I finished the derivation part, I began searching for options to generate lambda expressions from user inputted strings.

One alternative was to make a parser (or find on the web) that put together an expression using feature #2. The other alternative is to let the framework do it by dynamically compiling the code (feature #4). Since the second is promptly available and is much more reliable than any parser I could code, that's what I chose to use.

Dynamic Compilation

Before I continue, let me state that I'll not get into the details about using CodeDOM. Here is a much more detailed explanation. Also, although you can compile VB.NET too, I'll talk only about C# here, sorry.

Basically, it works like this. You need the full text of one or more source code files (including usings, namespaces, classes, and so forth). Then, through a CodeDomProvider, you can compile your code to an Assembly. The assembly can exist only in memory or it could be generated directly to a file. You can access and execute what is in the assembly by reflection.

For my particular case, where I wanted to generate lambda expressions from a string, CodeDOM is too generic. So, I made a wrapper around it that compiles a lambda expression code string by filling the template below. Then, using reflection I executed DynamicFunc and forwarded the returned LambdaExpression.
using $Using0$;
using $Using1$;
//...
using $UsingN$;
namespace JpLabs.DynamicCode.DynamicAssembly
{
public static class DynamicClass
{
public static LambdaExpression DynamicFunc() {
return (Expression<$DelegateType$>)( $LambdaExpresion$ );
}
}
}


This solution solved my problem, but it has some issues. First, it's slow. Actually, it is not nearly as fast as it could be. One could say that I'm using an elephant gun to kill a fox. It's possible to modify it to compile several expressions at once, but that was not the case. I needed to compiled expressions one by one as the user inputs it.

The bigger issue is that, each time an expression is compiled, an assembly is generated. Unlike object instances, the memory of loaded types and assemblies can't be release. In fact, there's one way to do that, it's by unloading the entire application domain.

Cross AppDomain Dynamic Compilation

One application might use multiple domains for several reasons. But I believe nearly that all of them have to deal with one typical problem. Code in one domain cannot access data in other domain. So, they have to communicate in alternative ways. The two ways that are most straightforward are serialization and remoting.

The data I wanted to cross between domains are lambda expresssions. Unfortunately, a LambdaExpression is not Serializable. And remoting is not an option too. I need to be able to unload the compiler domain, and by doing this, all remoting objects would be released and the proxies would cease to work.

I thought I had hit a impassable wall. But I found this article with a solution. It also uses multiple domains to dynamically compile code. If you read to this point, you will probably like to take a look there too.

The idea somewhat crazy. .NET reflection allows you to access the IL of a method. One can pack it into a Serializable class and send it to another domain. There, the method would be reconstructed using the IL. This can be accomplished with the semi-magic DynamicMethod class. To let you understand its wonderful features, I'll quote MSDN remarks about it:
You can use the DynamicMethod class to generate and execute a method at run time, without having to generate a dynamic assembly and a dynamic type to contain the method. The executable code created by the just-in-time (JIT) compiler is reclaimed when the DynamicMethod object is reclaimed. Dynamic methods are the most efficient way to generate and execute small amounts of code.

The input for a DynamicMethod is exactly what we have: the IL that came from the compiled code on the other domain. The output is a delegate to a newly created (and releasable) function. Since the function returns our so desired lambda expression, we just need to call it.

Now, let me summarize the steps involved in this lambda expression compiler:
  1. Create an AppDomain;
  2. Instantiate a compiler (actually, a wrapper around CodeDOM) on the new AppDomain;
  3. Compile the input string into an in-memory assembly using the template I've shown above;
  4. Unload the compiler domain (or keep it for reuse and unload it later);
  5. Extract the IL of the DynamicFunc method (please, refer to the template), and send it to the original domain;
  6. Employ a DynamicMethod to get a delegate from the compiled IL;
  7. Call the delegate and it shall return a LambdaExpression.

Conclusion

Here's an reliable way to dynamically compile code without requiring always more memory. To be honest, I did some improvements that I didn't mention above. To mention, I tried to make the compiler generic enough so that it could compile other stuff than lambda expression. I think you can simply extend the compiler to achieve that. Please, tell me if anyone of you get to the point of doing that. I'll even let you in a short future-possible-enhancements-list:
  • Extend compiler to compile other stuff than lambda expressions;
  • Improve the compiler so that it can compile a batch of code strips.

And, finally, the source code download link: here. Of course, this source includes a test project for the compiler. And, of course, the test project is my Differentiator.

Thanks!

Thursday 11 September 2008

LINQ Source Code (or LINQ for .NET 2.0)

Background

Recently, I had to make some enhancements in a large stabilised asp.net solution. As usual, I began testing some code in a new project as a small proof of concept. After I was done, I realized that the old code was made using VS 2005 (i.e., over .NET Framework 2.0) and my test code used a lot of LINQ (i.e., over .NET Framework 3.5). Since upgrading the solution was not allowed, my only option was to remove Linq from my code. Or maybe not.

Using VS 2008, it's possible to use almost all the features of C# 3.0 even in a project targetting the 2.0 framework. The exception is Linq, because it inherently references extension functions that are only present in the 3.5 framework. Since I needed only Where() and Select(), I simply coded them and it worked just fine. I saw it was possible to write code for all the Linq to Objects extension methods. Better yet, I could aquire it from a more trustful source.

Linq to Objects Source Code

It's publicly known that a good part of the .NET Framework source code is available for anyone to download. Unfortunately, the code for Linq to Objects was not released yet. The alternative was to disassemble the assembly using .NET Reflector (I hope MS will not sue me for this). Anyway, as the code was not genuinely authored by me, I took special care of not using my namespace (JpLabs) in it.

One thing that Reflector don't do is to simplify functions that use yield return. Instead, it shows what is actually compiled, i.e., one compiler-generated iterator class for each function. These classes use state machines to control the iteretion. If you want to see how such a class looks like, try using Reflector. If you're not familiarised to yield return, you might want to check this arcticle. I decompiled all the iterator functions in Linq to Objects manually.

To make this post short, I would like to say that, in addition to the existent Linq extension functions, I've included some of my own, like ForEach(Action action).

Download

Download the full source here.

[Update @2008.09.25]
I've just found out that there is a project with the exactly the same purpose as the one in this post. And the authors are much more reliable than me. Check out the LINQBridge from the Albahari brothers. As mine, their code is completely open and free for any use.

Also, I would like to suggest the excellent free book on Threading in C# on their web page. It's definitely the most complete and most readable source on this subject that I know of.

Thursday 14 August 2008

Lambda Expressions and Differentiation (continued)

Today, I made a presentation about the first article that I've posted on this blog. Since then I did some improvements on the code.

The one extra feature that I implemented was a parser for lambda expressions. One of the first things that I did when I started fiddling with lambda expression was trying to generate one from a string. To my disappointment, there's not such a feature included in the .NET Framework. So, I coded a humble one using CodeDom.

I'm sharing the last source code that I have. You're free to play with it with one condition. You have to share enhancements with me. :P

Also, I would like to let a challenge here. If you want to learn Lambda Expressions and don't have an idea to implement, try making code to simplify a polynomial function.

Finally, as always, please, let your comment here.

Friday 8 August 2008

WebSphere MQ API (or MQAL)

WebSphere MQ (or simply "MQ") is a messaging system made by IBM. If you've never used any message-oriented middleware, it might be a good idea to take a look at wikipedia first. Anyway, the basic problems that I had to solve in the component I'm posting are not specific to MOM systems. Thus, I hope you can extract some value from it in any case.

So, let me introduce you to the problem. About one year ago, I had to develop an application that extensively uses MQ to exchange messages with other applications. This technology was kind of new to me and I had to learn the basics. To oversimplify, the idea is to connect to a queue manager, then connect to a queue, and finally put (or get) a message to (or from) the queue. The way that this process is made using the MQ API for .NET seemed very awkward to me. Then, I decided to write a component to encapsulate it and provide a simpler interface.

Fisrt, I've tried to find a design pattern to fit the component in and I choosed to follow the way that database connections are handled in the .NET framework. Then, I've started to make some analogies. To start, a queue manager connection could work pretty much like a database connection. Both of them have parameters like server, port, user name, password, and database name / queue manager name. To summarize, here are some of the analogies I've made:
- A queue manager is like a database;
- A queue manager connection is like a database connection;
- A queue manager connection parameters is like a database connection string;
- A queue is like a table (or a view);
- A queue connection is like an database command;
- A message is like a table row;
- A put message action is like an execute insert command;
- A get message action is like an execute select and delete command;
- A browse message action is like an execute select command.

Ok, you must have gotten the general idea. I won't get in much detail here, but let me list some of the problems I had to solve in order to implement this:
- Translating a queue connection string into queue manager connection parameters. I solved this by coding a connection string parser using regular expressions;
- Implement queue transactions. This wasn't a requisite at the time, but MQ provides transactions and I felt like I had to offer a way to use them;
- Inherently use a queue connection pool under the covers. The tests that I've made using the MQ API showed that it doesn't had a connection pool. I would had to either make the user of my component keep connections for longer (in order to avoid opening and closing them too frequently) or code a connection pool and let the user enjoy all its benefits. This was pretty interesting to do and probably deserves a post of itself. For now I'll let you with the source code.

By the way, I didn't touched this code for awhile now and I see that I've let some items on a TODO list, so I won't say it's a final version. Anyway, it's running in a production environment somewhere for months and it have not failed once.

Feel free to download the source code here.

Tuesday 3 June 2008

Microsoft Parallel Extensions CTP released

In the past years, evolution of computing is turning to the grounds of high level concurrecy. Processors are becoming faster by being multi-core, and not by having higher frequencies. The time is coming so that even simple programs will be concurrent too. Parallel programs are harder to code, harder to test and can cause pseudo-random bugs that can drive programmers crazy. The best solution for this, as to many problems, is a dependable library and/or language support.

There are several types of problems that require specific solutions. Making a responsive UI, for instance. How many application have you seen that simply stay frozen while it's processing something? I've made some of them myself. And the programmer cannot be blamed, most projects don't have the time (and money) to invest on a responsive UI. The WPF framework solves this in a very neat way. Its interface engine is fully concurrent. But that's a subject for other post.

Yesterday, Microsoft released another CTP of its Parallel Extensions (PFX) library. You may get the details at their blog or at MSDN. In case you're not familiar with it, I'll give a short brief here. Basically, the library's comprehended by three parts:

- Task Parallel Library
Provides some imperative ways to express parallelism. The simpler one is the Invoke method. It receives a set of actions in the form of delegates and executes them concurrently, returning after all the actions have completed. If you have a loop in which iterations may run in parallel, you can use the For or the ForEach methods. When the problem requires a little more elaboration, there's the Task system. It's a set of classes that works somewhat like what the ThreadPool class does, but with much greater flexibility and control.

- PLINQ
It's a implementation of LINQ that allows anyone to easily make declarative parallel queries. All it's needed is to transform the IEnumerable<t> into a IParallelEnumerable<t> using the AsParallel() extention method. It's not optimized yet, but it certainly can boost many applications with almost no code change.

- Coordination Data Structures
It has a group of synchronization classes that enhance those that currently exists in the framework. Also, there are some thread-safe collection classes. Coincidently, they've created a ConcurrentQueue class that serves the same purpose as the one I've posted!

A detail that you can't avoid noticing in this framework is the extensive use of delegates. LINQ itself is based on delegates being passed to other functions. This type of construct is called a higher-order function. Another construct that is present in PFX is the continuation. Also, there are recurrent remarks in its documentation about side-effects and why they have to be avoided. All these characteristics are usually associated with functional langagues. No surprise some of the samples are in F#.

Functional programming has lived almost only at the academic world, but its future looks different. It holds a lot of interesting and useful concepts. The disturbing part is that very few programmers really know these concepts. Do you know any skilled/experienced/old programmer that can't understand OO programming? Well, can you really understand functional programming? I don't. At least not yet.

Friday 25 April 2008

A C# BlockingQueue

Ok, things didn't go as I planned, so I'll stop promising and start delivering. This time I'll share with you a code I've used before.

The problem is quite simple and is not new: exchange messages between threads. One possible solution is message queuing, but it's too bulky to use inside a single process. The simplest way to make threads communicate is through shared variables, but it's obviously too simple.

So, what kind of feature I was looking for? To begin with, it needed to be fast and light. Secondly, it had to work in a fire-and-forget fashion, i.e., I wanted a thread to be able to send a message (or task, job, whatever) to another thread and go on. Obviously, in order to such a thing to function, it would have to use some sort of internal buffer. So, it looked like what I needed was a thread-safe FIFO queue.

All right, I could simply use a Queue<t> and wrap its calls around a lock statement. Or, even easier, use a non-generic Queue and transform if by calling the Queue.Synchronized() method. In both ways, we'd get a simply synchronized thread-safe queue.

Is that all? Of course not. Suppose we have a producer-consumer problem to solve. Usually, when the buffer is empty, the consumer is supposed to idly wait for an new item. How could we accomplish that using a synched queue? If we try to dequeue from an empty Queue, a exception is thrown. We could check the item count before dequeueing, but there would be two steps involved and it wouldn't be thread-safe even if the Queue instance is synched. Another possibility is catching the queue-is-empty exception, which is a very lousy trick. Anyway, we wouldn't be able to avoid writing synchronization code and this is not what I wanted.

So, considering that you are introduced to the problem, let me show the wish list that I gathered in the form of an interface:


public interface IBlockingQueue
{
/// <summary>Max number of item in the queue (-1 means there's no limit)</summary>
int MaxQueueDepth { get; }

/// <summary>Returns false when there's something wrong, e.g., queue full</summary>
bool TryEnqueue(object item);

/// <summary>Returns false when there's no message to read</summary>
bool TryDequeue(out object item);

/// <summary>Returns false when a timeout or an abort happens</summary>
bool WaitEnqueue(object item, int millisecondsTimeout, FuncGetVolatileFlag ExternalAbortSignaled);

/// <summary>Returns false when a timeout or an abort happens</summary>
bool WaitDequeue(out object item, int millisecondsTimeout, FuncGetVolatileFlag ExternalAbortSignaled);
}

public delegate bool FuncGetVolatileFlag();

TryDequeue() and TryEnqueue are two simply synchronized methods to access the queue. The other two (WaitDequeue() and WaitEnqueue()) are much more interesting. The first one will try to dequeue and, if the queue is empty, will wait for a new item. The latter will try to enqueue and, if the queue is full, will wait until there's room for the new item.

But that's still not enough. A common problem in a multithreaded application is when you need to close it and there are some threads still running. There's several possible approaches on how to close them. The safer choice might be to wait for every thread to end, but it might take too long. One alternative is to terminate everyone, but it's very risky. I think the best pick is to ask somehow the threads to stop and wait until they do so. With my blocking queue, you can choose between any of the 3.

Let me explain how. As you can see, I've put two addicional parameters. Both have the same purpose: cancel the action and return the code execution to the client code. Whilst the fisrt is obviously a timeout, one might wonder what that other parameter is. It's a delegate of a type that accepts functions that like this:bool func(). Its name is ExternalAbortSignaled and its type name is FuncGetVolatileFlag, so I hope it's not hard to recognize how it's used.

In order to eliminate any doubts, I'll describe it. The idea is that, while waiting, the ExternalAbortSignaled delegate will be called in short intervals. If it returns true at any time, the waiting ends and no item is enqueued/dequeued. Also the function returns false. In that way, it's possible to provide the queue an anonymous method that merely returns the value of a boolean flag. Whenever the flag is set to true, the wait function will return promptly and the application as a whole will be able to respond faster.

As you may know, java has a built-in BlockingQueue already. Sadly, java has direct support for delegates. So, I think it would be hard to code a functionality like an abort flag in an elegant way.

Now, if you want to know what is the best way to implement the waiting synchronization code, I'll let the explanation to Joseph, which is much more eloquent than me. It might be a good idea to take a look at my source code too. The technique that I've chosen is through Wait and Pulse.

Some ideas for future enhancements that I've thought of are:
- TryPeek() and WaitPeek()
- Transactions

Source code download link at google code

[Updated at 2008-04-28]

Tuesday 19 February 2008

Hello, Again

I was on vacations until recently and I'm quite busy now, so it might take longer than I expected to write a new article. I guess it's a good thing to publish short posts once in a while.

Anyway, what I wanted to tell you is that a group that I'm part of is organizing an event here in Porto Alegre. I'll make a presentation about LINQ to Relational Data. Very much of it is 'inspired' in material from Mike Pizzo, but I'll show some stuff of my own.

Since I'm studing LINQ, I'm trying to prepare an interesting article about it. If you can't wait and want to get a deep dive into it, I can recomend a very exhaustive series of posts on LINQ to LDAP by Bart De Smet's.

See ya!

Monday 7 January 2008

Lambda Expressions and Higher-Order Functions

[This article has a continuation here]

First, I need to ask you not to be scare of the title I've chosen. Less than two months ago, I barely knew what the first name was and have never heard of the latter.

As you might know, the brand-new C# 3.0 has support to lambda expressions. You can imagine it as a less wordy way to describe anonymous delegates. A major difference between them is that a lambda expression doesn't necessarily needs to be compiled. In fact, it's just a representation of a function. In addition to being compiled, they can be serialized (if you want to pass it on), analyzed and even dynamically created at run-time. I'll be using the last two here.

Now, let me introduce the computational (as well as mathematical) concept of a higher-order funcion (aka operators in math). It's nothing more than a function that receives another function as a parameter and/or returns a function.

Why would a function receive another function as a parameter? If you can't imagine a good answer or are just not convinced there's one, I suggest that you read this very didactic article (this is also good). Anyway, you can implement such functions in c# 2.0 without problems. What I want to examine are functions that returns another functions. If you are not using c# 3.0 (or a functional language), you might find some difficulties. Basically, you would be limited to returning one of a set of pre-declared functions. You could try using closures or dynamic code generation to add some versatility, but it probably would be rather cumbersome.

You may be asking why would anyone need such higher-order function. I will let this question open and focus on how to implement it. I was searching for a good example and can't imagine a better one than the differentiation, i.e., the mathematical process of obtaining the derivative of a function.

So, I'll build a functional that receives a f(x) and return a f'(x). The declaration of this in c# would be like that:

static public Expression<Func<double,double>> Differentiate(this Expression<Func<double,double>> f)
{
//I will find the derivative of f with respect to the variable x (it's the 1st and sole parameter)
ParameterExpression x = f.Parameters[0];

//Diff is the function that actually does the differentiation
Expression derivative = Diff(f.Body, x);

//Build a new lambda expression with the derivative expression body
return Expression.Lambda<Func<double,double>>(derivative, f.Parameters);
}
*Func<double,double> is a delegate type for functions that receives a single double parameter and returns a double. Expression<Func<double,double>> is the type of a lambda expression that represents a Func<double,double> function.

As you can see I used a Diff function that actually calculates the derivative. How does it work? It needs to analyze the original expression and create a new expression that represents its derivative. Luckily enough, this calculation can be done recursively using a set a rules. Let's start with the sum rule: namely, derivatives of sums are equal to the sum of derivatives. Coding that in c# 3.0 is easier than you one might thought. Look:
static Expression Diff(Expression exp, ParameterExpression dx)
{
switch (exp.NodeType) {
case ExpressionType.Add: {
BinaryExpression f = (BinaryExpression)exp;
//Sum rule: [g(x) + h(x)]' == g'(x) + h'(x).
return Expression.Add( //Create an add expression
Diff(f.Left, dx), //g'(x) +
Diff(f.Right, dx) //h'(x)
);
}
/* other cases omitted ... */
}
}

In similar ways, we can implement the other recursive rules (such as the product rule, the quotient rule, the power rule, etc). However, any recursion need a stop condition. We'll have two stop conditions: for constants, which derivative is zero; and for the variable x, which derivative is one. The code looks like this:
case ExpressionType.Parameter: {
//be aware that f(x) = y is a constant with respect to x
return (exp == dx) ? Expression.Constant(1d) : Expression.Constant(0d);
}
case ExpressionType.Constant: {
return Expression.Constant(0d);
}

There's also the chain rule, that is quite trickier to implement. I did it with very little code, though. As you're not faint of heart, take a look at my full source.

And that's all for the day, folks. Please, comment.

Friday 4 January 2008

Hello World

Hi,

My name is João Paulo Bochi and I'm a senior programmer in Brazil. Welcome to my knowledge sharing blog.

I'll post mainly C# related stuff with tips and tricks that I learned. I plan to include topics about C#, LINQ, ASP.NET, AJAX, Concurrent Programming and some others I am (or might get) interested with.

Please, pay a visit soon.