lessCode.net
Links
AFFILIATES

Entries by lessCode (32)

Wednesday
Nov242010

Extending the Gallio Automation Platform

I’ve been doing a lot recently with Gallio, and I have to say I like it. Gallio is an open source project that bills itself as an automation framework, but its most compelling use by far is as a unit testing platform. It supports every unit test framework that I’ve ever heard of (including the test framework that spawned it, MbUnit, which is now another extension to Gallio), and has great tooling hooks for the flexibility to run tests from many different environments.

Gallio came onto my radar when my company had to choose whether to enhance or replace an in-house testing platform which was written back in the days when the unit testing tools available were not all that sophisticated. The application had evolved to become quite feature-rich, but the user interfaces (a Windows Forms application and a console variant), needed a makeover, and there were two schools of thought: invest resources in further enhancing and maintaining an internal tool, or revisit the current external options available to us.

We use NHibernate as a persistence layer, and have been able to benefit greatly from taking advantage of the improvements made to that open-source product, so I was keen to see if we could lean on the wider community again for our new unit testing framework.

I’d had a fair amount of experience with MSTest in a former life, but quickly dismissed it as an option when comparing it against the feature sets available in the latest versions of NUnit and MbUnit. Our in-house framework was clearly influenced by an early version of NUnit, and made significant use of parameterized tests, many of which were quite complex. It seemed that data-driven testing in MSTest hadn’t really progressed beyond the ADO.NET DataSource, whereas frameworks like MbUnit have started to provide quite powerful, generative capabilities for parameterizing test fixtures and test cases.

So one option on the table was to port (standardize, in reality) all of our existing tests to NUnit or MbUnit tests. This would allow us to run the tests in a number of ways, instead of just via the two tools we had built around our own framework. With my developer hat on, I really like the ability to run one or more tests directly from Visual Studio in ReSharper, and for the buildmaster in me, running and tracking those same tests from a continuous integration server like TeamCity is also important. We had neither of these capabilities with our existing platform.

Another factor in our deliberations became the feature set of the UI for our in-house tool. We had some feature requirements here that we would have to take with us going forward, so if we weren’t going to continue to maintain ours, we needed to find a replacement test runner UI that could be extended.

Enter Icarus, which itself is another extension to Gallio that provides a great Windows Forms UI that, all by itself, does everything you might need a standard test runner application to do, but that’s only the beginning. Adding your own functionality to the UI is actually quite easy to achieve. I was able to add an entire test pane with just a handful of source files and few dozen lines of code, and we were up and running with a couple of core features we needed from our new test runner (and this was an ElementHost-ed WPF panel, at that).

And that’s not the end of the story. Even if we wanted to use Gallio/Icarus going forward, we were still faced with the prospect of porting all of our existing unit tests to one of the many supported by Gallio (with NUnit and MbUnit being the two favourites). We really didn’t want to do this, and would probably have lived with a bifurcated testing architecture where the existing tests would have stayed with our internal framework and any new tests would be built for NUnit or MbUnit. This would have been less that ideal, but it probably would still have been worthwhile in order to avoid maintaining our own tools while watching the third party tools advance without us.

As it turns out, we didn’t need to make that choice, because adding a whole new test runner framework to Gallio is as easy as extending the Icarus UI. By shamelessly cribbing from the existing Gallio adapters for NUnit and MbUnit, we were able to reuse significant parts of our in-house framework, build a new custom adapter around those, and run all of our existing unit tests alongside new NUnit tests in both the Icarus UI and the Gallio Echo command-line test runners. As an added bonus, since Gallio is also supported by ReSharper, we were now able to run our old tests directly from within Visual Studio, for free, something we had not been able to do with our platform. It took about two days to complete all of the custom adapter work.

I’m quite optimistic that we’ll be able to really enhance our unit testing practices by leveraging Gallio, and without the effort it would take to maintain a lot of complex internal code. The extensibility of Gallio and Icarus is really quite phenomenal – kudos to all those responsible.

Saturday
May082010

Did the Entity Framework team do this by design?

I’ve been playing around quite a bit recently with Entity Framework, in the context of a WCF RIA Services Silverlight application, and just today stumbled upon a quite elegant solution to a performance issue I had feared would not be easy to solve without writing a bunch of code.

The application in question is based around a SQL Server database which contains a large number of high-resolution images stored in binary form, which are associated with other entities via a foreign key relationship, like this: image

Pretty straightforward. Each Thing can have zero or many Image entities associated with it. Now, lets say we want to present a list of Thing entities to the user. With a standard WCF RIA Services domain service, we might implement the query method like this:

public IQueryable<Thing> GetThings() {
return ObjectContext.Things.Include("Images").OrderBy(t => t.Title);
}

 

Unfortunately, this query will perform quite poorly if there are many Things referencing many large Images, because all the Images for all the Things will cross the wire down to the client. When I try this for a database containing a single Thing with four low-resolution Images, Fiddler says the following about the query:

 image

If we had a large number of Thing entities, and the user never navigates to those entities to view their images, we’d be transferring a lot of images in order to simply discard them, unviewed. If we leave out the Include(“Images”) extension from the query, we won’t transfer the image data, but also the client will not be aware that there are in fact any Images associated with the Things, and we’d have to make subsequent queries back to the service to retrieve the image data separately.

What we’d like to be able to do is include a collection of the image Ids in the query results that go to the client, but leave out the actual image bytes. Then, we can write a simple HttpHandler that’s capable of pulling a single image out of the database and serving it up as an image resource. At the same time we can also instruct the browser to cache these image resources, which will even further reduce our bandwidth consumption. Here’s what that handler might look like:

public class ImageHandler : IHttpHandler {
    #region IHttpHandler Members

    public bool IsReusable {
        get { return true; }
    }

    public void ProcessRequest(HttpContext context) {
        Int32 id;

        if (context.Request.QueryString["id"] != null) {
            id = Convert.ToInt32(context.Request.QueryString["id"]);
        }
        else {
            throw new ArgumentException("No id specified");
        }

        using (Bitmap bmp = ConvertToBitmap(GetImageBytes(id))) {
            context.Response.Cache.SetValidUntilExpires(true);
            context.Response.Cache.SetExpires(DateTime.Now.AddMonths(1));
            context.Response.Cache.SetCacheability(HttpCacheability.Public);
            bmp.Save(context.Response.OutputStream, ImageFormat.Jpeg);
            bmp.Dispose();
        }
    }

    #endregion

    private Bitmap ConvertToBitmap(byte[] bmp) {
        if (bmp != null) {
            TypeConverter tc = TypeDescriptor.GetConverter(typeof(Bitmap));
            var b = (Bitmap) tc.ConvertFrom(bmp);
            return b;
        }
        return null;
    }

    private byte[] GetImageBytes(Int32 id) {
        var entities = new Entities();
        return entities.Images.Single(i => i.Id == id).Data;
    }
}
Note that the handler queries Entity Framework on the server side to load the image bytes from the database, given the image Id that comes from the Url’s query string.
 
So, back the real problem. How do we avoid sending the image bytes back to the client when RIA Services queries for the Things and requests that the images be Included?
 
One way to achieve this would be to remove the Data property from the Image entity in the entity model. This won’t, of course, affect the database, but now since there is no way to access the image bytes, an Image will consist only of an Id. However, this means that we’d have to change our handler’s GetImageBytes method to retrieve the image from the database with lower-level database calls, bypassing Entity Framework. 

It seems like there’s no clean way to achieve what we want, but in fact there is. If you look at the properties available to you on the Data property you can see that the accessibility of these properties can be changed:

 

image

By default entity properties are Public, and RIA Services will dutifully serialize Public properties for us. But if we change the accessibility to Internal, RIA Services chooses not to do so, which make sense. Since the property is Internal, it’s still visible to every class in the same assembly. Therefore, as long as our ImageHandler is part of the same project/application as the entity model and the domain service, it will still have access to the image bytes via Entity Framework, and the code above will work unmodified.

After making this small change in the property editor (and regenerating the domain service), on the client side we no longer see a byte[] as a member of the Image entity:

image

When I now run my single-entity-with-four-images example, Fiddler gives us a much better result:

image

Friday
Jan082010

The Big Bang Development Model

Last summer, on the first hot day we had (there weren’t that many hot days last year in New York), I turned on my air conditioner to find that although the outdoor compressor unit and the indoor air handler both appeared to be working (fans spinning), there was no cold air to be felt anywhere in the house. We bought the house about three years ago, and at that time the outdoor unit was relatively new (maybe five years old). In the non-summer seasons we've spent in the house, we've always made sure keep the compressor covered up so that rain, leaves and critters don't foul up the works, and I've even opened it up a couple of times to oil the fan and generally clean out whatever crud did accumulate in there, so I was surprised that the thing didn't last longer than eight years or so.

When the HVAC engineer came to check it out, he found that the local fuse that was installed inline with the compressor was not correctly rated (the original installer had chosen a 60A fuse; the compressor was rated at 40A), and that the compressor circuitry had burned out as a result. For the sake of the correct $5 part eight years ago, a new $4000 compressor was now required.

So this winter, when I turned up the thermostat on the first cold day and there was no heat, I readied the checkbook again. This furnace was the original equipment installed when the house was built in the ‘60s, so I felt sure that I’d need a replacement furnace. I was pleasantly surprised when this time the HVAC guy told me that a simple inexpensive part needed to be replaced (the flame sensor that shuts off the gas if the pilot light goes out). This was a great, modular design for a device that ensured that a full refit or replacement wasn’t required when a single component failed.

It occurred to me that I'd seen analogs of these two stories play out on software projects I've been involved with over the years. A single expedient choice (or a confluence of several such choices), each seemingly innocuous at the time, can turn into monstrous, expensive maintenance nightmares. Ward Cunningham originally equated this effect with that of (technical) debt [http://en.wikipedia.org/wiki/Technical_debt].

What makes the situation worse for software projects over the air conditioner analogy is that continuous change over the life of a software project offers more and more opportunity for such bad choices, and each change becomes more and more expensive as the system becomes more brittle, until all change becomes prohibitively expensive and the system is mothballed, its replacement is commissioned and a whole new expensive development project is begun. I think of this as a kind of Big Bang Development Model, and in my experience this has been the standard model for the finance industry.

For an in-house development shop, an argument can be made that this might not be so bad, although I wouldn't be one to make it – it’s success is highly dependent on your ability to retain the staff that “know where the bodies are buried”, which in turn is directly proportional to remuneration. If you're a vendor, Big Bang should not be an option - you need to hope that there's no other vendor waiting in the wings when your Big Bang happens.

Of course, today we try to mitigate the impact of bad choices with a combination of unit testing, iterative refactoring and abstraction, but all of this requires management vision, discipline, good governance and three or four of the right people (as opposed to several dozens or hundreds of the wrong ones). Those modern software engineering tools are also co-dependent: effective unit testing requires appropriate abstraction; fearless refactoring requires broad, automated testing; sensible abstractions can usually be refactored more easily than inappropriate ones when the need arises to improve them.

I'm not going to "refactor" my new $4000 compressor, but you can bet that I am going to use a $10 cover and a $5 can of oil.

Friday
Dec042009

Windows Mobile got me fired (or at least it could have done)

After resisting getting a so-called “smart”-phone for the longest time, making do with a pay-as-you-go Virgin Mobile phone (I wasn’t a heavy user), I recently stumped up for the T-Mobile HTC Touch Pro2. I was going to be traveling in Europe, and I also wanted to explore writing an app or two for the device (if I can ever find the time), so I convinced myself that the expense was justified. After three months or so, I think I should have gone for the iPhone 3GS.

Coincidentally, the week I got the HTC, my old alarm clock (a cheap Samsonite travel alarm) bit the dust, and so I started using the phone as a replacement (it was on the nightstand anyway, and the screen slides out and tilts up nicely as a display). I also picked up a wall charger so that it could charge overnight and I wouldn’t risk it running out of juice and not waking me up.

Over the last few weeks, however, the timekeeping on the device has started to become erratic. First it would lose time over the course of a day, to the tune of about 15 minutes or so. Then I’d notice that when I’d set it down and plug it in for the night, the time would just suddenly jump back by several hours, and I’d have to reset it. I guess this must have happened four or five times over the course of a couple of months, but it always woke me up on time.

Today I woke up at 8.30am to bright sunshine, my phone proclaiming it to be 4.15am. My 6.00am alarm clearly useless; and I’m glad the 8am call I had scheduled wasn’t mandatory on my part.

A pretty gnarly bug for a business-oriented smart-phone.

Wednesday
Oct072009

Discount/Zero Curve Construction in F# – Part 4 (Core Math)

All that’s left to cover in this simplistic tour of discount curve construction is to fill in the implementations of the core mathematics underpinning the bootstrapping process.

computeDf simply calculates the next discount factor based on a previously computed one:

let computeDf fromDf toQuote =
    let dpDate, dpFactor = fromDf
    let qDate, qValue = toQuote
    (qDate, dpFactor * (1.0 / 
                        (1.0 + qValue * dayCountFraction { startDate = dpDate; 
                                                           endDate = qDate })))
 
Where for dayCountFraction we’ll assume an Actual/360 day count convention, but this could be generalized to pass the day counting method as a function parameter to computeDf:
 
  let dayCountFraction period = double (period.endDate - period.startDate).Days / 360.0


findDf looks up a discount factor on the curve, for a given date, interpolating if necessary. Again, here tail recursion and pattern matching make this relatively clean:

  let rec findDf interpolate sampleDate =
    function
      // exact match
      (dpDate:Date, dpFactor:double) :: tail 
        when dpDate = sampleDate
        -> dpFactor
            
      // falls between two points - interpolate    
    | (highDate:Date, highFactor:double) :: (lowDate:Date, lowFactor:double) :: tail 
        when lowDate < sampleDate && sampleDate < highDate
        -> interpolate sampleDate (highDate, highFactor) (lowDate, lowFactor)
      
      // recurse      
    | head :: tail
        -> findDf interpolate sampleDate tail
      
      // falls outside the curve
    | [] 
        -> failwith "Outside the bounds of the discount curve"

logarithmic does logarithmic interpolation for a date that falls between two points on the discount curve. This function is passed by value as the interpolate parameter to findDf above:

  let logarithmic (sampleDate:Date) highDp lowDp = 
    let (lowDate:Date), lowFactor = lowDp
    let (highDate:Date), highFactor = highDp
    lowFactor * ((highFactor / lowFactor) ** 
                 (double (sampleDate - lowDate).Days / double (highDate - lowDate).Days))

Newton’s Method is quite straightforward in F#:
 
let newton f df (guess:double) = guess - f guess / df guess

To recursively solve using Newton’s Method to a given accuracy:
 
  let rec solveNewton f df accuracy guess =
    let root = (newton f df guess)
    if abs(root - guess) < accuracy then root else solveNewton f df accuracy root

And all that remains are the functions that feed Newton; the price of the market swap and its first derivative – note that this is certainly not the most efficient way to do this, because to compute the derivative we recalculate the price in order to approximate the derivative with a finite-difference method:
 
  let deriv f x =
    let dx = (x + max (1e-6 * x) 1e-12)
    let fv = f x
    let dfv = f dx
    if (dx <= x) then
        (dfv - fv) / 1e-12
    else
        (dfv - fv) / (dx - x)

  let computeSwapDf dayCount spotDate swapQuote discountCurve swapSchedule (guessDf:double) =
    let qDate, qQuote = swapQuote
    let guessDiscountCurve = (qDate, guessDf) :: discountCurve 
    let spotDf = findDf logarithmic spotDate discountCurve
    let swapDf = findPeriodDf { startDate = spotDate; endDate = qDate } guessDiscountCurve
    let swapVal =
        let rec _computeSwapDf a spotDate qQuote guessDiscountCurve =
            function
              swapPeriod :: tail ->
                let couponDf = findPeriodDf { startDate = spotDate; endDate = swapPeriod.endDate } guessDiscountCurve
                _computeSwapDf (couponDf * (dayCount swapPeriod) * qQuote + a) spotDate qQuote guessDiscountCurve tail

            | [] -> a
        _computeSwapDf -1.0 spotDate qQuote guessDiscountCurve swapSchedule
    spotDf * (swapVal + swapDf)

And finally, the zero coupon rates can be found with something like the following direct mapping, on the basis that there are 365 days in a year:
 
  let zeroCouponRates = discs
                      |> Seq.map (fun (d, f) 
                                    -> (d, 100.0 * -log(f) * 365.0 / double (d - curveDate).Days))


All in all, I think that a functional language like F# provides a much simpler means to code these types of calculation over a typical C or C++ implementation (notwithstanding performance considerations, which I’ve yet to study in any depth, but I have a hunch that with the advent of cloud computing and massively multicore hardware, that for this kind of work it’s going to become less and less about “straight-line” speed in clock cycles or operations, and more and more about the simplicity with which we can reason about such code in order to find inherent functional parallelism).
I’m also wondering about the benefits of functional programming when it comes to pricing certain derivative trades with payoff formulae that might be maintained by an end-user-trader or quantitative analyst without requiring anything we’d currently regard as “software engineering”. This question touches on the use of functional languages as Domain-Specific Languages (DSLs), and may be the topic of a future series of posts as time permits…