Monday, June 26, 2006

Generating C# from XML: A Mixed Blessing

I was recently confronted with a considerable legacy application (I use "legacy" the way Michael Feathers does: a legacy application is any without tests) that at first seemed to have exploited .NET's lovely support for XML serialization: most of the classes from the domain model had the [Serializable] attribute. It turned out, however, that I could not simply instantiate an XmlSerializer for any of these classes, since they all had public properties of type System.Type (which already suggests that the developers who preceded me had some peculiar ideas about OO) or IDictionary. Now, of course, I could have changed the design only minimally in the following fashion:

[XmlSerializable]
public class LegacyClass {
  [NonSerialized]
  public Type ValueType { ... }

  public TypeEnum ValueTypeEnum {
    get { based on ValueType property }
    set { likewise }
  }
}

Similarly, events have to be marked thus:

  [field:NonSerialized]
  public event BlahHandler Blah;

The field qualifier causes the compiler to apply the attribute to the underlying multicast delegate field. An event is not serializable as such.

Still, after only half a day, I realized that my effort to leverage .NET's handy code generation for XML serialization of a class was adding more cruft to an already crufty code base, that any unit tests I tried to create as motivation for my new code were compromised by the low quality of the code I was adding to, that I would have to version these additions without knowing what effect they would have on the application as a whole...that I was driving blind, in other words. Our XML wizard advised me simply to write all my custom serialization code (via the ISerializable interface)...and at that point, I knew I was getting into pissing-match territory. I simply don't feel any need to prove myself by writing this code. I'm perfectly happy accepting the output from the VS2003 xsd.exe utility. The only obstacle was that I didn't have an XML schema to work from! All I had was the existing and very unorthodox custom serialization code that was already in place, which navigated a DOM, thus losing the performance advantages of custom serialization.

So I decided to use the memento pattern: treat XML as a persistence layer, generate a .cs file from a schema that I would have to create incrementally, based on what I could learn about the object graph and interpose a factory class that could populate the memento from the object graph or vice versa. Now all my serialization code is in one place, and testable as such, and I need never break the existing code. All I have to do is deserialize the XML, serialize, create a DOM, and determine what was lost during the round trip. Then I extend the XML schema, regenerate the .cs file for the memento, and repeat as necessary. I sidestep the existing deserialization code, and continue extending my factory until it creates an object graph that works, or seems to work. This latter step is obviously the difficult one, since I don't really know what the application requires in order to work properly! Ultimately, my factory has to reverse-engineer the existing code, but I can avoid destabilizing that code by virtue of having a completely discrete factory class in a separate file, and write unit tests in a strict A/B fashion (i.e., compare the object graph produced by my factory with that produced by the legacy code).

Then I dump the legacy code and open the champagne!

Tuesday, June 20, 2006

Running Fit via Visual Studio

Robert C. Martin has written, "My attitude is that every time I must fire up a debugger, I have failed. Perhaps I have failed to make my code so clear that I don't need a debugger to understand it. Perhaps I have failed to work in cycles that are so small that I don't need a debugger to find out what went wrong. Whatever the reason, when I am forced to use a debugger it means that I need to adjust my practices so that I can avoid using a debugger next time." I agree with him whole-heartedly. The necessity of understanding legacy code (i.e., code not covered by tests) is often given as a counter-example, but the more time I spend debugging code I didn't write, the more I'm moved to bring it under test, for which I've found FitNesse indispensable. Ward Cunningham has likewise recommended writing exploratory tests when many developers would be inclined to single-step through code they haven't seen before.

Nonetheless, I've felt the need this week to single-step through some legacy code that is exercized by tests I wrote in FitNesse. I willingly concede that this feeling is evidence of some kind of failure, perhaps my own and not just the legacy coders'. The appropriate FitNesse documentation was not entirely clear to me, perhaps because I needed to better understand how FitNesse works in the first place. It seems obvious now that the test runner would have to get the HTML representing the tests from FitNesse, but I didn't quite grasp that. However, the application that needed to call my code is C:\FitNesse\dotnet\TestRunner.exe. In order to have TestRunner invoke my code (single-stepping through your own fixtures, let alone TestRunner or the Fit code, takes a little more work), I needed to open the Property Pages for the project and change the Debug Mode setting from "Project" to "Program," after which I could specify the Start Application. I then set the Command Line Arguments to -debug -nopaths -v localhost 8080 SerializationSuite.RoundTripSuite Validation.dll, though not all of that is required. My FitNesse server is listening on port 8080, so yes, those options are required. -debug and -v lead to more console output, though they might provide you with more insight into what, precisely, TestRunner is doing. FitNesse has to provide the tests, and in this case the tests that drive the problematic code are on the specified page (i.e., I normally see them at http://localhost:8080/SerializationSuite...). Validation.dll is the project output; in this case it contains custom fixtures. The Validation project references the code I want to bring under test; those DLLs are copied to the Validation project by default. If I set the Working Directory to the bin\Debug directory for the project, TestRunner should be able to find all the necessary DLLs. fit.dll and fitLibrary.dll should have been installed to the same directory as TestRunner.exe. I specified -nopaths in order to avoid any potential collisions with paths set in FitNesse. At this point, TestRunner can find all the DLLs it needs, so I can freely hit F5, just like in the old days, when I was looking for dangling pointers in C++.