Monday, June 26, 2006

Generating C# from XML: A Mixed Blessing

I was recently confronted with a considerable legacy application (I use "legacy" the way Michael Feathers does: a legacy application is any without tests) that at first seemed to have exploited .NET's lovely support for XML serialization: most of the classes from the domain model had the [Serializable] attribute. It turned out, however, that I could not simply instantiate an XmlSerializer for any of these classes, since they all had public properties of type System.Type (which already suggests that the developers who preceded me had some peculiar ideas about OO) or IDictionary. Now, of course, I could have changed the design only minimally in the following fashion:

[XmlSerializable]
public class LegacyClass {
  [NonSerialized]
  public Type ValueType { ... }

  public TypeEnum ValueTypeEnum {
    get { based on ValueType property }
    set { likewise }
  }
}

Similarly, events have to be marked thus:

  [field:NonSerialized]
  public event BlahHandler Blah;

The field qualifier causes the compiler to apply the attribute to the underlying multicast delegate field. An event is not serializable as such.

Still, after only half a day, I realized that my effort to leverage .NET's handy code generation for XML serialization of a class was adding more cruft to an already crufty code base, that any unit tests I tried to create as motivation for my new code were compromised by the low quality of the code I was adding to, that I would have to version these additions without knowing what effect they would have on the application as a whole...that I was driving blind, in other words. Our XML wizard advised me simply to write all my custom serialization code (via the ISerializable interface)...and at that point, I knew I was getting into pissing-match territory. I simply don't feel any need to prove myself by writing this code. I'm perfectly happy accepting the output from the VS2003 xsd.exe utility. The only obstacle was that I didn't have an XML schema to work from! All I had was the existing and very unorthodox custom serialization code that was already in place, which navigated a DOM, thus losing the performance advantages of custom serialization.

So I decided to use the memento pattern: treat XML as a persistence layer, generate a .cs file from a schema that I would have to create incrementally, based on what I could learn about the object graph and interpose a factory class that could populate the memento from the object graph or vice versa. Now all my serialization code is in one place, and testable as such, and I need never break the existing code. All I have to do is deserialize the XML, serialize, create a DOM, and determine what was lost during the round trip. Then I extend the XML schema, regenerate the .cs file for the memento, and repeat as necessary. I sidestep the existing deserialization code, and continue extending my factory until it creates an object graph that works, or seems to work. This latter step is obviously the difficult one, since I don't really know what the application requires in order to work properly! Ultimately, my factory has to reverse-engineer the existing code, but I can avoid destabilizing that code by virtue of having a completely discrete factory class in a separate file, and write unit tests in a strict A/B fashion (i.e., compare the object graph produced by my factory with that produced by the legacy code).

Then I dump the legacy code and open the champagne!

1 comment:

Anonymous said...
This comment has been removed by a blog administrator.