Saturday, December 08, 2007

Eight Queens in F#

Man, I'm really going back to school here. The last time I did this problem was when I was learning Prolog. Let's say you're trying to place a queen in the first column. Prolog's select function will pick one of the rows, and create a branch to explore the remaining rows. If row 3 was selected, say, you don't have to write any code to concatenate [1..2] and [4..8], in order to pass that smaller set of unoccupied rows to the next iteration / recursion. Very handy! No loops, no list operations, etc.

> #light;;
> type Position = int * int;;

type Position = int * int

> let safe p q =
- fst p <> fst q && snd p <> snd q && fst p + snd p <> fst q + snd q && fst p - snd p <> fst q - snd q;;

val safe : int * int -> int * int -> bool

> let all_safe p positions =
- Seq.for_all (safe p) positions;;

val all_safe : int * int -> #seq -> bool

> all_safe (1, 2) [(1, 2)];;
val it : bool = false
> all_safe (1, 2) [(3, 3)];;
val it : bool = true

> let range = [1..8];;

val range : int list

> range;;
val it : int list = [1; 2; 3; 4; 5; 6; 7; 8]
> let queens size =
- let range = [1..size]
- let rec queens' columns answers =
- match columns with
- | [] -> answers
- | column::tail -> queens' tail [for answer in answers for position in [for row in range -> (column, row)] when all_safe position answer -> position::answer]
- queens' range [[]];;

val queens : int -> (int * int) list list

> queens 4;;
val it : (int * int) list list
= [[(4, 3); (3, 1); (2, 4); (1, 2)]; [(4, 2); (3, 4); (2, 1); (1, 3)]]
>

Implemented recursively, it's:

> let queens size =
- let range = [1..size]
- let rec queens' columns =
- match columns with
- | column::tail -> [for answer in (queens' tail) for position in [for row in range -> (column, row)] when all_safe position answer -> position::answer]
- | _ -> [[]]
- queens' range;;

val queens : int -> (int * int) list list

> queens 4
- ;;
val it : (int * int) list list
= [[(1, 3); (2, 1); (3, 4); (4, 2)]; [(1, 2); (2, 4); (3, 1); (4, 3)]]
>

Friday, December 07, 2007

F#, finally!

I've been champing at the bit to actually write some F#, but haven't been willing to risk sneaking it into production at the office, and unable to put in extra time at home to learn it. I have read Robert Pickering's Foundations of F# while commuting, but that's not verifiable experience in functional programming. As to why I would care whether I learn functional programming or not, let me plug the FringeDC interest group, founded by the estimable Dr. Conrad Barski. Not only have I found the members to be charming and personable, but they also seem much more incisive about what programming is than most developers I meet. They're not, for the most part, doing enterprise-scale stuff, where decisions about programming languages, database vendors, and so on would probably be taken out of their hands. Unless they're Paul Graham, who got rich writing LISP. Good for him! Anyway, lots of smart people write Java, or even VB. Nonetheless, the kicks in the butt that I've gotten at FringeDC, as well as from graduate classes at George Mason U., have led me to feel that what would do me the most good, and what I would most enjoy, is something more fundamental, theoretical, than spending every weekend preparing for MCAD exams. Different strokes.

Alright, enough about that. Languages that offer a REPL seem really suited to scripting. Yes, my friends at FringeDC would probably do everything in Emacs Scheme, but...I just don't know how anymore. Which I'm not proud of. Visual Studio has actually spoiled me to the point where I wouldn't know how to run some XML through an XSLT from the command line. I would actually have to take time out to figure it out. OK, really enough. Here's my task: I've got some XML describing the expected return values from some linear programming models, and it's tacked on top of some older regression testing scripts that just list the files to be solved by particular solvers. The XML file is called knownSolutions.xml, and the scripts are called solveBatch.cplex.txt and solveBatch.lpSolve.txt. Here's what I want to do: load the XML file, and, when a particular solver is not supposed to solve a model, add a corresponding attribute to the element. I've changed the schema:
<xs:attribute name="SkipSolvers" type="solverTypeList" use="optional">
...
<xs:simpleType name="solverType">
<xs:restriction base="xs:string">
<xs:enumeration value="lpSolve" />
<xs:enumeration value="Cplex" />
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="solverTypeList">
<xs:list itemType="solverType" />
</xs:simpleType>
So if we have the XML:
<KnownSolutions>
<Model skipsolvers="lpSolve">
<File>TestModels\JasonCh3Net.xml</File>
<ObjectiveValue>196200</ObjectiveValue>
</Model>
</KnownSolutions>
...then I'd like to add attributes like so:
<Model SkipSolvers="lpSolve" .../<

So I need to load the XML from the F# console:

> open System.Xml;; # Import the assembly and the corresponding namespace.
# The F# console waits for two semicolons before it interprets
# your statement.

> let document = new XmlDocument();; # Infer the variable's type.

val document : XmlDocument

> document.Load @"knownSolutions.xml";;
val it : unit = () # The Load() method returns void, in other words.

Now at this point I've got to make some decisions about, you know, algorithms and data structures. I've got a tree—XML—and I've got a list of files that I can't assume are sorted. So why don't I read them in, filter out the empty lines, sort them, and then iterate across the XML's elements looking for files that aren't in the list for Cplex. Since I'm going to be looking things up in the list, I'll make a set out of it first. I should point out that, in contrast to imperative programming, we do not want to open the file and loop over the lines. We want to treat the file as a sequence of lines, no different than IEnumerable<string>. I say "we," but I mean "I": I just don't want to write nested loops anymore. But how do I deal with the EOF? I use a "generator" that has a special way of "terminating" the sequence of lines it reads. This is the "pipes and filters" idiom familier to UNIX folks. F#'s Seq class provides a generate_using function that thoughtfully calls IDispose() on the object created at the beginning of the generation.
> let opener () = File.OpenText("something.txt");;

val opener : unit -> StreamReader
> > opener;;
val it : (unit -> StreamReader) = <fun:clo@0_3>
What that means is that opener is a function with no arguments (if I'd left off the parentheses before the equals sign, F# would assign the return value of OpenText() to opener, which is not what I want), that knows how to return a StreamReader. generate_using takes that residual value (I don't think it's called a return value, per se) and passes it repeatedly to the actual generator, ergo:

> let liner (reader : StreamReader) =
- if reader.EndOfStream then None
- else Some(reader.ReadLine());;

val liner : StreamReader -> string option

option means "either a string, or nothing." Don't think of it as null; F# doesn't really have null. None will mark the end of the generated sequence. generate_using keeps sending the stream as an argument to the "closure" liner, and liner decides whether it's got something worth returning; if not, it signals to the next thing in line, "I'm done." I don't need to point out how parallelizable this "pipes and filters" idiom is; whatever is waiting for a line of text only needs one line to do its job. Anyway, now I'll filter out the empty strings, and put all of them in a hashed set for fast lookup (I could also just remove the empty string from the hashed set after populating it). I'll create the filter separately, rather than inline:

> let filter s = (s <> "");;

val filter : string -> bool

The static Seq.filter method requires a function that takes a sequence element and returns a Boolean value, in this case false if the string is empty. In order to create a set that's initialized to include all the values in the generated sequence, I do:

> let lpSolveSet = HashSet<string>.Create(
- Seq.generate_using opener liner
- |> Seq.filter (fun s -> (s <> ""))
- );;

val lpSolveSet : HashSet<string>

Whew! I don't know if F#'s type inference would have let me leave out the <string>. Putting it together, as I suppose I could have done in 5 minutes if I were a PowerShell guru:

> Seq.generate_using (fun () -> File.OpenText @"solveBatch.lp_solve.txt")
- (fun reader -> if reader.EndOfStream then None else Some(reader.ReadLine()))
- |> Seq.filter (fun s -> (s <> ""))
- |> Seq.map (fun s -> lpSolveSet.Add(s));;
val it : seq<unit> = seq [null; null; null; null; ...]
> lpSolveSet;;
val it : HashSet<string>
= seq
["JasonCh3Net.xml"; "JohnCh4Net.xml"; "OffsetNet.xml";
"WithinClauseTestNet.xml"; ...]
>

In other words, generate a sequence of lines from a file , pass them through a filter that removes empty strings, and send the resulting sequence to the HashSet's constructor. I could also have piped the filtered sequence to a closure that would add each string to the set in turn. Whatever. Now...

> let models = document.SelectNodes "/KnownSolutions/Model";;

val models : XmlNodeList

> models;;
val it : XmlNodeList
= seq
[seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]];
seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]]; ...]
>

Oh, so it's a sequence! Now I'm rolling, so:

> Seq.hd models;;

Seq.hd models;;
-------^^^^^^^

stdin(131,7): error: FS0001: The type XmlNodeList is not compatible with the type seq<'a>
stopped due to error
Seq.hd ("head") is like LISP's car, btw. Hm, it's not a sequence. Embarrassing. I scratched my head for a few minutes, then realized that I should use a list comprehension:

> { for o in models -> o :?> XmlElement };;
val it : seq
= seq
[seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]];
seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]]; ...]
>
I apply the downcast operator to each element; now I have a strongly typed sequence, so F# can use type inference on the next element in the chain:

> let attrAdder (e : XmlElement) =
- if not (lpSolveSet.Contains(e.InnerText)) then begin
- let attr = document.CreateAttribute("SkipSolvers") in begin
- attr.Value <- "lpSolve";
- e.Attributes.Append(attr)
- end
- end;;

e.Attributes.Append(attr)
-----------------^^^^^^^^^^^^^

stdin(87,17): error: Type constraint mismatch. The type
XmlAttribute
is not compatibile with type
unit.
The type XmlAttribute is not compatible with the type unit
stopped due to error
What? I scratched my head over this one. In the end, the error message makes perfect sense: because there's no "alternative," the "consequent" is not allowed to return a value, and XmlAttributeCollection.Append() does. So we have to pipe it into oblivion, hence:

> let attrAdder (e : XmlElement) =
- if not (lpSolveSet.Contains(e.InnerText)) then begin
- let attr = document.CreateAttribute("SkipSolvers") in begin
- attr.Value &lt- "lpSolve";
- e.Attributes.Append(attr) |> ignore
- end
- end;;

val attrAdder : XmlElement -> unit
So this is the little function object that will add the required attribute to the element if the lpSolve solver is not supposed to solve it. Then:

> { for o in models -> o :?> XmlElement } |> Seq.iter adder;;
val it : unit = ()

And lo and behold, I get this:

> {for node in document.SelectNodes "/KnownSolutions/Model" -> node :?> XmlElement } |> Seq.iter adder;;
System.ArgumentException: The named node is from a different document context.
at System.Xml.XmlAttributeCollection.Append(XmlAttribute node)
at FSI_0014.adder(XmlElement e)
at FSI_0074.it@150.Invoke(XmlElement e@51_6)
at Microsoft.FSharp.Collections.Seq.iter@868.Invoke(IEnumerator`1 e@90_1)
at Microsoft.FSharp.Core.Operators.using[T,U](T ie, FastFunc`2 f)
at Microsoft.FSharp.Collections.Seq.iter[T,U](FastFunc`2 f, U ie)
at <StartupCode>.FSI_0074._main()
stopped due to error
After about an hour, I realized that the function object that's adding the attributes is a closure: that means that it encapsulates a reference to a previous value of document. That's what "closure" means, silly! I create a new closure that refers to the new XML document, rerun, and I'm done, almost:

> document.Save @"knownSolutionsImproved.xml"

Friday, November 30, 2007

Now it's Chinese music...

Yep, I've listened to nothing but flamenco for several weeks. Then I wanted to see if YouTube had any clips of Sabicas, and saw that someone had posted a comment comparing him to Liu Fang. I don't quite see the comparison, except in terms of their virtuosity, but if you've been dying to hear a performance of "King Chu doffs his armor," here it is.

Saturday, November 17, 2007

The FBI in Movies, the FBI in Reality

In Broken, Richard Gid Powers suggests that the FBI has never actually had a mission, and many of it failures can be attributed to its flailing about in the effort to find one. J. Edgar Hoover's insistence on chasing a tiny number of radicals while completely ignoring the much more serious problem of organized crime was not simply due to his cynicism; it was also due to the confusion in the definition of a federal crime, and his unwillingness to embarrass himself trying and failig to solve an actually difficult problem. It was also the beginning of the FBI's self-promotion through popular entertainment ("The Untouchables" and all that).

Now, I'm a sucker for crime shows, but there is a powerful whiff of bullshit in the depiction of the profiling of serial killers in such movies as The Silence of the Lambs (not to mention something really repulsive, not to mention factually incorrect, in the depiction of them as evil geniuses in SotL, the contemptible Seven...I could go on), and that whiff emanates from John Douglas, the FBI's "eminent" profiler, the model for SotL's Jack Crawford. Malcolm Gladwell recently argued in The New Yorker that what these profilers do is not so different from the "cold readings" that psychics such as John Edward do, namely, spew enough predictions that some of them have to stick. Even about such simple, empirically verifiable aspects of a killer's identity as, oh, his age, his skin color, his intelligence...Douglas is usually wrong. Even if he were right, how would his colleagues go about looking for an unusually intelligent, unmarried white man whom no evidence links to the crime? Profiling's basis in research is rather weak; imprisoned serial killers are *not* reliable sources, and the more intelligent ones have ample opportunity to construct ex post facto justifications for details of the crime if that will keep some psychologist talking to them. Even if this were not so, "research" on imprisoned serial killers has not been conducted according to proper research protocols, and the findings would be impossible to replicate. Gladwell: "Not long ago, a group of psychologists at the University of Liverpool decided to test the FBI's assumptions [that a criminal typology can be deduced from crime-scene details]...When they looked at a sample of a hundred serial crimes, however, they couldn't find any support for the FBI's distinction [between 'organized' and 'disorganized' killings and therefore killers]. Crimes don't fall into one camp or the other. It turns out that they're almost always a mixture of a few key organized traits [emphasis mine—TN] and a random array of disorganized traits. Laurence Alison, one of the leaders of the Liverpool group and the author of The Forensic Psychologist's Casebook,'The whole business is a lot more complicated than the FBI imagines.'"

Unit-Testing XML, XSLT

OK, I posted my suggested design for an XML fixture to the MbUnit user's group. I ought to use this blog for self-promotion sometimes.

Monday, November 05, 2007

MbUnit vs. NUnit, the cage match








I've happily used NUnit for 4 years now, and never questioned the design. As you may know, NUnit diverges from the xUnit model by using metadata rather than inheritance to indicate test fixtures, their setup, and their tests: "With NUnit 2.0, we introduced the use of attributes to identify both fixtures and test cases. Use of attributes in this way was a natural outcome of their presence in .NET and gave us a way of identifying tests that was completely independent of both inheritance and naming conventions" (see here). There are times, however, when this new model doesn't quite serve. I have quite a few test fixtures that, as part of their setup, need to perform a similar set of operations: say, read a file, create some sort of data structures, run an expensive Solve() method, etc. I can do this by deriving my fixture classes from an abstract base class that spells out a "template pattern," namely the methods or properties that the real fixture classes must implement, which the base class, which actually has a public method with the [TestFixtureSetUp] or [SetUp], which in turn encapsulates the pattern. In this case, the derived class would implement public string ModelPath { get { return @"C:\Documents and Settings\...\model.xml" } }, and the abstract class's fixture set up would call this.Load(this.ModelPath). You get the idea. Anyway, I'm not totally happy with this. I don't like the coupling within a class hierarchy that the template pattern entails, I end up with too many, only slightly different ad hoc abstract base classes that no one else understands, and there are still things I can't do this way.

Now, I have to admit that I haven't yet explored NUnit's extensibility. My current task is to write a rather complex XSLT involving three different Muenchian groupings, and I really wanted to be able to do it incrementally, in TDD fashion. I do sort of like stepping through the transform in Visual Studio's debugger, as that's taught me a lot about XML, but I need repeatable tests. XsltUnit hasn't been updated in four years, nor has XmlUnit. The former might be suitable for a real XMLer, but I'm not one, and I don't want to displace my difficulties onto a tool. XmlUnit doesn't handle namespaces, anyway, whereas I can't avoid them. Poking around further, I stumbled on MbUnit,which does include XML diffing. That turned out not to be what I wanted, but I did learn some interesting things. MbUnit still provides the expected attributes, including the ever-handy [ExpectedException(...)], but the test pattern is really described in terms of attributes. A test runner finds a test fixture's attributes, and has the appropriate invoker create the tests. This allows test methods to accept parameters, for example; the invoker simply uses reflection to call the method with an argument described by method's attributes. You can find more information at Peli's Farm, and at the Advanced Unit Test project, but I wouldn't say that the documentation is stellar. Anyway, in order to begin to be more of a radical, and less of an early adopter, I downloaded the source code, went back and forth between it and my requirements, trying to figure out how to implement them.

So this is what I ended up with:


using System.Xml.XPath;
using MbUnit.Framework;
using Omega.NetworkModel.Tests;

namespace Omega.NetworkModel.Tests
{
/* A fixture factory should load this document:

<?xml version="1.0" encoding="utf-8"?>
<Root>
<Element Integer="1"></Element>
</Root>

*/
[MyXmlTestFixture("document.xml")]
public class TestXmlWithoutTransformFixture
{
// A test invoker should somehow get its hands on the XPathNavigable,
// compile this expression, and give me the strongly typed result.
[MyXPathExpressionTest("/*")]
public void DoSomething(XPathNodeIterator iterator)
{
Assert.AreEqual(1, iterator.Count);
}

// count() returns a double, it turns out. The invoker is invoking
// this method via reflection, so if the XPath expression returns
// something other than a double, there'll be an error.
[MyXPathExpressionTest("count(//*)")]
public void CountElements(double count)
{
Assert.AreEqual(2, count);
}

// I was surprised that the compiler does *not* expect XML-escaped stuff.
[MyXPathExpressionTest("count(//*) > 0")]
public void TryBoolean(bool hasElements)
{
Assert.IsTrue(hasElements);
}
}

// The runner should load this document, and transform it. The
// transformed result is then the basis of the tests.
[MyXmlTestFixture("document.xml")]
/*
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<!-- The default template simply copies the text value. -->
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

<xsl:template match="/">
<xsl:apply-templates select="*|@*" />
</xsl:template>
</xsl:stylesheet>
*/
[MyXsltTransform("test.xslt")]
public class TestXmlAfterIdentityTransformFixture
{
[MyXPathExpressionTest("/*")]
public void DoSomething(XPathNodeIterator iterator)
{
Assert.AreEqual(iterator.Current.NodeType, XPathNodeType.Root);
}
}

// Transform, and then establish an evaluation context (i.e., namespace mappings!).
[MyXmlTestFixture("document.xml")]
[MyXsltTransform("test.xslt")]
[MyXsltNamespaceMappingDecorator("tnt", "")]
public class TestXmlAfterIdentityTransformAndNamespaceFixture
{
[MyXPathExpressionTest("/tnt:*")]
public void DoSomething(XPathNodeIterator iterator)
{
Assert.AreEqual(iterator.Current.NodeType, XPathNodeType.Root);
}
}
}

Thursday, November 01, 2007

Open Spaces & Productivity


I write from a position of utter ignorance here, as for the most of the last 5 years I've had quite unpleasant working situations. Nonetheless, I've often been very productive in my windowless basements, sometimes because of their unpleasantness: I had to work harder in order to eliminate them from my awareness. I wasn't always allowed an MP3 player and headphones, either, so it's not like I had any sort of private space. I worked for 15 months at the desk pictured at the right, and I was extremely productive. That doesn't mean that I didn't also hate it. The effort that I had to exert in order to shut out distractions eventually wore me out. Be all that as it may, I really just want to admit that I have no proper basis of comparison to the agile image of an open space of unimpeded communication. I'm not sure I want to share a keyboard with anyone for most of the day, but maybe my irritation will diminish now that I've moved out of the aforementioned dungeon.

At any rate, it is possible to study these things, and Michael Brill does just that. Brill is president of BOSTI Associates, "workplace planning and design consultant in Buffalo, N.Y., and founder of the School of Architecture at the University of Buffalo" (click here). This article contrasts his viewpoint to that of Franklin Becker, director of the International Workplace Studies Program at Cornell University. I don't have time to summarize right now (I just finished my coffee, so my "knife-sharpening" 20 minutes are over), but they both agree that cubicles are bad.

Friday, October 26, 2007

I'm a Libertarian Now!

It was probably inevitable that I would become a libertarian. When I was in graduate school, I was a Gramscian Marxist and a Freudian, as was only proper. Now I write software. Now I'm linking to George Mason University economics professor Tyler Cowen's Marginal Revolution blog. Nice that the guy also writes about art: Markets and Cultural Voices: Liberty vs. Power in the Lives of Mexican Amate Painters. To be honest, most of the libertarians I knew before I got back into software were angry middle-aged engineers, and it was hard not to interpret their convictions as of a piece with their middle-aged guy anger.

Now that I'm rockin' the GMU CS department, maybe I'll meet Cowan.

Sunday, October 21, 2007

Greeted as Liberators

Rory Stewart, who served as governor of Maysan province under the CPA, reviews several biographies of one of his forebears, Gertrude Bell: "The Queen of the Quagmire". Both the review and his book The Prince of the Marshes show a kind of modesty that might be impossible for any American writing about our occupation of Iraq. For Americans, it seems to be all about us. This flaw doesn't invalidate the writing of George Packer, for example, whose essay on our contemptible neglect of Iraqis who've chosen to help the occupiers by serving as translators I highly recommend. Yet Packer was a liberal hawk only a few years ago, and it's very difficult for him, writing now, not to attribute the obvious failure of our occupation (and of his arguments at the time) to the people carrying it out, rather than to a more fundamental cause. In other words, if the CPA had not put the obtuse Paul Bremer in charge, it might have worked.

Gertrude Bell is one of the architects of modern Iraq, if that's something she'd really want to claim credit for at this point. In contrast to the liberal hawk's regret that we didn't put the smartest people in charge, not to mention Edward Said's imputation of a fundamental bad faith, Stewart credits Bell and her colleagues with quite impressive knowledge of Arabic, of tribal power structures, of how to ride a camel, etc.: "Some suggest that the US failure in Iraq is due simply to lack of planning...They should consider Bell and her colleagues, such as Colonel Leachman or Bertram Thomas, a political officer on the Euphrates. All three were fluent and highly experienced Arabists, won medals from the Royal Geographical Society for their Arabian journeys, and were greatly admired for their political work. Thomas was driven from his office in Shatra by a tribal mob. Colonel Leachman, who was famed for being able to kill a tribesman dead in his own tent without a hand lifted against him, was shot in the back in Fallujah." Bell herself has to take the blame for agreeing to tack the Kurdish areas onto this new entity, at subsequent great cost to the Kurds.

It's too easy to hate the smirking fatuity of Donald Rumsfeld. But it's entirely beside the point. There is no objective measure of the depth of our understanding of Iraq. Who's to say that their self-understanding is any better than our understanding of them? The obstacle to building a new Iraq for the Iraqis is not that we don't understand them. The obstacle is that they're just not that into us. That's what it's like to be an occupier.

Friday, October 12, 2007

There's this music called flamenco...

...and on a whim I recently purchased a 12-volume set called The History of Flamenco. I had just read Fernanda Eberstadt's Little Money Street, and I felt I had to hear some. Or a lot. I've always loved the blues, and flamenco is like it in some ways: it's the music of a marginal group, much of its history is mysterious, purity of tone means little while emotional force means everything (in a wonderful phrase, a great flamenco singer is said to have "the voice that wounds"), and there's a violence in it that doesn't always seem metaphorical. On the other hand, flamenco has a much larger range of forms, and however much I may love Charlie Patton, I can't think of a blues guitar virtuoso who comes close to Sabicas. Anyway, I was just surfing for some information on La Paquera, one of whose tracks is included in the above anthology, and stumbled across this video clip of the same woman, now perhaps an abuela, attacking "Dolor de madre mia" with even more force than she had when she was young. I have no idea if this song is actually sad, although the title would suggest that. I was at work, but the hair on my arm stood up on hearing this, not just because of the emotion, which I can't identify, but because of the indomitable force of the music that bursts out of these old folks. Hunt around for some more clips from Carlos Saura's flamenco films. YouTube also has a lot of odd clips from old movies, in which some cantante in a suit strolls onto the set and lets loose (it doesn't hurt to have a young Paco de Lucia on guitar). For me it's like seeing Howlin' Wolf walk into a Doris Day movie and sing "Commit a Crime."

When I saw the first clip, I thought, "I could listen to this all day." Since then, a month ago, I have been. Here, I've done some of the work for you: Paco de Lucia with El Camaron de la Isla, Juanito Valderrama wandering into some weird dance routine, some younger singers and the great Tomatito tearing it up (also from Saura's film), and finally these two old men singing the slowest form, the martinete. Their intonation isn't perfect. Listen, Maria Callas's intonation wasn't perfect.

Ancient Technology

I'm usually skeptical when I watch some History Channel cheapie about ancient trans-Atlantic journeys or whatever, but the evidence in this case is incontrovertible: the ancient Greeks knew how to build clocks, with gears and all. Only two examples exist, but they're too sophisticated to have been the only ones of their kind. How could this knowledge have been lost? John Seabrook suggests a number of reasons: not many people would have possessed such knowledge, and if social upheavals separated them from the opportunity to use the knowledge, the clocks would have ended up as scrap, and melted down; both Greek and Roman culture respected individual bravery too much to commit itself to technologizing warfare; technology may have provided delight rather than profit. Moreover, the knowledge almost surely ended up with the Arabs, whence it returned to Europe in the 14th century, when clocks with gears suddenly (re)appear. It's easy to suspect that the West refuses to admit just how advanced Muslim civilization was, but something more peculiar is going on here: even classicists showed little interest in the Antikythera Mechanism until recently. It's as though we think that only we are capable of technology per se. The Greeks had some impressive mechanisms, but didn't put them to uses we respect.

Country music is actually pretty good!

Merle Haggard, at http://archive.salon.com/people/bc/2000/11/14/haggard/index2.html: "Look at the past 25 years -- we went downhill, and if people don't realize it, they don't have their fucking eyes on," says Haggard. "In 1960, when I came out of prison as an ex-convict, I had more freedom under parolee supervision than there's available to an average citizen in America right now. I mean, there was nobody going to throw you down on the side of the road spread-eagled, and look up your butt for a fucking marijuana cigarette. God almighty, what have we done to each other?"

Wednesday, October 10, 2007

I give up!

I started this blog about a year ago because I thought it was necessary to my self-promotion. In addition to going to work every day, and doing a good job, and attending conferences, and reading books about CS generally, whatever language I'm using (XSLT, F#, whatever), whatever platform I'm on (.NET for most of the last 4 years), etc., I need to blog about the cool thing I'm doing, and get some C# MVP finally to add me to his blogroll. Real life got in the way, though, as it will, esp. in the form of a new baby. I've had just enough time to do what needed to be done, and none left over to describe to the recruiters at Google who, one imagines, are reading all these blogs, to explain how well I did it. So I'm going to start using this blog the way most people do, i.e. to quote big chunks of stuff I read elsewhere, give shout-outs to my friends, post cute toddler photos, etc.

Anyway, the big chunk that I want to regurgitate undigested is from The New Yorker, May 14, 2007. James Surowiecki, whose The Wisdom of Crowds might have appealed to the same folks I see reading Radicals for Capitalism on the bus, argues that it might be better for the whole if a certain part, namely companies who own patents, makes less money. I suppose that's an instance of the radicalism of capitalism; emergent capitalist economies might require a certain amount of theft, just as ours did. I'm not an ideologue of open-source software; I actually don't care about Linux one way or another, just as I don't care about Microsoft. Anyway, "History suggests that after a certain point tougher intellectual property rules yield diminishing returns. Josh Lerner, a professor at Harvard Business School, looked at a hundred and fifty years of patenting, and found that strengthening patent laws had little effect on the number of innovations within a country. And, in the US, stronger patent protections for things like software have had little or no effect on the amount of innovation in the field. The benefits of stronger IP protection are even less convincing when it comes to copyright law: there's little evidence that writers and artists are made more productive or creative by the prospect of earning profits for seventy years after they die, and the historical record suggests only a tenuous connection between stronger IP laws and creative output."

One would expect that at some point China will have to have strong laws to prevent Chinese entrepreneurs from stealing from each other. You're certainly not going to read some ponderous clash-of-civilizations crap on my blog. Maybe China is culturally incapable of respecting the rule of law, in the way that's bred into the Anglo-Saxon bone. Maybe not. "The great irony is that the US economy in its early years was built in large part on a lax attitude toward intellectual property rights and enforcement [ditto for the Dutch and English economies in the two centuries before that]. As the historian Doron Ben-Atar shows in his book Trade Secrets, the Founders believed that a strict attitude toward patents and copyright would limit domestic innovation and make it harder for the US to expand its industrial base. American law did not protect the rights of foreign investors or writers, and Secretary of the Treasury Alexander Hamilton, in his famous Report on the Manufactures, of 1791, actively advocated the theft of technology and the luring of skilled workers from foreign countries. Among the beneficiaries...was the American textile industry,which flourished thanks to pirated technology."

Wednesday, March 14, 2007

I really hate Wes Anderson

From James Wolcott's blog: "When I was watching Wes Anderson's beyond-precious and unjustifiably proud-of-itself The Life Aquatic, there was a scene where a three-legged dog is whimpering on deck and Jeff Goldblum, playing the suave villain, asks Bill Murray the dog's name. Murray tells him and (off camera) we hear a loud smack as Goldblum hits the dog to stop its whimpering. I can't recall if Murray changed expression or not, since that wasn't a high priority for him here. The minor cruelty doled out to an animal for a cheap laugh was compounded when the dog was left behind at the end splashing in the water as the cast of overpaid actors vying to outhip each other departs."

Wolcott puts it better than I could. Jason Schwartzmann gave Rushmore some real passion, and the film deserved the praise it got, but my jaw clenched while watching The Royal Tenenbaums. Not only was I both bored and irritated, Anjelica Huston looked bored and irritated, the usually ill-used Danny Glover looked bored and irritated, Gwyneth Paltrow looked bored and irritated (though that may just be Gwyneth)...only Gene Hackman didn't seem to be saying, "Dig how adorable I am in my ineffectuality." I know that Wes Anderson has already anticipated my belligerent response to his WASP privilege (check out the bowtie) by his irony, but that just makes me hate him more. Which, of course, lets him come out on top!

Would the aimless but amusing banter of his films be possible without the equally problematic Quentin Tarantino, who similarly dares you to disdain his shallowness? I saw Pulp Fiction on a college campus, surrounded by undergraduates (I was about 34 at the time), and I might have been the only person in the theater not to laugh when Marvin (the nigger, in case you've forgotten) gets his brains blown out! This gives Quentin the opportunity to appear in his own film and try to say the word "nigger" in good conscience, though he's so sweaty and uncomfortable doing it that ofay college students were not moved to emulate him, as far as I know. I realize that, having admitted that I was not enchanted by Pulp Fiction, I will never considered cool, but that's what his film is for: to separate the cool from the uncool (cf. this). What I'm getting at is that Wes Anderson is sort of the upper-class Tarantino, complex in an infuriatingly shallow if not selfish way, and deliberate in his provocation, which is intended to make you lose your cool and leave him on top.

Friday, March 09, 2007

NAnt's Filtered Trigger

I have to say that I found the documentation on this feature somewhat baffling, but that might just be me. We use Subversion here, and I'd like the build not to run if the only files committed since the last build are the ones that the last build created, e.g. the version.txt file, the generated CommonAssemblyInfo.cs into which the incremented build number is written, the backups of CruiseControl.NET's various .config and .xsl files, etc. Feel free to write me if my example doesn't help you out. The problem that cost me close to a day is that relative paths within the block don't seem to be resolved as I expected. See my comments...


<cruisecontrol>
<project name="Our Project 3.0">

<workingDirectory>E:\Builds\Our Project 3.0</workingDirectory>
<artifactDirectory>E:\Builds\Our Project 3.0\artifacts</artifactDirectory>
<!--
CruiseControl.NET doesn't handle spaces in paths consistently, so let's
keep ourselves out of trouble.
-->
<webURL>http://localhost/ccnet/OurProject3.0</webURL>

<!-- Check every 10 minutes. -->
<triggers>
<intervalTrigger seconds="600" />
</triggers>

<sourcecontrol type="filtered">
<sourceControlProvider type="svn">
<trunkUrl>svn://aMachine/omega/trunk/OurProject</trunkUrl>
<workingDirectory>E:\Builds\Our Project 3.0</workingDirectory>
<username>tnassar</username>
<password></password>
</sourceControlProvider>
<inclusionFilters>
<pathFilter>
<!-- Note that this is the full pathname, and apparently has to be. -->
<pattern>/trunk/Our Project/**/*.*</pattern>
</pathFilter>
</inclusionFilters>
<exclusionFilters>
<!-- Changes to these files should *not* cause a new build. -->
<pathFilter>
<pattern>/trunk/Our Project/version.txt</pattern>
</pathFilter>
<pathFilter>
<!-- These are serialization assemblies generated during the build process,
using xsd.exe.
-->
<pattern>/trunk/Our Project/Serialization/Bin/*.dll</pattern>
</pathFilter>
<pathFilter>
<!-- The build number is boosted by a NAnt task. -->
<pattern>/trunk/Omega LP/Applications/CommonAssemblyInfo.cs</pattern>
</pathFilter>
<pathFilter>
<pattern>/trunk/Our Project/Build/**/*</pattern>
</pathFilter>
</exclusionFilters>
</sourcecontrol>
<tasks>
<!-- And so on... -->
</tasks>

Monday, February 05, 2007

Generating LOC from NAnt, Part D'oh!

At some point I'd like to make all the code available, and perhaps even contribute it NAntContrib, but at the moment it's really butt-ugly. I'm glad that C# 2.0 has better support for sort-of-functional programming constructs, but this is unpleasantly verbose:

public static TreeSorter Sort(string[] paths)
{
if (paths.Length == 0)
return new TreeSorter("");
List<string> pathsList = new List<string>(paths);
pathsList.Sort();
List<List<string>> splitPaths = pathsList.ConvertAll<List<string>>(delegate(string path)
{
return
new List<string>(
path.Split(
System.IO.Path.DirectorySeparatorChar));
});

int index = 0;
for (; index < splitPaths[0].Count; ++index)
{
if (!splitPaths.TrueForAll(delegate(List<string> input)
{
return index < input.Count && input[index] == splitPaths[0][index];
}))
break;
}

IEnumerable<string> commonPath = splitPaths[0].GetRange(0, index);
IList<IEnumerable<string>> truncatedPaths = splitPaths.ConvertAll<IEnumerable<string>>(delegate(List<string> input)
{
return
input.GetRange(index,
input.Count -
index);
});
return new TreeSorter(commonPath, truncatedPaths);
}

Saturday, February 03, 2007

Generating LOC from NAnt, Part 1

Last March I my team took over a failing project for our most important client. The domain to which the software applied is very interesting, and very difficult, and I can't deny that the previous contractor had employed some smart people...at some point in the project's history! The code was a mess. Simply passing FxCop over it revealed that about 5% of it was dead code. After a few months of working with it, we realized that another significant part of it would only be called if certain events fired, and they never did. This is often not so easy to prove, unfortunately. Since I use ReSharper, I found it easier simply to delete unused methods, but the overuse of dynamic binding (!) in the code base meant that I could not be sure, without regression tests (which of course didn't exist until I wrote them) that certain classes were never instantiated. And so on.

Anyway, we began by simply counting the lines of text in the entire code base. This measure decreased daily, and the project manager finally suggested that we do this more systematically so that the client could cite these figures when trying to justify his additional expenditures on this already "working" code to his superiors. I want to make clear that we are not somehow charging for KLOC. Notwithstanding that we're doing this work for a gov't agency, we're not working under some burdensome methodology. The distinction between physical and logical LOC is not so important to us. We are supposed to be making the code more maintainable; a steady decrease in the relative LOC counts as success. I imposed the requirement on myself that the LOC metric be generated from CruiseControl.NET via NAnt. None of the tools that I could use in this way were entirely satisfactory, however. NDepend measured the LOC in files that didn't interest me (e.g., MyForm.Designer.cs); GeroneSoft's Code Counter likewise. Both utilites allow me to create include and exclude lists, but none of them as neatly as NAnt! I would really like to be able to edit the .build file, quickly create another <fileset>, say for ".cs files generated by the Windows Forms Designer," measure the LOC in that set, and arrange for those results to be published by CruiseControl.NET.

NAntContrib does supply a <codestats> task, which accepts a fileset as input. Then, unfortunately, it outputs the statistics as a flat list, which is not what I wanted. What I need is to turn that list into a tree again. This turns out to be unspeakably difficult without XSLT 2.0. Don't recommend Muenchian groupings, please! My initial thought that it would be neat to write this transform cost me at least a day. Then I set about writing a custom NAnt task in C#, but once again I was sidetracked by C#'s relatively poor support for list comprehensions. Moreover, one is discouraged by "performance considerations" from repeatedly splitting and rejoining strings, slicing lists, etc. That's what happens to you when you write too much C, as opposed to learning LISP at a good school. After about two wasted days, on the bus going home, I pulled out a pad of paper and wrote out some Python. I tried it out when I got home, and it worked right out of the box. Then, I admit with some embarrassment, I retrofitted the unit tests, and implemented the special functions such as __len__(), for the sake of the tests.

import os
import itertools
import unittest

class Node:
def __init__(self, path = [], children = []):
assert isinstance(path, list)
assert isinstance(children, list)
self.path = path
self.children = {}
for child in children:
self.add(child)

def __len__(self):
return 1 + sum([len(child) for child in self.children.values()])

def __getitem__(self, key):
return self.children.__getitem__(key)

def __setitem__(self, key):
return self.children.__setitem__(key)

def __iter__(self):
yield self.path
for child in self.children.values():
for i in child:
yield self.path + i

def __repr__(self):
return os.sep.join(self.path)

def add(self, path):
assert len(path), "Cannot add an empty path."
if len(path) == 1:
self.children[path[0]] = Node(path)
elif self.children.has_key(path[0]):
self.children[path[0]].add(path[1:])
else:
self.children[path[0]] = Node([path[0]], [path[1:]])

def allsame(seq):
# An empty sequence is true because is has no mismatches.
# A 1-length sequence should always be true, i.e., imap(...) will
# return an empty list, of which False is not a member (duh!).
# Otherwise, compare all elements to the first. The iteration
# over the sequence should stop as soon as False becomes
# a member of the output sequence.
return not seq or False not in itertools.imap(lambda x: x == seq[0], itertools.islice(seq, 1, None))

def createNodeSorter(paths):
assert paths, "Cannot create a base node for 0 paths"
# Split the paths along the directory separator character.
splitFiles = [f.split(os.sep) for f in paths]
# Now "pivot" these lists in order to find out if they share a common
# base path. Note that os.commonprefix() is character-based; it is
# not suitable.
zipped = zip(*splitFiles)
# How many sequences consist of identical elements?
# Concatenate those, and you've got the common base path.
common = [f[0] for f in itertools.takewhile(allsame, zipped)]
print "Common base path: " + str(common)
# Now strip the common base path.
splitFiles = [f[len(common):] for f in splitFiles if f != common]
print "\tchildren: " + str(splitFiles)
return Node(common, splitFiles)

class AllSameTest(unittest.TestCase):
def testEmptySequence(self):
self.assertTrue(allsame([]))

def testSingleElement(self):
self.assertTrue(allsame([1]))
self.assertTrue(allsame(['test']))

def testTwoIntegers(self):
self.assertTrue(allsame([1, 1]))
self.assertFalse(allsame([0, 1]))

def testTwoStrings(self):
self.assertTrue(allsame(['test', 'test']))
self.assertFalse(allsame(['test', 'fail']))

class CreatorTest(unittest.TestCase):
def testEmptyFileSet(self):
self.assertRaises(AssertionError, lambda: createNodeSorter([]))

def testSinglePath(self):
n = createNodeSorter([r'D:\trunk\Projects\Build\NAnt\LineCounter\LineCount.cs'])
self.assertEqual(1, len(n))

def testTwoPaths(self):
files = [
r'D:\trunk\Projects\Build\NAnt\LineCounter\LineCount.cs',
r'D:\trunk\Projects\Build\NAnt\LineCounter\LineCountCollection.cs']
n = createNodeSorter(files)
self.assertEqual(3, len(n))
self.assertEqual(['D:', 'trunk', 'Projects', 'Build', 'NAnt', 'LineCounter'], n.path)

def testOverlappingPaths(self):
files = [
r'D:\trunk\Projects\Build\NAnt\LineCounter',
r'D:\trunk\Projects\Build\NAnt\LineCounter\LineCountCollection.cs']
n = createNodeSorter(files)
self.assertEqual(2, len(n))
self.assertEqual(['D:', 'trunk', 'Projects', 'Build', 'NAnt', 'LineCounter'], n.path)

class NodeSorterTest(unittest.TestCase):
def testBlankNode(self):
n = Node()
self.assertEqual('', repr(n))
self.assertEqual(1, len(n))

def testSingleChild(self):
n = Node(['base'], [['child1']])
self.assertEqual(2, len(n))
self.assertEqual('base', `n`)

def testNoChildren(self):
n = Node(['C:'])
self.assertEqual(1, len(n))

def testInvalidPath(self):
self.assertRaises(AssertionError, lambda: Node('base'))

def testIteratorBlankNode(self):
n = Node()
nodes = [node for node in n]
self.assertEqual([[]], nodes)

def testIteratorNodeWithPath(self):
n = Node(['C:', 'Program Files'])
nodes = [node for node in n]
self.assertEqual([['C:', 'Program Files']], nodes)

def testIteratorWithOneChild(self):
n = Node(['C:'], [['Program Files']])
nodes = [node for node in n]
self.assertEqual(['C:'], nodes[0])
self.assertEqual(['C:', 'Program Files'], nodes[1])

def testThreeLevels(self):
n = Node(['C:'], [['Program Files', 'Adobe', 'Acrobat 7.0'], ['Program Files', "CruiseControl.NET"]])
self.assertEqual(5, len(n))

def testNoCommonBase(self):
n = Node([], [['C:', 'Program Files', 'Adobe', 'Acrobat 7.0'], ['D:', 'Program Files', "CruiseControl.NET"]])
self.assertEquals(8, len(n))

if __name__ == '__main__':
unittest.main()

Friday, January 19, 2007

Boosting Your Assembly.cs Revision Number from NAnt

I know that others have blogged about this, esp. Scott Hanselman did, somewhere...I'm digressing already. I wanted the simplest possible solution: I wanted to leverage already existing NAnt or NAntContrib tasks, rather than write my own; I did not want to use the CruiseControl.NET label, since a new CC.NET installation starts from zero, as I found out after I hosed my own, and in any case I would want CC.NET ultimately to do no more than call a bunch of NAnt tasks, pass all the artifacts through some XSLT, and create the Web pages. What I ended up with was 1, a file called version.txt, containing only (at the moment) the text 3.0.0.1401; 2, a file called CommonAssemblyInfo.cs, including assembly-level metadata common to the entire application, e.g. [assembly: AssemblyVersionAttribute("3.0.0.1412")]; 3, the following NAnt task:

<target name="createAsmInfo" description="Overwrite the rev. #" >
<if test="${not property::exists('version.txt')}">
<property name="version.txt" value="version.txt" />
</if>
<if test="${not property::exists('common.assembly.info')}">
<property
name="common.assembly.info"
value="Applications/CommonAssemblyInfo.cs" />
</if>
<!-- I.e., increment the revision number and *not* the build number. -->
<version path="${version.txt}"
buildtype="NoIncrement"
revisiontype="Increment" />
<asminfo output="${common.assembly.info}" language="CSharp">
<imports>
<import namespace="System.Reflection" />
</imports>
<attributes>

<attribute type="AssemblyVersionAttribute"

value="${buildnumber.major}.${buildnumber.minor}.${buildnumber.build}.${buildnumber.revision}" />
<attribute type="AssemblyCopyrightAttribute" value="Copyright © 2006 MDi" />
<attribute type="AssemblyCompanyAttribute" value="MegaDyne, Inc." />
<attribute type="AssemblyProductAttribute" value="KillerApp" />
</attributes>
</asminfo>
<!-- The svn task is inadequate. It doesn't handle commits! -->
<exec program='${svn.exe}'
commandline='commit --non-interactive --username=${svn.username} --password=${svn.password} -m "Updated by CruiseControl.NET" ' />
<exec program='${svn.exe}' commandline='update' />
</target>


The <asminfo> task should really not overwrite existing metadata unless so instructed, but I haven't tested that yet. Note that I immediately commit the changed files back into the repository; this might be the wrong thing to do, unless the build succeeds. If it does fail, then CruiseControl.NET will be unable to update the source code from the repository; i.e., there files I've changed will be in conflict. This suggests that I really do all my work from NAnt, but I haven't gotten around to that yet. A further problem is that version.txt and CommonAssemblyInfo.cs will look new to CC.NET the next time it checks, triggering another build. CC.NET offers a "filter trigger" for this sort of thing, but I found the documentation incomprehensible, so I haven't been able to use it yet.


The one additional step is to remove the common metadata from AssemblyInfo.cs files throughout the solution, and add CommonAssemblyInfo.cs as a link. This turned out to be somewhat tricky for me. I have an old habit of configuring Windows Explorer to open files with a single click; a more experienced developer once told me that he was "more productive" with this option; I fell for it, and now I'm stuck with this wierd tic. Anyway, I would select "Add existing item" from a project's context menu, navigate to CommonAssemblyInfo.cs, click on it...and of course a copy would be added to the project. You won't have this problem! Anyway, I had to right-click on the file from the File Open dialog, select "Select" from the context menu, and only then would the little triangle be enabled:

That's pretty much it. You have to remember to change every project this way, but that's about it. Some (Scott Hanselman) prefer to edit the .csproj files themselves; I am probably revealing myself as a non-guru by this admission, but I hate doing that.