I've been champing at the bit to actually write some F#, but haven't been willing to risk sneaking it into production at the office, and unable to put in extra time at home to learn it. I have read Robert Pickering's
Foundations of F# while commuting, but that's not verifiable experience in functional programming. As to why I would care whether I learn functional programming or not, let me plug the
FringeDC interest group, founded by the estimable Dr. Conrad Barski. Not only have I found the members to be charming and personable, but they also seem much more incisive about what programming is than most developers I meet. They're not, for the most part, doing enterprise-scale stuff, where decisions about programming languages, database vendors, and so on would probably be taken out of their hands. Unless they're
Paul Graham, who got rich writing LISP. Good for him! Anyway, lots of smart people write Java, or even VB. Nonetheless, the kicks in the butt that I've gotten at FringeDC, as well as from graduate classes at George Mason U., have led me to feel that what would do me the most good, and what I would most enjoy, is something more fundamental, theoretical, than spending every weekend preparing for MCAD exams. Different strokes.
Alright, enough about that. Languages that offer a REPL seem really suited to scripting. Yes, my friends at FringeDC would probably do everything in Emacs Scheme, but...I just don't know how anymore. Which I'm not proud of. Visual Studio has actually spoiled me to the point where I wouldn't know how to run some XML through an XSLT from the command line. I would actually have to take time out to figure it out. OK, really enough. Here's my task: I've got some XML describing the expected return values from some linear programming models, and it's tacked on top of some older regression testing scripts that just list the files to be solved by particular solvers. The XML file is called
knownSolutions.xml, and the scripts are called
solveBatch.cplex.txt and
solveBatch.lpSolve.txt. Here's what I want to do: load the XML file, and, when a particular solver is not supposed to solve a model, add a corresponding attribute to the element. I've changed the schema:
<xs:attribute name="SkipSolvers" type="solverTypeList" use="optional">
...
<xs:simpleType name="solverType">
<xs:restriction base="xs:string">
<xs:enumeration value="lpSolve" />
<xs:enumeration value="Cplex" />
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="solverTypeList">
<xs:list itemType="solverType" />
</xs:simpleType>
So if we have the XML:
<KnownSolutions>
<Model skipsolvers="lpSolve">
<File>TestModels\JasonCh3Net.xml</File>
<ObjectiveValue>196200</ObjectiveValue>
</Model>
</KnownSolutions>
...then I'd like to add attributes like so:
<Model SkipSolvers="lpSolve" .../<
So I need to load the XML from the F# console:
> open System.Xml;; # Import the assembly and the corresponding namespace.
# The F# console waits for two semicolons before it interprets
# your statement.
> let document = new XmlDocument();; # Infer the variable's type.
val document : XmlDocument
> document.Load @"knownSolutions.xml";;
val it : unit = () # The Load() method returns void, in other words.
Now at this point I've got to make some decisions about, you know, algorithms and data structures. I've got a tree—XML—and I've got a list of files that I can't assume are sorted. So why don't I read them in, filter out the empty lines, sort them, and then iterate across the XML's elements looking for files that
aren't in the list for Cplex. Since I'm going to be looking things up in the list, I'll make a set out of it first. I should point out that, in contrast to imperative programming, we do
not want to open the file and loop over the lines. We want to treat the file as a sequence of lines, no different than
IEnumerable<string>. I say "we," but I mean "I": I just don't want to write nested loops anymore. But how do I deal with the EOF? I use a "generator" that has a special way of "terminating" the sequence of lines it reads. This is the "pipes and filters" idiom familier to UNIX folks. F#'s Seq class provides a
generate_using function that thoughtfully calls IDispose() on the object created at the beginning of the generation.
> let opener () = File.OpenText("something.txt");;
val opener : unit -> StreamReader
> > opener;;
val it : (unit -> StreamReader) = <fun:clo@0_3>
What that means is that opener is a function with no arguments (if I'd left off the parentheses before the equals sign, F# would assign the return value of
OpenText() to
opener, which is not what I want), that knows how to return a StreamReader.
generate_using takes that residual value (I don't think it's called a return value,
per se) and passes it repeatedly to the actual generator, ergo:
> let liner (reader : StreamReader) =
- if reader.EndOfStream then None
- else Some(reader.ReadLine());;
val liner : StreamReader -> string option
option means "either a string, or nothing." Don't think of it as null;
F# doesn't really have null. None will mark the end of the generated sequence. generate_using keeps sending the stream as an argument to the "closure" liner, and liner decides whether it's got something worth returning; if not, it signals to the next thing in line, "I'm done." I don't need to point out how parallelizable this "pipes and filters" idiom is; whatever is waiting for a line of text only needs one line to do its job. Anyway, now I'll filter out the empty strings, and put all of them in a hashed set for fast lookup (I could also just remove the empty string from the hashed set after populating it). I'll create the filter separately, rather than inline:
> let filter s = (s <> "");;
val filter : string -> bool
The static
Seq.filter method requires a function that takes a sequence element and returns a Boolean value, in this case false if the string is empty. In order to create a set that's initialized to include all the values in the generated sequence, I do:
> let lpSolveSet = HashSet<string>.Create(
- Seq.generate_using opener liner
- |> Seq.filter (fun s -> (s <> ""))
- );;
val lpSolveSet : HashSet<string>
Whew! I don't know if F#'s type inference would have let me leave out the <string>. Putting it together, as I suppose I could have done in 5 minutes if I were a PowerShell guru:
> Seq.generate_using (fun () -> File.OpenText @"solveBatch.lp_solve.txt")
- (fun reader -> if reader.EndOfStream then None else Some(reader.ReadLine()))
- |> Seq.filter (fun s -> (s <> ""))
- |> Seq.map (fun s -> lpSolveSet.Add(s));;
val it : seq<unit> = seq [null; null; null; null; ...]
> lpSolveSet;;
val it : HashSet<string>
= seq
["JasonCh3Net.xml"; "JohnCh4Net.xml"; "OffsetNet.xml";
"WithinClauseTestNet.xml"; ...]
>
In other words, generate a sequence of lines from a file , pass them through a filter that removes empty strings, and send the resulting sequence to the HashSet's constructor. I could also have piped the filtered sequence to a closure that would add each string to the set in turn. Whatever. Now...
> let models = document.SelectNodes "/KnownSolutions/Model";;
val models : XmlNodeList
> models;;
val it : XmlNodeList
= seq
[seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]];
seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]]; ...]
>
Oh, so it's a sequence! Now I'm rolling, so:
> Seq.hd models;;
Seq.hd models;;
-------^^^^^^^
stdin(131,7): error: FS0001: The type XmlNodeList is not compatible with the type seq<'a>
stopped due to error
Seq.hd ("head") is like LISP's
car, btw. Hm, it's not a sequence. Embarrassing. I scratched my head for a few minutes, then realized that I should use a list comprehension:
> { for o in models -> o :?> XmlElement };;
val it : seq
= seq
[seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]];
seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]]; ...]
>
I apply the downcast operator to each element; now I have a strongly typed sequence, so F# can use type inference on the next element in the chain:
> let attrAdder (e : XmlElement) =
- if not (lpSolveSet.Contains(e.InnerText)) then begin
- let attr = document.CreateAttribute("SkipSolvers") in begin
- attr.Value <- "lpSolve";
- e.Attributes.Append(attr)
- end
- end;;
e.Attributes.Append(attr)
-----------------^^^^^^^^^^^^^
stdin(87,17): error: Type constraint mismatch. The type
XmlAttribute
is not compatibile with type
unit.
The type XmlAttribute is not compatible with the type unit
stopped due to error
What? I scratched my head over this one. In the end, the error message makes perfect sense: because there's no "alternative," the "consequent" is not allowed to return a value, and XmlAttributeCollection.Append() does. So we have to pipe it into oblivion, hence:
> let attrAdder (e : XmlElement) =
- if not (lpSolveSet.Contains(e.InnerText)) then begin
- let attr = document.CreateAttribute("SkipSolvers") in begin
- attr.Value <- "lpSolve";
- e.Attributes.Append(attr) |> ignore
- end
- end;;
val attrAdder : XmlElement -> unit
So this is the little function object that will add the required attribute to the element if the lpSolve solver is not supposed to solve it. Then:
> { for o in models -> o :?> XmlElement } |> Seq.iter adder;;
val it : unit = ()
And lo and behold, I get this:
> {for node in document.SelectNodes "/KnownSolutions/Model" -> node :?> XmlElement } |> Seq.iter adder;;
System.ArgumentException: The named node is from a different document context.
at System.Xml.XmlAttributeCollection.Append(XmlAttribute node)
at FSI_0014.adder(XmlElement e)
at FSI_0074.it@150.Invoke(XmlElement e@51_6)
at Microsoft.FSharp.Collections.Seq.iter@868.Invoke(IEnumerator`1 e@90_1)
at Microsoft.FSharp.Core.Operators.using[T,U](T ie, FastFunc`2 f)
at Microsoft.FSharp.Collections.Seq.iter[T,U](FastFunc`2 f, U ie)
at <StartupCode>.FSI_0074._main()
stopped due to error
After about an hour, I realized that the function object that's adding the attributes is a
closure: that means that it encapsulates a reference to a previous value of
document. That's what "closure" means, silly! I create a new closure that refers to the new XML document, rerun, and I'm done, almost:
> document.Save @"knownSolutionsImproved.xml"