Alright, enough about that. Languages that offer a REPL seem really suited to scripting. Yes, my friends at FringeDC would probably do everything in Emacs Scheme, but...I just don't know how anymore. Which I'm not proud of. Visual Studio has actually spoiled me to the point where I wouldn't know how to run some XML through an XSLT from the command line. I would actually have to take time out to figure it out. OK, really enough. Here's my task: I've got some XML describing the expected return values from some linear programming models, and it's tacked on top of some older regression testing scripts that just list the files to be solved by particular solvers. The XML file is called knownSolutions.xml, and the scripts are called solveBatch.cplex.txt and solveBatch.lpSolve.txt. Here's what I want to do: load the XML file, and, when a particular solver is not supposed to solve a model, add a corresponding attribute to the element. I've changed the schema:
<xs:attribute name="SkipSolvers" type="solverTypeList" use="optional">So if we have the XML:
...
<xs:simpleType name="solverType">
<xs:restriction base="xs:string">
<xs:enumeration value="lpSolve" />
<xs:enumeration value="Cplex" />
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="solverTypeList">
<xs:list itemType="solverType" />
</xs:simpleType>
<KnownSolutions>...then I'd like to add attributes like so:
<Model skipsolvers="lpSolve">
<File>TestModels\JasonCh3Net.xml</File>
<ObjectiveValue>196200</ObjectiveValue>
</Model>
</KnownSolutions>
<Model SkipSolvers="lpSolve" .../<
So I need to load the XML from the F# console:
> open System.Xml;; # Import the assembly and the corresponding namespace.
# The F# console waits for two semicolons before it interprets
# your statement.
> let document = new XmlDocument();; # Infer the variable's type.
val document : XmlDocument
> document.Load @"knownSolutions.xml";;
val it : unit = () # The Load() method returns void, in other words.
Now at this point I've got to make some decisions about, you know, algorithms and data structures. I've got a tree—XML—and I've got a list of files that I can't assume are sorted. So why don't I read them in, filter out the empty lines, sort them, and then iterate across the XML's elements looking for files that aren't in the list for Cplex. Since I'm going to be looking things up in the list, I'll make a set out of it first. I should point out that, in contrast to imperative programming, we do not want to open the file and loop over the lines. We want to treat the file as a sequence of lines, no different than IEnumerable<string>. I say "we," but I mean "I": I just don't want to write nested loops anymore. But how do I deal with the EOF? I use a "generator" that has a special way of "terminating" the sequence of lines it reads. This is the "pipes and filters" idiom familier to UNIX folks. F#'s Seq class provides a generate_using function that thoughtfully calls IDispose() on the object created at the beginning of the generation.
> let opener () = File.OpenText("something.txt");;What that means is that opener is a function with no arguments (if I'd left off the parentheses before the equals sign, F# would assign the return value of OpenText() to opener, which is not what I want), that knows how to return a StreamReader. generate_using takes that residual value (I don't think it's called a return value, per se) and passes it repeatedly to the actual generator, ergo:
val opener : unit -> StreamReader
> > opener;;
val it : (unit -> StreamReader) = <fun:clo@0_3>
> let liner (reader : StreamReader) =
- if reader.EndOfStream then None
- else Some(reader.ReadLine());;
val liner : StreamReader -> string option
option means "either a string, or nothing." Don't think of it as null; F# doesn't really have null. None will mark the end of the generated sequence. generate_using keeps sending the stream as an argument to the "closure" liner, and liner decides whether it's got something worth returning; if not, it signals to the next thing in line, "I'm done." I don't need to point out how parallelizable this "pipes and filters" idiom is; whatever is waiting for a line of text only needs one line to do its job. Anyway, now I'll filter out the empty strings, and put all of them in a hashed set for fast lookup (I could also just remove the empty string from the hashed set after populating it). I'll create the filter separately, rather than inline:
> let filter s = (s <> "");;
val filter : string -> bool
The static Seq.filter method requires a function that takes a sequence element and returns a Boolean value, in this case false if the string is empty. In order to create a set that's initialized to include all the values in the generated sequence, I do:
> let lpSolveSet = HashSet<string>.Create(
- Seq.generate_using opener liner
- |> Seq.filter (fun s -> (s <> ""))
- );;
val lpSolveSet : HashSet<string>
Whew! I don't know if F#'s type inference would have let me leave out the <string>. Putting it together, as I suppose I could have done in 5 minutes if I were a PowerShell guru:
> Seq.generate_using (fun () -> File.OpenText @"solveBatch.lp_solve.txt")
- (fun reader -> if reader.EndOfStream then None else Some(reader.ReadLine()))
- |> Seq.filter (fun s -> (s <> ""))
- |> Seq.map (fun s -> lpSolveSet.Add(s));;
val it : seq<unit> = seq [null; null; null; null; ...]
> lpSolveSet;;
val it : HashSet<string>
= seq
["JasonCh3Net.xml"; "JohnCh4Net.xml"; "OffsetNet.xml";
"WithinClauseTestNet.xml"; ...]
>
In other words, generate a sequence of lines from a file , pass them through a filter that removes empty strings, and send the resulting sequence to the HashSet's constructor. I could also have piped the filtered sequence to a closure that would add each string to the set in turn. Whatever. Now...
> let models = document.SelectNodes "/KnownSolutions/Model";;
val models : XmlNodeList
> models;;
val it : XmlNodeList
= seq
[seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]];
seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]]; ...]
>
Oh, so it's a sequence! Now I'm rolling, so:
Seq.hd ("head") is like LISP's car, btw. Hm, it's not a sequence. Embarrassing. I scratched my head for a few minutes, then realized that I should use a list comprehension:
> Seq.hd models;;
Seq.hd models;;
-------^^^^^^^
stdin(131,7): error: FS0001: The type XmlNodeList is not compatible with the type seq<'a>
stopped due to error
I apply the downcast operator to each element; now I have a strongly typed sequence, so F# can use type inference on the next element in the chain:
> { for o in models -> o :?> XmlElement };;
val it : seq
= seq
[seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]];
seq [seq [seq []]; seq [seq []]]; seq [seq [seq []]; seq [seq []]]; ...]
>
What? I scratched my head over this one. In the end, the error message makes perfect sense: because there's no "alternative," the "consequent" is not allowed to return a value, and XmlAttributeCollection.Append() does. So we have to pipe it into oblivion, hence:
> let attrAdder (e : XmlElement) =
- if not (lpSolveSet.Contains(e.InnerText)) then begin
- let attr = document.CreateAttribute("SkipSolvers") in begin
- attr.Value <- "lpSolve";
- e.Attributes.Append(attr)
- end
- end;;
e.Attributes.Append(attr)
-----------------^^^^^^^^^^^^^
stdin(87,17): error: Type constraint mismatch. The type
XmlAttribute
is not compatibile with type
unit.
The type XmlAttribute is not compatible with the type unit
stopped due to error
So this is the little function object that will add the required attribute to the element if the lpSolve solver is not supposed to solve it. Then:
> let attrAdder (e : XmlElement) =
- if not (lpSolveSet.Contains(e.InnerText)) then begin
- let attr = document.CreateAttribute("SkipSolvers") in begin
- attr.Value <- "lpSolve";
- e.Attributes.Append(attr) |> ignore
- end
- end;;
val attrAdder : XmlElement -> unit
> { for o in models -> o :?> XmlElement } |> Seq.iter adder;;
val it : unit = ()
And lo and behold, I get this:
After about an hour, I realized that the function object that's adding the attributes is a closure: that means that it encapsulates a reference to a previous value of document. That's what "closure" means, silly! I create a new closure that refers to the new XML document, rerun, and I'm done, almost:
> {for node in document.SelectNodes "/KnownSolutions/Model" -> node :?> XmlElement } |> Seq.iter adder;;
System.ArgumentException: The named node is from a different document context.
at System.Xml.XmlAttributeCollection.Append(XmlAttribute node)
at FSI_0014.adder(XmlElement e)
at FSI_0074.it@150.Invoke(XmlElement e@51_6)
at Microsoft.FSharp.Collections.Seq.iter@868.Invoke(IEnumerator`1 e@90_1)
at Microsoft.FSharp.Core.Operators.using[T,U](T ie, FastFunc`2 f)
at Microsoft.FSharp.Collections.Seq.iter[T,U](FastFunc`2 f, U ie)
at <StartupCode>.FSI_0074._main()
stopped due to error
> document.Save @"knownSolutionsImproved.xml"
No comments:
Post a Comment