Wednesday, January 16, 2008

Huffman Coding, SICP Exercise 2.69

As a convenience, I defined Incidence as System.Collections.Generic.Dictionary. F#'s Map type
is actually a functor, and each insertion or removal returns a new instance. What I'd really like is something like the FindOrAdd() method available in the C5 dictionary implementations, but I can live without it. Anyway, I intend to fold a sequence of characters into a dictionary, for which purpose I'll first define the function that will do the folding:

> let count (d : Incidence) (c : char) =
- let n = ref 0
- // I need a "ref" variable to serve as an "out" parameter.
- if d.TryGetValue(c, n) then
- // Assign to a mutable property...
- d.[c] <- !n + 1
- else
- d.Add(c, 1)
- d;;
val count : Incidence -> char -> Incidence

I'm getting good at this:

> let d = "Tony Nassar, Software Engineer" |> Seq.fold count (new Incidence());;

val d : Incidence

> d;;
val it : Incidence
= dict
[('T', 1); ('o', 2); ('n', 3); ('y', 1); (' ', 3); ('N', 1); ('a', 3);
('s', 2); ('r', 3); (',', 1); ('S', 1); ('f', 1); ('t', 1); ('w', 1);
('e', 3); ('E', 1); ('g', 1); ('i', 1)]
>
I did not know until today that System.String implements IEnumerable! And, fortunately, F#'s type inference lets me pipe that sequence of characters into my count() function. There are two steps that I need to take now: I need to create a sequence sorted by frequency (actually, a mapping from frequency to character, where the frequencies don't have to be unique) and turn it into a tree. This took me a while. Unlike C#, F# only allows mutually recursive type definitions if they're linked with "and." Here I'm declaring HuffmanNode as a "discriminated union," where Leaf and Tree are implemented as tags (you can verify this by using Reflector to examine the compiled DLL), and a leaf consists of the character to be encoded and its frequency (the latter is necessary during the construction of the tree), and a tree (or any non-leaf node) is...well, it's kinda obvious. If this were C, and someone were using a union and a type code, I'd say "Yuck!" But since it's functional programming, and this is a very handy way to build up a tree, it's suddenly cool again.

> type HuffmanNode =
- | Leaf of HuffmanLeaf
- | Tree of HuffmanTree
- and HuffmanLeaf = { Count : int; Char : char }
- and HuffmanTree = { Count : int; Chars : CharSet; Left : HuffmanNode; Right : HuffmanNode };;

type HuffmanNode =
| Leaf of HuffmanLeaf
| Tree of HuffmanTree
and HuffmanLeaf = {Count: int;
Char: char;}
and HuffmanTree = {Count: int;
Chars: CharSet;
Left: HuffmanNode;
Right: HuffmanNode;}

>

No comments: