/dev/code: March 2005

Thursday, March 31, 2005

Fibonacci heap consolidate function

While redoing my fib heap code in OCAML, I figured out a nice tail-recursive way to do part of the heap consolidation function. The pseudo-code of the algorithm, as specified in CLRS is:


CONSOLIDATE(H)
  for i := 0 to d(n[H])
    fo A[i] := NIL

  for each node w in the root list of H
    do x <= w
       d := degree[x]
       while A[d] != NIL
         do y := A[d]
           if key[x] > key[y]
             then exchange x <-> y
           FIB-HEAP-LINK (H,y,x)
           A[d] := nil
           d := d + 1
       A[d] := x
  min[H] := nil
  for i := 0 to D(n[H])
    do if A[i] != NIL
         then add A[i] to the root list of H
              if min[H] = nil or key[A[i]] < key[min[H]]
                then min[H] := A[i]

I tend to split this into two pieces. The first part (the "for each node w ...") creates an array of root-level nodes, each with a different degree (number of children). If two nodes have the same degree, one becomes the child of the other - the one with the lower key being the parent. Once this array has been created, the second part wipes out the old root list and re-inserts the new nodes, adjusting the min pointer to point to the node with the lowest key.

The second part, in OCAML, looks like this:


let fh_consolidate2 heap node =
  match node with
      Empty -> ()
    | Node(n) ->
        if is_empty heap.min then
          heap.min <- node
        else
          (fh_splice_nodes heap.min node;
           if (heap.compare_func n.data (as_fh_node_rec heap.min).data) then
             heap.min <- node);;

I call fh_consolidate2 using an iterator (see my earlier blog on partial functions if you don't understand how I can pass (fh_consolidate2 heap) this way):


Array.iter (fh_consolidate2 heap) a;;

The first part is more interesting because it is a pain to write in OCAML. The LISP version looks like this:


    (do* ((num-roots (fib-heap-node-list-length (fib-heap-min heap)) (decf num-roots))
      (x (fib-heap-min heap) next)
      (d (fib-heap-node-degree x) (fib-heap-node-degree x))
      (next (fib-heap-node-right x) (fib-heap-node-right x)))
     ((<= num-roots 0))
      (do ((y (aref a d) (aref a d)))
      ((null y))
    (when (funcall (fib-heap-compare-func heap)
               (fib-heap-node-data y) (fib-heap-node-data x))
      (let ((temp x))
        (setf x y)
        (setf y temp)))
    (fib-heap-link x y)
    (setf (aref a d) nil)
    (incf d))
      (setf (aref a d) x))

As you can see, this is pretty much an imperitive Lisp version of the pseudo-code.
One observation I made was that since I am clearing out the root level of the heap, I can go ahead and do that as I iterate through the heap. I created a destructive foreach that removes nodes from the heap as it goes through. Since the list is circular, the loop is done when a node's right pointer points to itself (meaning it is the last one):


let destructive_foreach f node =
  match node with
      Empty -> ()
    | Node(n) ->
        let rec foreach_iter curr next =
          fh_remove_from_list curr;
          (f curr);
          if next != curr then
            (foreach_iter next next.right)
        in
          foreach_iter n n.right;;

Now that destructive_foreach handles the "for each node w in the root list of H", I just need a better way to store the nodes in the A array. The "while A[d] != NIL" loop is really just a loop that looks for an empty slot in A. I rewrote the loop as a tail recursive procedure like this:


  let rec put_in_slot x =
    let d = x.degree in
      if is_empty a.(d) then
        a.(d) <- Node(x)
      else
        (let y = (as_fh_node_rec a.(d)) in
           a.(d) <- Empty;
           if heap.compare_func x.data y.data then
             (fh_link x y; put_in_slot x)
           else
             (fh_link y x; put_in_slot y))

Basically, look at the degree of x, if a[d] is empty, put x in there. Otherwise, pull the item out of a[d] and call it y. Link x and y together and then look for a slot to put the new joined-together nodes. Now, the guts of the consolidate routine just look like this:


    destructive_foreach put_in_slot heap.min;
    heap.min <- Empty;
    Array.iter (fh_consolidate2 heap) a;;

# posted by Mark Wutka @ 9:26 AM 0 comments

Sunday, March 27, 2005

Followup on "Duh!"

One of the things that has been bothering me about my "Duh!" moment is that it seems like a bit of a cop-out. I shouldn't have to use a list to represent a null reference - I should be able to represent it using the Empty | Node of 'a fh_node_rec. I decided to go back and make the code work the way I originally thought it should, and except for a few lingering bugs with decreasing a key value, I have the heap working with Empty representing null.

Two things that helped me a lot were the definition of an is_empty function:


let is_empty node =
    match node with
        Empty -> true
     | x -> false;;

and as as_fh_node_rec that is essentially like a cast (there may be a built-in way to do this in OCaml, I just don't know about it):


let as_fh_node_rec node =
    match node with
        Empty -> failwith "Expected fh_node_rec, found empty"
       | Node(n) -> n;;

When I need to refer to an fh_node_rec as a Node, I can just do Node(x). The combination of these things made the solution a little more palatable, but it is still a bit of an annoyance.

# posted by Mark Wutka @ 11:43 AM 0 comments

Wednesday, March 23, 2005

DUH!

I had a "Duh!" moment this evening. A "Duh!" moment is an "Aha!" moment where you say "Aha! I am an idiot!". I have been translating my Fibonacci Heap code from Lisp to OCAML. In my implementation, a heap node has left and right pointers for storing nodes in a circular list, and also parent and child pointers. Since a pointer could be null, I defined a type that could be either Empty or an fh_node_rec object, like this:


type 'a fh_node_rec = {
    mutable data : 'a;
    mutable parent : 'a fh_node;
    mutable left : 'a fh_node_rec;
    mutable right : 'a fh_node_rec;
    mutable child : 'a fh_node;
    mutable degree : int;
    mutable mark : bool;
} and 'a fh_node = Empty | Node of 'a fh_node_rec;;

So, mistake number one was that the left and right pointers were a different type from the parent and child. This got really complicated when I started processing the lists. When handling the left and right nodes, I could work directly with fh_node_rec types, but for parent and child, I had to use match, like:


match x with
    Empty -> ()
  | Node(xn) -> (* do something with xh as an fh_node_rec *)

I ended up defining multiple functions, some that worked with fh_node and others that worked with fh_node_rec. I started to have trouble keeping track of whether something was an fh_node or an fh_node_rec. I really had trouble when writing the consolidate function. In fact, I never could get it right.

I came across a paper called Design Patterns in OCaml by Antonio Vicente and Earl Wagner that suggested using a list to represent items that may be null, using [] to represent null. That's when I just wanted to slap my forehead for not thinking of it before. I was able to define a single record type:


type 'a fh_node = {
    mutable data : 'a;
    mutable parent : 'a fh_node list;
    mutable left : 'a fh_node;
    mutable right : 'a fh_node;
    mutable child : 'a fh_node list;
    mutable degree : int;
    mutable mark : bool;
};;

I was able to eliminate all the duplicate functions and I got the consolidate function written. It was much easier to see if a node was equal to [] than it was to see if it was either an Empty type or an fh_node. Once I get the rest of the code finished and working, I'll post it on my web site.

# posted by Mark Wutka @ 7:57 PM 0 comments

Partial functions in OCAML

OCAML supports something called "partial functions". Suppose you define a function that takes two arguments and adds them together:


let adder x y = x + y;;

You can obviously call adder 2 3 and expect to get back 5. OCAML lets you call adder 2 and get back a function that adds 2 to whatever argument you pass it. For example:


let add2 = adder 2;;
let add3 = adder 3;;
Printf.print "%d %d\n" (add2 5) (add3 7);;

You would see "7 10" printed out.

I must admit, my first reaction to this was "Neat. So what?" I am starting to see the usefulness of it, however. In Scheme, if I want to loop through a list and add 5 to every member of the list, I can do something like this:


(map (lambda (x) (+ x 5)) '(1 2 3 4 5 6))

In OCAML, you could do this:


List.map (function x -> (x + 5)) [1;2;3;4;5];;

Since I have defined my adder function, though, I can do this instead:


List.map (adder 5) [1;2;3;4;5];;

I think the trick to taking advantage of partial functions may be to determine which arguments are likely to change and place those last. For example, if I have a function called heap_consolidate, I can define it as:


let heap_consolidate x heap = (* do something with x and the heap *)

or I can do:


let heap_consolidate heap x = (* do something with x and the heap *)

If I am going to be passing a lot of different x values for the same heap, especially if I want to iterate through a list or an array, I might want to make heap the first parameter so I can do something like this:


List.map (heap_consolidate myheap) myheap.root;;

If heap is the second argument, I would have to do:


List.map (function x -> heap_consolidate x myheap) myheap.root;;

While partial functions may seem error-prone, I don't think that is the case. If you accidentally omit a function argument, OCAML's type checker will notice that you are using a function type where it is expecting you to use something else. Without type checking, partial functions seem pretty dangerous.

# posted by Mark Wutka @ 6:04 AM 1 comments

Friday, March 18, 2005

OCAML and Type inference

I have been playing around with OCAML a bit. I was amazed at how well it did in the Great Computer Language Shootout, so I decided to see what it was like.

After working with Scheme and Lisp for a while, OCAML is really not that different. Where you might do (let ((x 1) (y 2)) (something)) in Lisp, you do:
let x=1 and y=2 in something;;

I really haven't gotten into the meat of the language yet - I've really just been playing with the CAML part and not the O. Any way, OCAML was my first encounter with type inference (that I can recall, at least). OCAML is strictly typed, but you don't usually have to indicate the type of a variable. The OCAML compiler figures it out. If a function can accept different types of variables, that's okay, too.

For example, here is a simple recursive function that removes all occurrences of x from a list:


let rec remove x l =
    match l with
        [] -> []
    | h::t -> if x == h then remove x t else h::(remove x t)

Because of the usage, l must be a list. If I try to pass anything else as the second parameter to remove, I will get an error at compile time. The x parameter can be any type, but it must match the type of elements in l. So, I can do this:
remove 3 [1;2;3;4;5];;
or
remove 'B' ['A','B','C','D'];;

I can't do:
remove 'A' [1;2;3;4;5];;

The interesting thing is that these errors get caught at compile time. I don't declare any types in my definition of remove, the compiler just figures out the type restrictions at compile time. Now, if I add a print statement to print out x as an integer value, I am now constraining x to be only an integer (and l to be only a list of integers):


let rec remove2 x l =
    match l with
        [] -> []
    | h::t -> if x == h then ((Printf.printf "Removed %d\n" x);remove2 x t)
        else h::(remove2 x t)

Now I can still do:
remove2 3 [1;2;3;4;5];;
but I now get an error if I try to do:
remove2 'B' ['A','B','C','D'];;

Last night I took a little Chicken Scheme program that I had written to solve the puzzle USA+USSR=PEACE and rewrote it in OCAML. It is a brute force program and isn't the most efficient way to do it, but it only took a few minutes to write:


(require-extension srfi-1 srfi-13 format)
(define usa "USA")
(define ussr "USSR")
(define peace "PEACE")
 
(define digit-pool '(0 1 2 3 4 5 6 7 8 9))
(define letter-pool '(#\U #\S #\A #\R #\P #\E #\C))
 
(define (word-to-number word letter-table)
  (let ((sum 0))
    (string-for-each
      (lambda (x) (set! sum (+ (* 10 sum) (cdr (assoc x letter-table))))) word)
    sum))
 
(define print-solution (lambda (letter-table)
  (format #t "~A + ~A = ~A~%"
          (word-to-number usa letter-table)
          (word-to-number ussr letter-table)
          (word-to-number peace letter-table))))
 
(define test-solution (lambda (letter-table)
  (when (= (+ (word-to-number usa letter-table)
              (word-to-number ussr letter-table))
           (word-to-number peace letter-table))
    (print-solution letter-table))))
 
(define try-letter (lambda (letters digits letter-table)
  (cond ((null? letters) (test-solution letter-table))
        (#t
         (let ((curr-letter (first letters))
               (curr-letter-pool (cdr letters)))
          (for-each (lambda (digit)
            (try-letter curr-letter-pool
                        (delete digit digits)
                        (alist-cons curr-letter digit letter-table)))
        digits))))))
   
(try-letter letter-pool digit-pool '())

I was able to just about translate the program without any major rewriting:


let usa = "USA";;
let ussr = "USSR";;
let peace = "PEACE";;
 
let digit_pool = [0;1;2;3;4;5;6;7;8;9];;
let letter_pool = ['U';'S';'A';'R';'P';'E';'C'];;
 
let word_to_number word letter_table =
    let rec rec_word_to_number pos num =
        if pos >= (String.length word) then num
            else rec_word_to_number (pos + 1) (num * 10 +
                (List.assoc word.[pos] letter_table)) in
    rec_word_to_number 0 0;;
 
let print_solution letter_table =
    Printf.printf "%d + %d = %d\n" (word_to_number usa letter_table)
        (word_to_number ussr letter_table)
        (word_to_number peace letter_table);;
 
let test_solution letter_table =
    if (word_to_number usa letter_table) + (word_to_number ussr letter_table)
        = (word_to_number peace letter_table) then
            print_solution letter_table;;
 
let rec remove x l =
    match l with
        [] -> []
    | h::t -> if x == h then remove x t else h::(remove x t)
 
let rec try_letter letters digits letter_table =
    match letters with
        [] -> (test_solution letter_table)
    | curr_letter::curr_letter_pool ->
            List.iter (function digit ->
                (try_letter curr_letter_pool (remove digit digits)
                    ((curr_letter,digit)::letter_table))) digits;;
 
try_letter letter_pool digit_pool [];;

Although the code is pretty similar, the OCAML version runs more than twice as fast as the Chicken Scheme version. I think that a lot of the speed difference is due to the fact that OCAML knows the type of everything ahead of time. The cool thing, though, is that I didn't have to tell OCAML about the types, it figured them out on its own.

# posted by Mark Wutka @ 6:20 AM 1 comments

Thursday, March 10, 2005

Communicating with the outside world

Many languages provide something a mechanism to call into external libraries. I thought I'd mention two ways this is done, one in Lisp and another in Python. The example I am using for Python uses the dynamic libraries package. There is also an excellent package called ctypes that works well under Windows, Linux and MacOS X.

I have a C library that works with Directed Acyclic Word Graphs, specifically, the type of file used to store words for the Hasbro Scrabble game. I use the DAWG for automated cryptogram solvers as well. The C library is available at http://www.wutka.com/download/dawg.c. If you want to play with this, the dictionary file is available at http://www.wutka.com/download/twl98.daw. The 3 functions that you would care about are:


int load_dawg(char *filename); // Load the DAWG into memory
int match(char *word); // Returns non-zero if WORD is a valid word in the dawg
int longest_match(char *word); // Returns the length of the longest word found at the beginning
                               // of the string passed in. (MARKFOOBAR would return 4, 
                               // since MARK is valid, but MARKF isn't)

Under Linux, I compiled this into a shared library with the following commands:


gcc -fPIC -c dawg.c
gcc -shared -Wl,-soname,libdawg.so.1 -o libdawg.so.1 dawg.o

Now, from Python, I can use dl.open and dl.call to invoke these functions:


import dl

dawglib=dl.open("libdawg.so.1")

def dawg_open(filename):
    return dawglib.call("load_dawg", filename)

def dawg_match(word):
    return dawglib.call("match", word)

def dawg_longest_match(word):
    return dawglib.call("longest_match", word)

dawg_open("twl98.daw")

print "FOO=%d" % (dawg_match("FOO"))
print "AA=%d" % (dawg_match("AA"))
print "A=%d" % (dawg_match("A"))
print "MARK=%d" % (dawg_match("MARK"))
print "longest PRINTFOOBAR=%d" % (dawg_longest_match("PRINTFOOBAR"))

As you can see, I wrapped these calls with convenience methods. I do the same thing in the Lisp code below, which uses the Universal Foreign Function Interface.


(uffi:load-foreign-library #p"libdawg.so.1" :module "dawg" :supporting-libraries '("c"))

(uffi:def-function ("load_dawg" dawg-load-c) ((name :cstring)) :module "dawg" :returning :int)
(uffi:def-function ("match" dawg-match-c) ((word :cstring)) :module "dawg" :returning :int)
(uffi:def-function ("longest_match" dawg-longest-match-c) ((word :cstring))
  :module "dawg" :returning :int)

(defun dawg-load (filename)
  (let* ((c-filename (uffi:convert-to-cstring filename))
        (result (dawg-load-c c-filename)))
    (uffi:free-cstring c-filename)
    result))

(defun dawg-match (word)
  (let* ((c-word (uffi:convert-to-cstring word))
        (result (dawg-match-c c-word)))
    (uffi:free-cstring c-word)
    result))

(defun dawg-longest-match (word)
  (let* ((c-word (uffi:convert-to-cstring word))
        (result (dawg-longest-match-c c-word)))
    (uffi:free-cstring c-word)
    result))

(defun run-test ()
  (dawg-load "twl98.daw")
  (format t "FOO=~A~%" (dawg-match "FOO"))
  (format t "AA=~A~%" (dawg-match "AA"))
  (format t "A=~A~%" (dawg-match "A"))
  (format t "MARK=~A~%" (dawg-match "MARK"))
  (format t "longest PRINTFOOBAR=~A~%" (dawg-longest-match "PRINTFOOBAR")))

Again, I defined convenience methods, mostly because I needed to allocate and free the C strings. It's really nice when a language lets you use external library without having to write any glue code in C or some other language.

# posted by Mark Wutka @ 5:30 PM 0 comments

Wednesday, March 09, 2005

Fun With Lisp

I have occasionally played with Lisp and Scheme in the past, but never long enough to get a good feel for the language. Lately, however, I have spent a significant amount of time working with Lisp and I am really beginning to appreciate the beauty of the language.

I have started a web page for Lisp stuff where I will put some of the interesting things I come up with. So far, it has an implementation of the Fibonacci Heap, which I can tell you is a bear to implement. I also have an implementation of Dijkstra's Shortest Path algorithm and some code to parse Tiger/Line files.

Why do I think Lisp is cool?

First of all, the syntax is extremely simple. Yes, there are a lot of parentheses, but I think the reason that they stand out so much is that, apart from quotes, there's not a lot of punctuation required. You don't have to end statements with a semi-colon, or put commas between list items, or put curly braces around statements. It is simple and direct.

First class functions are really cool, as well. Although they are not unique to Lisp, first class functions are not found in Java, which is my current day-job-language. By saying a function is "first class", it means that a function has the same status as other data items. You can pass a function as a parameter to another function. You can store functions in other data structures. You could, for example, iterate over a list and apply each function in the list to a particular piece of data. This would make it easier to apply business rules, for example. You can do this in Java with Interfaces, it's just a bit easier in Lisp.

There are lots of functions for dealing with data structures. For all the different containers that Java has, it really falls down on the job when it comes to actually using them. One of the things that always impressed me about Smalltalk was the things you could do with containers (Ruby has this as well, and Python's list comprehensions are close enough). In Lisp, for example, there is the (map) function, which in its simple form is just like iterating through a list with a for loop.
For example:

(mapc #'print '(1 2 3 4 5 6 7))
loops through a list and calls the (print) function for each item in the list. There are numerous (map) functions, some return no result, some return a list of the result of applying the function. Where the (map) functions excel beyond a simple loop, however, is that it can use multiple lists and invoke functions that take more than one argument. A simple example would be adding together corresponding elements in two lists:

(mapcar #'+ '(1 2 3 4 5) '(6 7 8 9 10))
which returns the list
(7 9 11 13 15).

There are other cool things, one last one I'd like to mention is that a Lisp environment tends to be highly interactive. You can write some functions and immediately interact with them from a Lisp prompt.

# posted by Mark Wutka @ 6:48 PM 3 comments

/dev/code