Association maps in mit-scheme
I recently some did some benchmarking of persistent data structures in mit-scheme for my UROP. There were a few questions we were interested in:
- For what association sizes does a fancier data structure beat out your plain old association list?
- What is the price of persistence? That is, how many times slower are persistent data structures as compared to your plain old hash table?
- What is the best persistent data structure?
These are by no means authoritative results; I still need to carefully comb through the harness and code for correctness. But they already have some interesting implications, so I thought I'd share. The implementations tested are:
- assoc, association lists
- hamt, hash array mapped tries (hand-implemented, using 26-bit fixnums)
- hash-table, hash tables
- prb-tree, persistent red-black trees (hand-implemented)
- wt-tree, weight-balanced trees
All implementations use eq? for key comparison.
Unsurprisingly, assoc beats out everyone else, since all it has to do is a simple cons. However, there are some strange spikes at regular intervals, which I am not sure of the origin; it might be the garbage collector kicking in.
Of course, you pay back the cheap updates in assoc with a linear lookup time; the story also holds true for weight-balanced trees, which have fast inserts but the slowest lookups.
The hamt really flies when the key isn't present, even beating out hash-tables until 15 elements or so.
Source code for running the benchmarks, our home-grown implementations, and graphing can be found at the scheme-hamt repository.