You've got to know where you've been to know where you're going [Some background on Charting's ordered multiple dictionary implementation]

Wednesday, May 27, 2009

As I've said before, one of our key goals for the March 09 release of Silverlight/WPF Charting was to improve the performance of key customer scenarios. I didn't go into a lot of details with the release notes, but one of the ways we accomplished this was to change some code that had been doing a linear search to use a binary search instead. (Example: Finding the high/low data point values as part of the process of setting the range of an axis.) If this optimization seems obvious and makes you think "golly, they should have done it that way in the first place", you're absolutely right. :) It's not that we didn't want to do this earlier; it was just that we didn't have the resources to do so...

What we hoped to take advantage of was some already-written-and-tested class implementing an ordered multi-dictionary (a kind of associative array) that could be dropped right into the Silverlight Toolkit source code and used without concern. After we found a suitable implementation, we established that it did, indeed, improve performance on non-trivial data sets in the way that we hoped. Unfortunately, something came up at the last minute and we decided not to proceed with the code we'd been using for the previous few weeks. That left us in kind of a funk because we didn't want to give up the performance improvements we'd already seen...

So I set aside some other tasks and dashed off a quick binary tree implementation to do what we needed and preserve the performance gains from faster searching. The resulting code for BinaryTree is part of the Charting source code and can be viewed here or as part of the Silverlight Toolkit download. It's fairly simple and straightforward, though there are a few things worth calling out:

A binary tree (indexed by key, duplicates replace, values don't matter) doesn't follow quite the same semantics as an ordered multi-dictionary (indexed by key, duplicates do not replace, values matter). I needed to come up with an easy way to merge the two notions and the trick I came up with was to change the search function (named KeyAndValueComparison) from being key-based to being key-and-value-based. Simply by incorporating the value into the comparison, I pretty easily created a multi-dictionary (i.e., something that can store multiple values for the same key). What's more, it automatically clusters by key and orders the values under every key! The only remaining problem is what to do with key+value duplicates - and the answer is simple: store them all together in the tree. By relaxing the binary tree definition slightly to allow for same-valued nodes, it's easy enough to store identical nodes "beside" each other. The same search/remove logic still applies under the new rules - all that changes is a slight tweak to the add logic!
This implementation is just a simple binary tree, so there's no guarantee of balance like there is with more sophisticated algorithms. In fact, because the rules about where new nodes go are strict, there's no freedom when it comes time to place a new node. The only time there's any choice is when a key+value duplicate is added (or removed) - once the matching node in the tree is found, the new node could be added to its left or its right. In a feeble attempt to create balance, this BinaryTree implementation alternates left/right in these cases. Unfortunately, it's not likely to help much because key+value duplicates are typically not the primary scenario...
Because it was an easy performance win, this implementation uses an iterative add implementation (vs. the traditional recursive one). The delete operation is somewhat more complex and follows the usual recursive approach - which can be problematic for certain inputs. :( Imagine the scenario of adding 1000 elements to a BinaryTree where the keys are already in numerical order. (By the way, this isn't as unlikely as it sounds; people frequently chart data that is already sorted!) In this case, the tree will "go linear" and all the nodes will lie in a line off to the right side. Conveniently, removing the same nodes in the same order (the common scenario for Charting!) is wicked fast - the node that gets removed is always the root node which is both fast to find and fast to update. However, if one were to remove the nodes in reverse order, there's a good chance the stack would overflow instead. :( This is because the node that should be removed is at the end of a long chain of 1000 nodes and the recursive remove calls build up quickly and soon overwhelm the system. So please don't do that with this implementation!
One of the handy helper methods I ended up writing is Traverse which does an iterative inorder walk of the entire tree and returns a sequence of items selected by the selection function from nodes that match the comparison function. Traverse makes it easy to build more sophisticated methods - GetKeys, GetValuesForKey, and GetValuesForAllKeys are all simple 1-liners thanks to the power of Traverse. Similarly, the GetExtreme method does a binary search according to the successor function and returns some aspect of the most extreme node according to the selector function. MinimumKey, MaximumKey, MinimumValue, and MaximumValue are all simple 1-liners thanks to the flexibility of GetExtreme.
BinaryTree is a generic class and supports completely arbitrary key and value types. In order to avoid imposing unnecessary restrictions on the generic types (ex: that they implement the IComparable interface), BinaryTree's constructor takes a Comparison(T) delegate for comparing keys and another for comparing values. As long as you can write code to compare two keys/values of the type you've chosen according to the necessary contract, BinaryTree is happy to work with them!

So that's a bit of background on the BinaryTree class that's used by Silverlight/WPF Charting today. An unbalanced binary tree doesn't give the best performance in the world, but it was quick and easy to implement (under time pressure!) and gives a noticeable boost to many of the common scenarios we set out to improve.

And as it happens, this is all very relevant to the topic of my next post! Here's a hint; see if you can guess what it is... :)