Experilous

A few weeks ago I introduced you to what I felt was an impressive little data structure that can provide constant time complexity for the standard operations, stores data contiguously, and provides stable handles that remain valid across all operations other than removing the referenced items themselves.  Today I want to extend the basic data structure with some additional functionality which will be focused on iterating over the contiguous items in order.  While iterating, we may wish to modify the collection as we go (such as conditionally removing some items).  We may also wish to rearrange the items so that a simple linear traversal will also happen to hit the items in a particular order (such as sorting 3D objects from front to back to take advantage of a depth buffer and reduce fill-rate).

Here is the JavaScript code from last time, implementing the core data structure:

For this post, we won’t need to modify any of this code.  We’ll only be adding some new functions.  Let’s begin by supporting a linear traversal of all items.  There are various styles that could be used for this, but I’ll keep it simple and use basic integer indexes as direct offsets into the dense array.

What if, while traversing the collection using the above functions, we want to store a stable handle to one of the items?  The dense index is not guaranteed to remain stable, so we’ll need the sparse index.  This is an easy lookup.

What if we want to conditionally remove items as we traverse the full collection?  We could now use the above function getHandleAt(), in conjunction with remove() that we wrote last time.  Alternatively, we could write a removeAt() function that saves a bit of complexity:

While it is true that this function affects not on the item removed, but also the last item in the dense array, this doesn’t prevent us from using the removeAt() function in the middle of a traversal.  If we are traversing from front to back, we just need to not increment our index whenever we remove an item, since that index now references an object we have not yet seen, or it is one past the end of the array and our terminating condition will break us out of the loop.  If we are traversing from back to front, on the other hand, we know that the index now references an item that we have already seen, so we continue to decrement the index every time, whether or not we remove the current item.

Note also that we can call add() whenever we want, but only during a forward traversal, since it will always push the new item onto the back of the dense array.  Here are some example loops:

So linear traversal is pretty easy now, but what if we need a different order?  Just run a sort algorithm of your choice over the array, of course.  But the dense array and the sparse array need to remain synchronized, so it’s a little tougher than that.  Most sort algorithms can be abstracted with two binary operations:  compare and swap.  Comparison is dependent upon the objects stored, so we don’t need to focus on that here.  But the swap operation is definitely affected by the details of our data structure.  Fortunately it isn’t too complicated.

We could now use the above with any basic sort algorithm that accepted both a comparator and a swap function. (Unfortunately, they are usually written to take only a comparator, and presume that a trivial swap is sufficient.)

This is good enough for basic usage. However, when I chose to use insertion sort as my default, I realized that it was going to call swapAt() a lot, and in a way that was wasteful. In particular, the item being inserted into its correct position would be read and written repeatedly, shifting by one index each time, rather than being inserted directly into the target location. So I went ahead and wrote a sort function that manually kept the sparse and dense maps synchronized, rather than delegating that responsibility to swapAt().

With that function, you can now sort once and then traverse the collection in the desired order numerous times.  Keep in mind that remove() is not order-preserving.  Unless you can guarantee that you only remove from the end, you’ll probably need to call sort() again after performing one or more removals.  The same is true for add(), unless you can guarantee that all newly added items belong at the end at the time they are added.

Fortunately, some algorithms like insertion sort are quite efficient when a collection is mostly sorted, and performing just a few removals or additions with this data structure only slightly disrupts the order of items.  Even better, if you only need your data to be mostly sorted anyway (such as with the depth sorting example mentioned in the intro), you can operate for quite a while without resorting, and then only resort occasionally, whether at a fixed interval, after you’ve done a fixed number of additions or removals, or based on some other criteria of your choice.


No Comments

Leave a comment