Thursday, 14 April 2011

Balancing Deletions

Those who have known me for a long time are familiar with Rhys' First Law of Balanced Trees:

There is no such thing as a balanced tree delete algorithm.

Since it was first codified in 1989 the law has seen wide application: AVL trees, B-trees, red-black trees, you name it.  Open up any textbook on algorithms and you'll usually find something like this (Knuth, volume 3, section 6.2.4):

Exercises
...
7. [23] Design a deletion algorithm for B-trees.
"Algorithms in C++", Sedgewick, 1992 doesn't even mention deletion in the section on red-black trees.

Try it with your textbook.  Deletion is always left as an exercise for the reader.  I began to suspect that none of the authors had ever managed to implement deletion themselves.  But rather than admit ignorance they instead preferred to torture their students with mythical algorithms that were "too big to fit in the margin".

Over the years, I just gave up and added a boolean flag, "deleted", to every node and ignored the heap space cost.  Whenever someone has said "You're wrong, libfoobar implements it", I've looked and found only unreadable heuristic gobbledegook.  Nothing like a textbook treatment.  "Trust me!  It works!".  Sure it does Sonny.

Well, after 20-odd years, the law has finally been disproved.  Wikipedia's page on red-black trees actually includes code and a decent description of how deletion works.  After wading through the implementation and writing tons of unit tests, I can confirm that it does indeed work as advertised.

For the first time in my life I've actually implemented a balanced tree delete that actually works!  Woohoo!

I can now use red-black trees to do clause indexing for quickly locating a relevant clause amongst thousands (or even gabillions!) of Plang database facts.

Code here https://github.com/rweather/plang/blob/master/src/libplang/rbtree.c

References: Wikipedia, Literate Programs.

No comments:

Post a Comment