Ben Cantrick (mackys) wrote,
Ben Cantrick
mackys

  • Mood:

In-memory B-trees - faster (and smaller) than binary trees.


I found John-Mark Gurney’s B-tree via google search. It is well coded and full of clever ideas. The original version has small memory footprint, but it is not as fast as STL’s red-black tree. I studied this source codes and think I should be able to further optimize it. In the end, I got my kbtree.h macro library. As you can see in my hash table benchmark, the modified version beats STL set while using even smaller memory than the original version. I think I am now at the position to say: at least for some applications, B-tree is a better ordered data structure than most of binary search trees.

The most attractive feature of B-tree is its small memory usage. A binary tree needs at least two pointers for each record, which amounts to 16N on a modern 64-bit systems. A B-tree only needs one pointer. Although in a B-tree each node may not be full, a sufficiently large B-tree should be at least 50% full by definition and in average around 75% full. On a 64-bit system, the extra memory is only 8N/0.75+K(1-1/0.75)=(10+0.3K)N, where K is the size of a key. In fact we can do even better as we do not need to allocate the null pointers in leaves. The practical memory overhead can be reduced to below (5+0.3K)N (in fact, as the majority of nodes in a B-tree are leaves, the factor 5 should be smaller in practice), far better than a binary search tree. On speed, no binary search tree with just two additional pointers can achieve the best performance. We usually need at least a pointer to the parent (AVL tree and standard red-black tree) or a random number (treap) to get good performance. B-tree is different. It is even faster than the standard red-black tree while still using (5+0.3K)N extra memory! People should definitely pay more attention to B-tree.

The modified B-tree is available here as a single C header file. Example is here. Currently, the APIs are not very friendly but are ready to use. In case you want to give a try. Note that you must make sure the key is absent in the tree before kb_put() and make sure the key is present in the tree before calling kb_del().
Tags: reddit
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 6 comments