Ben Cantrick (mackys) wrote,
Ben Cantrick

  • Mood:

Binary heaps - know 'em, love 'em.

One of the most neglected data structures in the CS repertoire is the heap. Heaps are an extremely simple but powerful structure that is used for maintaining a collection of values from which you want to be able to quickly remove the largest object quickly. This may sound trivial, but this turns out to be an incredibly useful structure. You can use this basic structure to implement a very fast sorting algorithm; to maintain a prioritized set of values/tasks; to manage chunks of memory; etc.

In a heap, what we want is to be able to add an element to the heap quickly, and to be able to remove the largest element quickly. We're not going to remove arbitrary elements of the heap: we only remove the largest. We can't query the heap - that is, we can't ask "Is X in the heap?" - all we can do is remove the largest value. And while providing these properties, we want to use as little memory as possible.

Now, I haven't yet defined what I mean by "quickly" in the above definition. There's a good reason for that: there are a set of tradeoffs that produce different solutions. For now, we'll settle for one version of "quickly": O(lg n) - that is, in the worst case, the amount of time it takes to either remove the largest value, or to add a new value, is proportional to the logarithm of the number of values in the heap. We can provide that kind of performance with a kind of heap called a binary heap.

This is seriously the best explanation of how heaps work that I've ever read. I've read two books on algorithms, including the book used at MIT, and none of them had nearly this good an explanation of how you insert and remove nodes from a heap.

He didn't quite have space to talk about how you can store a heap in an array for extreme storage efficiency, but you can and it's not hard either - "for node whose value is stored at index N: store the left child's value at index N*2+1, and the right child's value at index N*2+2".

He also didn't have time to talk about the O(n) method of building a heap from a random list of values, but that's cool. If you need performance so badly and/or are working on data sets so large that the difference between nlog(n) and n matters, you'll have time to look up the details.
Tags: reddit
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.