Log in

No account? Create an account
Adventures in Engineering
The wanderings of a modern ronin.

Ben Cantrick
  Date: 2008-05-22 00:17
  Subject:   More retro than a slide-rule: the Curta hand-crank mechanical adder.
  Music:Vernian Process - Queen Of The Delta





Amusing little steampunk calculator. You enter a number using the sliders around the outside, and then turn the crank. The input number is added to whatever number is in the result register on the top. Pulling the handle upwards and turning does subtraction instead of addition. The ring is used to clear the result register and/or the crank count. The band around the top of the body can be rotated, to pre-multiply the input number (and crank count) by powers of ten.

(Am I looking a gift horse in the mouth by saying that you should be able to add by turning the handle clockwise and subtract by turning it counter-clock?)
2 Comments | Post A Comment | | Link

Ben Cantrick
  Date: 2008-05-22 01:34
  Subject:   Heapsort, Quicksort, and Entropy
  Tags:  reddit

Quicksort and heapsort have been thoroughly compared by Paul Hsieh. He says `I suspected that heapsort should do better than its poor reputation and I think these results bear that out.' In his tests, the best compiler (for either heapsort or quicksort) produced a heapsort that was about 20% faster than quicksort, in total CPU time. The question I'd like to address, however, is, why Heapsort uses more comparisons than quicksort. Paul Hsieh says `what struck me is that I could not see really why heapsort is slower than quicksort. And I've not heard or read a credible explanation for this either.' I think there is a simple explanation, based on the idea of expected information content.

Heapsort is inefficient in its comparison count because it pulls items from the bottom of the heap and puts them on the top, allowing them to trickle down, exchanging places with bigger items. This always struck me as odd, putting a quite-likely-to-be-small individual up above a quite-likely-to-be-large individual, and seeing what happens. Why does heapsort do this? Could no-one think of an elegant way to promote one of the two sub-heap leaders to the top of the heap? Let's call that algorithm Fast Heapsort. It is not an in-place algorithm, but, just like Heapsort, it extracts the sorted items one at a time from the top of the heap. I evaluated the performance of Fast Heapsort on random permutations. Performance was measured solely on the basis of the number of binary comparisons required.

Post A Comment | | Link

Ben Cantrick
  Date: 2008-05-22 14:26
  Subject:   How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale
  Tags:  reddit

How do you structure your database using a distributed hash table like BigTable? The answer isn't what you might expect. If you were thinking of translating relational models directly to BigTable then think again. The best way to implement joins with BigTable is: don't. You - pause for dramatic effect - duplicate data instead of normalize it.

Flickr anticipated this design in their architecture when they chose to duplicate comments in both the commentor and the commentee user shards rather than create a separate comment relation. I don't know how that decision was made, but it must have gone against every fiber in their relational bones. But Flickr’s reasoning was genius. To scale you need to partition. User data must spread across the shards. So where do comments belong in a scalable architecture?

From one world view comments logically belong to a relation binding comments and users together. But if your unit of scalability is the user shard there is no separate relation space. So you go against all your training and decide to duplicate the comments. Nerd heroism at its best. Let inductive rules derived from observation guide you rather than deductions from arbitrarily chosen first principles. Very Enlightenment era thinking. Voltaire would be proud.

In a relational world duplication is removed in order to prevent update anomalies. Error prevention is the driving force in relational modeling. Normalization is a kind of ethical system for data. What happens, for example, if a comment changes? Both copies of the comment must be updated. That leads to errors because who can remember where all the data is stored? A severe ethical violation may happen. Go directly to relational jail :-)

1 Comment | Post A Comment | | Link

May 2015