Ben Cantrick (mackys) wrote,
Ben Cantrick
mackys

How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale


How do you structure your database using a distributed hash table like BigTable? The answer isn't what you might expect. If you were thinking of translating relational models directly to BigTable then think again. The best way to implement joins with BigTable is: don't. You - pause for dramatic effect - duplicate data instead of normalize it.

Flickr anticipated this design in their architecture when they chose to duplicate comments in both the commentor and the commentee user shards rather than create a separate comment relation. I don't know how that decision was made, but it must have gone against every fiber in their relational bones. But Flickr’s reasoning was genius. To scale you need to partition. User data must spread across the shards. So where do comments belong in a scalable architecture?

From one world view comments logically belong to a relation binding comments and users together. But if your unit of scalability is the user shard there is no separate relation space. So you go against all your training and decide to duplicate the comments. Nerd heroism at its best. Let inductive rules derived from observation guide you rather than deductions from arbitrarily chosen first principles. Very Enlightenment era thinking. Voltaire would be proud.

In a relational world duplication is removed in order to prevent update anomalies. Error prevention is the driving force in relational modeling. Normalization is a kind of ethical system for data. What happens, for example, if a comment changes? Both copies of the comment must be updated. That leads to errors because who can remember where all the data is stored? A severe ethical violation may happen. Go directly to relational jail :-)


http://highscalability.com/how-i-learned-stop-worrying-and-love-using-lot-disk-space-scale
Tags: reddit
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 1 comment