Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

None of these scaling approaches are as fun and easy as developing for Rails. All the convenience methods and syntactical sugar that makes Rails such a pleasure for coders ends up being absolutely punishing, performance-wise.

Alex Payne, Twitter

Tagged , ,

15 comments

  1. One of the things I got from reading Cal's book is that the approaches that needed to be taken for scaling large didn't seem to map very well to any of the modern pack of frameworks - stuff like denormalisation and federating data across multiple databases. Unfortunately those problems are still rare enough that there isn't a community effort to solve them at the framework level yet.

    Simon Willison - 12th April 2007 14:53 - #

  2. Simon,
    I recently read Cal's book at thought the same thing.

    I know google basically reject using relational databases at web scale entirely, since you have to do all kinds of architectural hacks to work around the database-as-bottleneck.

    I just don't know how they do it. DBs exist because they solve lots of hard problems and put a nice abstraction in front of it... so how do I scale something without a DB?

    Anyway, I've got the scaling bug now. :)

    Jeremy Dunck - 12th April 2007 15:07 - #

  3. Jeremy: I heard J2EE scales :)

    Ryan Tomayko - 12th April 2007 15:09 - #

  4. Ryan,
    I hear frameworks are supposed to make things easier than assembly. :)

    I'm still using and contributing to Django.

    Jeremy Dunck - 12th April 2007 15:27 - #

  5. Jeremy, I suppose google scales up with BigTable:

    http://labs.google.com/papers/bigtable.html

    Other interesting readings regarding this topic:

    http://bitworking.org/news/158/ETech-07-Summary-Pa rt-2-MegaData
    http://bitworking.org/news/159/Megadata-Follow-up

    michele - 12th April 2007 16:20 - #

  6. I think the trick isn't so much abandoning relational databases as it is using them in a totally non-relational way. One of the first things I hear scalability people say is "avoid joins, denormalise whenever you have to". Similarly, partitioning is critical.

    The way WordPress.com (running on WordPress MU) scales is really educational. Every WordPress.com blog gets its own set of tables, with the only shared databases being the ones used for user accounts. This sounded completely nuts to me at first, but it actually makes a great deal of sense from a scalability point of view. It also means that MU isn't very different at all from regular WordPress, so improvements can be trivially shared between the two.

    Simon Willison - 12th April 2007 16:30 - #

  7. That's useful when your app is easily partitioned, but what do social softwares do?

    I mean, LJ apparently partitions on users, too, but surely there's a lot of overlap -- seeing a user's most recent comments (on other users' blogs), for example.

    Also, LJ's user clusters are balanced by an app-specific account mover so that active users don't overwhelm a fully-allocated cluster. I don't see how a framework can help there.

    I guess that's no so different than Rail's migrations (and Django's lack thereof). Honestly, migrations scare me. I wrote my own db_upgrade.py. It's simple, but it works for me. :)

    Jeremy Dunck - 12th April 2007 17:17 - #

  8. Completely agree - that's why Twitter is so fascinating, because basically all it does is deal with friends and messages spread across friend groups. If you solve scaling Twitter, you've solved the key to a huge class of social applications.

    It seems like it's something that would work better with a completely custom data layer - some kind of publish/subscribe message distributing kadoobry. Maybe you should be building something like Twitter on top of an architecture designed for high scale instant messaging - http://ejabberd.jabber.ru/ for example.

    Simon Willison - 12th April 2007 18:02 - #

  9. A birdy told me that twitter IM is using ejabberd, and still hit a scaling issue, since it wasn't designed to have bajillions of messages all delivered to the same user (twitter@twitter.com).

    I think people designing DBs figured this out a long time ago, and I/we ought to go learn postgres's code. ;-)

    Jeremy Dunck - 12th April 2007 19:51 - #

  10. But Flickr wasn't build with those techniques from day 1, either. First you build something people love, **then** you scale it. And Rails makes the first part much much easier.

    Also Rails doesn't **preclude** caching/federation/denormalisation, it just doesn't include it for free. (because scaling is a site specific issue)

    kellan - 12th April 2007 20:26 - #

  11. Kellan,
    Yeah, I'm not taking this as an argument against using a lightweight framework. Scaling my OMG-so-popular blog is surely an example of premature optimization, but it doesn't hurt to look at different cases where scaling was needed and try to find patterns that *can* be solved in frameworks.

    Jeremy Dunck - 12th April 2007 21:11 - #

  12. Scaling is hard, anyone how tells you it's not has never had to scale an application. You can only account for a limited set of problems until your site gets large enough that you start to see new trends.

    I think Django addresses caching very well by giving you various caching hooks (memory, disk-based, and memcache). No matter what language or framework you use, handling 11k of reads a second is way easier to scale than 11k of writes per second (or even 1k to 2k of writes per second).

    Sooner or later you have to look at your environment and code according to a model that fits your application. I think that using a container based hosting provider may work very well for a few servers but not so much for these larger then normal applications. I personally don't care if it's a Ruby or a Python problem because I think the same problem exists with both but just at different volumes.

    Jeff Triplett - 13th April 2007 05:59 - #

  13. Simon, Twitter does use ejabberd as a delivery engine. And for messages which are within the last twenty four hours aren't pulled from a database. Blaine even wrote a simplified version of map-reduce to make things scale.

    http://romeda.org/blog/2007/04/mapreduce-in-36-lin es-of-ruby.html

    Scaling is hard, it requires custom shit, rails just like anything else is going to require you to break things open and recode stuff all the time, whether it's cold fusion, python, php, or ruby.

    rabble - 13th April 2007 08:17 - #

  14. One more contibution to this mini-thread:
    http://mooseyard.com/Jens/2007/04/twitter-rails-ha mmers-and-11000-nails-per-second/

    That discussion doesn't cover a lot of new ground, but it's a roundup of opinions for people interested in the idea of web site w/o DB.

    Jeremy Dunck - 19th April 2007 14:43 - #

  15. Michele,
    Thanks for the megadata links. I was already aware of BigTable. :)

    Jeremy Dunck - 19th April 2007 14:43 - #

Sign in with OpenID

Auto-HTML: Line breaks are preserved; URLs will be converted in to links.

Manual XHTML: Enter your own, valid XHTML. Allowed tags are a, p, blockquote, ul, ol, li, dl, dt, dd, em, strong, dfn, code, q, samp, kbd, var, cite, abbr, acronym, sub, sup, br, pre

A django site