Simon Willison on scaling

143 posts tagged “scaling”

2009

Database Sharding at Netlog, with MySQL and PHP. Detailed MySQL sharding case study from Netlog, who serve five billion page requests a month using thousands of shards across more than 80 database servers.

# 2nd March 2009, 10:22 am / mysql, sharding, scaling, netlog, php, databases

How FriendFeed uses MySQL to store schema-less data. The pain of altering/ adding indexes to tables with 250 million rows was killing their ability to try out new features, so they’ve moved to storing pickled Python objects and manually creating the indexes they need as denormalised two column tables. These can be created and dropped much more easily, and are continually populated by an off-line index building process.

# 27th February 2009, 2:33 pm / mysql, friendfeed, databases, bret-taylor, scaling, sharding, python

Building and Scaling a Startup on Rails: 12 Things We Learned the Hard Way. Lessons learned from Posterous. Some good advice in here, in particular “Memcache later: If you memcache first, you will never feel the pain and never learn how bad your database indexes and Rails queries are”. Also recommends using job queues for offline processing of anything that takes more than 200ms.

# 23rd February 2009, 8:28 am / message-queues, rails, scaling, posterous, memcache

Sharding Counters on Google App Engine. “While the datastore for App Engine scales to support a huge number of entities it is important to note that you can only expect to update any single entity, or entity-group, about five times a second”. This article explains a technique for sharding writes across multiple counters in detail, including a way to keep a memcache counter updated at the same time for faster reads.

# 27th January 2009, 8:27 pm / memcache, caching, sharding, googleappengine, google, appengine, python, scaling

Project Voldemort. Yet Another “big, distributed, persistent, fault-tolerant hash table”—this time from LinkedIn, released under the Apache 2.0 license. The approach to consistency is interesting—instead of using distributed transactions, they use versioning and “resolve inconsistencies at read time”. It also uses consistent hashing (as seen in libketama) to select servers. The design document has lots more information.

# 17th January 2009, 7:45 pm / consistency, hashtable, keyvaluepairs, linkedin, open-source, scaling, versioning, voldemort, libketama

New Gearman Server & Library in C, MySQL UDFs. Gearman, the job queue written for LiveJournal and now used by Digg and Yahoo!, has been rewritten in C. Looks like a good candidate for an easily configured lightweight message queue. Also includes hooks for writing MySQL functions that can interact with queues.

# 13th January 2009, 4:41 pm / mysql, message-queues, digg, gearman, livejournal, queues, scaling, yahoo, eric-day

MemcacheDB. A server that speaks the memcache protocol but uses Berkeley DB for reliable persistent storage. Speedy: 20,000 writes/second and 60,000+ reads/second. Includes a full replication mechanism (with custom memcache protocol commands) based on Berkeley DB’s.

# 5th January 2009, 12:37 pm / replication, memcachedb, berkeleydb, memcache, scaling, keyvaluepairs

2008

Scaling memcached at Facebook. Fascinating techie details on how Facebook forked memcache to use UDP and increase performance from 50,000 requests a second to 200,000. Now running on 800 servers with 28 TB of memory, and their code is on GitHub. (They may scale like crazy, but they can’t put their blog entry title in the title element?)

# 13th December 2008, 10:08 am / scaling, memcached, facebook, udp

Spock Proxy. A MySQL Proxy fork (no Lua) that concentrates solely on sharding, by parsing incoming SQL statements and redirecting them across multiple databases. There are some limitations on the SQL that can be handled (no nested queries, joins across a maximum of two tables) but generally it looks pretty impressive.

# 11th December 2008, 9:49 am / spockproxy, mysql, sharding, databases, scaling

Facebook engineering notes on Scaling Out. Jason Sobel explains a couple of tricks Facebook use to deal with consistency between their California and Virginia data centres. The first is to hijack the MySQL replication stream to include information about memcached records to invalidate; the second is to use Layer 7 load balancers which inspect a “last modification time” cookie and send users to the masters in California if they have updated their profile in the past 20 seconds.

# 20th August 2008, 11:51 pm / facebook, scaling, memcached, mysql, replication, jason-sobel

Dark Launches, Gradual Ramps and Isolation: Testing the Scalability of New Features on your Web Site. Smart advice from Dare Obasanjo that extend the “dark launch” idea illustrated by Facebook chat a few weeks ago.

# 29th June 2008, 2:22 pm / dare-obasanjo, scaling, darklaunches, gradualramps, isolation, systemarchitecture, facebook

Dissecting today’s Internet traffic spikes (via) Theo Schlossnagle on how the increasing popularity of interest aggregation services such as Digg and Reddit result in traffic spikes that dwarf the old Slashdot effect, making a the old rules of thumb for capacity planning irrelevant.

# 29th June 2008, 2:12 pm / slashdotting, reddit, digg, theo-schlossnagle, capacity-planning, scaling

Scoble writes something - 6,800 writes are kicked off, 1 for each follower. Michael Arrington replies - another 6,600 writes. Jason Calacanis jumps in - another 6,500 writes. Beyond the 19,900 writes, there's a lot of additional overhead too. You have to hit a DB to figure out who the 19,900 followers are. [...] And here's the kicker: that giant processing and delivery effort - possibly a combined 100K disk IOs - was caused by 3 users, each just sending one, tiny, 140 char message. How innocent it all seemed.

— Isreal L'Heureux

# 23rd May 2008, 7:28 pm / scaling, twitter

Engineering @ Facebook: Facebook Chat. The new Facebook Chat uses Comet (long polling with a hidden iframe) against a custom web / chat server written in Erlang, designed to handle a launch to all 70 million users at once. It was tested using a “dark launch” period where live pages simulated chat request traffic without showing any visible UI.

# 15th May 2008, 7:55 am / facebook, comet, javascript, erlang, darklaunch, scaling

Internet Asshattery, Armchair Scaling Experts Edition (via) Leonard says what needs to be said about the most recent case of Twitter scaling flame-bait.

# 25th April 2008, 11:19 pm / twitter, scaling, leonard-lin

Google App Engine. Write applications in Python using a WSGI compatible application framework, then host them on Google’s highly scalable infrastructure. The most exciting part is probably the Datastore API, which provides external developers with access to Bigtable for the first time.

# 8th April 2008, 7:25 am / googleappengine, python, bigtable, google, scaling, virtualisation, wsgi

Consistent Hashing. Beautifully clear explanation of consistent hashing, a simple technique that allows you to add new caching servers to a cluster without re-hashing your keys and hence invalidating all of your caches.

# 18th March 2008, 1 am / caching, scaling, consistenthashing, hashing

The GigaOM Interview: Mark Zuckerberg. Some interesting titbits on Facebook’s architecture.

# 11th March 2008, 5:41 am / facebook, architecture, mark-zuckerberg, scaling

Two data streams for a happy website. Useful architectural concept for scaling: keep user-specific and generic data separate from the start, in recognition of their different caching and partitioning constraints.

# 4th March 2008, 4:40 am / caching, scaling

2007

Eventually Consistent. Werner Vogels explains the trade-offs involved in building scalable, highly-available data stores such as Amazon’s SimpleDB.

# 20th December 2007, 5:59 pm / eventuallyconsistent, simpledb, amazon, scaling, wernervogels

Techniques for safely consuming external HTTP on demand? I asked this question on programming.reddit.com yesterday and got some really insightful answers, including Joe Stump from Digg describing how Digg Images uses Danga’s Gearman worker queue.

# 15th December 2007, 12:29 pm / http, queue, workers, gearman, reddit, askreddit, joe-stump, digg, danga, scaling

NginxMemcachedModule. nginx can be set up to directly serve a URL from memcache if the corresponding cache key is set, and fall back to a backend application server otherwise. Application servers can then write directly to memcache when content needs to be cached or goes stale.

# 15th December 2007, 1:59 am / nginx, memcached, memcache, scaling, caching

What You Need To Know About Amazon SimpleDB. Amazon have finally launched the database component of their web service suite. It fits a bunch of current trends: key/value pairs, schemaless, built on top of Erlang. “Eventual consistency” is an interesting characteristic.

# 14th December 2007, 11:21 am / amazon, simpledb, web-services, schemaless, erlang, hashtables, scaling, databases, charles-ying

Client Side Load Balancing for Web 2.0 Applications (via) I recall that early versions of Netscape picked a random server from a hard-coded list each time a user clicked the “What’s New” button, back before server-side scaling techniques were well understood.

# 5th October 2007, 11:29 pm / scaling, load-balancing, sitepoint, digitalweb, leizhu, netscape

Scale rails from one box to three, four and five. Excellent, concise run-down of what it takes to scale a web application. Most of the advice is easily portable to other frameworks.

# 30th July 2007, 1:40 pm / rails, scaling, courtenay

High Scalability (via) New blog about building scalable, reliable sites.

# 26th July 2007, 8:15 pm / cal-henderson, scaling, highscalability

YouTube Scalability Talk. Kyle Cordes’ notes on a Google Tech Talk on scaling YouTube by Cuong Do.

# 14th July 2007, 10:26 pm / youtube, scaling, kylecordes, google, googletechtalk, cuongdo

SlideShare: Webapps scalability. Lots of great presentations on scaling, from Twitter, Digg, Vox, LiveJournal, Last.fm and more.

# 4th July 2007, 12:53 am / slideshare, vox, twitter, digg, livejournal, sixapart, lastfm, scaling

SELECT * FROM everything, or why databases are awesome. I’m beginning to think that for scalable applications the thinner your ORM is the better—if you even use one at all.

# 22nd June 2007, 12:40 am / rails, orm, blaine-cook, scaling, twitter, databases, sql

iLike: Holy cow... 6mm users and growing 300k/day! (via) Facebook platform offers a viral distribution mechanism for free. Downside: you have to double your capacity every few days.

# 13th June 2007, 9:02 am / marcandreessen, facebook, f8, ilike, scaling

«« first « previous page 4 / 5 next » last »»

Simon Willison’s Weblog

143 posts tagged “scaling”

2009

2008

2007