Simon Willison on scaling

144 posts tagged “scaling”

2010

When should one switch from MySQL to Oracle or PostgreSQL?

When your own benchmarks prove that your application’s particular load characteristics will perform better on another database—and the difference is large enough that it’s worth the cost involved in retargeting your code. If that cost is high (and it probably will be) it may be worth paying for some expert consultants to ensure that your implementations against the different databases are properly optimised.

[... 102 words]

4:13 pm / 12th October 2010 / mysql, oracle, postgresql, scaling, quora

Django (web framework): Why did theonion.com stop using Drupal?

They wrote about their reasons in detail in a post to the Django sub-reddit a while ago: http://www.reddit.com/r/django/c...

[... 165 words]

5:40 pm / 25th August 2010 / django, drupal, scaling, quora

What is the largest production deployment of CouchDB for online use?

The BBC have a pretty big CouchDB cluster, which they use mostly as a replicated key-value store. It’s used by their new identity platform which includes customisation features for iPlayer.

[... 47 words]

9:23 am / 25th August 2010 / couchdb, databases, nosql, scaling, quora

What is the highest traffic website built on top of Django?

My best guess would be Disqus. Instagram are pretty enormous these days as well.

[... 31 words]

6:52 pm / 24th August 2010 / django, frameworks, python, scaling, quora

Three new features for reddit gold. Reddit’s experiments with a subscriber program are interesting to watch. 9,000 people signed up as subscribers without there being any benefit at all, and they’re now being rewarded with the ability to opt out of ads and access to computationally expensive features (like different ways of sorting their own user page) that wouldn’t scale for the entire user base.

# 20th July 2010, 5:54 pm / ads, reddit, scaling, subscriptions, recovered

What’s powering the Content API? The new Guardian Content API runs on Solr, scaled using EC2 and Solr replication and with a Scala web service layer sitting between Solr and the API’s end users.

# 24th May 2010, 2:08 pm / apis, contentapi, ec2, guardian, openplatform, scala, scaling, solr, recovered

twitter’s gizzard (via) Intriguing new open source project from Twitter. Gizzard is a sharding framework which provides a network service for partitioning data across arbitrary backend datastores, managing its own forwarding table to map key ranges to partitions and adding support for tree-based replication.

# 11th April 2010, 9:39 pm / twitter, scaling, gizzard, sharding

Notes from a production MongoDB deployment. Notes from running MongoDB for 8 months in production, with 664 million documents spread across 72 GB master and slave servers located in two different data centers.

# 28th February 2010, 11:05 pm / mongodb, sysadmin, scaling

Django Advent: Scaling Django. Mike Malone’s advice on scaling Django applications, including taking advantage of new features in 1.2.

# 26th February 2010, 7:22 pm / django, mike-malone, scaling

Search Engine Time Machine. Detailed explanation of how ElasticSearch provides high availability, through clever sharding and replication strategies and configurable gateways for long-term persistent storage.

# 17th February 2010, 10:32 pm / elasticsearch, highavailability, scaling, search

Elastic Search (via) Solr has competition! Like Solr, Elastic Search provides a RESTful JSON HTTP interface to Lucene. The focus here is on distribution, auto-sharding and high availability. It’s even easier to get started with than Solr, partly due to the focus on providing a schema-less document store, but it’s currently missing out on a bunch of useful Solr features (a web interface and faceting are the two that stand out). The high availability features look particularly interesting. UPDATE: I was incorrect, basic faceted queries are already supported.

# 11th February 2010, 6:33 pm / search, scaling, rest, lucene, java, elasticsearch, json, http, sharding, solr

dogproxy. Another of my experiments with Node.js—this is a very simple HTTP proxy which addresses the dog pile effect (also known as the thundering herd) by watching out for multiple requests for a URL that is currently “in flight” and bundling them together.

# 3rd February 2010, 1:05 pm / scaling, node, nodejs, projects, javascript, dogproxy, dogpile, thunderingherd

2009

PostgreSQL 8.5alpha3 now available. “Hot Standby, allowing read-only connections during recovery, provides a built-in master-slave replication solution.” Woohoo!

# 23rd December 2009, 9:57 am / postgresql, scaling, replication, hotstandby, masterslave

Django | Multiple Databases. Russell just checked in the final patch developed from Alex Gaynor’s Summer of Code project to add multiple database support to Django. I’d link to the 21,000 line changeset but it crashed our Trac, so here’s the documentation instead.

# 22nd December 2009, 5:22 pm / django, multidb, russell-keith-magee, alex-gaynor, python, databases, scaling

PostgreSQL 8.5 alpha 2 is out. “P.S. If you’re wondering about Hot Standby and Synchronous Replication, they’re still under heavy development and still (at this point) expected to be in 8.5.”—Hot Standby is PostgreSQL-speak for MySQL-style master/slave replication for scaling your reads.

# 28th October 2009, 9:02 am / scaling, postgresql, replication, hotstandby, databases

How We Made GitHub Fast. Detailed overview of the new GitHub architecture. It’s a lot more complicated than I would have expected—lots of moving parts are involved in ensuring they can scale horizontally when they need to. Interesting components include nginx, Unicorn, Rails, DRBD, HAProxy, Redis, Erlang, memcached, SSH, git and a bunch of interesting new open source projects produced by the GitHub team such as BERT/Ernie and ProxyMachine.

# 21st October 2009, 9:14 pm / github, scaling, nginx, unicorn, rails, drbd, haproxy, replication, redis, erlang, memcached, ssh, git, proxymachine, ruby, ernie

MichaelMoore.com in Django. A seriously impressive case study—a complete rebuild from the ground up completed in just five weeks using Django, Solr and Haystack for a high traffic site with a top 10,000 US Alexa ranking.

# 9th October 2009, 12:38 am / python, michael-moore, django, scaling

When I worked at Amazon.com we had a deeply-ingrained hatred for all of the SQL databases in our systems. Now, we knew perfectly well how to scale them through partitioning and other means. But making them highly available was another matter. Replication and failover give you basic reliability, but it's very limited and inflexible compared to a real distributed datastore with master-master replication, partition tolerance, consensus and/or eventual consistency, or other availability-oriented features.

— Matt Brubeck

# 4th October 2009, 9:50 am / sql, nosql, replication, reliability, scaling, amazon, matt-brubeck

Ravelry. Tim Bray interviews Casey Forbes, the single engineer behind Ravelry, the knitting community that serves 10 million Rails requests a day using just seven physical servers, MySQL, Sphinx, memcached, nginx, haproxy, passenger and Tokyo Cabinet.

# 3rd September 2009, 6:50 pm / scaling, ravelry, tim-bray, caseyforbes, tokyocabinet, tokyotyrant, rails, mysql, sphinx-search, memcached, nginx, haproxy, passenger

Memcached 1.4.0 released. The big new feature is the (optional) binary protocol, which enables other features such as CAS-everywhere and efficient client-side replication. Maintainer Dustin Sallings has also released some useful sounding EC2 instances which automatically assign nearly all of their RAM to memcached on launch and shouldn’t need any further configuration.

# 17th July 2009, 10:26 pm / memcached, dustin-sallings, binary, cas, ec2, ami, caching, performance, scaling

Keyspace. Yet Another Key-Value Store—this one focuses on high availability, with one server in the cluster serving as master (and handling all writes), and the paxos algorithm handling replication and ensuring a new master can be elected should the existing master become unavailable. Clients can chose to make dirty reads against replicated servers or clean reads by talking directly to the master. Underlying storage is BerkeleyDB, and the authors claim 100,000 writes/second. Released under the AGPL.

# 16th July 2009, 10:30 am / keyvaluepairs, keyspace, databases, agpl, berkeleydb, scaling, replication, paxos

Up and running with Cassandra. Twitter are beginning to use Cassandra, the open source branch of Facebook’s BigTable-like non-relational database. Evan Weaver explains how to get started with it, but warns that it’s not yet a good idea to trust data to it without having a full backup in an unrelated storage engine.

# 7th July 2009, 11:18 am / evanweaver, facebook, twitter, bigtable, cassandra, nonrelationaldatabases, scaling

uuidd.py. Neat implementation of an ID server from Mike Malone—it serves up incrementing integers over a socket (using Python’s asyncore for fast IO) and records state to a file only after every 10,000 IDs served, so most of the time it’s not reading or writing to disk at all. If the server crashes it doesn’t matter because it can start up again at an integer it’s sure hasn’t been used before.

# 25th May 2009, 9:34 pm / uuid, idserver, python, mike-malone, asyncore, scaling

TwitterAlikeExample—redis. Excellent example of how you design a moderately complex system against a scalable key-value store (in this case redis). Most “how to build Twitter” code examples fail to address the hard problem of scaling user inboxes, but this one tackles it head on.

# 21st May 2009, 11:14 pm / redis, keyvaluepairs, twitter, scaling

New Features for EC2: Elastic Load Balancing, Auto Scaling, and Amazon CloudWatch. EC2 now fulfils the promise of “magic scaling in the cloud” out of the box—CloudWatch monitors performance of your EC2 instances without needing to install any monitoring software, Auto Scaling allows you to configure “scaling triggers” which start up new instances based on information from CloudWatch, and Elastic Load Balancing balances requests across all available instances.

# 18th May 2009, 10:07 am / cloudwatch, amazon, ec2, elasticloadbalancing, autoscaling, scaling, cloud-computing

peeping into memcached. “Peep uses ptrace to freeze a running memcached server, dump the internal key metadata, and return the server to a running state”—you can then load the resulting data in to MySQL using LOAD LOCAL INFILE and analyse it using standard SQL queries.

# 20th April 2009, 6:35 pm / performance, scaling, memcached, introspection, peep, mysql, sql, evanweaver, twitter

Experiences deploying a large-scale infrastructure in Amazon EC2. “At OpenX we recently completed a large-scale deployment of one of our server farms to Amazon EC2. Here are some lessons learned from that experience.”

# 10th April 2009, 9:43 am / openx, amazonec2, ec2, amazon, scaling, griggheorghiu

Introducing Digg’s IDDB Infrastructure. IDDB is Digg’s new infrastructure component for sharding data across multiple databases, with support for both MySQL and memcachedb. “The DiggBar and URL minifying service is powered by a 16 machine IDDB cluster, which includes 8 write masters in the index and 8 MySQL storage nodes.”

# 3rd April 2009, 8:42 pm / joe-stump, databases, digg, iddb, memcachedb, mysql, scaling, sharding

Streams, affordances, Facebook, and rounding errors. I asked Kellan about scaling activity streams the other day. Here he suggests the best technique is not to promise a perfect stream (like Twitter does)—Facebook used to get away with 80% loss of update messages, but their new redesign has changed the contract with their users.

# 19th March 2009, 2:02 pm / facebook, activitystreams, scaling, twitter, kellan-elliott-mccrea

redis (via) An in-memory scalable key/value store but with an important difference: this one lets you perform list and set operations against keys, opening up a whole new set of possibilities for application development. It’s very young but already supports persistence to disk and master-slave replication.

# 15th March 2009, 1:32 pm / redis, keyvaluepairs, scaling, replication, masterslave, salvatore-sanfilippo

«« first « previous page 3 / 5 next » last »»

Simon Willison’s Weblog