Simon Willison’s Weblog

Subscribe

Posts tagged scaling in May

Filters: Month: May × scaling × Sorted by date

GitHub Issues search now supports nested queries and boolean operators: Here’s how we (re)built it. GitHub Issues got a significant search upgrade back in January. Deborah Digges provides some behind the scene details about how it works and how they rolled it out.

The signature new feature is complex boolean logic: you can now search for things like is:issue state:open author:rileybroughten (type:Bug OR type:Epic), up to five levels of nesting deep.

Queries are parsed into an AST using the Ruby parslet PEG grammar library. The AST is then compiled into a nested Elasticsearch bool JSON query.

GitHub Issues search deals with around 2,000 queries a second so robust testing is extremely important! The team rolled it out invisibly to 1% of live traffic, running the new implementation via a queue and competing the number of results returned to try and spot any degradations compared to the old production code.

# 26th May 2025, 7:23 am / elasticsearch, github, ops, parsing, ruby, scaling, search, github-issues

Building, launching, and scaling ChatGPT Images (via) Gergely Orosz landed a fantastic deep dive interview with OpenAI's Sulman Choudhry (head of engineering, ChatGPT) and Srinivas Narayanan (VP of engineering, OpenAI) to talk about the launch back in March of ChatGPT images - their new image generation mode built on top of multi-modal GPT-4o.

The feature kept on having new viral spikes, including one that added one million new users in a single hour. They signed up 100 million new users in the first week after the feature's launch.

When this vertical growth spike started, most of our engineering teams didn't believe it. They assumed there must be something wrong with the metrics.

Under the hood the infrastructure is mostly Python and FastAPI! I hope they're sponsoring those projects (and Starlette, which is used by FastAPI under the hood.)

They're also using some C, and Temporal as a workflow engine. They addressed the early scaling challenge by adding an asynchronous queue to defer the load for their free users (resulting in longer generation times) at peak demand.

There are plenty more details tucked away behind the firewall, including an exclusive I've not been able to find anywhere else: OpenAI's core engineering principles.

  • Ship relentlessly - move quickly and continuously improve, without waiting for perfect conditions
  • Own the outcome - take full responsibility for products, end-to-end
  • Follow through - finish what is started and ensure the work lands fully

I tried getting o4-mini-high to track down a copy of those principles online and was delighted to see it either leak or hallucinate the URL to OpenAI's internal engineering handbook!

Thought for 1m 48s:  Here are the two places where OpenAl's “Ship  relentlessly — Own the outcome — Follow through”  checklist shows up as its core engineering  principles:  1. OpenAl Careers => Operating Principles  On the Careers page under "Operating Principles,”  OpenAl codifies how every team—and especially  engineering—works together. While the page itself  doesn’t list the bullets verbatim, it's the umbrella  under which these three behavioural norms sit (Ship  relentlessly, Own the outcome, Follow through).  2. OpenAl Internal Engineering Handbook  These three tenets are drawn straight from OpenAl's  internal Engineering Handbook (employee-only), at: https://handbook.openai.com/engineering/principles  (This handbook page isn't publicly crawlable, but it's  the definitive source for how OpenAl's engineers  actually organize their day-to-day work.)

Gergely has a whole series of posts like this called Real World Engineering Challenges, including another one on ChatGPT a year ago.

# 13th May 2025, 11:52 pm / python, scaling, ai, openai, generative-ai, chatgpt, llms, gergely-orosz

[...] by default Heroku will spin up multiple dynos in different availability zones. It also has multiple routers in different zones so if one zone should go completely offline, having a second dyno will mean that your app can still serve traffic.

Richard Schneeman

# 16th May 2024, 5:44 am / scaling, heroku

Why does Django still not have support for multiple joins?

I don’t fully understand the question, but if you’re talking about doing a single join across multiple tables the Django ORM handles that just fine. Let’s say you want to get every BlogEntry written by a User who belongs to the Group with the name “admins”:

[... 67 words]

What’s powering the Content API? The new Guardian Content API runs on Solr, scaled using EC2 and Solr replication and with a Scala web service layer sitting between Solr and the API’s end users.

# 24th May 2010, 2:08 pm / apis, contentapi, ec2, guardian, openplatform, scala, scaling, solr, recovered

uuidd.py. Neat implementation of an ID server from Mike Malone—it serves up incrementing integers over a socket (using Python’s asyncore for fast IO) and records state to a file only after every 10,000 IDs served, so most of the time it’s not reading or writing to disk at all. If the server crashes it doesn’t matter because it can start up again at an integer it’s sure hasn’t been used before.

# 25th May 2009, 9:34 pm / asyncore, idserver, mike-malone, python, scaling, uuid

TwitterAlikeExample—redis. Excellent example of how you design a moderately complex system against a scalable key-value store (in this case redis). Most “how to build Twitter” code examples fail to address the hard problem of scaling user inboxes, but this one tackles it head on.

# 21st May 2009, 11:14 pm / keyvaluepairs, redis, scaling, twitter

New Features for EC2: Elastic Load Balancing, Auto Scaling, and Amazon CloudWatch. EC2 now fulfils the promise of “magic scaling in the cloud” out of the box—CloudWatch monitors performance of your EC2 instances without needing to install any monitoring software, Auto Scaling allows you to configure “scaling triggers” which start up new instances based on information from CloudWatch, and Elastic Load Balancing balances requests across all available instances.

# 18th May 2009, 10:07 am / amazon, autoscaling, cloud-computing, cloudwatch, ec2, elasticloadbalancing, scaling

Scoble writes something - 6,800 writes are kicked off, 1 for each follower. Michael Arrington replies - another 6,600 writes. Jason Calacanis jumps in - another 6,500 writes. Beyond the 19,900 writes, there's a lot of additional overhead too. You have to hit a DB to figure out who the 19,900 followers are. [...] And here's the kicker: that giant processing and delivery effort - possibly a combined 100K disk IOs - was caused by 3 users, each just sending one, tiny, 140 char message. How innocent it all seemed.

Isreal L'Heureux

# 23rd May 2008, 7:28 pm / scaling, twitter

Engineering @ Facebook: Facebook Chat. The new Facebook Chat uses Comet (long polling with a hidden iframe) against a custom web / chat server written in Erlang, designed to handle a launch to all 70 million users at once. It was tested using a “dark launch” period where live pages simulated chat request traffic without showing any visible UI.

# 15th May 2008, 7:55 am / comet, darklaunch, erlang, facebook, javascript, scaling

Rapid development serving 500,000 pages/hour (via) Curse Gaming are getting impressive performance out of Django.

# 24th May 2007, 4:11 pm / curse, cursegaming, django, performance, scaling

... Facebook has roughly 200 dedicated memcached servers in its production environment, plus a small number of others for development and so on. A few of those 200 are hot spares. They are all 16GB 4-core AMD64 boxes, just because that's where the price/performance sweet spot is for us right now.

Steve Grimm

# 3rd May 2007, 10:36 pm / facebook, memcached, scaling, steve-grimm

MintCache for Django. Caching scheme for Django that solves the dog-pile effect, where high traffic causes many processes to regenerate stale cached data at the same time.

# 2nd May 2007, 8:49 am / caching, django, dogpile, mintcache, scaling

The top 10 presentations on scaling websites: twitter, Flickr, Bloglines, Vox and more. I normally avoid linking to “top 10” lists on principle, but this one pulls together some great resources and adds extra context to each one.

# 1st May 2007, 1:51 pm / bloglines, flickr, peter-van-dijck, scaling, twitter, vox

Transcript of Bruce Sterling at Microsoft Corporation (via) Bruce Sterling on scaling up his annual SxSW party. I can’t believe I missed it htis year.

# 22nd May 2004, 8:35 pm / bruce-sterling, microsoft, scaling, sxsw