Simon Willison on operations

5 items tagged “operations”

2019

Details of the Cloudflare outage on July 2, 2019 (via) Best retrospective I’ve read in a long time. The outage was caused by a backtracking regex rule that was added to the Web Application Firewall project, which rolls out globally and skips most of Cloudflare’s regular graduar rollout process (delightfully animal themed, named DOG for the dogfooding PoP that their employees use, PIG for the Guinea Pig PoPs reserved for free customers, then Canary for the final step) so that they can deploy counter-measures to newly discovered vulnerabilities as quickly as possible—but the real value in the retro is that it provides an extremely deep insight into how Cloudflare organize, test and manage their changes. Really interesting stuff.

# 12th July 2019, 5:36 pm / operations, regular-expressions, cloudflare

2011

The Virtues of Monitoring. Fantastic guide to the various levels of monitoring required for a modern web application.

# 13th January 2011, 4:26 am / monitoring, operations, sysadmin, recovered

2010

The operations team is the one place with access to data and traffic that is "real-time enough" to detect business issues before they manifest in significant monetary loss. Traffic anomalies, chargeback rates, visitor retention… all these translate into money. This is what ops does; they make things work; they make the business work. And they spend a lot more time trending, investigating and analyzing than they do replacing hard drives and network cards.

— Theo Schlossnagle

# 24th March 2010, 12:43 am / operations, theoschlossnagle

2009

Installing Django, Solr, Varnish and Supervisord with Buildout. Useful, detailed instructions... but I still think this stuff is Way Too Difficult at the moment. I’m a big fan of the idea of sites that are assembled from multiple smaller web services talking HTTP to each other, but ensuring all the moving parts stay running is massively more painful than just running Apache and MySQL.

# 7th June 2009, 1:54 pm / apache, bertrandmathieu, buildout, django, mysql, operations, python, rest, solr, supervisord, sysadmin, varnish

Google uncloaks once-secret server. Instead of a data centre wide UPS and redundant power supplies, each Google server has its own 12V battery. They live in standard shipping containers, each holding 1,160 servers.

# 2nd April 2009, 10:47 am / datacentres, google, operations, power, servers, ups

Simon Willison’s Weblog

5 items tagged “operations”

2019

2011

2010

2009