5 items tagged “operations”
Details of the Cloudflare outage on July 2, 2019 (via) Best retrospective I’ve read in a long time. The outage was caused by a backtracking regex rule that was added to the Web Application Firewall project, which rolls out globally and skips most of Cloudflare’s regular graduar rollout process (delightfully animal themed, named DOG for the dogfooding PoP that their employees use, PIG for the Guinea Pig PoPs reserved for free customers, then Canary for the final step) so that they can deploy counter-measures to newly discovered vulnerabilities as quickly as possible—but the real value in the retro is that it provides an extremely deep insight into how Cloudflare organize, test and manage their changes. Really interesting stuff. # 12th July 2019, 5:36 pm
The operations team is the one place with access to data and traffic that is “real-time enough” to detect business issues before they manifest in significant monetary loss. Traffic anomalies, chargeback rates, visitor retention… all these translate into money. This is what ops does; they make things work; they make the business work. And they spend a lot more time trending, investigating and analyzing than they do replacing hard drives and network cards.
Installing Django, Solr, Varnish and Supervisord with Buildout. Useful, detailed instructions... but I still think this stuff is Way Too Difficult at the moment. I’m a big fan of the idea of sites that are assembled from multiple smaller web services talking HTTP to each other, but ensuring all the moving parts stay running is massively more painful than just running Apache and MySQL. # 7th June 2009, 1:54 pm