Simon Willison’s Weblog

Subscribe

Wednesday, 8th November 2017

How Sentry Receives 20 Billion Events Per Month While Preparing To Handle Twice That. RabbitMQ federation, nginx and HAProxy, Riak as a key/value store, data processing is still mainly Python with a little bit of Rust. As of July 2017 it’s all hosted on Google Cloud Platform. # 11:32 pm

For Redis 4.2 I’m moving Disque as a Redis module. To do this, Redis modules are getting a fully featured Cluster API. This means that it will be possible, for instance, to write a Redis module that orchestrates N Redis masters, using Raft or any other consensus algorithm, as a single distributed system. This will allow to also model strong guarantees easily.

Salvatore Sanfilippo # 10:51 pm

Eager Execution: An imperative, define-by-run interface to TensorFlow. Lets you evaluate TensorFlow expressions interactively in Python without needing to constantly run tf.Session().run(variable). # 7:32 pm

TensorFlow 101. Concise, readable introduction to TensorFlow, with Python examples you can execute (and visualize) in Jupyter. # 5:57 pm

spaCy. “Industrial-strength Natural Language Processing in Python”. Exciting alternative to nltk—spaCy is mostly written in Cython, makes bold performance claims and ships with a range of pre-built statistical models covering multiple different languages. The API design is clean and intuitive and spaCy even includes an SVG visualizer that works with Jupyter. # 4:43 pm

Redis Streams and the Unified Log. In which Brandur Leach explores the new Kafka-style streams functionality coming to Redis 4.0, and shows an example of a robust at-least once processing architecture built on a combination of Redis streams and PostgreSQL transactions. I really like the pattern of writing log records to a staging table in PostgreSQL first in order to bundle them up in the same transaction as the originating state change, then have a separate process read them from that table and publish them to Redis. # 4:37 pm

Redis streams aren’t exciting for their innovativeness, but rather than they bring building a unified log architecture within reach of a small and/or inexpensive app. Kafka is infamously difficult to configure and get running, and is expensive to operate once you do. [...] Redis on the other hand is probably already in your stack.

Brandur Leach # 4:23 pm

There’s also a huge population of “admins,” who as you might guess, administer an organization’s salesforce account & data. These folks often start out as office managers or other clerical types, who are handed this responsibility because nobody else wants to do it. Here’s where it gets interesting. Admin ➡️ WYSIWYG customizer ➡️ occasional coder ➡️ full time dev is a real pipeline into software development that folks often with just high school degrees are actually taking. This isn’t just a narrative pushed by salesforce marketing; I’m meeting these people. They say things like “I love salesforce, it changed my life” with disarming sincerity.

Sarah Mei # 11:56 am

ZEIT – 6x Faster Now Uploads with HTTP/2 (via) Fantastic optimization write-up by Pranay Prakash. The Now deployment tool works by computing a hash for every local file in a project, then uploading just the ones that are missing. Pranay switched to uploading over HTTP/2 using the fetch-h2 library and got a 6x speedup for larger projects. # 1:04 am