Simon Willison’s Weblog

On python, patrickmckenzie, funding, javascript, programminglanguages, ...

 

Recent entries

Porting my blog to Python 3 two days ago

This blog is now running on Python 3! Admittedly this is nearly nine years after the first release of Python 3.0, but it’s the first Python 3 project I’ve deployed myself so I’m pretty excited about it.

Library authors like to use six to allow them to write code that supports both Python 2 and Python 3 at the same time… but my blog isn’t a library, so I used the 2to3 conversion tool that ships with Python instead.

And… it worked pretty well! I ran the following command from my project’s root directory:

2to3 -w -n blog/ config/ redirects/ feedstats/

The -w option causes the files to be over-written in place. Since everything is already in git, there was no reason to have 2to3 show my a diff without applying it. Likewise, the -n option tells 2to3 not to bother saving backups of the files it modifies.

Here’s the initial commit containing mostly the 2to3 changes.

Next step: run the tests! My test suite may be very thin, but it does at least check that the app can run its migrations, start up and serve a few basic pages without errors. One of my migrations was failing due to rogue bytestrings but that was an easy fix.

At this point I started to lean heavily on my continuous integration setup built on Travis CI. All of my Python 3 work took place in a branch, and all it took was a one line change to my .travis.yml for Travis to start running the tests for that branch using Python 3.

With the basic tests working, I made my first deploy to my Heroku staging instance—after first modifying my Heroku runtime.txt to tell it to use Python 3.6.2. My staging environment allowed me to sanity check that everything would work OK when deployed to Heroku.

At this point I got a bit lazy. The responsible thing to do would have been extensive manual testing plus systematic unit test coverage of core functionality. My blog is hardly a critical piece of infrastructure though, so I went with the faster option: put it all live and use Sentry to see if anything breaks.

This is where Heroku’s ability to deploy a specific branch came in handy: one click to deploy my python3 branch, keep an eye on Sentry (via push notifications from my private slack channel) and then one click to deploy my master branch again for an instant rollback in case of errors. Which I had to do instantly, because it turned out I had stored some data in Django’s cache using Python 2 pickle and was trying to read it back out again using Python 3.

I fixed that by bumping my cache VERSION setting and deployed again. This deploy lasted a few minute longer before Sentry started to fill up with encoding errors and I rolled it back again.

The single biggest difference between Python 2 and Python 3 is how strings are handled. Python 3 strings are unicode sequences. Learning to live in a world where strings are all unicode and byte strings are the rare, deliberate exceptions takes some getting used to.

The key challenge for my blog actually came from my custom markup handling template tags. 15 years ago I made the decision to store all of my blog entries as valid XHTML fragments. This meant I could use XML processors—back then in PHP, today Python’s ElementTree—to perform various transformations on my content.

ElementTree in Python 2 can only consume bytestrings. In Python 3 it expects unicode strings. Cleaning this up took a while, eventually inspiring me to refactor my custom template tags completely. In the process I realized that my blog templates were mostly written back before Django’s template language implemented autoescape (in Django 1.0), so my code was littered with unnecessary |escape and |safe filters. Those are all gone now.

Sentry lets you mark an exception as “resolved” when you think you’ve fixed it—if it occurs again after that it will be re-reported to your Slack channel and added back to the Sentry list of unresolved issues. Once Sentry was clear (especially given Googlebot had crawled my older pages) I could be pretty confident there were no more critical 500-causing errors.

That left logic errors, of which only one has cropped up so far: the “zero years ago” bug. Entries on my homepage include a relative date representation, e.g. “three days ago”. Python 3 changed how the divison operator works on integers3 / 2 == 1.5 where in Python 2 it gets truncated to 1. As a result, every entry on my homepage showed “zero years ago”. Thankfully this was a one-line fix.

All in all this process was much less painful than I expected. It took me longer to write this blog entry than it did to actually make the conversion (thanks to 2to3 doing most of the tedious work), and the combination of Travis CI, Sentry and Heroku allowed me to ship aggressively with the knowledge that I could promptly identify and resolve any issues that came up.

Next upgrade: Django 2.0!

How to set up world-class continuous deployment using free hosted tools six days ago

I’m going to describe a way to put together a world-class continuous deployment infrastructure for your side-project without spending any money.

With continuous deployment every code commit is tested against an automated test suite. If the tests pass it gets deployed directly to the production environment! How’s that for an incentive to write comprehensive tests?

Each of the tools I’m using offers a free tier which is easily enough to handle most side-projects. And once you outgrow those free plans, you can solve those limitations in exchange for money!

Here’s the magic combination:

Step one: Publish some code to GitHub with some tests

I’ll be using the code for my blog as an example. It’s a classic Django application, with a small (OK, tiny) suite of unit tests. The tests are run using the standard Django ./manage.py test command.

Writing a Django application with tests is outside the scope of this article. Thankfully the official Django tutorial covers testing in some detail.

Step two: Hook up Travis CI

Travis CI is an outstanding hosted platform for continuous integration. Given a small configuration file it can check out code from GitHub, set up an isolated test environment (including hefty dependencies like a PostgreSQL database server, Elasticsearch, Redis etc), run your test suite and report the resulting pass/fail grade back to GitHub.

It’s free for publicly hosted GitHub projects. If you want to test code in a private repository you’ll have to pay them some money.

Here’s my .travis.yml configuration file:

language: python

python:
  - 2.7

services: postgresql

addons:
  postgresql: "9.6"

install:
  - pip install -r requirements.txt

before_script:
  - psql -c "CREATE DATABASE travisci;" -U postgres
  - python manage.py migrate --noinput
  - python manage.py collectstatic

script:
  - python manage.py test

And here’s the resulting Travis CI dashboard.

The integration of Travis with GitHub runs deep. Once you’ve set up Travis, it will automatically test every push to every branch—driven by GitHub webhooks, so test runs are set off almost instantly. Travis will then report the test results back to GitHub, where they’ll show up in a bunch of different places— including these pleasing green ticks on the branches page:

GitHub branches page showing CI results

Travis will also run tests against any open pull requests. This is a great incentive to build new features in a pull request even if you aren’t using them for code review:

GitHub pull request showing CI results

Circle CI deserves a mention as an alternative to Travis. The two are close competitors and offer very similar feature sets, and Circle CI’s free plan allows up to 1,500 build minutes of private repositories per month.

Step 3: Deploy to Heroku and turn on continuous deployment

I’m a big fan of Heroku for side projects, because it means not having to worry about ongoing server-maintenance. I’ve lost several side-projects to entropy and software erosion—getting an initial VPS set up may be pretty simple, but a year later security patches need applying and the OS needs upgrading and the log files have filled up the disk and you’ve forgotten how you set everything up in the first place…

It turns out Heroku has basic support for continuous deployment baked in, and it’s trivially easy to set up. You can tell Heroku to deploy on every commit to GitHub, and then if you’ve attached a CI service like Travis that reports build health back you can check the box for “Wait for CI to pass before deploy”:

Heroku deployment settings for continuous deployment

Since small dynos on Heroku are free, you can even set up a separate Heroku app as a staging environment. I started my continuous integration adventure just deploying automatically to my staging instance, then switched over to deploying to production once I gained some confidence in how it all fitted together.

If you’re using continuous deployment with Heroku and Django, it’s a good idea to set up Heroku to automatically run your migrations for every deploy—otherwise you might merge a pull request with a model change and forget to run the migrations before the deploy goes out. You can do that using Heroku’s release phase feature, by adding the line release: python manage.py migrate --noinput to your Heroku Procfile (here’s mine).

Once you go beyond Heroku’s free tier things get much more powerful: Heroku Flow combines pipelines, review apps and their own CI solution to provide a comprehensive solution for much larger teams.

Step 4: Monitor errors with Sentry

If you’re going to move fast and break things, you need to know when things have broken. Sentry is a fantastic tool for collecting exceptions, aggregating them and spotting when something new crops up. It’s open source so you can host it yourself, but they also offer a robust hosted version with a free plan that can track up to 10,000 errors a month.

My favourite feature of Sentry is that it gives each exception it sees a “signature” based on a MD5 hash of its traceback. This means it can tell if errors are the same underlying issue or something different, and can hence de-dupe them and only alert you the first time it spots an error it has not seen before.

Notifications from Travis CI and GitHub in Slack

Sentry has integrations for most modern languages, but it’s particularly easy to use with Django. Just install raven and add few extra lines to your settings.py:

SENTRY_DSN = os.environ.get('SENTRY_DSN')
if SENTRY_DSN:
    INSTALLED_APPS += (
        'raven.contrib.django.raven_compat',
    )
    RAVEN_CONFIG = {
        'dsn': SENTRY_DSN,
        'release': os.environ.get('HEROKU_SLUG_COMMIT', ''),
    }

Here I’m using the Heroku pattern of keeping configuration in environment variables. SENTRY_DSN is provided by Sentry when you create your project there—you just have to add it as a Heroku config variable.

The HEROKU_SLUG_COMMIT line causes the currently deployed git commit hash to be fed to Sentry so that it knows what version of your code was running when it reports an error. To enable that variable, you’ll need to enable Dyno Metadata by running heroku labs:enable runtime-dyno-metadata against your application.

Step 5: Hook it all together with Slack

Would you like a push notification to your phone every time your site gets code committed / the tests pass or fail / a deploy goes out / a new error is detected? All of the above tools can report such things to Slack, and Slack’s free plan is easily enough to collect all of these notifications and push them to your phone via the free Slack iOS or Android apps.

Notifications from Travis CI and GitHub in Slack

Here are instructions for setting up Slack with GitHub, Travis CI, Heroku and Sentry.

Need more? Pay for it!

Having run much of this kind of infrastructure myself in the past I for one am delighted by the idea of outsourcing it, especially when the hosted options are of such high quality.

Each of these tools offers a free tier which is generous enough to work great for small side projects. As you start scaling up, you can start paying for them—that’s why they gave you a free tier in the first place.

Comments or suggestions? Join this thread on Hacker News.

Deploying an asynchronous Python microservice with Sanic and Zeit Now nine days ago

Back in 2008 Natalie Downe and I deployed what today we would call a microservice: json-head, a tiny Google App Engine app that allowed you to make an HTTP head request against a URL and get back the HTTP headers as JSON. One of our initial use-scase for this was Natalie’s addSizes.js, an unobtrusive jQuery script that could annotate links to PDFs and other large files with their corresponding file size pulled from the Content-Length header. Another potential use-case is detecting broken links, since the API can be used to spot 404 status codes (as in this example).

At some point in the following decade json-head.appspot.com stopped working. Today I’m bringing it back, mainly as an excuse to try out the combination of Python 3.5 async, the Sanic microframework and Zeit’s brilliant Now deployment platform.

First, a demo. https://json-head.now.sh/?url=https://simonwillison.net/ returns the following:

[
    {
        "ok": true,
        "headers": {
            "Date": "Sat, 14 Oct 2017 18:37:52 GMT",
            "Content-Type": "text/html; charset=utf-8",
            "Connection": "keep-alive",
            "Set-Cookie": "__cfduid=dd0b71b4e89bbaca5b27fa06c0b95af4a1508006272; expires=Sun, 14-Oct-18 18:37:52 GMT; path=/; domain=.simonwillison.net; HttpOnly; Secure",
            "Cache-Control": "s-maxage=200",
            "X-Frame-Options": "SAMEORIGIN",
            "Via": "1.1 vegur",
            "CF-Cache-Status": "HIT",
            "Vary": "Accept-Encoding",
            "Server": "cloudflare-nginx",
            "CF-RAY": "3adca70269a51e8f-SJC",
            "Content-Encoding": "gzip"
        },
        "status": 200,
        "url": "https://simonwillison.net/"
    }
]

Given a URL, json-head.now.sh performs an HTTP HEAD request and returns the resulting status code and the HTTP headers. Results are returned with the Access-Control-Allow-Origin: * header so you can call the API using fetch() or XMLHttpRequest from JavaScript running on any page.

Sanic and Python async/await

A key new feature added to Python 3.5 back in September 2015 was built-in syntactic support for coroutine control via the async/await statements. Python now has some serious credibility as a platform for asynchronous I/O (the concept that got me so excited about Node.js back in 2009). This has lead to an explosion of asynchronous innovation around the Python community.

json-head is the perfect application for async—it’s little more than a dumbed-down HTTP proxy, accepting incoming HTTP requests, making its own requests elsewhere and then returning the results.

Sanic is a Flask-like web framework built specifically to take advantage of async/await in Python 3.5. It’s designed for speed—built on top of uvloop, a Python wrapper for libuv (which itself was originally built to power Node.js). uvloop’s self-selected benchmarks are extremely impressive.

Zeit Now

To host this new microservice, I chose Zeit Now. It’s a truly beautiful piece of software design.

Now lets you treat deployments as immutable. Every time you deploy you get a brand new URL. You can then interact with your deployment directly, or point an existing alias to it if you want a persistent URL for your project.

Deployments are free, and deployed code stays available forever due to some clever engineering behind the scenes.

Best of all: deploying a project takes just a single command: type now and the code in your current directory will be deployed to their cloud and assigned a unique URL.

Now was originally built for Node.js projects, but last August Zeit added Docker support. If the directory you run it in contains a Dockerfile, running now will upload, build and run the corresponding image.

There’s just one thing missing: good examples of how to deploy Python projects to Now using Docker. I’m hoping this article can help fill that gap.

Here’s the complete Dockerfile I’m using for json-head:

FROM python:3
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
EXPOSE 8006
CMD ["python", "json_head.py"]

I’m using the official Docker Python image as a base, copying the current directory into the image, using pip install to install dependencies and then exposing port 8006 (for no reason other than that’s the port I use for local development environment) and running the json_head.py script. Now is smart enough to forward incoming HTTP traffic on port 80 to the port that was exposed by the container.

If you setup Now yourself (npm install -g now or use one of their installers) you can deploy my code directly from GitHub to your own instance with a single command:

$ now simonw/json-head
> Didn't find directory. Searching on GitHub...
> Deploying GitHub repository "simonw/json-head" under simonw
> Ready! https://simonw-json-head-xqkfgorgei.now.sh (copied to clipboard) [1s]
> Initializing…
> Building
> ▲ docker build
Sending build context to Docker daemon 7.168 kBkB
> Step 1 : FROM python:3
> 3: Pulling from library/python
> ... lots more stuff here ...

Initial implementation

Here’s my first working version of json-head using Sanic:

from sanic import Sanic
from sanic import response
import aiohttp

app = Sanic(__name__)

async def head(session, url):
    try:
        async with session.head(url) as response:
            return {
                'ok': True,
                'headers': dict(response.headers),
                'status': response.status,
                'url': url,
            }
    except Exception as e:
        return {
            'ok': False,
            'error': str(e),
            'url': url,
        }

@app.route('/')
async def handle_request(request):
    url = request.args.get('url')
    if url:
        async with aiohttp.ClientSession() as session:
            head_info = await head(session, url)
            return response.json(
                head_info,
                headers={
                    'Access-Control-Allow-Origin': '*'
                },
            )
    else:
        return response.html('Try /?url=xxx')

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8006)

This exact code is deployed at https://json-head-thlbstmwfi.now.sh/—since Now deployments are free, there’s no reason not to leave work-in-progress examples hosted as throwaway deployments.

In addition to Sanic, I’m also using the handy aiohttp asynchronous HTTP library—which features API design clearly inspired by my all-time favourite HTTP library, requests.

The key new pieces of syntax to understand in the above code are the async and await statements. async def is used to declare a function that acts as a coroutine. Coroutines need to be executed inside an event loop (which Sanic handles for us), but gain the ability to use the await statement.

The await statement is the real magic here: it suspends the current coroutine until the coroutine it is calling has finished executing. It is this that allows us to write asynchronous code without descending into a messy hell of callback functions.

Adding parallel requests

So far we haven’t really taken advantage of what async I/O can do—if every incoming HTTP request results in a single outgoing HTTP response then async may help us scale to serve more incoming requests at once but it’s not really giving us any new functionality.

Executing multiple outbound HTTP requests in parallel is a much more interesting use-case. Let’s add support for multiple ?url= parameters, such as the following:

https://json-head.now.sh/?url=https://simonwillison.net/&url=https://www.google.com/

[
    {
        "ok": true,
        "headers": {
            "Date": "Sat, 14 Oct 2017 19:35:29 GMT",
            "Content-Type": "text/html; charset=utf-8",
            "Connection": "keep-alive",
            "Set-Cookie": "__cfduid=ded486c1faaac166e8ae72a87979c02101508009729; expires=Sun, 14-Oct-18 19:35:29 GMT; path=/; domain=.simonwillison.net; HttpOnly; Secure",
            "Cache-Control": "s-maxage=200",
            "X-Frame-Options": "SAMEORIGIN",
            "Via": "1.1 vegur",
            "CF-Cache-Status": "EXPIRED",
            "Vary": "Accept-Encoding",
            "Server": "cloudflare-nginx",
            "CF-RAY": "3adcfb671c862888-SJC",
            "Content-Encoding": "gzip"
        },
        "status": 200,
        "url": "https://simonwillison.net/"
    },
    {
        "ok": true,
        "headers": {
            "Date": "Sat, 14 Oct 2017 19:35:29 GMT",
            "Expires": "-1",
            "Cache-Control": "private, max-age=0",
            "Content-Type": "text/html; charset=ISO-8859-1",
            "P3P": "CP=\"This is not a P3P policy! See g.co/p3phelp for more info.\"",
            "Content-Encoding": "gzip",
            "Server": "gws",
            "X-XSS-Protection": "1; mode=block",
            "X-Frame-Options": "SAMEORIGIN",
            "Set-Cookie": "1P_JAR=2017-10-14-19; expires=Sat, 21-Oct-2017 19:35:29 GMT; path=/; domain=.google.com",
            "Alt-Svc": "quic=\":443\"; ma=2592000; v=\"39,38,37,35\"",
            "Transfer-Encoding": "chunked"
        },
        "status": 200,
        "url": "https://www.google.com/"
    }
]

We’re now accepting multiple URLs and executing multiple HEAD requests… but Python 3.5 async makes it easy to do this in parallel, so our overall request time should match that of the single longest HEAD request that we triggered.

Here’s an implementation that adds support for multiple, parallel outbound HTTP requests:

@app.route('/')
async def handle_request(request):
    urls = request.args.getlist('url')
    if urls:
        async with aiohttp.ClientSession() as session:
            head_infos = await asyncio.gather(*[
                head(session, url) for url in urls
            ])
            return response.json(
                head_infos,
                headers={'Access-Control-Allow-Origin': '*'},
            )
    else:
        return response.html(INDEX)

We’re using the asyncio module from the Python 3.5 standard library here—in particular the gather function. async.gather takes a list of coroutines and returns a future aggregating their results. This future will resolve (and return to a corresponding await statement) as soon as all of those coroutines have returned their values.

My final code for json-head can be found on GitHub. As I hope I’ve demonstrated, the combination of Python 3.5+, Sanic and Now makes deploying asynchronous Python microservices trivially easy.

Changelogs to help understand the fires in the North Bay 13 days ago

The situation in the counties north of San Francisco is horrifying right now. I’ve repurposed some of the tools I built to for the Irma Response project last month to collect and track some data that might be of use to anyone trying to understand what’s happening up there. I’m sharing these now in the hope that they might prove useful.

I’m scraping a number of sources relevant to the crisis, and making the data available in a repository on GitHub. Because it’s a git repository, changes to those sources are tracked automatically. The value I’m providing here isn’t so much the data itself, it’s the history of the data. If you need to see what has changed and when, my repository’s commit log should have the answers for you. Or maybe you’ll just want to occasionally hit refresh on this history of changes to srcity.org/610/Emergency-Information to see when they edited the information.

The sources I’m tracking right now are:

  1. The Santa Rosa Fire Department’s Emergency Information page. This is being maintained by hand so it’s not a great source of structured data, but it has key details like the location and availability of shelters and it’s useful to know what was changed and when. History of changes to that page.
  2. PG&E power outages. This is probably the highest quality dataset with the neatest commit messages. The commit history of these shows exactly when new outages are reported and how many customers were affected.
  3. Road Conditions in the County of Sonoma. If you want to understand how far the fire has spread, this is a useful source of data as it shows which roads have been closed due to fire or other reasons. History of changes.
  4. California Highway Patrol Incidents, extracted from a KML feed on quickmap.dot.ca.gov. Since these cover the whole state of California there’s a lot of stuff in here that isn’t directly relevant to the North Bay, but the incidents that mention fire still help tell the story of what’s been happening. History of changes.

The code for the scrapers can be found in north_bay.py. Please leave comments, feedback or suggestions on other useful potential sources of data in this GitHub issue.

Elsewhere

Yesterday

  • We made a sale for Appointment Reminder from someone whose only way of getting data into the system was to fax it to us. Guess the cheat. If you guess “CEO signs up for HelloFax, receives the fax, and types 600 patient names and phone numbers by hand” you have good instincts.

    Patrick McKenzie #

  • Getting Your First 10 Customers (via) There is so much good advice embedded in this article by Patrick McKenzie, and it constantly comes back to the theme of doing whatever it takes to get to your first ten paying customers. #
  • Can You Afford It?: Real-world Web Performance Budgets. Alex Russell’s magnum opus on web performance budgets in 2017. He proposes a baseline testing device equivalent to a $200 Android phone on a slow 3G network emulated at 400ms RTT/400Kbps transfer and encourages a goal of 5s time-to-interactive on first load and 2s TTI for subsequent views. This means around 130kb of gzipped JavaScript—challenging but not impossible with modern JavaScript frameworks. #
  • hupper (via) Handy Python module for adding “live reload” development support to just about anything. I’m using it with Sanic—I run “hupper -m app” and it starts up my code in app.py and automatically reloads it any time any of the corresponding files changes on disk. #

22nd October 2017

  • System Requirements For SQLite. Document describing the high level goals and objectives of SQLite. Like everything to do with SQLite this exhibits some incredibly well thought out software engineering. I particularly like “S80000: SQLite shall exhibit ductile failure characteristics“ where ductile is described in opposition to brittle: a ductile system begins showing signs of trouble well in advance of failure. #
  • Parse shell one-liners with pyparsing. Neat introduction to the pyparsing library, both for parsing tokens into labeled sections and constructing an AST from them. #
  • SurviveJS - Webpack (via) Free online book about Webpack. I’ve read the first couple of chapters and it looks like a concise, well constructed guide to a key component of the modern JavaScript stack. #
  • It’s Not a Feature Problem—Avoiding Startup Tarpits (via) “When we turned on paid advertising for the first time the increase we had a sizable increase in signups. We always feared that a new user would just churn because of what we perceived as deficiencies in the product. While there were users who churned for that reason, it was never the nightmare scenario that we imagined.” #
  • I am currently documenting a language called Seenku, spoken by fewer than 15,000 people in the rolling hills of southwestern Burkina Faso in West Africa. Like Chinese, it is a tonal language, meaning the pitch on which a word is pronounced can radically alter its meaning. For instance, tsu can mean “thatch” when pronounced with an extra low pitch, but “hippopotamus” when pronounced with falling pitch. In fact, pitch plays such a huge role in Seenku that it can be “spoken” through music alone, most notably on the traditional xylophone.

    Laura McPherson #

  • Getting the Most out of Sqlite3 with Python. A couple of neat tricks I didn’t know: you can skip cursors entirely by calling .execute and .executemany directly on the connection object, and you can use the connection object as a context manager to execute transactions using a “with” block. #
  • Crossdressing, Compression, and Colliders: 'The First Photo on the Web'. TIL the first photo shared on the web was of Les Horribles Cernettes, an all-female comedy musical group at CERN who performed songs about particle physics. And Sir Tim Berners-Lee first met them when he played the dame in the CERN panto. #

21st October 2017

  • github-dashboard (via) Nice little self-contained example of a React app with no build step by Shing Lyu. #
  • Minimal React.js Without A Build Step. React is pretty dependent on a build phase, to handle things like JSX compilation. This is fine for most projects, but sometimes I just want to hot-link react and react-dom from a CDN and knock out a quick self-contained mini-application. Shing Lyu points out that this is much easier if you ditch JSX in favour of direct calls to React.createElement(). #
  • Goldberg’s machines are always described as useless and my machines are too. But they both made us enough money to live off, which is quite useful. Also making people laugh is useful, a lot more beneficial than many ‘serious’ advances in technology like yet another new computer operating system. My aunt Lis, who is very religious, describes my arcade as my ministry.

    Tim Hunkin #

20th October 2017

  • Serverless is a somewhat unhelpfully misleading term for “highly scalable stateless code”. All the times I’ve seen serverless stuff work really well it was workloads that were usually zero but occasionally 30k/sec without warning. I’ve run a company with that kind of workload and serverless stuff would have saved us a ton of money. Publishing to the [npm] registry could be done as a serverless app but there’s little benefit because we do not get huge spikes in publishing. We get huge spikes in *downloads* but serverless isn’t useful there because it’s a read-only case and very little processing is done. Serverless is a great solution to one type of problem. It’s very seldom the case that you can convert all your problems into that shape.

    Laurie Voss #

  • Hey Siri: An On-device DNN-powered Voice Trigger for Apple’s Personal Assistant (via) “The “Hey Siri” detector uses a Deep Neural Network (DNN) to convert the acoustic pattern of your voice at each instant into a probability distribution over speech sounds. It then uses a temporal integration process to compute a confidence score that the phrase you uttered was “Hey Siri”. If the score is high enough, Siri wakes up.” #
  • React is the new Dojo. In which Mikeal Rogers provides his perspective on the history of Dojo, the earliest break-out JavaScript framework, how jQuery eclipsed it and contemplates the same thing eventually happening to React. #

19th October 2017

  • Carbon (via) Beautiful little tool that you can paste source code into to generate an image of that code with syntax highlighting applied, ready to be tweeted or shared anywhere that lets you share an image. Built in Node and next.js, with image generation handled client-side by the dom-to-image JavaScript library which loads HTML into a SVG foreignObject (sadly not yet supported by Safari) and uses that to populate a canvas and produce a PNG. #
  • By cutting out a hundred voices or fewer, things and people that everybody talks about became things and people that nobody talks about. The internet is a technology for creating small ponds for us to all be big fish in. But you change your perspective just slightly, move over just an inch, and suddenly you get a sense of just how few people know about you or could possibly care.

    Fredrik deBoer #

  • Streaming Dataframes. This is some deep and brilliant magic: Matthew Rocklin’s Streamz Python library provides some elegant abstractions for consuming infinite streams of data and calculating cumulative averages and rolling reductions... and now he’s added an integration with jupyter that lets you embed bokeh graphs and pandas dataframe tables that continue to update in realtime as the stream continues! Check out the animated screenshots, this really is a phenomenal piece of work. #

18th October 2017

  • A Brief Intro to Docker for Djangonauts (via) This is great—a really clear introduction to both Docker and Docker Compose, aimed at Django developers. Includes line-by-line annotations of an example Dockerfile and docker-compose.yml. #
  • SRI Hash Generator. Handy utility for generating SRI hashes—just give it a URL and it will show you the script or link href block you need to use to safely embed that URL in your page with the correct SRI hash. #
  • Subresource Integrity. Now supported in Firefox 55, Chrome 49+ and Safari 11+. This makes me much more comfortable about hot-linking to JavaScript and CSS hosted by the various CDN providers, since it means that should they get breached any evil new scripts hosted at the same URL will be denied by modern browsers. #
  • Select Transform: JSON Template over JSON (via) A barrage of interesting ideas here. Having clients transmit up a JSON template which is then executed against data on the server and used to return exactly the data the client needs is just one of them (significant overlap with GraphQL there). #

16th October 2017

  • An interactive explanation of quadtrees (via) Neat explorable explanation of quadtrees, using interactives built on top of D3. #
  • TL;DR on the KRACK WPA2 stuff—you can repeatedly resend the 3rd packet in a WPA2 handshake and it’ll reset the key state, which leads to nonce reuse, which leads to trivial decryption with known plaintext. Can be easily leveraged to dump TCP SYN traffic and hijack connections.

    Graham Sutherland #

15th October 2017

  • Explorable Explanations. I’m fascinated by web articles and essays that embed interactive visualizations—taking advantage of the unique capabilities of the medium to help explain complex concepts. Explorable Explanations collects exactly these, under the banner of “learning through play”. They also gather tools and tutorials to help build more of them. #

13th October 2017

  • What's New In DevTools (Chrome 62). Some really neat stuff. Highlights include top-level “await” support in the console, the ability to take screenshots of specific HTML nodes, CSS grid highlighting and the ability to drop a .HAR file onto the network panel in order to view it as a waterfall. #
  • The Xi Text Engine CRDT (via) Xi is “a modern editor with a backend written in Rust”—an open-source text editor project from Google built on some very interesting computer science (Conflict-free Replicated Data Types). It’s a native editor with server-backed synchronization as a first-class concept. #
  • Whatever weird thing you imagine might happen, something weirder probably did happen. Reporters tried to keep up, but it was too strange. As Max Read put it in New York Magazine, Facebook is “like a four-dimensional object, we catch slices of it when it passes through the three-dimensional world we recognize.” No one can quite wrap their heads around what this thing has become, or all the things this thing has become.

    Alexis C. Madrigal #

  • How to Do Code Reviews Like a Human. Some very well thought out advice on giving constructive reviews, including simple but effective language tricks for keeping suggestions positive and helpful. #

12th October 2017

  • Exploding Git Repositories. Kate Murphy describes how git is vulnerable to a similar attack to the XML “billion laughs” recursive entity expansion attack—you can create a tiny git repository that acts as a “git bomb”, expanding 12 root objects to over a billion files using recursive blob references. #
  • Dead End Thrills. Duncan Harris Is a photographer who works in the medium of video game screen captures. #
  • How the CIA Staged Sham Academic Conferences to Thwart Iran’s Nuclear Program. “The importance of a conference may be measured not only by the number of Nobel Prize winners or Oxford dons it attracts, but by the number of spies. U.S. and foreign intelligence officers flock to conferences for the same reason that Army recruiters concentrate on low-income neighborhoods: They make the best hunting grounds. While a university campus may have only one or two professors of interest to an intelligence service, the right conference — on drone technology, perhaps, or ISIS — may have dozens.” #