Simon Willison’s Weblog

On webapps, funding, networking, speaking, conferences, ...

 

Recent entries

How to set up world-class continuous deployment using free hosted tools 15 hours ago

I’m going to describe a way to put together a world-class continuous deployment infrastructure for your side-project without spending any money.

With continuous deployment every code commit is tested against an automated test suite. If the tests pass it gets deployed directly to the production environment! How’s that for an incentive to write comprehensive tests?

Each of the tools I’m using offers a free tier which is easily enough to handle most side-projects. And once you outgrow those free plans, you can solve those limitations in exchange for money!

Here’s the magic combination:

Step one: Publish some code to GitHub with some tests

I’ll be using the code for my blog as an example. It’s a classic Django application, with a small (OK, tiny) suite of unit tests. The tests are run using the standard Django ./manage.py test command.

Writing a Django application with tests is outside the scope of this article. Thankfully the official Django tutorial covers testing in some detail.

Step two: Hook up Travis CI

Travis CI is an outstanding hosted platform for continuous integration. Given a small configuration file it can check out code from GitHub, set up an isolated test environment (including hefty dependencies like a PostgreSQL database server, Elasticsearch, Redis etc), run your test suite and report the resulting pass/fail grade back to GitHub.

It’s free for publicly hosted GitHub projects. If you want to test code in a private repository you’ll have to pay them some money.

Here’s my .travis.yml configuration file:

language: python

python:
  - 2.7

services: postgresql

addons:
  postgresql: "9.6"

install:
  - pip install -r requirements.txt

before_script:
  - psql -c "CREATE DATABASE travisci;" -U postgres
  - python manage.py migrate --noinput
  - python manage.py collectstatic

script:
  - python manage.py test

And here’s the resulting Travis CI dashboard.

The integration of Travis with GitHub runs deep. Once you’ve set up Travis, it will automatically test every push to every branch - driven by GitHub webhooks, so test runs are set off almost instantly. Travis will then report the test results back to GitHub, where they’ll show up in a bunch of different places - including these pleasing green ticks on the branches page:

GitHub branches page showing CI results

Travis will also run tests against any open pull requests. This is a great incentive to build new features in a pull request even if you aren’t using them for code review:

GitHub pull request showing CI results

Circle CI deserves a mention as an alternative to Travis. The two are close competitors and offer very similar feature sets, and Circle CI's free plan allows up to 1,500 build minutes of private repositories per month.

Step 3: Deploy to Heroku and turn on continuous deployment

I’m a big fan of Heroku for side projects, because it means not having to worry about ongoing server-maintenance. I’ve lost several side-projects to entropy and software erosion - getting an initial VPS set up may be pretty simple, but a year later security patches need applying and the OS needs upgrading and the log files have filled up the disk and you’ve forgotten how you set everything up in the first place…

It turns out Heroku has basic support for continuous deployment baked in, and it’s trivially easy to set up. You can tell Heroku to deploy on every commit to GitHub, and then if you’ve attached a CI service like Travis that reports build health back you can check the box for “Wait for CI to pass before deploy”:

Heroku deployment settings for continuous deployment

Since small dynos on Heroku are free, you can even set up a separate Heroku app as a staging environment. I started my continuous integration adventure just deploying automatically to my staging instance, then switched over to deploying to production once I gained some confidence in how it all fitted together.

If you’re using continuous deployment with Heroku and Django, it’s a good idea to set up Heroku to automatically run your migrations for every deploy - otherwise you might merge a pull request with a model change and forget to run the migrations before the deploy goes out. You can do that using Heroku’s release phase feature, by adding the line release: python manage.py migrate --noinput to your Heroku Procfile (here’s mine).

Once you go beyond Heroku’s free tier things get much more powerful: Heroku Flow combines pipelines, review apps and their own CI solution to provide a comprehensive solution for much larger teams.

Step 4: Monitor errors with Sentry

If you’re going to move fast and break things, you need to know when things have broken. Sentry is a fantastic tool for collecting exceptions, aggregating them and spotting when something new crops up. It’s open source so you can host it yourself, but they also offer a robust hosted version with a free plan that can track up to 10,000 errors a month.

My favourite feature of Sentry is that it gives each exception it sees a “signature” based on a MD5 hash of its traceback. This means it can tell if errors are the same underlying issue or something different, and can hence de-dupe them and only alert you the first time it spots an error it has not seen before.

Notifications from Travis CI and GitHub in Slack

Sentry has integrations for most modern languages, but it’s particularly easy to use with Django. Just install raven and add few extra lines to your settings.py:

SENTRY_DSN = os.environ.get('SENTRY_DSN')
if SENTRY_DSN:
    INSTALLED_APPS += (
        'raven.contrib.django.raven_compat',
    )
    RAVEN_CONFIG = {
        'dsn': SENTRY_DSN,
        'release': os.environ.get('HEROKU_SLUG_COMMIT', ''),
    }

Here I’m using the Heroku pattern of keeping configuration in environment variables. SENTRY_DSN is provided by Sentry when you create your project there - you just have to add it as a Heroku config variable.

The HEROKU_SLUG_COMMIT line causes the currently deployed git commit hash to be fed to Sentry so that it knows what version of your code was running when it reports an error. To enable that variable, you’ll need to enable Dyno Metadata by running heroku labs:enable runtime-dyno-metadata against your application.

Step 5: Hook it all together with Slack

Would you like a push notification to your phone every time your site gets code committed / the tests pass or fail / a deploy goes out / a new error is detected? All of the above tools can report such things to Slack, and Slack’s free plan is easily enough to collect all of these notifications and push them to your phone via the free Slack iOS or Android apps.

Notifications from Travis CI and GitHub in Slack

Here are instructions for setting up Slack with GitHub, Travis CI, Heroku and Sentry.

Need more? Pay for it!

Having run much of this kind of infrastructure myself in the past I for one am delighted by the idea of outsourcing it, especially when the hosted options are of such high quality.

Each of these tools offers a free tier which is generous enough to work great for small side projects. As you start scaling up, you can start paying for them - that’s why they gave you a free tier in the first place.

Comments or suggestions? Join this thread on Hacker News.

Deploying an asynchronous Python microservice with Sanic and Zeit Now three days ago

Back in 2008 Natalie Downe and I deployed what today we would call a microservice: json-head, a tiny Google App Engine app that allowed you to make an HTTP head request against a URL and get back the HTTP headers as JSON. One of our initial use-scase for this was Natalie’s addSizes.js, an unobtrusive jQuery script that could annotate links to PDFs and other large files with their corresponding file size pulled from the Content-Length header. Another potential use-case is detecting broken links, since the API can be used to spot 404 status codes (as in this example).

At some point in the following decade json-head.appspot.com stopped working. Today I’m bringing it back, mainly as an excuse to try out the combination of Python 3.5 async, the Sanic microframework and Zeit’s brilliant Now deployment platform.

First, a demo. https://json-head.now.sh/?url=https://simonwillison.net/ returns the following:

[
    {
        "ok": true,
        "headers": {
            "Date": "Sat, 14 Oct 2017 18:37:52 GMT",
            "Content-Type": "text/html; charset=utf-8",
            "Connection": "keep-alive",
            "Set-Cookie": "__cfduid=dd0b71b4e89bbaca5b27fa06c0b95af4a1508006272; expires=Sun, 14-Oct-18 18:37:52 GMT; path=/; domain=.simonwillison.net; HttpOnly; Secure",
            "Cache-Control": "s-maxage=200",
            "X-Frame-Options": "SAMEORIGIN",
            "Via": "1.1 vegur",
            "CF-Cache-Status": "HIT",
            "Vary": "Accept-Encoding",
            "Server": "cloudflare-nginx",
            "CF-RAY": "3adca70269a51e8f-SJC",
            "Content-Encoding": "gzip"
        },
        "status": 200,
        "url": "https://simonwillison.net/"
    }
]

Given a URL, json-head.now.sh performs an HTTP HEAD request and returns the resulting status code and the HTTP headers. Results are returned with the Access-Control-Allow-Origin: * header so you can call the API using fetch() or XMLHttpRequest from JavaScript running on any page.

Sanic and Python async/await

A key new feature added to Python 3.5 back in September 2015 was built-in syntactic support for coroutine control via the async/await statements. Python now has some serious credibility as a platform for asynchronous I/O (the concept that got me so excited about Node.js back in 2009). This has lead to an explosion of asynchronous innovation around the Python community.

json-head is the perfect application for async - it’s little more than a dumbed-down HTTP proxy, accepting incoming HTTP requests, making its own requests elsewhere and then returning the results.

Sanic is a Flask-like web framework built specifically to take advantage of async/await in Python 3.5. It’s designed for speed - built on top of uvloop, a Python wrapper for libuv (which itself was originally built to power Node.js). uvloop’s self-selected benchmarks are extremely impressive.

Zeit Now

To host this new microservice, I chose Zeit Now. It’s a truly beautiful piece of software design.

Now lets you treat deployments as immutable. Every time you deploy you get a brand new URL. You can then interact with your deployment directly, or point an existing alias to it if you want a persistent URL for your project.

Deployments are free, and deployed code stays available forever due to some clever engineering behind the scenes.

Best of all: deploying a project takes just a single command: type now and the code in your current directory will be deployed to their cloud and assigned a unique URL.

Now was originally built for Node.js projects, but last August Zeit added Docker support. If the directory you run it in contains a Dockerfile, running now will upload, build and run the corresponding image.

There’s just one thing missing: good examples of how to deploy Python projects to Now using Docker. I’m hoping this article can help fill that gap.

Here’s the complete Dockerfile I’m using for json-head:

FROM python:3
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
EXPOSE 8006
CMD ["python", "json_head.py"]

I’m using the official Docker Python image as a base, copying the current directory into the image, using pip install to install dependencies and then exposing port 8006 (for no reason other than that’s the port I use for local development environment) and running the json_head.py script. Now is smart enough to forward incoming HTTP traffic on port 80 to the port that was exposed by the container.

If you setup Now yourself (npm install -g now or use one of their installers) you can deploy my code directly from GitHub to your own instance with a single command:

$ now simonw/json-head
> Didn't find directory. Searching on GitHub...
> Deploying GitHub repository "simonw/json-head" under simonw
> Ready! https://simonw-json-head-xqkfgorgei.now.sh (copied to clipboard) [1s]
> Initializing…
> Building
> ▲ docker build
Sending build context to Docker daemon 7.168 kBkB
> Step 1 : FROM python:3
> 3: Pulling from library/python
> ... lots more stuff here ...

Initial implementation

Here’s my first working version of json-head using Sanic:

from sanic import Sanic
from sanic import response
import aiohttp

app = Sanic(__name__)

async def head(session, url):
    try:
        async with session.head(url) as response:
            return {
                'ok': True,
                'headers': dict(response.headers),
                'status': response.status,
                'url': url,
            }
    except Exception as e:
        return {
            'ok': False,
            'error': str(e),
            'url': url,
        }

@app.route('/')
async def handle_request(request):
    url = request.args.get('url')
    if url:
        async with aiohttp.ClientSession() as session:
            head_info = await head(session, url)
            return response.json(
                head_info,
                headers={
                    'Access-Control-Allow-Origin': '*'
                },
            )
    else:
        return response.html('Try /?url=xxx')

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8006)

This exact code is deployed at https://json-head-thlbstmwfi.now.sh/ - since Now deployments are free, there’s no reason not to leave work-in-progress examples hosted as throwaway deployments.

In addition to Sanic, I’m also using the handy aiohttp asynchronous HTTP library - which features API design clearly inspired by my all-time favourite HTTP library, requests.

The key new pieces of syntax to understand in the above code are the async and await statements. async def is used to declare a function that acts as a coroutine. Coroutines need to be executed inside an event loop (which Sanic handles for us), but gain the ability to use the await statement.

The await statement is the real magic here: it suspends the current coroutine until the coroutine it is calling has finished executing. It is this that allows us to write asynchronous code without descending into a messy hell of callback functions.

Adding parallel requests

So far we haven’t really taken advantage of what async I/O can do - if every incoming HTTP request results in a single outgoing HTTP response then async may help us scale to serve more incoming requests at once but it’s not really giving us any new functionality.

Executing multiple outbound HTTP requests in parallel is a much more interesting use-case. Let’s add support for multiple ?url= parameters, such as the following:

https://json-head.now.sh/?url=https://simonwillison.net/&url=https://www.google.com/

[
    {
        "ok": true,
        "headers": {
            "Date": "Sat, 14 Oct 2017 19:35:29 GMT",
            "Content-Type": "text/html; charset=utf-8",
            "Connection": "keep-alive",
            "Set-Cookie": "__cfduid=ded486c1faaac166e8ae72a87979c02101508009729; expires=Sun, 14-Oct-18 19:35:29 GMT; path=/; domain=.simonwillison.net; HttpOnly; Secure",
            "Cache-Control": "s-maxage=200",
            "X-Frame-Options": "SAMEORIGIN",
            "Via": "1.1 vegur",
            "CF-Cache-Status": "EXPIRED",
            "Vary": "Accept-Encoding",
            "Server": "cloudflare-nginx",
            "CF-RAY": "3adcfb671c862888-SJC",
            "Content-Encoding": "gzip"
        },
        "status": 200,
        "url": "https://simonwillison.net/"
    },
    {
        "ok": true,
        "headers": {
            "Date": "Sat, 14 Oct 2017 19:35:29 GMT",
            "Expires": "-1",
            "Cache-Control": "private, max-age=0",
            "Content-Type": "text/html; charset=ISO-8859-1",
            "P3P": "CP=\"This is not a P3P policy! See g.co/p3phelp for more info.\"",
            "Content-Encoding": "gzip",
            "Server": "gws",
            "X-XSS-Protection": "1; mode=block",
            "X-Frame-Options": "SAMEORIGIN",
            "Set-Cookie": "1P_JAR=2017-10-14-19; expires=Sat, 21-Oct-2017 19:35:29 GMT; path=/; domain=.google.com",
            "Alt-Svc": "quic=\":443\"; ma=2592000; v=\"39,38,37,35\"",
            "Transfer-Encoding": "chunked"
        },
        "status": 200,
        "url": "https://www.google.com/"
    }
]

We’re now accepting multiple URLs and executing multiple HEAD requests… but Python 3.5 async makes it easy to do this in parallel, so our overall request time should match that of the single longest HEAD request that we triggered.

Here’s an implementation that adds support for multiple, parallel outbound HTTP requests:

@app.route('/')
async def handle_request(request):
    urls = request.args.getlist('url')
    if urls:
        async with aiohttp.ClientSession() as session:
            head_infos = await asyncio.gather(*[
                head(session, url) for url in urls
            ])
            return response.json(
                head_infos,
                headers={'Access-Control-Allow-Origin': '*'},
            )
    else:
        return response.html(INDEX)

We’re using the asyncio module from the Python 3.5 standard library here - in particular the gather function. async.gather takes a list of coroutines and returns a future aggregating their results. This future will resolve (and return to a corresponding await statement) as soon as all of those coroutines have returned their values.

My final code for json-head can be found on GitHub. As I hope I’ve demonstrated, the combination of Python 3.5+, Sanic and Now makes deploying asynchronous Python microservices trivially easy.

Changelogs to help understand the fires in the North Bay seven days ago

The situation in the counties north of San Francisco is horrifying right now. I’ve repurposed some of the tools I built to for the Irma Response project last month to collect and track some data that might be of use to anyone trying to understand what’s happening up there. I’m sharing these now in the hope that they might prove useful.

I’m scraping a number of sources relevant to the crisis, and making the data available in a repository on GitHub. Because it’s a git repository, changes to those sources are tracked automatically. The value I’m providing here isn’t so much the data itself, it’s the history of the data. If you need to see what has changed and when, my repository’s commit log should have the answers for you. Or maybe you’ll just want to occasionally hit refresh on this history of changes to srcity.org/610/Emergency-Information to see when they edited the information.

The sources I’m tracking right now are:

  1. The Santa Rosa Fire Department’s Emergency Information page. This is being maintained by hand so it’s not a great source of structured data, but it has key details like the location and availability of shelters and it’s useful to know what was changed and when. History of changes to that page.
  2. PG&E power outages. This is probably the highest quality dataset with the neatest commit messages. The commit history of these shows exactly when new outages are reported and how many customers were affected.
  3. Road Conditions in the County of Sonoma. If you want to understand how far the fire has spread, this is a useful source of data as it shows which roads have been closed due to fire or other reasons. History of changes.
  4. California Highway Patrol Incidents, extracted from a KML feed on quickmap.dot.ca.gov. Since these cover the whole state of California there’s a lot of stuff in here that isn’t directly relevant to the North Bay, but the incidents that mention fire still help tell the story of what’s been happening. History of changes.

The code for the scrapers can be found in north_bay.py. Please leave comments, feedback or suggestions on other useful potential sources of data in this GitHub issue.

Recovering missing content from the Internet Archive nine days ago

When I restored my blog last weekend I used the most recent SQL backup of my blog’s database from back in 2010. I thought it had all of my content from before I started my 7 year hiatus, but in watching the 404 logs I started seeing the occasional hit to something that really should have been there but wasn’t. Turns out the SQL backup I was working from was missing some content.

Thank goodness then for the Wayback Machine at the Internet Archive! I tried some of the missing URLs there and found they had been captured and preserved. But how to get them back?

A quick search turned up wayback-machine-downloader, an open-source Ruby script that claims to be able to Download an entire website from the Internet Archive Wayback Machine. I gem installed it and tried it out (after some cargo cult incantations to work around some weird certificate errors I was seeing)

rvm osx-ssl-certs update all
gem update --system
gem install wayback_machine_downloader

wayback_machine_downloader http://simonwillison.net/

And it worked! I left it running overnight and came back to a folder containing 18,952 HTML files, neatly arranged in a directory structure that matched my site:

$ find . | more
.
./simonwillison.net
./simonwillison.net/2002
./simonwillison.net/2002/Aug
./simonwillison.net/2002/Aug/1
./simonwillison.net/2002/Aug/1/cetis
./simonwillison.net/2002/Aug/1/cetis/index.html
./simonwillison.net/2002/Aug/1/cssSelectorsTutorial
./simonwillison.net/2002/Aug/1/cssSelectorsTutorial/index.html
...

I tarred them up into an archive and backed them up to Dropbox.

Next challenge: how to restore the missing content?

I’m a recent and enthusiastic adopter of Jupyter notebooks. As a huge fan of development in a REPL I’m shocked I was so late to this particular party. So I fired up Jupyter and used it to start playing with the data.

Here’s the final version of my notebook. I ended up with a script that did the following:

  • Load in the full list of paths from the tar archive, and filter for just the ones matching the /YYYY/Mon/DD/slug/ format used for my blog content
  • Talk to my local Django development environment and load in the full list of actual content URLs represented in that database.
  • Calculate the difference between the two - those are the 213 items that need to be recovered.
  • For each of those 213 items, load the full HTML that had been saved by the Internet Archive and feed it into the BeautifulSoup HTML parsing library.
  • Detect if each one is an entry, a blogmark or a quotation. Scrape the key content out of each one based on the type.
  • Scrape the tags for each item, using this delightful one-liner: [a.text for a in soup.findAll('a', {'rel': 'tag'})]
  • Scrape the comments for each item separately. These were mostly spam, so I haven’t yet recovered these for publication (I need to do some aggressive spam filtering first). I have however stashed them in the database for later processing.
  • Write all of the scraped data out to a giant JSON file and upload it to a gist (a nice cheap way of giving it a URL).

Having executed the above script, I now have a JSON file containing the parsed content for all of the missing items found in the Wayback Machine. All I needed then was a script which could take that JSON and turn it into records in the database. I implemented that as a custom Django management command and deployed it to Heroku.

Last step: shell into a Heroku dyno (using heroku run bash) and run the following:

./manage.py import_blog_json \
    --url_to_json=https://gist.github.com/simonw/5a5bc1f58297d2c7d68dd7448a4d6614/raw/28d5d564ae3fe7165802967b0f9c4eff6091caf0/recovered-blog-content.json \
    --tag_with=recovered

The result: 213 recovered items (which I tagged with recovered so I could easily browse them). Including the most important entry on my whole site, my write-up of my wedding!

So thank you very much to the Internet Archive team, and thank you Hartator for your extremely useful wayback-machine-downloader tool.

Elsewhere

16th October 2017

  • An interactive explanation of quadtrees (via) Neat explorable explanation of quadtrees, using interactives built on top of D3. #
  • TL;DR on the KRACK WPA2 stuff - you can repeatedly resend the 3rd packet in a WPA2 handshake and it'll reset the key state, which leads to nonce reuse, which leads to trivial decryption with known plaintext. Can be easily leveraged to dump TCP SYN traffic and hijack connections.

    Graham Sutherland #

15th October 2017

  • Explorable Explanations. I’m fascinated by web articles and essays that embed interactive visualizations - taking advantage of the unique capabilities of the medium to help explain complex concepts. Explorable Explanations collects exactly these, under the banner of “learning through play”. They also gather tools and tutorials to help build more of them. #

13th October 2017

  • What's New In DevTools (Chrome 62). Some really neat stuff. Highlights include top-level "await" support in the console, the ability to take screenshots of specific HTML nodes, CSS grid highlighting and the ability to drop a .HAR file onto the network panel in order to view it as a waterfall. #
  • The Xi Text Engine CRDT (via) Xi is "a modern editor with a backend written in Rust" - an open-source text editor project from Google built on some very interesting computer science (Conflict-free Replicated Data Types). It's a native editor with server-backed synchronization as a first-class concept. #
  • Whatever weird thing you imagine might happen, something weirder probably did happen. Reporters tried to keep up, but it was too strange. As Max Read put it in New York Magazine, Facebook is “like a four-dimensional object, we catch slices of it when it passes through the three-dimensional world we recognize.” No one can quite wrap their heads around what this thing has become, or all the things this thing has become.

    Alexis C. Madrigal #

  • How to Do Code Reviews Like a Human. Some very well thought out advice on giving constructive reviews, including simple but effective language tricks for keeping suggestions positive and helpful. #

12th October 2017

  • Exploding Git Repositories. Kate Murphy describes how git is vulnerable to a similar attack to the XML "billion laughs" recursive entity expansion attack - you can create a tiny git repository that acts as a "git bomb", expanding 12 root objects to over a billion files using recursive blob references. #
  • Dead End Thrills. Duncan Harris Is a photographer who works in the medium of video game screen captures. #
  • How the CIA Staged Sham Academic Conferences to Thwart Iran’s Nuclear Program. “The importance of a conference may be measured not only by the number of Nobel Prize winners or Oxford dons it attracts, but by the number of spies. U.S. and foreign intelligence officers flock to conferences for the same reason that Army recruiters concentrate on low-income neighborhoods: They make the best hunting grounds. While a university campus may have only one or two professors of interest to an intelligence service, the right conference — on drone technology, perhaps, or ISIS — may have dozens.” #

11th October 2017

  • Cypress (via) Promising looking new open source testing framework for full-blown web integration testing - a modern alternative to Selenium. I spent five minutes playing with the demo and was really impressed by it - especially their "time travel" feature which lets you hover over a passed test and see the state of the browser when each of those assertions was executed. #
  • My essential django package list. Insightful list of Django packages - many of which I hadn’t seen before - by Serafeim Papastefanos, each with a handy explanation of what it’s useful for and why. #
  • WPO Stats (via) “Case studies and experiments demonstrating the impact of web performance optimization (WPO) on user experience and business metrics.“ #

10th October 2017

  • The Absurdly Underestimated Dangers of CSV Injection. This is horrifying. A plain old CSV file intended for import into Excel can embed formulas (a value prefixed with an equals symbol) which can execute system commands - with a big honking security prompt that most people will likely ignore. Even worse: they can embed IMPORTXML() functions that can silently leak data from the rest of the sheet to an external URL - and those will work against Google Sheets as well as Excel. #

9th October 2017

  • What happened: Hillary’s view. Lawrence Lessig’s review of Hillary Clinton’s book, including his thoughts on the failures of commercial media in covering the story of Russian interference during those crucial closing weeks before the election. #
  • I have come to the conclusion that the real heroes of ideas are not the people who have them – they are the people who buy them

    David Gluckman #

8th October 2017

  • fd (via) "A simple, fast and user-friendly alternative to find." Written in rust, with a saner default command-line syntax than the regular find command. Microbenchmark shows it running 7x faster. Install it on OS X using "brew install fd". #
  • Removing MediaWiki from SPA: Cool URIs don't change (via) Detailed write-up from Anna Shipman describing how she archived an old MediaWiki as static content using recursive wget and some cunning application of mod_rewrite. #
  • Generate dynamic titles for /search/ pages. Fun little enhancement to my faceted search implementation: I now generate dynamic titles for each search results page describing the search, e.g. "Blogmarks tagged security in Feb, 2005" or "“python” in quotations tagged ruby, python in 2007". #
  • Dillinger. I really like this online Markdown editor. It has source syntax highlighting, live previews of the generated HTML and it constantly syncs to localStorage so you won't lose your work if you accidentally shut your browser window. The code is also available open source on GitHub. #
  • C is a bit like Latin these days. We no longer write everything in it, but knowing it affords deeper knowledge of more-recent languages.

    Norman Wilson #

  • China Demonstrates Quantum Encryption By Hosting a Video Call. This reads like pure science fiction. “Pan’s team first established a connection and generated a secure key between a ground station in Xinglong and the Micius satellite as it passed overhead, orbiting about 500 kilometers above Earth. [...] Next, the Chinese team waited for Micius to pass over Vienna, where their collaborators at the Austria Academy of Sciences were waiting to also receive the key from the satellite. Then, with the keys in hand, the groups initiated a video conference and used those keys to encrypt the video data through a standard VPN protocol.” #
  • Facets. New open source visualization and data exploration tool from Google ("Disclaimer: This is not an official Google product", whatever that means). It's intended for visualizing machine learning datasets but it's obviously useful outside of ML as well - any time you need to understand a large dataset this looks like it could be extremely useful. Ships with example jupyter notebooks and an easy mechanism for embedding the Facets interactive UI directly inside a notebook cell. #

7th October 2017

  • Sanic. "Sanic is a Flask-like Python 3.5+ web server that's written to go fast [...] On top of being Flask-like, Sanic supports async request handlers. This means you can use the new shiny async/await syntax from Python 3.5, making your code non-blocking and speedy". #
  • uvloop: Blazing fast Python networking. “uvloop makes asyncio fast. In fact, it is at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework. The performance of uvloop-based asyncio is close to that of Go programs.” #

6th October 2017

  • SQL Fiddle demonstrating the PostgreSQL to_tsvector() function (via) SQL Fiddle is amazing - it's an interactive pastebin that lets you execute queries against MySQL, PostgreSQL, Oracle, SQLite & SQL Server, and then share both the input and the results by sending around the resulting URL. Here I'm using it to demonstrate that stripping tags before indexing documents in PostgreSQL is unnecessary because the ts_vector() function already does that for you. #