Simon Willison’s Weblog

On genetics, cloudrun, rust, museums, python, ...

 

Recent entries

datasette-atom: Define an Atom feed using a custom SQL query three days ago

I’ve been having a ton of fun iterating on www.niche-museums.com. I put together some notes on how the site works last week, and I’ve been taking advantage of the Thanksgiving break to continue exploring ways in which Datasette can be used to quickly build database-backed static websites.

I post a new museum to the site every day, so it was inevitable that someone would ask for a feed. And here it is: an Atom feed for Niche Museums.

This means Niche Museums is effectively a blog now, which is fitting: it’s a universal truth that any sufficiently advanced backend technology will evolve to the point where it can power a blog with an Atom feed.

datasette-atom

I built the feed by wrapping up work on the first version of a new Datasette plugin: datasette-atom. It takes advantage of the register_output_renderer plugin hook, which was contributed by Russ Garrett back in May.

The idea with the plugin is to make it possible to construct an Atom feed from an arbitrary SQL query.

This is a really powerful ability. It means that a user with sufficent knowledge of SQL can subscribe to an arbitrary feed of data from any Datasette instance that is running the plugin.

Defining an Atom feed with a SQL query

The plugin works by requiring you to provide a SQL query that produces the following columns in its output:

atom_id, atom_title and atom_updated.

These correspond to the required entry elements defined by the Atom specification.

The plugin can then render the results of the query as an Atom feed.

You can also produce an atom_link column, which will become a link.

And finally, you can produce either an atom_content column which will be treated as text and used as the feed entry body, or an atom_content_html column which will be treated as HTML.

(The HTML from atom_content_html is sanitized through Mozilla’s Bleach library to ensure the plugin doesn’t act as an XSS vector.)

This means we can define a custom Atom feed by crafting a SQL query! Here’s the query I’m using on the Niche Museums website:

select
  'tag:niche-museums.com,' || substr(created, 0, 11) || ':' || id as atom_id,
  name as atom_title,
  created as atom_updated,
  'https://www.niche-museums.com/browse/museums/' || id as atom_link,
  coalesce(
    '<img src="' || photo_url || '?w=800&amp;h=400&amp;fit=crop&amp;auto=compress">',
    ''
  ) || '<p>' || description || '</p>' as atom_content_html
from
  museums
order by
  created desc
limit
  15

I’m using a couple of extra tricks here.

The atom_id is defined as a tag:uri following this advice from Mark Pilgrim—since created is an ISO 8601 timestamp substr(created, 0, 11) returns the YYYY-MM-DD component.

Not every museum has a photo, which means photo_url is sometimes null. In SQL, concatenating a null value to something else (using the || concatenation operator) produces another null. So this entire expression evaluates to null if photo_url is null:

'<img src="' || photo_url ||
'?w=800&amp;h=400&amp;fit=crop&amp;auto=compress">',

The coalesce(x...) function returns the first argument passed to it. So coalesce('<img ...>' || photo_url || '...', '') returns the empty string if the photo is not available.

Deriving created/updated fields from Git history

For the atom feed to work, I need an atom_updated value. This should be a timestamp representing “the last time the entry was modified in a significant way”—so it’s actually more like a created timestamp for my museums website.

My museum data is defined in a YAML file—museums.yaml—which doesn’t include created and updated timestamps. So where can I get them from?

Since the YAML file is stored in the site’s GitHub repository, I’m deriving those timestamps from the git history. I repurposed code I wrote for my PG&E outages project for this—the full script is annotate _timestamps.py in the museums repo.

It works by looping through the entire history of the museums.yaml file comparing the list of museums in each version to the previous iteration. If a museum is new (it has an ID not seen before) we use the commit date as its created date. If the JSON serialization of the museum differs from the previous version we reset its updated date.

Then at the end the script uses sqlite-utils to update each record with the derived timestamps:

# updated/created are dicts of {"id": "timestamp"}
db = sqlite_utils.Database("browse.db")
for id, ts in created.items():
    db["museums"].update(id, {
        "created": ts,
        "updated": updated[id]
    }, alter=True)

The alter=True parameter to the .update() method causes sqlite-utils to automatically add any missing columns that are referenced in the update.

Defining a feed as a canned query

Now that we’ve defined the feed as a SQL query, we can assign it a more pleasing URL using Datasette’s canned queries feature.

I encoded the query as a JSON string using JSON Escape Text, then added it to the metadata.json configuration file for Niche Museums. I named the query feed, resulting in a URL of www.niche-museums.com/browse/feed.

There’s just one catch: Atom feeds should have a name. As a quick and nasty hack I allow the name to be set using ?_feed_name=Niche+Museums. I have an open issue to come up with a less nasty way of defining this.

Also this week

I added a simple search engine to Niche Museums! Here’s an example search for “model”: www.niche-museums.com/browse/search?q=model.

I used sqlite-utils to configure SQLite FTS (here’s the line that calls it in the CI build script), defined another canned query in metadata.json that executes the query and built a custom template to render the results page.

I added press coverage to Niche Museums. Many of the listings now link to articles in the local media about them.

Museum pages now link to other nearby museums. Here’s the commit that implemented that feature—it works by embedding a new SQL query in the template page, using datasette-template-sql.

I identified a bug in yaml-to-sqlite and shipped a fix in version 0.3.

I added the following seven museums to the site:

Goal for the next week: make some progress on projects that aren’t related to niche museums!

niche-museums.com, powered by Datasette 10 days ago

I just released a major upgrade to my www.niche-museums.com website (launched last month).

Notably, the site is entirely powered by Datasette. It’s a heavily customized Datasette instance, making extensive use of custom templates and plugins.

It’s a really fun experiment. I’m essentially using Datasette as a weird twist on a static site generator—no moving parts since the database is immutable but there’s still stuff happening server-side to render the pages.

Continuous deployment

The site is entirely stateless and is published using Circle CI to a serverless hosting provider (currently Zeit Now v1, but I’ll probably move it to Google Cloud Run in the near future.)

The site content—46 museums and counting—lives in the museums.yaml file. I’ve been adding a new museum listing every day by editing the YAML file using Working Copy on my iPhone.

The build script runs automatically on every commit. It converts the YAML file into a SQLite database using my yaml-to-sqlite tool, then runs datasette publish now... to deploy the resulting database.

The full deployment command is as follows:

datasette publish now browse.db about.db \
    --token=$NOW_TOKEN \
    --alias=www.niche-museums.com \
    --name=niche-museums \
    --install=datasette-haversine \
    --install=datasette-pretty-json \
    --install=datasette-template-sql \
    --install=datasette-json-html \
    --install=datasette-cluster-map~=0.8 \
    --metadata=metadata.json \
    --template-dir=templates \
    --plugins-dir=plugins \
    --branch=master

There’s a lot going on here.

browse.db is the SQLite database file that was built by running yaml-to-sqlite.

about.db is an empty database built using sqlite3 about.db ''—more on this later.

The --alias= option tells Zeit Now to alias that URL to the resulting deployment. This is the single biggest feature that I’m missing from Google Cloud Run at the moment. It’s possible to point domains at deployments there but it’s not nearly as easy to script.

The --install= options tell datasette publish which plugins should be installed on the resulting instance.

--metadata=, --template-dir= and --plugins-dir= are the options that customize the instance.

--branch=master means we always deploy the latest master of Datasette directly from GitHub, ignoring the most recent release to PyPI. This isn’t strictly necessary here.

Customization

The site itself is built almost entirely using Datasette custom templates. I have four of them:

The about page uses a particularly devious hack.

Datasette doesn’t have an easy way to create additional custom pages with URLs at the moment (without abusing the asgi_wrapper() hook, which is pretty low-level).

But... every attached database gets its own URL at /database-name.

So, to create the /about page I create an empty database called about.db using the sqlite3 about.db "" command. I serve that using Datasette, then create a custom template for that specific database using Datasette’s template naming conventions.

I’ll probably come up with a less grotesque way of doing this and bake it into Datasette in the future. For the moment this seems to work pretty well.

Plugins

The two key plugins here are datasette-haversine and datasette-template-sql.

datasette-haversine adds a custom SQL function to Datasette called haversine(), which calculates the haversine distance between two latitude/longitude points.

It’s used by the SQL query which finds the nearest museums to the user.

This is very inefficient—it’s essentially a brute-force approach which calculates that distance for every museum in the database and sorts them accordingly—but it will be years before I have enough museums listed for that to cause any kind of performance issue.

datasette-template-sql is the new plugin I described last week, made possible by Datasette dropping Python 3.5 support. It allows SQL queries to be executed directly from templates. I’m using it here to run the queries that power homepage.

I tried to get the site working just using code in the templates, but it got pretty messy. Instead, I took advantage of Datasette’s --plugins-dir option, which causes Datasette to treat all Python modules in a specific directory as plugins and attempt to load them.

index_vars.py is a single custom plugin that I’m bundling with the site. It uses the extra_template_vars() plugin took to detect requests to the index page and inject some additional custom template variables based on values read from the querystring.

This ends up acting a little bit like a custom Django view function. It’s a slightly weird pattern but again it does the job—and helps me further explore the potential of Datasette as a tool for powering websites in addition to just providing an API.

Weeknotes

This post is standing in for my regular weeknotes, because it represents most of what I achieved this last week. A few other bits and pieces:

  • I’ve been exploring ways to enable CSV upload directly into a Datasette instance. I’m building a prototype of this on top of Starlette, because it has solid ASGI file upload support. This is currently a standalone web application but I’ll probably make it work as a Datasette ASGI plugin once I have something I like.
  • Shortcuts in iOS 13 got some very interesting new features, most importantly the ability to trigger shortcuts automatically on specific actions—including every time you open a specific app. I’ve been experimenting with using this to automatically copy data from my iPhone up to a custom web application—maybe this could help ingest notes and photos into Dogsheep.
  • Posted seven new museums to niche-museums.com:
  • I composed devious SQL query for generating the markdown for the seven most recently added museums.

Weeknotes: datasette-template-sql 17 days ago

Last week I talked about wanting to take ona a larger Datasette project, and listed some candidates. I ended up pushing a big project that I hadn’t listed there: the upgrade of Datasette to Python 3.8, which meant dropping support for Python 3.5 (thanks to incompatible dependencies).

Since Glitch now runs Python 3.7.5 my biggest reason for supporting 3.5 was gone, so I decided to make the upgrade.

Datasette 0.31 was the first version to drop support. Datasette 0.32 is the first to take advantage of it: I switched Datasette’s template rendering over to use Jinja’s async template support, which requires Python 3.6+.

This has exciting implications for the extra_template_vars plugin hook, which allows plugins to add extra variables (and functions) to the template scope.

Plugin authors can now add custom template functions that are defined with async def ... and make await calls within the body of the function. When the template is rendered, Jinja will automatically await those function calls.

I released a new plugin that takes advantage of this capability: datasette-template-sql. It lets you embed additional SQL queries directly in a custom Datasette template. For example:

{% for article in sql(
    "select headline, date, summary from articles order by date desc limit 5",
    "news"
) %}
    <h3>{{ article.headline }}</h2>
    <p class="date">{{ article.date }}</p>
    <p>{{ article.summary }}</p>
{% endfor %}

This new sql() function takes a SQL query and the optional name of the database to run the query against (in case you have more than one database file attached to your Datasette instance).

I’m really excited about this capability. I quipped about it on Twitter:

This is the great thing about having a plugin system: even if I’m not convinced this is the right technical direction for the core Datasette project, I can still expose this kind of feature in a plugin that people can opt into if they want to have this ability.

The official Datasette website is going to make extensive use of this plugin. I have an early prototype of that up and running now, which inspired me to release datasette-render-markdown 0.2 with a custom template function for rendering markdown directly:

{{ render_markdown(value) }}

Aside from the work on Python 3, my Datasette time this week has mostly involved ongoing refactors of both the query execution code and the core TableView.data() method. Hopefully these will unblock a flurry of interesting new functionality in the not too distant future.

Niche museums

This week’s new museums on www.niche-museums.com.

  • California State Capitol Dioramas in Sacramento
  • Zeppelin Museum in Friedrsichshafen
  • Dai Loy Museum in Locke (Sacramento Delta)
  • Wallace Collection in London
  • Cookin’ in San Francisco
  • Ramen Museum in Yokohama
  • Cable Car Museum in San Francisco

Weeknotes: Python 3.7 on Glitch, datasette-render-markdown 24 days ago

Streaks is really working well for me. I’m at 12 days of commits to Datasette, 16 posting a daily Niche Museum, 19 of actually reviewing my email inbox and 14 of guitar practice. I rewarded myself for that last one by purchasing an actual classical (as opposed to acoustic) guitar.

Datasette

One downside: since my aim is to land a commit to Datasette master every day, I’m incentivised to land small changes. I have a bunch of much larger Datasette projects in the works—I think my goal for the next week should be to land one of those. Contenders include:

I’m going to redefine my daily goal to include pushing in-progress work to Datasette branches in an attempt to escape that false incentive.

New datasette-csvs using Python 3.7 on Glitch

The main reason I’ve been strict about keeping Datasette compatible with Python 3.5 is that it was the only version supported by Glitch, and Glitch has become my favourite tool for getting people up and running with Datasette quickly.

There’s been a long running Glitch support thread requesting an upgrade, and last week it finally bore fruit. Projects on Glitch now get python3 pointing to Python 3.7.5 instead!

This actually broke my datasette-csvs project at first, because for some reason under Python 3.7 the Pandas dependency used by csvs-to-sqlite started taking up too much space from the 200MB Glitch instance quota. I ended up working around this by switching over to using my sqlite-utils CLI tool instead, which has much lighter dependencies.

I’ve shared the new code for my Glitch project in the datasette-csvs repo on GitHub.

The one thing missing from sqlite-utils insert my.db mytable myfile.csv --csv right now is the ability to run it against multiple files at once—something csvs-to-sqlite handles really well. I ended up finally learning how to use while in bash and wrote the following install.sh shell script:

$ pip3 install -U -r requirements.txt --user && \
  mkdir -p .data && \
  rm .data/data.db || true && \
  for f in *.csv
    do
        sqlite-utils insert .data/data.db ${f%.*} $f --csv
    done

${f%.*} is the bash incantation for stripping off the file extension—so the above evaluates to this for each of the CSV files it finds in the root directory:

$ sqlite-utils insert .data/data.db trees trees.csv --csv

github-to-sqlite releases

I released github-to-sqlite 0.6 with a new sub-command:

$ github-to-sqlite releases github.db simonw/datasette

It grabs all of the releases for a repository using the GitHub releases API.

I’m using this for my personal Dogsheep instance, but I’m also planning to use this for the forthcoming Datasette website—I want to pull together all of the releases of all of the Datasette Ecosystem of projects in one place.

I decided to exercise my new bash while skills and write a script to run by cron once an hour which fetches all of my repos (from both my simonw account and my dogsheep GitHub organization) and then fetches their releases.

Since I don’t want to fetch releases for all 257 of my personal GitHub repos—just the repos which relate to Datasette—I started applying a new datasette-io topic (for datasette.io, my planned website domain) to the repos that I want to pull releases from.

Then I came up with this shell script monstrosity:

#!/bin/bash
# Fetch repos for simonw and dogsheep
github-to-sqlite repos github.db simonw dogsheep -a auth.json

# Fetch releases for the repos tagged 'datasette-io'
sqlite-utils github.db "
select full_name from repos where rowid in (
    select repos.rowid from repos, json_each(repos.topics) j
    where j.value = 'datasette-io'
)" --csv --no-headers | while read repo;
    do github-to-sqlite releases \
            github.db $(echo $repo | tr -d '\r') \
            -a auth.json;
        sleep 2;
    done;

Here’s an example of the database this produces, running on Cloud Run: https://github-to-sqlite-releases-j7hipcg4aq-uc.a.run.app

I’m using the ability of sqlite-utils to run a SQL query and return the results as CSV, but without the header row. Then I pipe the results through a while loop and use them to call the github-to-sqlite releases command against each repo.

I ran into a weird bug which turned out to be caused by the CSV output using \r\n which was fed into github-to-sqlite releases as simonw/datasette\r—I fixed that using $(echo $repo | tr -d '\r').

datasette-render-markdown

Now that I have a releases database table with all of the releases of my various packages I want to be able to browse them in one place. I fired up Datasette and realized that the most interesting information is in the body column, which contains markdown.

So I built a plugin for the render_cell plugin hook which safely renders markdown data as HTML. Here’s the full implementation of the plugin:

import bleach
import markdown
from datasette import hookimpl
import jinja2

ALLOWED_TAGS = [
    "a", "abbr", "acronym", "b", "blockquote", "code", "em",
    "i", "li", "ol", "strong", "ul", "pre", "p", "h1","h2",
    "h3", "h4", "h5", "h6",
]

@hookimpl()
def render_cell(value, column):
    if not isinstance(value, str):
        return None
    # Only convert to markdown if table ends in _markdown
    if not column.endswith("_markdown"):
        return None
    # Render it!
    html = bleach.linkify(
        bleach.clean(
            markdown.markdown(value, output_format="html5"),
            tags=ALLOWED_TAGS,
        )
    )
    return jinja2.Markup(html)

This first release of the plugin just looks for column names that end in _markdown and renders those. So the following SQL query does what I need:

select
  json_object("label", repos.full_name, "href", repos.html_url) as repo,
  json_object(
    "href",
    releases.html_url,
    "label",
    releases.name
  ) as release,
  substr(releases.published_at, 0, 11) as date,
  releases.body as body_markdown,
  releases.published_at
from
  releases
  join repos on repos.id = releases.repo
order by
  releases.published_at desc

In aliases releases.body to body_markdown to trigger the markdown rendering, and uses json_object(...) to cause datasette-json-html to render some links.

You can see the results here.

Releases SQL results

More museums

I added another 7 museums to www.niche-museums.com.

  • Dingles Fairground Heritage Centre
  • Ilfracombe Museum
  • Barometer World
  • La Galcante
  • Musée des Arts et Métiers
  • International Women’s Air & Space Museum
  • West Kern Oil Museum

Weeknotes: More releases, more museums one month ago

Lots of small releases this week.

Datasette

I released two bug fix releases for Datasette—0.30.1 and 0.30.2. Changelog here. My Dogsheep personal analytics project means I’m using Datasette for my own data analysis every day, which inspires me to fix small but annoying bugs much more aggressively.

I’ve also set myself a Streak goal to land a commit to Datasette every day.

I landed a tiny new feature to master yesterday: a ?column__notin=x,y,z filter, working as an inverse of the existing ?column__in=x,y,z filter. See issue #614 for details.

More Niche Museums

I’ve been keeping up my streak of adding at least one new museum to www.niche-museums.com every day. This week I added the Pirates Museum in Antananarivo, Madagascar, the David Rumsey Map Center at Stanford, Galerie de Paléontologie et d’Anatomie comparée in Paris, DEVIL-ish Little Things in Vancouver, Washington, Mardi Gras World in New Orleans, Environmental Volunteers EcoCenter in Palo Alto, the Evergreen Aviation & Space Museum (home of the Spruce Goose!) in McMinnville Oregon and Autoservicio Condorito in Mendoza.

Here’s that list of new museums with my photos of them (images rendered using datasette-json-html).

sqlite-transform

I released a new tiny CLI tool for manipulating SQLite databases. sqlite-transform lets you run transformation functions against the values in a specific column of a database. It currently has three sub-commands:

  • parsedate parses strings that looks like dates and turns them into YYYY-MM-DD, so they can be sorted.
  • parsedatetime turns them into ISO YYYY-MM-DDTHH:mm:ss timestamps.
  • lambda is the most fun: it lets you provide a snippet of Python code which will be executed against each value to perform a custom transformation. More details in issue #2.

Here’s how to use it to wrap the values in a specific column, including importing the textwrap module from the Python standard library:

$ sqlite-transform lambda my.db mytable mycolumn \
    --code='"\n".join(textwrap.wrap(value, 10))' \
    --import=textwrap

Other releases

Dogsheep

I’ve been having a lot of fun creating new features for my personal Dogsheep analytics site. Many of these take the form of simple HTML added to the private homepage. Most recently I added the ability to search through the people who follow me on Twitter (an evolution of this technique from last year). That feature is entirely implemented as the following HTML form:

  <form action="/twitter/users" method="GET">
    <p>
      <input type="hidden" name="_where" value="id in (select follower_id from following where followed_id = 12497)">
      <input name="_search" type="search" placeholder="Search my Twitter followers"> <input type="submit" value="Search">
    </p>
  </form>

More tree data

I exported all 3.85 million 311 calls from the San Francisco data portal into a database, then extracted out the 80,000 calls that mention trees and loaded them into a separate Datasette instance. You can play with that here—it was the inspiration for creating the sqlite-transform tool because I needed a way to clean up the datetime columns.

Weeknotes: Niche Museums, Kepler, Trees and Streaks one month ago

Niche Museums

Every now and then someone will ask “so when are you going to build Museums Near Me then?”, based on my obsession with niche museums and websites like www.owlsnearme.com.

For my Strategic Communications course at Stanford last week I had to perform a midterm presentation—a six minute talk to convince my audience of something, accompanied by slides and a handout.

I chose “you should seek out and explore tiny museums” as my topic, and used it as an excuse to finally start the website!

www.niche-museums.com is the result. It’s a small but growing collection of niche museums (17 so far, mostly in the San Francisco Bay Area) complete with the all important blue “Use my location” button to see museums near you.

Naturally I built it on Datasette. I’ll be writing more about the implementation (and releasing the underlying code) soon. I also built a new plugin for it, datasette-haversine.

Mapping museums against Starbucks

I needed a way to emphasize quite how many tiny museums there are in the USA. I decided to do this with a visualization.

It turns out there are 15,891 branches of Starbucks in the USA… and at least 30,132 museums!

15,891 Starbucks

At least 30.132 museums!

I made these maps using a couple of sources.

All The Places is a crowdsourced scraper project which aims to build scrapers for every company that has a “store locator” area of their website. Starbucks has a store locator and All The Places have a scraper for it, so you can download GeoJSON of every Starbucks. I wrote a quick script to import that GeoJSON into Datasette using sqlite-utils.

The Institute of Museum and Library Services is an independent agency of the federal government that supports museums and libraries across the country. They publish a dataset of Museums in the USA as a set of CSV files. I used csvs-to-sqlite to load those into Datasette, than ran a union query to combine the three files together.

So I have Datasette instances (with a CSV export feature) for both Starbucks and USA museums, with altitudes and longitudes for each.

Now how to turn that into a map?

I turned to my new favourite GIS tool, Kepler. Kepler is an open source GIS visualization tool released by Uber, based on WebGL. It’s astonishingly powerful and can be used directly in your browser by clicking the “Get Started” button on their website (which I assumed would take you to installation instructions, but no, it loads up the entire tool in your browser).

You can import millions of points of data into Kepler and it will visualize them for you directly. I used a Datasette query to export the CSVs, then loaded in my Starbucks CSV, exported an image, loaded in the Museums CSV as a separate colour and exported a second image. The whole project ended up taking about 15 minutes. Kepler is a great addition to the toolbelt!

Animating the PG&E outages

My PG&E outages scraper continues to record a snapshot of the PG&E outage map JSON every ten minutes. I’m posting updates to a thread on Twitter, but discovering Kepler inspired me to look at more sophisticated visualization options.

This tutorial by Giuseppe Macrì tipped me off the the fact that you can use Kepler to animate points against timestamps!

Here’s the result: a video animation showing how PG&E’s outages have evolved since the 5th of October:

Hayes Valley Trees

The city announced plans to cut down 27 ficus trees in our neighborhood in San Francisco. I’ve been working with Natalie to help a small group of citizens organize an appeal, and this weekend I helped run a survey of the affected trees (recording their exact locations in a CSV file) and then built www.hayes-valley-trees.com (source on GitHub) to link to from fliers attached to each affected tree.

It started out as a Datasette (running on Glitch) but since it’s only 27 data points I ended up freezing the data in a static JSON file to avoid having to tolerate any cold start times. The site is deployed as static assets on Zeit Now using their handy GitHub continuous deployment tool.

Streaks

It turns out I’m very motivated by streaks: I’m at 342 days for Duolingo Spanish and 603 days for an Apple Watch move streak. Could I apply this to other things in my life?

I asked on Twitter and was recommended the Streaks iOS app. It’s beautiful! I’m now tracking streaks for guitar practice, Duolingo, checking email, checking Slack, reading some books and adding a new museum to www.niche-museums.com (if I add one a day I can get from 17 museums today to 382 in a year!)

It seems to be working pretty well so far. I particularly like their iPhone widget.

Streaks widget

Elsewhere

Yesterday

  • Two malicious Python libraries caught stealing SSH and GPG keys. Nasty. Two typosquatting libraries were spotted on PyPI—targetting dateutil and jellyfish but with tricky variants of their names. They attempted to exfiltrate SSH and GPG keys and send them to an IP address defined server. npm has seen this kind of activity too—it’s important to consider this when installing packages. #

4th December 2019

3rd December 2019

  • Let’s agree that no matter what we call the situation that the humans who are elsewhere are at a professional disadvantage. There is a communication, culture, and context tax applied to the folks who are distributed. Your job as a leader to actively invest in reducing that tax.

    Michael Lopp #

28th November 2019

  • In general, reviewers should favor approving a CL [code review] once it is in a state where it definitely improves the overall code health of the system being worked on, even if the CL isn’t perfect.

    Google Standard of Code Review #

21st November 2019

  • With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

    Hyrum's Law #

  • How Do You Remove Unused CSS From a Site? (via) Chris Coyier takes an exhaustive look at the current set of tools for automatically removing unused CSS, and finds that there’s no magic bullet but you can get OK results if you use them carefully. #

15th November 2019

  • datasette-template-sql (via) New Datasette plugin, celebrating the new ability in Datasette 0.32 to have asynchronous custom template functions in Jinja (which was previously blocked by the need to support Python 3.5). The plugin adds a sql() function which can be used to execute SQL queries that are embedded directly in custom templates. #

14th November 2019

  • I have sometimes wondered how I would fare with a problem where the solution really isn’t in sight. I decided that I should give it a try before I get too old. I’m going to work on artificial general intelligence (AGI). I think it is possible, enormously valuable, and that I have a non-negligible chance of making a difference there, so by a Pascal’s Mugging sort of logic, I should be working on it.

    John Carmack #

12th November 2019

  • Datasette 0.31. Released today: this version adds compatibility with Python 3.8 and breaks compatibility with Python 3.5. Since Glitch support Python 3.7.3 now I decided I could finally give up on 3.5. This means Datasette can use f-strings now, but more importantly it opens up the opportunity to start taking advantage of Starlette, which makes all kinds of interesting new ASGI-based plugins much easier to build. #
  • My Python Development Environment, 2020 Edition (via) Jacob Kaplan-Moss shares what works for him as a Python environment coming into 2020: pyenv, poetry, and pipx. I’m not a frequent user of any of those tools—it definitely looks like I should be. #

7th November 2019

  • pinboard-to-sqlite (via) Jacob Kaplan-Moss just released the second Dogsheep tool that wasn’t written by me (after goodreads-to-sqlite by Tobias Kunze)—this one imports your Pinterest bookmarks. The repo includes a really clean minimal example of how to use GitHub actions to run tests and release packages to PyPI. #

6th November 2019

  • The first ever commit to Sentry (via) This is fascinating: the first 70 lines of code that started the Sentry error tracking project. It’s a straight-forward Django process_exception() middleware method that collects the traceback and the exception class and saves them to a database. The trick of using the md5 hash of the traceback message to de-dupe errors has been there from the start, and remains one of my favourite things about the design of Sentry. #
  • Automate the Boring Stuff with Python: Working with PDF and Word Documents. I stumbled across this while trying to extract some data from a PDF file (the kind of file with actual text in it as opposed to dodgy scanned images) and it worked perfectly: PyPDF2.PdfFileReader(open(“file.pdf”, “rb”)).getPage(0).extractText() #