<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: urls</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/urls.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-04-21T18:48:18+00:00</updated><author><name>Simon Willison</name></author><entry><title>Decentralizing Schemes</title><link href="https://simonwillison.net/2025/Apr/21/decentralizing-schemes/#atom-tag" rel="alternate"/><published>2025-04-21T18:48:18+00:00</published><updated>2025-04-21T18:48:18+00:00</updated><id>https://simonwillison.net/2025/Apr/21/decentralizing-schemes/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.tbray.org/ongoing/When/202x/2025/04/16/Decentralized-Schemes"&gt;Decentralizing Schemes&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Tim Bray discusses the challenges faced by decentralized Mastodon in that shared URLs to posts don't take into account people accessing Mastodon via their own instances, which breaks replies/likes/shares etc unless you further copy and paste URLs around yourself.&lt;/p&gt;
&lt;p&gt;Tim proposes that the answer is URIs: a registered &lt;code&gt;fedi://mastodon.cloud/@timbray/109508984818551909&lt;/code&gt; scheme could allow Fediverse-aware software to step in and handle those URIs, similar to how &lt;code&gt;mailto:&lt;/code&gt; works.&lt;/p&gt;
&lt;p&gt;Bluesky have &lt;a href="https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml"&gt;registered&lt;/a&gt; &lt;code&gt;at:&lt;/code&gt; already, and there's also a &lt;code&gt;web+ap:&lt;/code&gt; prefix registered with the intent of covering ActivityPub, the protocol used by Mastodon.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/decentralisation"&gt;decentralisation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/social-media"&gt;social-media&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tim-bray"&gt;tim-bray&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mastodon"&gt;mastodon&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/bluesky"&gt;bluesky&lt;/a&gt;&lt;/p&gt;



</summary><category term="decentralisation"/><category term="social-media"/><category term="tim-bray"/><category term="urls"/><category term="mastodon"/><category term="bluesky"/></entry><entry><title>Note on 18th April 2025</title><link href="https://simonwillison.net/2025/Apr/18/link-to-things/#atom-tag" rel="alternate"/><published>2025-04-18T23:59:01+00:00</published><updated>2025-04-18T23:59:01+00:00</updated><id>https://simonwillison.net/2025/Apr/18/link-to-things/#atom-tag</id><summary type="html">
    &lt;p&gt;It frustrates me when support sites for online services fail to &lt;em&gt;link&lt;/em&gt; to the things they are talking about. Cloudflare's &lt;a href="https://developers.cloudflare.com/fundamentals/setup/find-account-and-zone-ids/"&gt;Find zone and account IDs&lt;/a&gt; page for example provides a four step process for finding my account ID that starts at the root of their dashboard, including a screenshot of where I should click.&lt;/p&gt;
&lt;p&gt;&lt;img alt="1. Log in to the Cloudflare dashboard. 2. Select your account and domain. 3. On the Overview page (the landing page for your domain), find the API section. Screenshot includes an Overview panel showing analytics" src="https://static.simonwillison.net/static/2025/cloudflare-docs.jpg" style="max-width: 100%"&gt;&lt;/p&gt;
&lt;p&gt;In Cloudflare's case it's harder to link to the correct dashboard page because the URL differs for different users, but that shouldn't be a show-stopper for getting this to work. Set up &lt;code&gt;dash.cloudflare.com/redirects/find-account-id&lt;/code&gt; and link to that!&lt;/p&gt;
&lt;p&gt;... I just noticed they &lt;em&gt;do&lt;/em&gt; have a mechanism like that which they use elsewhere. On the &lt;a href="https://developers.cloudflare.com/r2/api/tokens/"&gt;R2 authentication page&lt;/a&gt; they link to:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;https://dash.cloudflare.com/?to=/:account/r2/api-tokens
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The "find account ID" flow presumably can't do the same thing because there is no single page displaying that information - it's shown in a sidebar on the page for each of your Cloudflare domains.&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/usability"&gt;usability&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cloudflare"&gt;cloudflare&lt;/a&gt;&lt;/p&gt;



</summary><category term="urls"/><category term="usability"/><category term="cloudflare"/></entry><entry><title>URL-addressable Pyodide Python environments</title><link href="https://simonwillison.net/2025/Feb/13/url-addressable-python/#atom-tag" rel="alternate"/><published>2025-02-13T05:13:27+00:00</published><updated>2025-02-13T05:13:27+00:00</updated><id>https://simonwillison.net/2025/Feb/13/url-addressable-python/#atom-tag</id><summary type="html">
    &lt;p&gt;This evening I spotted &lt;a href="https://github.com/simonw/datasette/issues/2466"&gt;an obscure bug&lt;/a&gt; in &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt;, using &lt;a href="https://github.com/simonw/datasette-lite"&gt;Datasette Lite&lt;/a&gt;. I figure it's a good opportunity to highlight how useful it is to have a URL-addressable Python environment, powered by Pyodide and WebAssembly.&lt;/p&gt;
&lt;p&gt;Here's the page that helped me discover the bug:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;&lt;a href="https://lite.datasette.io/?install=datasette-visible-internal-db&amp;amp;ref=1.0a17#/_internal/catalog_columns?_facet=database_name"&gt;https://lite.datasette.io/?install=datasette-visible-internal-db&amp;amp;ref=1.0a17#/_internal/catalog_columns?_facet=database_name&lt;/a&gt;&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;To explain what's going on here, let's first review the individual components.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/13/url-addressable-python/#datasette-lite"&gt;Datasette Lite&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/13/url-addressable-python/#the-datasette-1-0-alphas"&gt;The Datasette 1.0 alphas&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/13/url-addressable-python/#this-works-for-plugins-too"&gt;This works for plugins, too&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/13/url-addressable-python/#datasette-visible-internal-db"&gt;datasette-visible-internal-db&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/13/url-addressable-python/#spotting-the-bug"&gt;Spotting the bug&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/13/url-addressable-python/#fixing-the-bug"&gt;Fixing the bug&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="https://simonwillison.net/2025/Feb/13/url-addressable-python/#url-addressable-steps-to-reproduce"&gt;URL-addressable Steps To Reproduce&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h4 id="datasette-lite"&gt;Datasette Lite&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://lite.datasette.io/"&gt;Datasette Lite&lt;/a&gt; is a version of &lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; that runs entirely in your browser. It runs on &lt;a href="https://pyodide.org/"&gt;Pyodide&lt;/a&gt;, which I think is still the most underappreciated project in the Python ecosystem.&lt;/p&gt;
&lt;p&gt;I built Datasette Lite &lt;a href="https://simonwillison.net/2022/May/4/datasette-lite/"&gt;almost three years ago&lt;/a&gt; as a weekend hack project to try and see if I could get Datasette - a server-side Python web application - to run entirely in the browser.&lt;/p&gt;
&lt;p&gt;I've added a bunch of features since then, &lt;a href="https://github.com/simonw/datasette-lite/blob/main/README.md"&gt;described in the README&lt;/a&gt; - most significantly the ability to load SQLite databases, CSV files, JSON files or Parquet files by passing a URL to a query string parameter.&lt;/p&gt;
&lt;p&gt;I built Datasette Lite almost as a joke, thinking nobody would want to wait for a full Python interpreter to download to their browser each time they wanted to explore some data. It turns out internet connections are fast these days and having a version of Datasette that needs a browser, GitHub Pages and &lt;em&gt;nothing else&lt;/em&gt; is actually extremely useful.&lt;/p&gt;
&lt;p&gt;Just the other day &lt;a href="https://bsky.app/profile/obtusatum.bsky.social/post/3lhyeuqmpns22"&gt;I saw Logan Williams&lt;/a&gt; of Bellingcat using it to share a better version of &lt;a href="https://www.commerce.senate.gov/2025/2/cruz-led-investigation-uncovers-2-billion-in-woke-dei-grants-at-nsf-releases-full-database"&gt;this Excel sheet&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The NSF grants that Ted Cruz has singled out for advancing "neo-Marxist class warfare propaganda," in Datasette-Lite: &lt;a href="https://lite.datasette.io/?url=https://data-house-lake.nyc3.cdn.digitaloceanspaces.com/cruz_nhs.db#/cruz_nhs/grants"&gt;lite.datasette.io?url=https://...&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Let's look at that URL in full:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;https://lite.datasette.io/?url=https://data-house-lake.nyc3.cdn.digitaloceanspaces.com/cruz_nhs.db#/cruz_nhs/grants&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;?url=&lt;/code&gt; parameter there poins to a SQLite database file, hosted on DigitalOcean Spaces and served with the all-important &lt;code&gt;access-control-allow-origin: *&lt;/code&gt; header which allows Datasette Lite to load it across domains.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;#/cruz_nhs/grants&lt;/code&gt; part of the URL tells Datasette Lite which page to load when you visit the link.&lt;/p&gt;
&lt;p&gt;Anything after the &lt;code&gt;#&lt;/code&gt; in Datasette Lite is a URL that gets passed on to the WebAssembly-hosted Datasette instance. Any query string items before that can be used to affect the initial state of the Datasette instance, to import data or even to install additional plugins.&lt;/p&gt;
&lt;h4 id="the-datasette-1-0-alphas"&gt;The Datasette 1.0 alphas&lt;/h4&gt;

&lt;p&gt;I've shipped &lt;em&gt;a lot&lt;/em&gt; of Datasette alphas - the most recent is &lt;a href="https://docs.datasette.io/en/latest/changelog.html#a17-2025-02-06"&gt;Datasette 1.0a17&lt;/a&gt;. Those alphas get published to &lt;a href="https://pypi.org/"&gt;PyPI&lt;/a&gt;, which means they can be installed using &lt;code&gt;pip install datasette==1.0a17&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A while back &lt;a href="https://github.com/simonw/datasette-lite/issues/75"&gt;I added the same ability&lt;/a&gt; to Datasette Lite itself. You can now pass &lt;code&gt;&amp;amp;ref=1.0a17&lt;/code&gt; to the Datasette Lite URL to load that specific version of Datasette.&lt;/p&gt;
&lt;p&gt;This works thanks to the magic of Pyodide's &lt;a href="https://micropip.pyodide.org/"&gt;micropip&lt;/a&gt; mechanism. Every time you load Datasette Lite in your browser it's actually using &lt;code&gt;micropip&lt;/code&gt; to install the packages it needs directly from PyPI. The code looks something like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-js"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-s1"&gt;pyodide&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;loadPackage&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;'micropip'&lt;/span&gt;&lt;span class="pl-kos"&gt;,&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;&lt;span class="pl-c1"&gt;messageCallback&lt;/span&gt;: &lt;span class="pl-s1"&gt;log&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-k"&gt;let&lt;/span&gt; &lt;span class="pl-s1"&gt;datasetteToInstall&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;'datasette'&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-k"&gt;let&lt;/span&gt; &lt;span class="pl-s1"&gt;pre&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;'False'&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
&lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;settings&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;ref&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
  &lt;span class="pl-k"&gt;if&lt;/span&gt; &lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s1"&gt;settings&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;ref&lt;/span&gt; &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-s"&gt;'pre'&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-s1"&gt;pre&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;'True'&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt; &lt;span class="pl-k"&gt;else&lt;/span&gt; &lt;span class="pl-kos"&gt;{&lt;/span&gt;
    &lt;span class="pl-s1"&gt;datasetteToInstall&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;`datasette==&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;settings&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;ref&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;`&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;
  &lt;span class="pl-kos"&gt;}&lt;/span&gt;
&lt;span class="pl-kos"&gt;}&lt;/span&gt;
&lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-s1"&gt;self&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-c1"&gt;pyodide&lt;/span&gt;&lt;span class="pl-kos"&gt;.&lt;/span&gt;&lt;span class="pl-en"&gt;runPythonAsync&lt;/span&gt;&lt;span class="pl-kos"&gt;(&lt;/span&gt;&lt;span class="pl-s"&gt;`&lt;/span&gt;
&lt;span class="pl-s"&gt;import micropip&lt;/span&gt;
&lt;span class="pl-s"&gt;await micropip.install("&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;datasetteToInstall&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;", pre=&lt;span class="pl-s1"&gt;&lt;span class="pl-kos"&gt;${&lt;/span&gt;&lt;span class="pl-s1"&gt;pre&lt;/span&gt;&lt;span class="pl-kos"&gt;}&lt;/span&gt;&lt;/span&gt;)&lt;/span&gt;
&lt;span class="pl-s"&gt;`&lt;/span&gt;&lt;span class="pl-kos"&gt;)&lt;/span&gt;&lt;span class="pl-kos"&gt;;&lt;/span&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/datasette-lite/blob/main/webworker.js"&gt;Full code here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That &lt;code&gt;settings&lt;/code&gt; object has been passed to the Web Worker that loads Datasette, incorporating various query string parameters.&lt;/p&gt;
&lt;p&gt;This all means I can pass &lt;code&gt;?ref=1.0a17&lt;/code&gt; to Datasette Lite to load a specific version, or &lt;code&gt;?ref=pre&lt;/code&gt; to get the most recently released pre-release version.&lt;/p&gt;
&lt;h4 id="this-works-for-plugins-too"&gt;This works for plugins, too&lt;/h4&gt;
&lt;p&gt;Since loading extra packages from PyPI via &lt;code&gt;micropip&lt;/code&gt; is so easy, I went a step further and added plugin support.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;?install=&lt;/code&gt; parameter can be passed multiple times, each time specifying a Datasette plugin from PyPI that should be installed into the browser.&lt;/p&gt;
&lt;p&gt;The README includes &lt;a href="https://github.com/simonw/datasette-lite?tab=readme-ov-file#installing-plugins"&gt;a bunch of examples&lt;/a&gt; of this mechanism in action. Here's a fun one &lt;a href="https://lite.datasette.io/?install=datasette-mp3-audio&amp;amp;csv=https://gist.githubusercontent.com/simonw/0a30d52feeb3ff60f7d8636b0bde296b/raw/c078a9e5a0151331e2e46c04c1ebe7edc9f45e8c/scotrail-announcements.csv#/data/scotrail-announcements"&gt;that loads datasette-mp3-audio&lt;/a&gt; to provide inline MP3 playing widgets, originally created for my &lt;a href="https://simonwillison.net/2022/Aug/21/scotrail/"&gt;ScotRail audio announcements&lt;/a&gt; project.&lt;/p&gt;
&lt;p&gt;This only works for some plugins. They need to be pure Python wheels - getting plugins with compiled binary dependencies to work in Pyodide WebAssembly requires a whole set of steps that I haven't quite figured out.&lt;/p&gt;
&lt;p&gt;Frustratingly, it doesn't work for plugins that run their own JavaScript yet! I may need to rearchitect significant chunks of both Datasette and Datasette Lite to make that work.&lt;/p&gt;
&lt;p&gt;It's also worth noting that this is a remote code execution security hole. I don't think that's a problem here, because &lt;code&gt;lite.datasette.io&lt;/code&gt; is deliberately hosted on the subdomain of a domain that I &lt;em&gt;never&lt;/em&gt; intend to use cookies on. It's possible to vandalize the visual display of &lt;code&gt;lite.datasette.io&lt;/code&gt; but it shouldn't be possible to steal any private data or do any lasting damage.&lt;/p&gt;
&lt;h4 id="datasette-visible-internal-db"&gt;datasette-visible-internal-db&lt;/h4&gt;
&lt;p&gt;This evening's debugging exercise used a plugin called &lt;a href="https://pypi.org/project/datasette-visible-internal-db/"&gt;datasette-visible-internal-db&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Datasette's &lt;a href="https://docs.datasette.io/en/latest/internals.html#datasette-s-internal-database"&gt;internal database&lt;/a&gt; is an invisible SQLite database that sits at the heart of Datasette, tracking things like loaded metadata and the schemas of the currently attached tables.&lt;/p&gt;
&lt;p&gt;Being invisible means we can use it for features that shouldn't be visible to users - plugins that record API secrets or permissions or track comments or data import progress, for example.&lt;/p&gt;
&lt;p&gt;In Python code it's accessed like this:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;internal_db&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-c1"&gt;get_internal_database&lt;/span&gt;()&lt;/pre&gt;
&lt;p&gt;As opposed to Datasette's other databases which are accessed like so:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;db&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-c1"&gt;get_database&lt;/span&gt;(&lt;span class="pl-s"&gt;"my-database"&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;Sometimes, when hacking on Datasette, it's useful to be able to browse the internal database using the default Datasette UI.&lt;/p&gt;
&lt;p&gt;That's what &lt;code&gt;datasette-visible-internal-db&lt;/code&gt; does. The plugin implementation is &lt;a href="https://github.com/datasette/datasette-visible-internal-db/blob/759e7001f91d3076d9f42eddb03fbaf6d1c7b9bb/datasette_visible_internal_db.py"&gt;just five lines of code&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;

&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-c1"&gt;hookimpl&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;startup&lt;/span&gt;(&lt;span class="pl-s1"&gt;datasette&lt;/span&gt;):
    &lt;span class="pl-s1"&gt;db&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-c1"&gt;get_internal_database&lt;/span&gt;()
    &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-c1"&gt;add_database&lt;/span&gt;(&lt;span class="pl-s1"&gt;db&lt;/span&gt;, &lt;span class="pl-s1"&gt;name&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"_internal"&lt;/span&gt;, &lt;span class="pl-s1"&gt;route&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"_internal"&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;On startup the plugin grabs a reference to that internal database and then registers it using Datasette's &lt;a href="https://docs.datasette.io/en/latest/internals.html#add-database-db-name-none-route-none"&gt;add_database() method&lt;/a&gt;. That's all it takes to have it show up as a visible database on the &lt;code&gt;/_internal&lt;/code&gt; path within Datasette.&lt;/p&gt;
&lt;h4 id="spotting-the-bug"&gt;Spotting the bug&lt;/h4&gt;
&lt;p&gt;I was poking around with this today out of pure curiosity - I hadn't tried &lt;code&gt;?install=datasette-visible-internal-db&lt;/code&gt; with Datasette Lite before and I wanted to see if it worked.&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://lite.datasette.io/?install=datasette-visible-internal-db&amp;amp;ref=1.0a17#/_internal/catalog_columns?_facet=database_name"&gt;that URL from earlier&lt;/a&gt;, this time with commentary:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;https://lite.datasette.io/ // Datasette Lite
  ?install=datasette-visible-internal-db // Install the visible internal DB plugin
  &amp;amp;ref=1.0a17 // Load the 1.0a17 alpha release
  #/_internal/catalog_columns // Navigate to the /_internal/catalog_columns table page
  &amp;amp;_facet=database_name // Facet by database_name for good measure
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And this is what I saw:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2025/datasette-lite-bug.jpg" alt="Screenshot of Datasette Lite. catalog_columns table has 382 rows. database_name facet shows content 237, fixtures 97, _internal 48. A table shows columns for Link, database_name, table_name, cid and name - a red arrow points to a hyperlinked _internal in the database_name column." style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;This all looked good... until I clicked on that &lt;code&gt;_internal&lt;/code&gt; link in the &lt;code&gt;database_name&lt;/code&gt; column... and it took me to &lt;a href="https://lite.datasette.io/?install=datasette-visible-internal-db&amp;amp;ref=1.0a17#/_internal/databases/_internal"&gt;this /_internal/databases/_internal 404 page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Why was that a 404? Datasette introspects the SQLite table schema to identify foreign key relationships, then turns those into hyperlinks. The SQL schema for that &lt;code&gt;catalog_columns&lt;/code&gt; table (displayed at the bottom of the table page) looked like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;CREATE&lt;/span&gt; &lt;span class="pl-k"&gt;TABLE&lt;/span&gt; &lt;span class="pl-en"&gt;catalog_columns&lt;/span&gt; (
    database_name &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;,
    table_name &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;,
    cid &lt;span class="pl-k"&gt;INTEGER&lt;/span&gt;,
    name &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;,
    type &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;,
    &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;"&lt;/span&gt;notnull&lt;span class="pl-pds"&gt;"&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;INTEGER&lt;/span&gt;,
    default_value &lt;span class="pl-k"&gt;TEXT&lt;/span&gt;, &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;--&lt;/span&gt; renamed from dflt_value&lt;/span&gt;
    is_pk &lt;span class="pl-k"&gt;INTEGER&lt;/span&gt;, &lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;--&lt;/span&gt; renamed from pk&lt;/span&gt;
    hidden &lt;span class="pl-k"&gt;INTEGER&lt;/span&gt;,
    &lt;span class="pl-k"&gt;PRIMARY KEY&lt;/span&gt; (database_name, table_name, name),
    &lt;span class="pl-k"&gt;FOREIGN KEY&lt;/span&gt; (database_name) &lt;span class="pl-k"&gt;REFERENCES&lt;/span&gt; databases(database_name),
    &lt;span class="pl-k"&gt;FOREIGN KEY&lt;/span&gt; (database_name, table_name) &lt;span class="pl-k"&gt;REFERENCES&lt;/span&gt; tables(database_name, table_name)
);&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Those foreign key references are a bug! I renamed the internal tables from &lt;code&gt;databases&lt;/code&gt; and &lt;code&gt;tables&lt;/code&gt; to &lt;code&gt;catalog_databases&lt;/code&gt; and &lt;code&gt;catalog_tables&lt;/code&gt; quite a while ago, but apparently forgot to update the references - and SQLite let me get away with it.&lt;/p&gt;
&lt;h4 id="fixing-the-bug"&gt;Fixing the bug&lt;/h4&gt;
&lt;p&gt;I fixed the bug &lt;a href="https://github.com/simonw/datasette/commit/e59fd0175708f2b14d4e3c08ea16631bda0aaed3"&gt;in this commit&lt;/a&gt;. As is often the case the most interesting part of the fix is &lt;a href="https://github.com/simonw/datasette/blob/e59fd0175708f2b14d4e3c08ea16631bda0aaed3/tests/test_internal_db.py#L65-L84"&gt;the accompanying test&lt;/a&gt;. I decided to use the introspection helpers in &lt;a href="https://sqlite-utils.datasette.io/"&gt;sqlite-utils&lt;/a&gt; to guard against every making another mistake like this again in the future:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;pytest&lt;/span&gt;.&lt;span class="pl-c1"&gt;mark&lt;/span&gt;.&lt;span class="pl-c1"&gt;asyncio&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;async&lt;/span&gt; &lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;test_internal_foreign_key_references&lt;/span&gt;(&lt;span class="pl-s1"&gt;ds_client&lt;/span&gt;):
    &lt;span class="pl-s1"&gt;internal_db&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-en"&gt;ensure_internal&lt;/span&gt;(&lt;span class="pl-s1"&gt;ds_client&lt;/span&gt;)
    &lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;inner&lt;/span&gt;(&lt;span class="pl-s1"&gt;conn&lt;/span&gt;):
        &lt;span class="pl-s1"&gt;db&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;sqlite_utils&lt;/span&gt;.&lt;span class="pl-c1"&gt;Database&lt;/span&gt;(&lt;span class="pl-s1"&gt;conn&lt;/span&gt;)
        &lt;span class="pl-s1"&gt;table_names&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;db&lt;/span&gt;.&lt;span class="pl-c1"&gt;table_names&lt;/span&gt;()
        &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;table&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;db&lt;/span&gt;.&lt;span class="pl-c1"&gt;tables&lt;/span&gt;:
            &lt;span class="pl-k"&gt;for&lt;/span&gt; &lt;span class="pl-s1"&gt;fk&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;table&lt;/span&gt;.&lt;span class="pl-c1"&gt;foreign_keys&lt;/span&gt;:
                &lt;span class="pl-s1"&gt;other_table&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;fk&lt;/span&gt;.&lt;span class="pl-c1"&gt;other_table&lt;/span&gt;
                &lt;span class="pl-s1"&gt;other_column&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;fk&lt;/span&gt;.&lt;span class="pl-c1"&gt;other_column&lt;/span&gt;
                &lt;span class="pl-s1"&gt;message&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;'Column "{}.{}" references other column "{}.{}" which does not exist'&lt;/span&gt;.&lt;span class="pl-c1"&gt;format&lt;/span&gt;(
                    &lt;span class="pl-s1"&gt;table&lt;/span&gt;.&lt;span class="pl-c1"&gt;name&lt;/span&gt;, &lt;span class="pl-s1"&gt;fk&lt;/span&gt;.&lt;span class="pl-c1"&gt;column&lt;/span&gt;, &lt;span class="pl-s1"&gt;other_table&lt;/span&gt;, &lt;span class="pl-s1"&gt;other_column&lt;/span&gt;
                )
                &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-s1"&gt;other_table&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;table_names&lt;/span&gt;, &lt;span class="pl-s1"&gt;message&lt;/span&gt; &lt;span class="pl-c1"&gt;+&lt;/span&gt; &lt;span class="pl-s"&gt;" (bad table)"&lt;/span&gt;
                &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-s1"&gt;other_column&lt;/span&gt; &lt;span class="pl-c1"&gt;in&lt;/span&gt; &lt;span class="pl-s1"&gt;db&lt;/span&gt;[&lt;span class="pl-s1"&gt;other_table&lt;/span&gt;].&lt;span class="pl-c1"&gt;columns_dict&lt;/span&gt;, (
                    &lt;span class="pl-s1"&gt;message&lt;/span&gt; &lt;span class="pl-c1"&gt;+&lt;/span&gt; &lt;span class="pl-s"&gt;" (bad column)"&lt;/span&gt;
                )
    &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-s1"&gt;internal_db&lt;/span&gt;.&lt;span class="pl-c1"&gt;execute_fn&lt;/span&gt;(&lt;span class="pl-s1"&gt;inner&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;This uses Datasette's &lt;a href="https://docs.datasette.io/en/latest/internals.html#await-db-execute-fn-fn"&gt;await db.execute_fn()&lt;/a&gt; method, which lets you run Python code that accesses SQLite in a thread. That code can then use the blocking &lt;code&gt;sqlite-utils&lt;/code&gt; &lt;a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#introspecting-tables-and-views"&gt;introspection methods&lt;/a&gt; - here I'm looping through every table in that internal database, looping through each tables &lt;code&gt;.foreign_keys&lt;/code&gt; and confirming that the &lt;code&gt;.other_table&lt;/code&gt; and &lt;code&gt;.other_column&lt;/code&gt; values reference a table and column that genuinely exist.&lt;/p&gt;
&lt;p&gt;I ran this test, watched it fail, then applied the fix and it passed.&lt;/p&gt;
&lt;h4 id="url-addressable-steps-to-reproduce"&gt;URL-addressable Steps To Reproduce&lt;/h4&gt;
&lt;p&gt;The idea I most wanted to highlight here is the enormous value provided by &lt;strong&gt;URL-addressable Steps To Reproduce&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Having good Steps To Reproduce is crucial for productively fixing bugs. Something you can &lt;em&gt;click on&lt;/em&gt; to see the bug is the most effective form of STR there is.&lt;/p&gt;
&lt;p&gt;Ideally, these URLs will continue to work long into the future.&lt;/p&gt;
&lt;p&gt;The great thing about a system like Datasette Lite is that everything is statically hosted files. The application itself is hosted on GitHub Pages, and it works by loading additional files from various different CDNs. The only dynamic aspect is cached lookups against the PyPI API, which I expect to stay stable for a long time to come.&lt;/p&gt;
&lt;p&gt;As a stable component of the Web platform &lt;a href="https://caniuse.com/wasm"&gt;for almost 8 years&lt;/a&gt; WebAssembly is clearly here to stay. I expect we'll be able to execute today's WASM code in browsers 20+ years from now.&lt;/p&gt;
&lt;p&gt;I'm confident that the patterns I've been exploring in Datasette Lite over the past few years could be just as valuable for other projects. Imagine demonstrating bugs in a Django application using a static WebAssembly build, archived forever as part of an issue tracking system.&lt;/p&gt;
&lt;p&gt;I think WebAssembly and Pyodide still have a great deal of untapped potential for the wider Python world.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pyodide"&gt;pyodide&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette-lite"&gt;datasette-lite&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="python"/><category term="urls"/><category term="datasette"/><category term="webassembly"/><category term="pyodide"/><category term="datasette-lite"/></entry><entry><title>New Django {% querystring %} template tag</title><link href="https://simonwillison.net/2024/Aug/13/django-querystring-template-tag/#atom-tag" rel="alternate"/><published>2024-08-13T18:03:49+00:00</published><updated>2024-08-13T18:03:49+00:00</updated><id>https://simonwillison.net/2024/Aug/13/django-querystring-template-tag/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://docs.djangoproject.com/en/5.1/ref/templates/builtins/#std-templatetag-querystring"&gt;New Django {% querystring %} template tag&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Django 5.1 &lt;a href="https://www.djangoproject.com/weblog/2024/aug/07/django-51-released/"&gt;came out last week&lt;/a&gt; and includes a neat new template tag which solves a problem I've faced a bunch of times in the past.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{% querystring color="red" size="S" %}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Adds &lt;code&gt;?color=red&amp;amp;size=S&lt;/code&gt; to the current URL - keeping any other existing parameters and replacing the current value for &lt;code&gt;color&lt;/code&gt; or &lt;code&gt;size&lt;/code&gt; if it's already set.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{% querystring color=None %}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Removes the &lt;code&gt;?color=&lt;/code&gt; parameter if it is currently set.&lt;/p&gt;
&lt;p&gt;If the value passed is a list it will append &lt;code&gt;?color=red&amp;amp;color=blue&lt;/code&gt; for as many items as exist in the list.&lt;/p&gt;
&lt;p&gt;You can access values in variables and you can also assign the result to a new template variable rather than outputting it directly to the page:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;{% querystring page=page.next_page_number as next_page %}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Other things that caught my eye in Django 5.1:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.djangoproject.com/en/5.1/ref/databases/#postgresql-pool"&gt;PostgreSQL connection pools&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The new &lt;a href="https://docs.djangoproject.com/en/5.1/ref/middleware/#django.contrib.auth.middleware.LoginRequiredMiddleware"&gt;LoginRequiredMiddleware&lt;/a&gt; for making every page in an application require login.&lt;/li&gt;
&lt;li&gt;The SQLite database backend now accepts &lt;a href="https://docs.djangoproject.com/en/5.1/ref/databases/#sqlite-init-command"&gt;init_command&lt;/a&gt; for settings things like &lt;code&gt;PRAGMA cache_size=2000&lt;/code&gt; on new connections.&lt;/li&gt;
&lt;li&gt;SQLite can also be passed &lt;code&gt;"transaction_mode": "IMMEDIATE"&lt;/code&gt; to &lt;a href="https://docs.djangoproject.com/en/5.1/ref/databases/#sqlite-transaction-behavior"&gt;configure the behaviour of transactions&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/postgresql"&gt;postgresql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;&lt;/p&gt;



</summary><category term="django"/><category term="postgresql"/><category term="sqlite"/><category term="urls"/></entry><entry><title>trurl manipulates URLs</title><link href="https://simonwillison.net/2023/Apr/4/trurl-manipulates-urls/#atom-tag" rel="alternate"/><published>2023-04-04T22:08:13+00:00</published><updated>2023-04-04T22:08:13+00:00</updated><id>https://simonwillison.net/2023/Apr/4/trurl-manipulates-urls/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://daniel.haxx.se/blog/2023/04/03/introducing-trurl/"&gt;trurl manipulates URLs&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Brand new command-line tool from curl creator Daniel Stenberg: The tr stands for translate or transpose, and the tool provides various mechanisms for normalizing URLs, adding query strings, changing the path or hostname and other similar modifications. I’ve tried designing APis for this kind of thing in the past—Datasette includes some clumsily named functions such as path_with_removed_args()—and it’s a deceptively deep set of problems.&lt;br&gt;.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/curl"&gt;curl&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/daniel-stenberg"&gt;daniel-stenberg&lt;/a&gt;&lt;/p&gt;



</summary><category term="curl"/><category term="urls"/><category term="daniel-stenberg"/></entry><entry><title>Why I invented "dash encoding", a new encoding scheme for URL paths</title><link href="https://simonwillison.net/2022/Mar/5/dash-encoding/#atom-tag" rel="alternate"/><published>2022-03-05T21:50:38+00:00</published><updated>2022-03-05T21:50:38+00:00</updated><id>https://simonwillison.net/2022/Mar/5/dash-encoding/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;a href="https://datasette.io/"&gt;Datasette&lt;/a&gt; now includes its own custom string encoding scheme, which I've called &lt;strong&gt;dash encoding&lt;/strong&gt;. I really didn't want to have to invent something new here, but unfortunately I think this is the best solution to my very particular problem. Some notes on how dash encoding works and why I created it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update 18th March 2022&lt;/strong&gt;: This turned out not to be the right idea for my project after all! I ended up settling on a &lt;a href="https://simonwillison.net/2022/Mar/19/weeknotes/#tilde-encoding"&gt;Tilde encoding&lt;/a&gt; scheme instead.&lt;/p&gt;

&lt;h4&gt;Table names and rows in URLs&lt;/h4&gt;
&lt;p&gt;I've put a lot of thought into the design of Datasette's URLs.&lt;/p&gt;
&lt;p&gt;Datasette exposes relational databases tables, as both web pages and a JSON API.&lt;/p&gt;
&lt;p&gt;Consider a database in a SQLite file called &lt;code&gt;legislators.db&lt;/code&gt;, containing a table called &lt;code&gt;legislator_terms&lt;/code&gt; (example from &lt;a href="https://datasette.io/tutorials/explore"&gt;this tutorial&lt;/a&gt;). The URL path to the web interface for that table will be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://congress-legislators.datasettes.com/legislators/legislator_terms"&gt;/legislators/legislator_terms&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And the JSON API will be here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://congress-legislators.datasettes.com/legislators/legislator_terms.json"&gt;/legislators/legislator_terms.json&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(Worth noting that Datasette supports other formats here too - &lt;a href="https://docs.datasette.io/en/stable/csv_export.html"&gt;CSV&lt;/a&gt; by default, and plugins can add more formats such as &lt;a href="https://datasette.io/plugins/datasette-geojson"&gt;GeoJSON&lt;/a&gt; or &lt;a href="https://datasette.io/plugins/datasette-atom"&gt;Atom&lt;/a&gt; or &lt;a href="https://datasette.io/plugins/datasette-ics"&gt;iCal&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Datasette also provides pages (and APIs) for individual rows, identified by their primary key:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://congress-legislators.datasettes.com/legislators/legislator_terms/1"&gt;/legislators/legislator_terms/1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://congress-legislators.datasettes.com/legislators/legislator_terms/1.json"&gt;/legislators/legislator_terms/1.json&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For tables with compound primary keys, these pages can include the primary key values separated by commas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://latest.datasette.io/fixtures/compound_three_primary_keys/a,a,a"&gt;/fixtures/compound_three_primary_keys/a,a,a&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is all pretty straightforward so far. But now we get to the challenge: what if a table's name or a row's primary key contains a forward slash or a period character?&lt;/p&gt;
&lt;p&gt;This could break the URL scheme!&lt;/p&gt;
&lt;p&gt;SQLite table names are allowed to contain almost any character, and Datasette is designed to work with any existing SQLite database - so I can't guarantee that a table with one of those characters won't need to be handled.&lt;/p&gt;
&lt;p&gt;Consider a database with two tables - one called &lt;code&gt;legislator_terms&lt;/code&gt; and another called &lt;code&gt;legislator_terms/1&lt;/code&gt; - given the URL &lt;code&gt;/legislators/legislator_terms/1&lt;/code&gt; it's no longer clear if it refers to the table with that name or the row with primary key 1 in the other table!&lt;/p&gt;
&lt;p&gt;A similar problem exists for table names with as &lt;code&gt;legislators.csv&lt;/code&gt; - which end in a format. Or primary key string values that end in &lt;code&gt;.json&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;Why URL encoding doesn't work here&lt;/h4&gt;
&lt;p&gt;Up until now, Datasette has solved this problem using &lt;a href="https://en.wikipedia.org/wiki/Percent-encoding"&gt;URL percent encoding&lt;/a&gt;. This provides a standard mechanism for encoding "special" characters in URLs.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;legislator_terms/1&lt;/code&gt; encodes to &lt;code&gt;legislator_terms%2F1&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;This should be enough to solve the problem. The URL to that weirdly named table can now be:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;/legislators/legislator_terms%2F1&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;When routing the URL, the application can take this into account and identify that this it a table named &lt;code&gt;legislator_terms/1&lt;/code&gt;, as opposed to a request for the row with ID &lt;code&gt;1&lt;/code&gt; in the &lt;code&gt;legislator_terms&lt;/code&gt; table.&lt;/p&gt;
&lt;p&gt;There are two remaining problems.&lt;/p&gt;
&lt;p&gt;Firstly, the "." character is ignored by URL encoding, so we still can't tell the difference between &lt;code&gt;/db/table.json&lt;/code&gt; and a table called &lt;code&gt;table.json&lt;/code&gt;. I worked around this issue in Datasette by supporting an optional alternative &lt;code&gt;?_format=json&lt;/code&gt; parameter, but it's &lt;a href="https://github.com/simonw/datasette/issues/1439"&gt;messy and confusing&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Much more seriously, it turns out there are numerous common pieces of web infrastructure that "helpfully" decode escaped characters in URLs before passing them on to the underlying web application!&lt;/p&gt;
&lt;p&gt;I first encountered this in the ASGI standard itself, which decoded characters in the &lt;code&gt;path&lt;/code&gt; field before they were passed to the rest of the application.I submitted &lt;a href="https://github.com/django/asgiref/issues/87#issuecomment-500168070"&gt;a PR&lt;/a&gt; adding &lt;code&gt;raw_path&lt;/code&gt; to ASGI precisely to work around this problem for Datasette.&lt;/p&gt;
&lt;p&gt;Over time though, the problem kept cropping up. Datasette aims to run on as many hosting platforms as possible. I've seen URL escaping applied at a higher level enough times now to be very suspicious of any load balancer or proxy or other web server mechanism that might end up executing between Datasette and the rest of the web.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; Flask core maintainer David Lord &lt;a href="https://twitter.com/davidism/status/1500251083070787585"&gt;confirms on Twitter&lt;/a&gt; that this is a long-standing known problem:&lt;/p&gt;

&lt;blockquote cite="https://twitter.com/davidism/status/1500251083070787585"&gt;&lt;p&gt;This behavior in Apache/nginx/etc is why WSGI/ASGI can't specify "literal URL the user typed in", because anything in front of the app might modify slashes or anything else. So all the spec can provide is "decoded URL".&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;So, I need a way of encoding a table name that might include &lt;code&gt;/&lt;/code&gt; and &lt;code&gt;.&lt;/code&gt; characters in a way that will survive some other layer of the stack decoding URL encoded strings in the URL path before Datasette gets to see them!&lt;/p&gt;
&lt;h4&gt;Introducing dash encoding&lt;/h4&gt;
&lt;p&gt;That's where dash encoding comes in. I tried to design the fastest, simplest encoding mechanism I could that would solve this very specific problem.&lt;/p&gt;
&lt;p&gt;Loose requirements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reversible - it's crucial to at any possible value survives a round-trip through the encoding&lt;/li&gt;
&lt;li&gt;Avoid changing the string at all if possible. Otherwise I could use something like base64, but I wanted to keep the name in the URL as close to readable as possible&lt;/li&gt;
&lt;li&gt;Survive interference by proxies and load balancer that might try to be helpful&lt;/li&gt;
&lt;li&gt;Fast to apply the transformation&lt;/li&gt;
&lt;li&gt;As simple as possible&lt;/li&gt;
&lt;li&gt;Easy to implement, including in languages other than Python&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Dash encoding consists of three simple steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Replace all single hyphen characters &lt;code&gt;-&lt;/code&gt; with two hyphens &lt;code&gt;--&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Replace any forward slash &lt;code&gt;/&lt;/code&gt; character with hyphen forward slash &lt;code&gt;-/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Replace any period character &lt;code&gt;.&lt;/code&gt; with hyphen period &lt;code&gt;-.&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To reverse the encoding, run those steps backwards.&lt;/p&gt;
&lt;p&gt;Here the Python implementation of this encoding scheme:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;dash_encode&lt;/span&gt;(&lt;span class="pl-s1"&gt;s&lt;/span&gt;: &lt;span class="pl-s1"&gt;str&lt;/span&gt;) &lt;span class="pl-c1"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;str&lt;/span&gt;:
     &lt;span class="pl-s"&gt;"Returns dash-encoded string - for example ``/foo/bar`` -&amp;gt; ``-/foo-/bar``"&lt;/span&gt;
     &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-s1"&gt;s&lt;/span&gt;.&lt;span class="pl-en"&gt;replace&lt;/span&gt;(&lt;span class="pl-s"&gt;"-"&lt;/span&gt;, &lt;span class="pl-s"&gt;"--"&lt;/span&gt;).&lt;span class="pl-en"&gt;replace&lt;/span&gt;(&lt;span class="pl-s"&gt;"."&lt;/span&gt;, &lt;span class="pl-s"&gt;"-."&lt;/span&gt;).&lt;span class="pl-en"&gt;replace&lt;/span&gt;(&lt;span class="pl-s"&gt;"/"&lt;/span&gt;, &lt;span class="pl-s"&gt;"-/"&lt;/span&gt;)

&lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;dash_decode&lt;/span&gt;(&lt;span class="pl-s1"&gt;s&lt;/span&gt;: &lt;span class="pl-s1"&gt;str&lt;/span&gt;) &lt;span class="pl-c1"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="pl-s1"&gt;str&lt;/span&gt;:
     &lt;span class="pl-s"&gt;"Decodes a dash-encoded string, so ``-/foo-/bar`` -&amp;gt; ``/foo/bar``"&lt;/span&gt;
     &lt;span class="pl-k"&gt;return&lt;/span&gt; &lt;span class="pl-s1"&gt;s&lt;/span&gt;.&lt;span class="pl-en"&gt;replace&lt;/span&gt;(&lt;span class="pl-s"&gt;"-/"&lt;/span&gt;, &lt;span class="pl-s"&gt;"/"&lt;/span&gt;).&lt;span class="pl-en"&gt;replace&lt;/span&gt;(&lt;span class="pl-s"&gt;"-."&lt;/span&gt;, &lt;span class="pl-s"&gt;"."&lt;/span&gt;).&lt;span class="pl-en"&gt;replace&lt;/span&gt;(&lt;span class="pl-s"&gt;"--"&lt;/span&gt;, &lt;span class="pl-s"&gt;"-"&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;And the pytest tests for it:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-en"&gt;@&lt;span class="pl-s1"&gt;pytest&lt;/span&gt;.&lt;span class="pl-s1"&gt;mark&lt;/span&gt;.&lt;span class="pl-en"&gt;parametrize&lt;/span&gt;(&lt;/span&gt;
&lt;span class="pl-en"&gt;     &lt;span class="pl-s"&gt;"original,expected"&lt;/span&gt;,&lt;/span&gt;
&lt;span class="pl-en"&gt;     (&lt;/span&gt;
&lt;span class="pl-en"&gt;         (&lt;span class="pl-s"&gt;"abc"&lt;/span&gt;, &lt;span class="pl-s"&gt;"abc"&lt;/span&gt;),&lt;/span&gt;
&lt;span class="pl-en"&gt;         (&lt;span class="pl-s"&gt;"/foo/bar"&lt;/span&gt;, &lt;span class="pl-s"&gt;"-/foo-/bar"&lt;/span&gt;),&lt;/span&gt;
&lt;span class="pl-en"&gt;         (&lt;span class="pl-s"&gt;"/-/bar"&lt;/span&gt;, &lt;span class="pl-s"&gt;"-/---/bar"&lt;/span&gt;),&lt;/span&gt;
&lt;span class="pl-en"&gt;         (&lt;span class="pl-s"&gt;"-/db-/table---.csv-.csv"&lt;/span&gt;, &lt;span class="pl-s"&gt;"---/db---/table-------.csv---.csv"&lt;/span&gt;),&lt;/span&gt;
&lt;span class="pl-en"&gt;     ),&lt;/span&gt;
&lt;span class="pl-en"&gt; )&lt;/span&gt;
 &lt;span class="pl-k"&gt;def&lt;/span&gt; &lt;span class="pl-en"&gt;test_dash_encoding&lt;/span&gt;(&lt;span class="pl-s1"&gt;original&lt;/span&gt;, &lt;span class="pl-s1"&gt;expected&lt;/span&gt;):
     &lt;span class="pl-s1"&gt;actual&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s1"&gt;utils&lt;/span&gt;.&lt;span class="pl-en"&gt;dash_encode&lt;/span&gt;(&lt;span class="pl-s1"&gt;original&lt;/span&gt;)
     &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-s1"&gt;actual&lt;/span&gt; &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-s1"&gt;expected&lt;/span&gt;
     &lt;span class="pl-c"&gt;# And test round-trip&lt;/span&gt;
     &lt;span class="pl-k"&gt;assert&lt;/span&gt; &lt;span class="pl-s1"&gt;original&lt;/span&gt; &lt;span class="pl-c1"&gt;==&lt;/span&gt; &lt;span class="pl-s1"&gt;utils&lt;/span&gt;.&lt;span class="pl-en"&gt;dash_decode&lt;/span&gt;(&lt;span class="pl-s1"&gt;actual&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/simonw/datasette/commit/d1cb73180b4b5a07538380db76298618a5fc46b6"&gt;the full commit&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This meets my requirements.&lt;/p&gt;
&lt;h4&gt;Capturing these with a regular expression&lt;/h4&gt;
&lt;p&gt;There was one remaining challenge. Datasette uses regular expressions - inspired by Django - to route requests to the correct page.&lt;/p&gt;
&lt;p&gt;I wanted to use a regular expression to extract out dash encoded values, that could also distinguish them from &lt;code&gt;/&lt;/code&gt; and &lt;code&gt;-&lt;/code&gt; and &lt;code&gt;.&lt;/code&gt; characters that were not encoded in that way.&lt;/p&gt;
&lt;p&gt;Here's the pattern I came up with for strings matching this pattern:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;([^\/\-\.]*|(\-/)|(\-\.)|(\-\-))*&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Broken down:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;[^\/\-\.]*&lt;/code&gt; means 0 or more characters that are NOT one of &lt;code&gt;.&lt;/code&gt; or &lt;code&gt;/&lt;/code&gt; or &lt;code&gt;-&lt;/code&gt; - since we don't care about those characters at all&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;(\-/)&lt;/code&gt; means the explicit sequence &lt;code&gt;-/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;(\-\.)&lt;/code&gt; means the explicit sequence &lt;code&gt;-.&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;(\-\-)&lt;/code&gt; means the explicit sequence &lt;code&gt;--&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Those four are wrapped in a group combined with the &lt;code&gt;|&lt;/code&gt; or operator&lt;/li&gt;
&lt;li&gt;The group is then wrapped in a &lt;code&gt;(..)*&lt;/code&gt; - specifying that it can repeat as many times as you like&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A better way to break down this regular expression is visually, &lt;a href="https://www.debuggex.com/r/KYfCocdmuBHxHETv"&gt;using Debuggex&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/dash-encoding-regex.png" alt="A visualization of the regular expression, showing how it loops around the inner concept of none of those three characters or one of the three explicit character groupings." style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;p&gt;Combining this into the full regular expression that matches a &lt;code&gt;/database/table.format&lt;/code&gt; path is even messier, due to the need to add non-capturing group syntax &lt;code&gt;(?:..)&lt;/code&gt; and named groups &lt;code&gt;(?P&amp;lt;name&amp;gt;...)&lt;/code&gt; - it ends up looking like this:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;^/(?P&amp;lt;database&amp;gt;[^/]+)/(?P&amp;lt;table&amp;gt;(?:[^\/\-\.]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*?)\.(?P&amp;lt;format&amp;gt;\w+)?$&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Visualized &lt;a href="https://www.debuggex.com/r/aTF6lx5JpaMN6UYz"&gt;with Debuggex&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/dash-encoding-regex-full.png" alt="The more complex regex visualized." style="max-width:100%;" /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; Thanks to suggestions &lt;a href="https://twitter.com/dracos/status/1500236433809973248"&gt;from Matthew Somerville&lt;/a&gt; I simplified this further to:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;^/(?P&amp;lt;database&amp;gt;[^/]+)/(?P&amp;lt;table&amp;gt;[^\/\-\.]*|\-/|\-\.|\-\-)*(?P&amp;lt;format&amp;gt;\.\w+)?$&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2022/dash-encoding-regex-simpler.png" alt="This looks less complex in Debuggex" style="max-width:100%;" /&gt;&lt;/p&gt;

&lt;h4&gt;Next steps: implementation&lt;/h4&gt;
&lt;p&gt;I'm currently working on integrating it into Datasette in &lt;a href="https://github.com/simonw/datasette/pull/1648"&gt;this PR&lt;/a&gt;. The full history of my thinking around this problem can be found &lt;a href="https://github.com/simonw/datasette/issues/1439"&gt;in issue 1439&lt;/a&gt;, with comments stretching back to August last year!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/regular-expressions"&gt;regular-expressions&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="regular-expressions"/><category term="urls"/><category term="datasette"/></entry><entry><title>Datasette 0.51 (plus weeknotes)</title><link href="https://simonwillison.net/2020/Nov/1/datasette-0-51/#atom-tag" rel="alternate"/><published>2020-11-01T04:22:55+00:00</published><updated>2020-11-01T04:22:55+00:00</updated><id>https://simonwillison.net/2020/Nov/1/datasette-0-51/#atom-tag</id><summary type="html">
    &lt;p&gt;I shipped &lt;a href="https://docs.datasette.io/en/stable/changelog.html#v0-51"&gt;Datasette 0.51&lt;/a&gt; today, with a new visual design, plugin hooks for adding navigation options, better handling of binary data, URL building utility methods and better support for running Datasette behind a proxy. It's a lot of stuff! Here are the annotated release notes.&lt;/p&gt;
&lt;h4 id="weeknotes-57-new-design"&gt;New visual design&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;Datasette is no longer white and grey with blue and purple links! &lt;a href="https://twitter.com/natbat"&gt;Natalie Downe&lt;/a&gt; has been working on a visual refresh, the first iteration of which is included in this release. (&lt;a href="https://github.com/simonw/datasette/pull/1056"&gt;#1056&lt;/a&gt;)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's about time Datasette grew beyond its clearly-designed-by-a-mostly-backend-engineer roots. Natalie has been helping me start adding some visual polish: we've started with an update to the colour scheme and will be continuing to iterate on the visual design as the project evolves towards the 1.0 release.&lt;/p&gt;
&lt;p&gt;The new design makes the navigation bar much more obvious, which is important for this release since the new navigation menu (tucked away behind a three-bar icon) is a key new feature.&lt;/p&gt;
&lt;p&gt;&lt;img alt="A screenshot of the new Datasette visual design, also showcasing two new menu icons" src="https://static.simonwillison.net/static/2020/datasette-0.51.png" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;h4 id="weeknotes-57-plugins-links"&gt;Plugins can now add links within Datasette&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;A number of existing Datasette plugins add new pages to the Datasette interface, providig tools for things like &lt;a href="https://github.com/simonw/datasette-upload-csvs"&gt;uploading CSVs&lt;/a&gt;, &lt;a href="https://github.com/simonw/datasette-edit-schema"&gt;editing table schemas&lt;/a&gt; or &lt;a href="https://github.com/simonw/datasette-configure-fts"&gt;configuring full-text search&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Plugins like this can now link to themselves from other parts of Datasette interface. The &lt;a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-menu-links"&gt;menu_links(datasette, actor)&lt;/a&gt; hook (&lt;a href="https://github.com/simonw/datasette/issues/1064"&gt;#1064&lt;/a&gt;) lets plugins add links to Datasette's new top-right application menu, and the &lt;a href="https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-table-actions"&gt;table_actions(datasette, actor, database, table)&lt;/a&gt; hook (&lt;a href="https://github.com/simonw/datasette/issues/1066"&gt;#1066&lt;/a&gt;) adds links to a new "table actions" menu on the table page.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This feature has been a long time coming. I've been writing an increasing number of plugins that add new pages to Datasette, and so far the main way of using them has been to memorise and type in their URLs!&lt;/p&gt;
&lt;p&gt;The new navigation menu (which only displays if it has something in it) provides a global location to add new links. I've already released several plugin updates that take advantage of this.&lt;/p&gt;
&lt;p&gt;The new "table actions" menu imitates Datasette's existing column header menu icon - it's a cog. Clicking it opens a menu of actions relating to the current table.&lt;/p&gt;
&lt;p&gt;Want to see a demo?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The demo at &lt;a href="https://latest.datasette.io/"&gt;latest.datasette.io&lt;/a&gt; now includes some example plugins. To see the new table actions menu first &lt;a href="https://latest.datasette.io/login-as-root"&gt;sign into that demo as root&lt;/a&gt; and then visit the &lt;a href="https://latest.datasette.io/fixtures/facetable"&gt;facetable&lt;/a&gt; table to see the new cog icon menu at the top of the page.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Here's an animated GIF demo showing the new menus in action.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Using the nav menu to upload CSVs, then the table actions menu to edit the table schema and rename latitude and longitude columns so that the Central Park Squirrel Census is displayed on a map!" src="https://static.simonwillison.net/static/2020/datasette-0.51-menus.gif" style="max-width:100%;" /&gt;&lt;/p&gt;
&lt;h4 id="weeknotes-57-binary-data"&gt;Binary data&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;SQLite tables can contain binary data in &lt;code&gt;BLOB&lt;/code&gt; columns. Datasette now provides links for users to download this data directly from Datasette, and uses those links to make binary data available from CSV exports. See &lt;a href="https://docs.datasette.io/en/stable/binary_data.html#binary"&gt;Binary data&lt;/a&gt; for more details. (&lt;a href="https://github.com/simonw/datasette/issues/1036"&gt;#1036&lt;/a&gt; and &lt;a href="https://github.com/simonw/datasette/issues/1034"&gt;#1034&lt;/a&gt;).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I spent a ton of time on this over the past few weeks. The initial impetus was a realization that Datasette CSV exports included ugly Python &lt;code&gt;b'\x15\x1c\x02\xc7\xad\x05\xfe'&lt;/code&gt; strings, which felt like the worst possible way to display binary in a CSV file, out of universally bad options.&lt;/p&gt;
&lt;p&gt;Datasette's main interface punted on binary entirely - it would show a &lt;code&gt;&amp;lt;Binary data: 7 bytes&amp;gt;&lt;/code&gt; label which didn't help much either.&lt;/p&gt;
&lt;p&gt;The only way to get at binary data stored in a Datasette instance was to request the JSON version and then manually decode the Base-64 value within it!&lt;/p&gt;
&lt;p&gt;This is now fixed: binary columns can be downloaded directly to your computer, using a new &lt;code&gt;.blob&lt;/code&gt; output renderer. The approach is described on &lt;a href="https://docs.datasette.io/en/stable/binary_data.html#binary"&gt;this new page in the documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Security was a major consideration when building this feature. Allowing the download of arbitrary byte payloads from a web server is dangerous business: it can easily result in XSS holes where HTML with dangerous &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; content can end up hosted on the primary domain.&lt;/p&gt;
&lt;p&gt;After some research, I decided to serve up binary content for download using the following headings:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;content-type: application/binary
x-content-type-options: nosniff
content-disposition: attachment; filename="data-f30889.blob"
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;application/binary&lt;/code&gt; is a safer Content-Type option than the more common &lt;code&gt;application/octet-stream&lt;/code&gt;, according to Michal Zalewski's renowned web application security book The Tangled Web (&lt;a href="https://neexee.github.io/posts-en/the-tangled-web/"&gt;quoted here&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;x-content-type-options: nosniff&lt;/code&gt; disables the XSS-tastic content sniffing feature in older versions of Internet Explorer, where IE would helpfully guess that you intended to serve HTML based on the first few bytes of the response.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;content-disposition: attachment&lt;/code&gt; header causes the browser to show a "download this file" dialog, using the suggested filename.&lt;/p&gt;
&lt;p&gt;If you know of a reason that this isn't secure enough, please &lt;a href="https://twitter.com/simonw"&gt;let me know&lt;/a&gt;!&lt;/p&gt;
&lt;h4 id="weeknotes-57-url-building"&gt;URL building&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;The new &lt;a href="https://docs.datasette.io/en/stable/internals.html#internals-datasette-urls"&gt;datasette.urls&lt;/a&gt; family of methods can be used to generate URLs to key pages within the Datasette interface, both within custom templates and Datasette plugins. See &lt;a href="https://docs.datasette.io/en/stable/writing_plugins.html#writing-plugins-building-urls"&gt;Building URLs within plugins&lt;/a&gt; for more details. (&lt;a href="https://github.com/simonw/datasette/issues/904"&gt;#904&lt;/a&gt;)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Datasette's &lt;a href="https://docs.datasette.io/en/stable/config.html#config-base-url"&gt;base_url&lt;/a&gt; configuration setting was the forcing factor around this piece of work.&lt;/p&gt;
&lt;p&gt;It allows you to configure Datasette to serve content starting at a path other than &lt;code&gt;/&lt;/code&gt; - for example:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;datasette --config base_url:/path-to-datasette/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will serve all Datasette pages at locations starting with &lt;code&gt;/path-to-datasette/&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Why would you want to do this? It's useful if you are proxying traffic to Datasette from within the URL hierarchy of an existing website.&lt;/p&gt;
&lt;p&gt;The feature didn't work properly, and enough people care about it that I had a steady stream of bug reports. For 0.51 I gathered them all into a single giant &lt;a href="https://github.com/simonw/datasette/issues/1023"&gt;tracking issue&lt;/a&gt; and worked through them all one by one.&lt;/p&gt;
&lt;p&gt;It quickly became apparent that the key challenge was building URLs within Datasette - not just within HTML template pages, but also for things like HTTP redirects.&lt;/p&gt;
&lt;p&gt;Datasette itself needed to generate URLs that took the &lt;code&gt;base_url&lt;/code&gt; setting into account, but so do Datasette plugins. So I built &lt;a href="https://docs.datasette.io/en/stable/internals.html#datasette-urls"&gt;a new datasette.urls&lt;/a&gt; collection of helper methods and made them part of the documented internals API for plugins. The &lt;a href="https://docs.datasette.io/en/stable/writing_plugins.html#building-urls-within-plugins"&gt;Building URLs within plugins&lt;/a&gt; documentation shows how these should be used.&lt;/p&gt;
&lt;p&gt;I also added documentation on &lt;a href="https://docs.datasette.io/en/stable/deploying.html#deploying-proxy"&gt;Running Datasette behind a proxy&lt;/a&gt; with example configs (tested on my laptop) for both nginx and Apache.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://docs.datasette.io/en/stable/internals.html#datasette-client"&gt;datasette.client&lt;/a&gt; mechanism from Datasette 0.50 allows plugins to make calls to Datasette's internal JSON API without the overhead of an HTTP request. This is another place where plugins need to be able to construct valid URLs to internal Datasette pages.&lt;/p&gt;
&lt;p&gt;I added this example to the documentation showing how the two features can work together:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-s1"&gt;table_json&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; (
    &lt;span class="pl-k"&gt;await&lt;/span&gt; &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-s1"&gt;client&lt;/span&gt;.&lt;span class="pl-en"&gt;get&lt;/span&gt;(
        &lt;span class="pl-s1"&gt;datasette&lt;/span&gt;.&lt;span class="pl-s1"&gt;urls&lt;/span&gt;.&lt;span class="pl-en"&gt;table&lt;/span&gt;(&lt;span class="pl-s"&gt;"fixtures"&lt;/span&gt;, &lt;span class="pl-s"&gt;"facetable"&lt;/span&gt;, &lt;span class="pl-s1"&gt;format&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"json"&lt;/span&gt;)
    )
).&lt;span class="pl-en"&gt;json&lt;/span&gt;()&lt;/pre&gt;
&lt;p&gt;One final weird detail on this: Datasette now has various methods that automatically add the &lt;code&gt;base_url&lt;/code&gt; prefix to a URL. I got worried about what would happen if these were applied more than once (as above, where &lt;code&gt;datasette.urls.table()&lt;/code&gt; applies the prefix so does &lt;code&gt;datasette.client.get()&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;I fixed this using the same trick that Django and Jinja use to avoid appliying auto-escaping twice to content that will be displayed in HTML: the &lt;code&gt;datasette.urls&lt;/code&gt; methods actually return a &lt;code&gt;PrefixedUrlString&lt;/code&gt; object which is a subclass of &lt;code&gt;str&lt;/code&gt; that knows that the prefix has been applied! Code for that &lt;a href="https://github.com/simonw/datasette/blob/main/datasette/utils/__init__.py#L1015"&gt;lives here&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id="weeknotes-57-weeknotes-smaller-changes"&gt;Smaller changes&lt;/h4&gt;
&lt;p&gt;A few highlights from the "smaller changes" in Datasette 0.51:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Wide tables shown within Datasette now scroll horizontally (&lt;a href="https://github.com/simonw/datasette/issues/998"&gt;#998&lt;/a&gt;). This is achieved using a new &lt;code&gt;&amp;lt;div class="table-wrapper"&amp;gt;&lt;/code&gt; element which may impact the implementation of some plugins (for example &lt;a href="https://github.com/simonw/datasette-cluster-map/commit/fcb4abbe7df9071c5ab57defd39147de7145b34e"&gt;this change to datasette-cluster-map&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I think this is a big improvement: if your database table is too wide, it now scrolls horizontally on the page (rather than blowing the entire page out to a wider width). You can see that in action on the &lt;a href="https://global-power-plants.datasettes.com/global-power-plants/global-power-plants"&gt;global-power-plants demo&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New &lt;a href="https://docs.datasette.io/en/stable/authentication.html#permissions-debug-menu"&gt;debug-menu&lt;/a&gt; permission. (&lt;a href="https://github.com/simonw/datasette/issues/1068"&gt;#1068&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you are signed in as &lt;code&gt;root&lt;/code&gt; the new navigation menu links to a whole plethora of previously-undiscoverable Datasette debugging tools. This new permission controls the display of those items.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Link:&lt;/code&gt; HTTP header pagination. (&lt;a href="https://github.com/simonw/datasette/issues/1014"&gt;#1014&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Inspired by GitHub and WordPress, which both use the HTTP &lt;code&gt;Link&lt;/code&gt; header in this way. It's an optional extra though: Datasette will always offer in-JSON pagination information.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Edit SQL button on canned queries, (&lt;a href="https://github.com/simonw/datasette/issues/1019"&gt;#1019&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Suggested by Jacob Fenton &lt;a href="https://github.com/simonw/datasette/issues/1019"&gt;in this issue&lt;/a&gt;. The implementation had quite a few edge cases since there are certain categories of canned query that can't be executed as custom SQL by the user. See the &lt;a href="https://github.com/simonw/datasette/issues/1019#issuecomment-708128286"&gt;issue comments&lt;/a&gt; for details and a demo.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--load-extension=spatialite&lt;/code&gt; shortcut. (&lt;a href="https://github.com/simonw/datasette/issues/1028"&gt;#1028&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Inspired by &lt;a href="https://github.com/simonw/sqlite-utils/issues/136"&gt;a similar feature&lt;/a&gt; in &lt;code&gt;sqlite-utils&lt;/code&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;datasette -o&lt;/code&gt; option now opens the most relevant page. (&lt;a href="https://github.com/simonw/datasette/issues/976"&gt;#976&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a fun little feature. If your Datasette only loads a single database, and that database only has a single table (common if you've just run a single CSV import) then running this will open your browser directly to that table page:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;datasette data.db -o
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;datasette --cors&lt;/code&gt; option now enables access to &lt;code&gt;/database.db&lt;/code&gt; downloads. (&lt;a href="https://github.com/simonw/datasette/issues/1057"&gt;#1057&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;This was inspired by Mike Bostock's &lt;a href="https://observablehq.com/@mbostock/sqlite"&gt;Observable Notebook&lt;/a&gt; that uses the Emscripten-compiled &lt;a href="https://github.com/sql-js/sql.js"&gt;JavaScript version&lt;/a&gt; of SQLite to run queries against SQLite database files.&lt;/p&gt;
&lt;p&gt;It turned out you couldn't use that notebook against SQLite files hosted in Datasette because they weren't covered by Datasette's CORS option. Now they are!&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;New documentation on &lt;a href="https://docs.datasette.io/en/stable/writing_plugins.html#writing-plugins-designing-urls"&gt;Designing URLs for your plugin&lt;/a&gt;. (&lt;a href="https://github.com/simonw/datasette/issues/1053"&gt;#1053&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Recommendations for plugin authors, inspired by a question &lt;a href="https://twitter.com/kanedr/status/1320653434895347713"&gt;from David Kane&lt;/a&gt; on Twitter. David has been building &lt;a href="https://github.com/drkane/datasette-reconcile"&gt;datasette-reconcile&lt;/a&gt;, a Datasette plugin that offers a reconciliation API endpoint that can be used with &lt;a href="https://openrefine.org/"&gt;OpenRefine&lt;/a&gt;. What a brilliant idea!&lt;/p&gt;
&lt;h4 id="weeknotes-57-datasette-edit-templates"&gt;datasette-edit-templates (almost)&lt;/h4&gt;
&lt;p&gt;Inspired by a conversation with &lt;a href="https://twitter.com/obra"&gt;Jesse Vincent&lt;/a&gt;, I also spent some time experimenting with the idea of a plugin that can load and edit templates from the database - which would turn a personal Datasette into a really fun interface hacking environment. I nearly got this working, and even shipped a preview of a &lt;code&gt;load_template()&lt;/code&gt; plugin hook in the &lt;a href="https://github.com/simonw/datasette/releases/tag/0.51a2"&gt;Datasette 0.51a2&lt;/a&gt; alpha... before crashing into a road block when I realized that it also needed to work with Jinja's &lt;code&gt;{% extends %}&lt;/code&gt; and &lt;code&gt;{% include %}&lt;/code&gt; template tags and loaders for those &lt;a href="https://github.com/pallets/jinja/issues/1304"&gt;don't currenty support async functions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In exploring this I also realized that my &lt;code&gt;load_template()&lt;/code&gt; plugin hook wasn't actually necessary - if I'm going to solve this problem with Jinja loaders I can do so using the existing &lt;a href="https://docs.datasette.io/en/stable/plugin_hooks.html#prepare-jinja2-environment-env"&gt;prepare_jinja2_environment(env)&lt;/a&gt; hook.&lt;/p&gt;
&lt;p&gt;My not-yet-functional prototype for this is caled &lt;a href="https://github.com/simonw/datasette-edit-templates"&gt;datasette-edit-templates&lt;/a&gt;. I'm pretty confident I can get it working against the old plugin hook with a little more work.&lt;/p&gt;
&lt;h4 id="weeknotes-57-other-weeknotes"&gt;Other weeknotes&lt;/h4&gt;
&lt;p&gt;Most of my time this week was spent on Datasette 0.51 - but I did find a little bit of time for other projects.&lt;/p&gt;
&lt;p&gt;I finished recording my talk for PyCon Argentina. It will air on November 20th.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/2.23"&gt;sqlite-utils 2.23&lt;/a&gt; is out, with a &lt;code&gt;.m2m()&lt;/code&gt; bug fix from Adam Wolf and the new ability to display progress bars when importing TSV and CSV files.&lt;/p&gt;
&lt;h4 id="weeknotes-57-releases"&gt;Releases this week&lt;/h4&gt;
&lt;p&gt;Several of these are updates to take advantage of the new navigation plugin hooks introduced in Datasette 0.51.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-configure-fts/releases/tag/1.1"&gt;datasette-configure-fts 1.1&lt;/a&gt; - 2020-11-01&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-graphql/releases/tag/1.1"&gt;datasette-graphql 1.1&lt;/a&gt; - 2020-11-01&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-edit-schema/releases/tag/0.4"&gt;datasette-edit-schema 0.4&lt;/a&gt; - 2020-10-31&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-upload-csvs/releases/tag/0.6"&gt;datasette-upload-csvs 0.6&lt;/a&gt; - 2020-10-31&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette/releases/tag/0.51"&gt;datasette 0.51&lt;/a&gt; - 2020-10-31&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette/releases/tag/0.51a2"&gt;datasette 0.51a2&lt;/a&gt; - 2020-10-30&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-edit-schema/releases/tag/0.4a0"&gt;datasette-edit-schema 0.4a0&lt;/a&gt; - 2020-10-30&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette/releases/tag/0.51a1"&gt;datasette 0.51a1&lt;/a&gt; - 2020-10-30&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/datasette-render-markdown/releases/tag/1.2"&gt;datasette-render-markdown 1.2&lt;/a&gt; - 2020-10-28&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/simonw/sqlite-utils/releases/tag/2.23"&gt;sqlite-utils 2.23&lt;/a&gt; - 2020-10-28&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id="weeknotes-57-til"&gt;TIL this week&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/til/til/python_decorators-with-optional-arguments.md"&gt;Decorators with optional arguments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://til.simonwillison.net/til/til/javascript_dropdown-menu-with-details-summary.md"&gt;Dropdown menu with details summary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/natalie-downe"&gt;natalie-downe&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/projects"&gt;projects&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xss"&gt;xss&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/datasette"&gt;datasette&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/weeknotes"&gt;weeknotes&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annotated-release-notes"&gt;annotated-release-notes&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="natalie-downe"/><category term="projects"/><category term="security"/><category term="urls"/><category term="xss"/><category term="datasette"/><category term="weeknotes"/><category term="annotated-release-notes"/></entry><entry><title>Microbrowsers are Everywhere</title><link href="https://simonwillison.net/2019/Dec/18/microbrowsers-are-everywhere/#atom-tag" rel="alternate"/><published>2019-12-18T08:32:19+00:00</published><updated>2019-12-18T08:32:19+00:00</updated><id>https://simonwillison.net/2019/Dec/18/microbrowsers-are-everywhere/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://24ways.org/2019/microbrowsers-are-everywhere/"&gt;Microbrowsers are Everywhere&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Colin Bendell introduces a new-to-me term, “microbrowsers”, to describe the user-agents which hit websites to generate unfurled link previews in messenger apps. Twitter and Facebook first popularized them, but today you’re likely getting far more preview-generating traffic from chat clients such as iMessage, WhatsApp and Slack (which won’t execute script and ignore cookies, and hence won’t show up in Google Analytics). Lots of great tips here—one example: if you provide three og:image meta tags iMessage will render them as a collage.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=21800346"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/24-ways"&gt;24-ways&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/metadata"&gt;metadata&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;&lt;/p&gt;



</summary><category term="24-ways"/><category term="metadata"/><category term="urls"/></entry><entry><title>Removing MediaWiki from SPA: Cool URIs don't change</title><link href="https://simonwillison.net/2017/Oct/8/cool-uris/#atom-tag" rel="alternate"/><published>2017-10-08T19:54:24+00:00</published><updated>2017-10-08T19:54:24+00:00</updated><id>https://simonwillison.net/2017/Oct/8/cool-uris/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.annashipman.co.uk/jfdi/removing-mediawiki-cool-uris.html"&gt;Removing MediaWiki from SPA: Cool URIs don&amp;#x27;t change&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Detailed write-up from Anna Shipman describing how she archived an old MediaWiki as static content using recursive wget and some cunning application of mod_rewrite.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/SteveMarshall/status/917110425308803072"&gt;Steve Marshall&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/annashipman"&gt;annashipman&lt;/a&gt;&lt;/p&gt;



</summary><category term="urls"/><category term="annashipman"/></entry><entry><title>Recovering missing content from the Internet Archive</title><link href="https://simonwillison.net/2017/Oct/8/missing-content/#atom-tag" rel="alternate"/><published>2017-10-08T19:08:57+00:00</published><updated>2017-10-08T19:08:57+00:00</updated><id>https://simonwillison.net/2017/Oct/8/missing-content/#atom-tag</id><summary type="html">
    &lt;p&gt;When &lt;a href="https://simonwillison.net/2017/Oct/1/ship/"&gt;I restored my blog last weekend&lt;/a&gt; I used the most recent SQL backup of my blog’s database from back in 2010. I thought it had all of my content from before I started my 7 year hiatus, but in watching the 404 logs I started seeing the occasional hit to something that really should have been there but wasn’t. Turns out the SQL backup I was working from was missing some content.&lt;/p&gt;
&lt;p&gt;Thank goodness then for &lt;a href="https://archive.org/web/"&gt;the Wayback Machine&lt;/a&gt; at the Internet Archive! I tried some of the missing URLs there and found they had been captured and preserved. But how to get them back?&lt;/p&gt;
&lt;p&gt;A quick search turned up &lt;a href="https://github.com/hartator/wayback-machine-downloader"&gt;wayback-machine-downloader&lt;/a&gt;, an open-source Ruby script that claims to be able to &lt;em&gt;Download an entire website from the Internet Archive Wayback Machine&lt;/em&gt;. I gem installed it and tried it out (after some cargo cult incantations to work around &lt;a href="https://rvm.io/support/fixing-broken-ssl-certificates"&gt;some weird certificate errors&lt;/a&gt; I was seeing)&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;rvm osx-ssl-certs update all
gem update --system
gem install wayback_machine_downloader

wayback_machine_downloader http://simonwillison.net/
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And it worked! I left it running overnight and came back to a folder containing 18,952 HTML files, neatly arranged in a directory structure that matched my site:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;$ find . | more
.
./simonwillison.net
./simonwillison.net/2002
./simonwillison.net/2002/Aug
./simonwillison.net/2002/Aug/1
./simonwillison.net/2002/Aug/1/cetis
./simonwillison.net/2002/Aug/1/cetis/index.html
./simonwillison.net/2002/Aug/1/cssSelectorsTutorial
./simonwillison.net/2002/Aug/1/cssSelectorsTutorial/index.html
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I tarred them up into an archive and backed them up to Dropbox.&lt;/p&gt;
&lt;p&gt;Next challenge: how to restore the missing content?&lt;/p&gt;
&lt;p&gt;I’m a recent and enthusiastic adopter of &lt;a href="https://jupyter-notebook.readthedocs.io/en/latest/notebook.html"&gt;Jupyter notebooks&lt;/a&gt;. As a huge fan of development in a REPL I’m shocked I was so late to this particular party. So I fired up Jupyter and used it to start playing with the data.&lt;/p&gt;
&lt;p&gt;Here’s &lt;a href="https://github.com/simonw/simonwillisonblog/blob/0d233afbf3bf4fbe8778b8e6e022616d73e11568/jupyter-notebooks/Recover%20content%20from%20the%20wayback%20machine.ipynb"&gt;the final version of my notebook&lt;/a&gt;. I ended up with a script that did the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Load in the full list of paths from the tar archive, and filter for just the ones matching the /YYYY/Mon/DD/slug/ format used for my blog content&lt;/li&gt;
&lt;li&gt;Talk to my local Django development environment and load in the full list of actual content URLs represented in that database.&lt;/li&gt;
&lt;li&gt;Calculate the difference between the two - those are the 213 items that need to be recovered.&lt;/li&gt;
&lt;li&gt;For each of those 213 items, load the full HTML that had been saved by the Internet Archive and feed it into the &lt;a href="https://www.crummy.com/software/BeautifulSoup/"&gt;BeautifulSoup&lt;/a&gt; HTML parsing library.&lt;/li&gt;
&lt;li&gt;Detect if each one is an entry, a blogmark or a quotation. Scrape the key content out of each one based on the type.&lt;/li&gt;
&lt;li&gt;Scrape the tags for each item, using this delightful one-liner: &lt;code&gt;[a.text for a in soup.findAll('a', {'rel': 'tag'})]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Scrape the comments for each item separately. These were mostly spam, so I haven’t yet recovered these for publication (I need to do some aggressive spam filtering first). I have however stashed them in the database for later processing.&lt;/li&gt;
&lt;li&gt;Write all of the scraped data out to a giant JSON file and upload it to a gist (a nice cheap way of giving it a URL).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Having executed the above script, I now have a JSON file containing the parsed content for all of the missing items found in the Wayback Machine. All I needed then was a script which could take that JSON and turn it into records in the database. I implemented that as &lt;a href="https://github.com/simonw/simonwillisonblog/blob/0d233afbf3bf4fbe8778b8e6e022616d73e11568/blog/management/commands/import_blog_json.py"&gt;a custom Django management command&lt;/a&gt; and deployed it to Heroku.&lt;/p&gt;
&lt;p&gt;Last step: shell into a Heroku dyno (using &lt;code&gt;heroku run bash&lt;/code&gt;) and run the following:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./manage.py import_blog_json \
    --url_to_json=https://gist.github.com/simonw/5a5bc1f58297d2c7d68dd7448a4d6614/raw/28d5d564ae3fe7165802967b0f9c4eff6091caf0/recovered-blog-content.json \
    --tag_with=recovered
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The result: &lt;a href="https://simonwillison.net/tags/recovered/"&gt;213 recovered items&lt;/a&gt; (which I tagged with &lt;code&gt;recovered&lt;/code&gt; so I could easily browse them). Including the most important entry on my whole site, &lt;a href="https://simonwillison.net/2010/Jun/21/married/"&gt;my write-up of my wedding&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;So thank you very much to the &lt;a href="https://archive.org/"&gt;Internet Archive&lt;/a&gt; team, and thank you &lt;a href="https://twitter.com/Hartator"&gt;Hartator&lt;/a&gt; for your extremely useful wayback-machine-downloader tool.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/beautifulsoup"&gt;beautifulsoup&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/internet-archive"&gt;internet-archive&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jupyter"&gt;jupyter&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="beautifulsoup"/><category term="internet-archive"/><category term="urls"/><category term="jupyter"/></entry><entry><title>What do Twitter and Gawker think of hash-bangs URLs?</title><link href="https://simonwillison.net/2013/Dec/8/what-do-twitter-and/#atom-tag" rel="alternate"/><published>2013-12-08T17:45:00+00:00</published><updated>2013-12-08T17:45:00+00:00</updated><id>https://simonwillison.net/2013/Dec/8/what-do-twitter-and/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/What-do-Twitter-and-Gawker-think-of-hash-bangs-URLs/answer/Simon-Willison"&gt;What do Twitter and Gawker think of hash-bangs URLs?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As of December 2013 (and potentially much earlier, I don't have the exact dates) both Twitter and a Gawker have moved away from hash bang URLs, so my guess is they turned out not to be a good idea.&lt;/p&gt;

&lt;p&gt;Most browsers now support the HTML5 history API which allows the usage of proper URLs while still fetching new content using Ajax without triggering a full page request.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/twitter"&gt;twitter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="twitter"/><category term="urls"/><category term="quora"/></entry><entry><title>How to find the URL of a page in an iframe? </title><link href="https://simonwillison.net/2013/Aug/26/how-to-find-the/#atom-tag" rel="alternate"/><published>2013-08-26T15:22:00+00:00</published><updated>2013-08-26T15:22:00+00:00</updated><id>https://simonwillison.net/2013/Aug/26/how-to-find-the/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/How-to-find-the-URL-of-a-page-in-an-iframe/answer/Simon-Willison"&gt;How to find the URL of a page in an iframe? &lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You can't, as this would be a security and privacy violation. Imagine an evil website which loads up Google in a full page iframe and then tracks what the unsuspecting user searches for and clicks on.&lt;/p&gt;

&lt;p&gt;The only way you could achieve this is if the pages within the iframe were deliberately collaborating with you - for example, you open up a page in an iframe of &lt;span&gt;&lt;a href="http://www.example.org"&gt;www.example.org&lt;/a&gt;&lt;/span&gt;?tracker=specialcode and &lt;span&gt;&lt;a href="http://example.org"&gt;example.org&lt;/a&gt;&lt;/span&gt; has code which sends information on each subsequent page visited back to your server. Even then, the tracking would stop working once the user clicked a link to a page not hosted on &lt;span&gt;&lt;a href="http://example.org"&gt;example.org&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="urls"/><category term="quora"/></entry><entry><title>What is the most efficient way to lookup an object (e.g. a user) by only a string?</title><link href="https://simonwillison.net/2012/May/31/what-is-the-most/#atom-tag" rel="alternate"/><published>2012-05-31T17:27:00+00:00</published><updated>2012-05-31T17:27:00+00:00</updated><id>https://simonwillison.net/2012/May/31/what-is-the-most/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/What-is-the-most-efficient-way-to-lookup-an-object-e-g-a-user-by-only-a-string/answer/Simon-Willison"&gt;What is the most efficient way to lookup an object (e.g. a user) by only a string?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Yes - an index on a varchar column is exactly how you would implement this.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/mysql"&gt;mysql&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/twitter"&gt;twitter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="mysql"/><category term="twitter"/><category term="urls"/><category term="quora"/></entry><entry><title>Is there an API that returns metadata for a given URL?</title><link href="https://simonwillison.net/2012/May/31/is-there-an-api/#atom-tag" rel="alternate"/><published>2012-05-31T16:01:00+00:00</published><updated>2012-05-31T16:01:00+00:00</updated><id>https://simonwillison.net/2012/May/31/is-there-an-api/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Is-there-an-API-that-returns-metadata-for-a-given-URL/answer/Simon-Willison"&gt;Is there an API that returns metadata for a given URL?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I suggest taking a look at &lt;span&gt;&lt;a href="http://embed.ly/"&gt;http://embed.ly/&lt;/a&gt;&lt;/span&gt; - it can take a huge range of URLs and turn them in to JSON metadata. Here's what it can do with a Wikipedia page: &lt;span&gt;&lt;a href="http://embed.ly/docs/explore/objectify?maxwidth=500&amp;amp;url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FWorld_War_II"&gt;http://embed.ly/docs/explore/obj...&lt;/a&gt;&lt;/span&gt; - and here's Google Maps URL (not as useful, but still some interesting metadata extracted) &lt;span&gt;&lt;a href="http://embed.ly/docs/explore/objectify?maxwidth=500&amp;amp;url=https%3A%2F%2Fmaps.google.co.uk%2F%3Fll%3D52.643063%2C-2.076416%26spn%3D0.026769%2C0.080338%26t%3Dm%26z%3D14&amp;amp;url=https%3A%2F%2Fmaps.google.co.uk%2F%3Fll%3D52.643063%2C-2.076416%26spn%3D0.026769%2C0.080338%26t%3Dm%26z%3D14"&gt;http://embed.ly/docs/explore/obj...&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apis"&gt;apis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/web-services"&gt;web-services&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="apis"/><category term="urls"/><category term="web-services"/><category term="quora"/></entry><entry><title>How did art.sy get a ".sy" url?</title><link href="https://simonwillison.net/2012/May/31/how-did-artsy-get/#atom-tag" rel="alternate"/><published>2012-05-31T11:00:00+00:00</published><updated>2012-05-31T11:00:00+00:00</updated><id>https://simonwillison.net/2012/May/31/how-did-artsy-get/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/How-did-art-sy-get-a-sy-url/answer/Simon-Willison"&gt;How did art.sy get a &amp;quot;.sy&amp;quot; url?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Here's a generally useful tip: if you're interested in learning more about ANY top level domain, visit the Wikipedia page for it - which will be &lt;span&gt;&lt;a href="http://en.wikipedia.org/wiki/.sy"&gt;http://en.wikipedia.org/wiki/.sy&lt;/a&gt;&lt;/span&gt; in this case (just add the domain, complete with its dot prefix, directly after &lt;span&gt;&lt;a href="http://en.wikipedia.org/wiki/"&gt;en.wikipedia.org/wiki/&lt;/a&gt;&lt;/span&gt; ).&lt;/p&gt;

&lt;p&gt;The wikipedia page will tell you which country the domain refers to, who the registrar is, what restrictions there are on those domains and plenty more besides. In this case, .sy is Syria - not exactly the best country to be associating your brand with these days!&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/domains"&gt;domains&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/wikipedia"&gt;wikipedia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="domains"/><category term="urls"/><category term="wikipedia"/><category term="quora"/></entry><entry><title>Which sites have the best URL design?</title><link href="https://simonwillison.net/2012/May/31/which-sites-have-the/#atom-tag" rel="alternate"/><published>2012-05-31T09:08:00+00:00</published><updated>2012-05-31T09:08:00+00:00</updated><id>https://simonwillison.net/2012/May/31/which-sites-have-the/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Which-sites-have-the-best-URL-design/answer/Simon-Willison"&gt;Which sites have the best URL design?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;GitHub's URL design is fantastic - it's a virtually flawless mapping of Git semantics to URL space. Their basic URL structure is excellent, but they also have a bunch of neat URL hacks going on. Here are a few of my favourites:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;https://github.com/django/django/compare/4553f511557052d6f18811807ae6136f81fa86a3...master  - compare view between commits, branches or tags&lt;/li&gt;&lt;li&gt;&lt;span&gt;&lt;a href="https://github.com/django/django/blob/314d82a94e2d4270b6ec8aa4b3fc2e8ebc404f08/AUTHORS#L36-38"&gt;https://github.com/django/django...&lt;/a&gt;&lt;/span&gt; - Place #L36-38 at the end of the URL to highlight those lines of code&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;You can read more about GitHub's approach to URL design on Kyle Neath's blog here: &lt;span&gt;&lt;a href="http://warpspire.com/posts/url-design/"&gt;http://warpspire.com/posts/url-d...&lt;/a&gt;&lt;/span&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/design"&gt;design&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="design"/><category term="urls"/><category term="quora"/></entry><entry><title>When referring to our web site in publications (or Twitter or Facebook), when is it important to provide the full URL - http://www.mywebsite.com and when should you provide just the mywebsite.com?</title><link href="https://simonwillison.net/2012/May/31/when-referring-to-our/#atom-tag" rel="alternate"/><published>2012-05-31T09:07:00+00:00</published><updated>2012-05-31T09:07:00+00:00</updated><id>https://simonwillison.net/2012/May/31/when-referring-to-our/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/When-referring-to-our-web-site-in-publications-or-Twitter-or-Facebook-when-is-it-important-to-provide-the-full-URL-http-www-mywebsite-com-and-when-should-you-provide-just-the-mywebsite-com/answer/Simon-Willison"&gt;When referring to our web site in publications (or Twitter or Facebook), when is it important to provide the full URL - http://www.mywebsite.com and when should you provide just the mywebsite.com?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You have no control over how other publications refer to your site - if you're lucky, they might spell it correctly and check the link works before publishing (but I wouldn't bet on it). What you DO have control over is making sure you compensate for any mistakes they make.&lt;/p&gt;

&lt;p&gt;So, it's critical that accessing both the www. and non-www. versions of your site do the right thing. You should pick one of them as the version that works, while the other should send a 301 permanent redirect to the one that you picked. You should redirect ALL URLs, so if you've chosen to publish your site without the www. anyone who visits &lt;span&gt;&lt;a href="http://www.example.com/path/blah/?foo=bar"&gt;http://www.example.com/path/blah...&lt;/a&gt;&lt;/span&gt; is automatically redirected to &lt;span&gt;&lt;a href="http://example.com/path/blah/?foo=bar"&gt;http://example.com/path/blah/?fo...&lt;/a&gt;&lt;/span&gt; (with a 301 permanent redirect, not a 302).&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="urls"/><category term="quora"/></entry><entry><title>How did slashes become the standard path separators for URLs?</title><link href="https://simonwillison.net/2012/Feb/10/how-did-slashes-become/#atom-tag" rel="alternate"/><published>2012-02-10T14:51:00+00:00</published><updated>2012-02-10T14:51:00+00:00</updated><id>https://simonwillison.net/2012/Feb/10/how-did-slashes-become/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/How-did-slashes-become-the-standard-path-separators-for-URLs/answer/Simon-Willison"&gt;How did slashes become the standard path separators for URLs?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I'm going to take an educated guess and say it's because of unix file system conventions. Early web servers mapped the URL to a path on disk inside the document root - this is still how most static sites work today.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/internet"&gt;internet&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/w3c"&gt;w3c&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="internet"/><category term="urls"/><category term="w3c"/><category term="quora"/></entry><entry><title>How do you find the new URL of a Tumblr that has moved?</title><link href="https://simonwillison.net/2012/Jan/30/how-do-you-find/#atom-tag" rel="alternate"/><published>2012-01-30T17:32:00+00:00</published><updated>2012-01-30T17:32:00+00:00</updated><id>https://simonwillison.net/2012/Jan/30/how-do-you-find/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/How-do-you-find-the-new-URL-of-a-Tumblr-that-has-moved/answer/Simon-Willison"&gt;How do you find the new URL of a Tumblr that has moved?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;One trick that might work is to look up the old tumble in the Google cache or on &lt;span&gt;&lt;a href="http://archive.org"&gt;archive.org&lt;/a&gt;&lt;/span&gt;, then copy and paste a unique search phrase from that page and run a Google search for:&lt;/p&gt;

&lt;p&gt;"unique phrase found on page" site:&lt;span&gt;&lt;a href="http://tumblr.com"&gt;tumblr.com&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;To hopefully find the new location of the tumblr.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tumblr"&gt;tumblr&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="urls"/><category term="quora"/><category term="tumblr"/></entry><entry><title>Quoting Reed Underwood</title><link href="https://simonwillison.net/2011/Feb/10/reed/#atom-tag" rel="alternate"/><published>2011-02-10T16:56:00+00:00</published><updated>2011-02-10T16:56:00+00:00</updated><id>https://simonwillison.net/2011/Feb/10/reed/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch#c1297353079.775017"&gt;&lt;p&gt;URLs are supposed to represent resources. A web app can be a resource, and there are techniques for managing state within those. Hashbangs might be one of these. But when large web properties are converting all their links to &lt;em&gt;articles&lt;/em&gt; and other &lt;em&gt;bits of text&lt;/em&gt; (tweets/twits/whatever) into these monstrosities, it’s not innovation. It’s a huge mistake that ought to be regretted now and will certainly be regretted in the future.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch#c1297353079.775017"&gt;Reed Underwood&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hashbanghell"&gt;hashbanghell&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;&lt;/p&gt;



</summary><category term="hashbanghell"/><category term="urls"/><category term="recovered"/></entry><entry><title>Quoting Tim Bray</title><link href="https://simonwillison.net/2011/Feb/10/ongoing/#atom-tag" rel="alternate"/><published>2011-02-10T06:00:00+00:00</published><updated>2011-02-10T06:00:00+00:00</updated><id>https://simonwillison.net/2011/Feb/10/ongoing/#atom-tag</id><summary type="html">
    &lt;blockquote cite="http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch"&gt;&lt;p&gt;Before events took this bad turn, the contract represented by a link was simple: “Here’s a string, send it off to a server and the server will figure out what it identifies and send you back a representation.” Now it’s along the lines of: “Here’s a string, save the hashbang, send the rest to the server, and rely on being able to run the code the server sends you to use the hashbang to generate the representation.” Do I need to explain why this is less robust and flexible? This is what we call “tight coupling” and I thought that anyone with a Computer Science degree ought to have been taught to avoid it.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch"&gt;Tim Bray&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/hashbanghell"&gt;hashbanghell&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tim-bray"&gt;tim-bray&lt;/a&gt;&lt;/p&gt;



</summary><category term="hashbanghell"/><category term="javascript"/><category term="urls"/><category term="recovered"/><category term="tim-bray"/></entry><entry><title>Going Postel</title><link href="https://simonwillison.net/2011/Feb/9/adactio/#atom-tag" rel="alternate"/><published>2011-02-09T02:18:00+00:00</published><updated>2011-02-09T02:18:00+00:00</updated><id>https://simonwillison.net/2011/Feb/9/adactio/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://adactio.com/journal/4346/"&gt;Going Postel&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jeremy points out that one of the many disadvantages of publishing JavaScript dependent content on the Web is that a single typo can render your entire site unusable.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ajax"&gt;ajax&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jeremy-keith"&gt;jeremy-keith&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gawker"&gt;gawker&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hashbanghell"&gt;hashbanghell&lt;/a&gt;&lt;/p&gt;



</summary><category term="ajax"/><category term="jeremy-keith"/><category term="urls"/><category term="recovered"/><category term="gawker"/><category term="hashbanghell"/></entry><entry><title>Breaking the Web with hash-bangs</title><link href="https://simonwillison.net/2011/Feb/9/isolani/#atom-tag" rel="alternate"/><published>2011-02-09T02:17:00+00:00</published><updated>2011-02-09T02:17:00+00:00</updated><id>https://simonwillison.net/2011/Feb/9/isolani/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs"&gt;Breaking the Web with hash-bangs&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Mike Davies explains why Gawker’s new Ajax fragment-tastic redesign is a web architecture error of colossal proportions.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ajax"&gt;ajax&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gawker"&gt;gawker&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hashbanghell"&gt;hashbanghell&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/mike-davies"&gt;mike-davies&lt;/a&gt;&lt;/p&gt;



</summary><category term="ajax"/><category term="urls"/><category term="recovered"/><category term="gawker"/><category term="hashbanghell"/><category term="mike-davies"/></entry><entry><title>Is there a way of tracking shortened URLs with Twitter streaming API?</title><link href="https://simonwillison.net/2011/Jan/21/is-there-a-way/#atom-tag" rel="alternate"/><published>2011-01-21T09:47:00+00:00</published><updated>2011-01-21T09:47:00+00:00</updated><id>https://simonwillison.net/2011/Jan/21/is-there-a-way/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Is-there-a-way-of-tracking-shortened-URLs-with-Twitter-streaming-API/answer/Simon-Willison"&gt;Is there a way of tracking shortened URLs with Twitter streaming API?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Think about it like this: the whole point of the Twitter streaming API is to get you the tweets as soon after they are posted as possible. If the API were to provide access to the lengthened URLs, it would have to delay emitting a Tweet on to the stream until a resolver had gone through each shortened URL in the tweet and checked to find what it redirects to. This would mean that the speed with which the streaming API could deal out tweets would be dependent on the speed of the third party servers that serve up the redirects. I doubt Twitter would ever want to implement this.&lt;/p&gt;

&lt;p&gt;I believe the Twitter search API may provide an index of the lengthened version of short URLs (the search on &lt;span&gt;&lt;a href="http://Twitter.com"&gt;Twitter.com&lt;/a&gt;&lt;/span&gt; quietly started doing that a month or so ago).&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/twitter"&gt;twitter&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="twitter"/><category term="urls"/><category term="quora"/></entry><entry><title>Getting Started - Google URL Shortener API</title><link href="https://simonwillison.net/2011/Jan/13/getting/#atom-tag" rel="alternate"/><published>2011-01-13T03:49:00+00:00</published><updated>2011-01-13T03:49:00+00:00</updated><id>https://simonwillison.net/2011/Jan/13/getting/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://code.google.com/apis/urlshortener/v1/getting_started.html#analytics"&gt;Getting Started - Google URL Shortener API&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The API for the goo.gl URL shortener is really nice—no API key required, easy to create a short URL and you can retrieve detailed stats breakdowns (similar to bit.ly) as JSON for any URL.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="urls"/><category term="recovered"/></entry><entry><title>Could browsers be made to scroll down (e.g. by 67%) if you add #67% to a URL?</title><link href="https://simonwillison.net/2011/Jan/1/could-browsers-be-made/#atom-tag" rel="alternate"/><published>2011-01-01T15:07:00+00:00</published><updated>2011-01-01T15:07:00+00:00</updated><id>https://simonwillison.net/2011/Jan/1/could-browsers-be-made/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Could-browsers-be-made-to-scroll-down-e-g-by-67-if-you-add-67-to-a-URL/answer/Simon-Willison"&gt;Could browsers be made to scroll down (e.g. by 67%) if you add #67% to a URL?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I'd say no.&lt;/p&gt;

&lt;p&gt;Presumably you want this so you can link to a particular point on a page - but the liquid nature of the Web means that you have no way of guaranteeing that the sentence 67% down the page for you is the same as the sentence 67% down the page for someone else (who may have a different browser width or font size).&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/browsers"&gt;browsers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/html"&gt;html&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="browsers"/><category term="html"/><category term="urls"/><category term="quora"/></entry><entry><title>URL Design</title><link href="https://simonwillison.net/2010/Dec/31/url/#atom-tag" rel="alternate"/><published>2010-12-31T10:03:00+00:00</published><updated>2010-12-31T10:03:00+00:00</updated><id>https://simonwillison.net/2010/Dec/31/url/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://warpspire.com/posts/url-design/"&gt;URL Design&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Thoughtful tips on modern URL design, from GitHub designer Kyle Neath. GitHub has the best designed URLs of any application I can think of.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;&lt;/p&gt;



</summary><category term="github"/><category term="urls"/><category term="recovered"/></entry><entry><title>Is it a good idea to allocate URLs such as quora.com/username to users?</title><link href="https://simonwillison.net/2010/Dec/22/is-it-a-good/#atom-tag" rel="alternate"/><published>2010-12-22T15:17:00+00:00</published><updated>2010-12-22T15:17:00+00:00</updated><id>https://simonwillison.net/2010/Dec/22/is-it-a-good/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Is-it-a-good-idea-to-allocate-URLs-such-as-quora-com-username-to-users/answer/Simon-Willison"&gt;Is it a good idea to allocate URLs such as quora.com/username to users?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;There's an interesting discussion about this issue on this question: &lt;span&gt;&lt;a href="https://www.quora.com/How-do-sites-prevent-vanity-URLs-from-colliding-with-future-features"&gt;How do sites prevent vanity URLs from colliding with future features ?&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/seo"&gt;seo&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="seo"/><category term="urls"/><category term="quora"/></entry><entry><title>Spacelog: space exploration stories from the original transcripts</title><link href="https://simonwillison.net/2010/Dec/10/spacelog/#atom-tag" rel="alternate"/><published>2010-12-10T10:07:00+00:00</published><updated>2010-12-10T10:07:00+00:00</updated><id>https://simonwillison.net/2010/Dec/10/spacelog/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://spacelog.org/"&gt;Spacelog: space exploration stories from the original transcripts&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The product of the most recent /dev/fort outing—a beautiful, web-native interface for browsing the NASA transcripts from the Apollo 13 and Mercury 6 missions (more to come). Every key moment has a URL.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/devfort"&gt;devfort&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/space"&gt;space&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;&lt;/p&gt;



</summary><category term="devfort"/><category term="space"/><category term="urls"/><category term="recovered"/></entry><entry><title>Porting Flickr to YUI 3: Lessons in Performance (at YUIConf 2010)</title><link href="https://simonwillison.net/2010/Nov/10/porting/#atom-tag" rel="alternate"/><published>2010-11-10T18:33:00+00:00</published><updated>2010-11-10T18:33:00+00:00</updated><id>https://simonwillison.net/2010/Nov/10/porting/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://lanyrd.com/2010/yuiconf/spdm/"&gt;Porting Flickr to YUI 3: Lessons in Performance (at YUIConf 2010)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Some very interesting tips here. The new Flickr photo pages suffered from what I’ve been calling “Flash of Un-Behavioured Content”, where slow loading JavaScript results in poor behaviour from some UI controls. They started using “Action Queueing”, where a small JS stub ensures a loading indicator is shown for clicks on features that have not yet fully loaded. Also, it turns out some corporate firewalls (Sonicwall in particular) dislike URLs over 1600 characters, and filter out any URL with xxx in it.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/flickr"&gt;flickr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/javascript"&gt;javascript&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/urls"&gt;urls&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/yui"&gt;yui&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;&lt;/p&gt;



</summary><category term="flickr"/><category term="javascript"/><category term="urls"/><category term="yui"/><category term="recovered"/></entry></feed>