Simon Willison's Webloghttp://simonwillison.net/2024-03-19T02:18:59+00:00Simon WillisonThe Tokenizer Playground2024-03-19T02:18:59+00:002024-03-19T02:18:59+00:00https://simonwillison.net/2024/Mar/19/the-tokenizer-playground/#atom-everything
<p><a href="https://huggingface.co/spaces/Xenova/the-tokenizer-playground">The Tokenizer Playground</a></p>
<p>I built a tool like this a while ago, but this one is much better: it provides an interface for experimenting with tokenizers from a wide range of model architectures, including Llama, Claude, Mistral and Grok-1 - all running in the browser using Transformers.js.</p>
<p>Via <a href="https://twitter.com/xenovacom/status/1769546095871287423">@xenovacom</a></p>
900 Sites, 125 million accounts, 1 vulnerability2024-03-18T18:53:23+00:002024-03-18T18:53:23+00:00https://simonwillison.net/2024/Mar/18/firebase/#atom-everything
<p><a href="https://env.fail/posts/firewreck-1/">900 Sites, 125 million accounts, 1 vulnerability</a></p>
<p>Google's Firebase development platform encourages building applications (mobile an web) which talk directly to the underlying data store, reading and writing from "collections" with access protected by Firebase Security Rules.</p>
<p>Unsurprisingly, a lot of development teams make mistakes with these.</p>
<p>This post describes how a security research team built a scanner that found over 124 million unprotected records across 900 different applications, including huge amounts of PII: 106 million email addresses, 20 million passwords (many in plaintext) and 27 million instances of "Bank details, invoices, etc".</p>
<p>Most worrying of all, only 24% of the site owners they contacted shipped a fix for the misconfiguration.</p>
<p>Via <a href="https://news.ycombinator.com/item?id=39742422">Hacker News</a></p>
Quoting Geoffrey Litt2024-03-18T18:16:09+00:002024-03-18T18:16:09+00:00https://simonwillison.net/2024/Mar/18/geoffrey-litt/#atom-everything
<blockquote cite="https://twitter.com/geoffreylitt/status/1769471002755338553"><p>It's hard to overstate the value of LLM support when coding for fun in an unfamiliar language. [...] This example is totally trivial in hindsight, but might have taken me a couple mins to figure out otherwise. This is a bigger deal than it seems! Papercuts add up fast and prevent flow. (A lot of being a senior engineer is just being proficient enough to avoid papercuts).</p></blockquote><p class="cite">— <a href="https://twitter.com/geoffreylitt/status/1769471002755338553">Geoffrey Litt</a>
Grok-1 code and model weights release2024-03-17T20:20:13+00:002024-03-17T20:20:13+00:00https://simonwillison.net/2024/Mar/17/grok-1/#atom-everything
<p><a href="https://github.com/xai-org/grok">Grok-1 code and model weights release</a></p>
<p>xAI have released their Grok-1 model under an Apache 2 license (for both weights and code). It's distributed as a 318.24G torrent file and likely requires 320GB of VRAM to run, so needs some very hefty hardware.</p>
<p>The accompanying blog post (via link) says "Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023", and describes it as a "314B parameter Mixture-of-Experts model with 25% of the weights active on a given token".</p>
<p>Very little information on what it was actually trained on, all we know is that it was "a large amount of text data, not fine-tuned for any particular task".</p>
<p>Via <a href="https://x.ai/blog/grok-os">Open Release of Grok-1</a></p>
Add ETag header for static responses2024-03-17T19:25:34+00:002024-03-17T19:25:34+00:00https://simonwillison.net/2024/Mar/17/add-etag-header-for-static-responses/#atom-everything
<p><a href="https://github.com/simonw/datasette/pull/2306">Add ETag header for static responses</a></p>
<p>I've been procrastinating on adding better caching headers for static assets (JavaScript and CSS) served by Datasette for several years, because I've been wanting to implement the perfect solution that sets far-future cache headers on every asset and ensures the URLs change when they are updated.</p>
<p>Agustin Bacigalup just submitted the best kind of pull request: he observed that adding ETag support for static assets would side-step the complexity while adding much of the benefit, and implemented it along with tests.</p>
<p>It's a substantial performance improvement for any Datasette instance with a number of JavaScript plugins... like the ones we are building on Datasette Cloud. I'm just annoyed we didn't ship something like this sooner!</p>
How does SQLite store data?2024-03-17T18:47:53+00:002024-03-17T18:47:53+00:00https://simonwillison.net/2024/Mar/17/how-does-sqlite-store-data/#atom-everything
<p><a href="https://michalpitr.substack.com/p/how-does-sqlite-store-data">How does SQLite store data?</a></p>
<p>Michal Pitr explores the design of the SQLite on-disk file format, as part of building an educational implementation of SQLite from scratch in Go.</p>
Weeknotes: the aftermath of NICAR2024-03-16T18:36:12+00:002024-03-16T18:36:12+00:00https://simonwillison.net/2024/Mar/16/weeknotes-the-aftermath-of-nicar/#atom-everything
<p><a href="https://schedules.ire.org/nicar-2024/index.html">NICAR</a> was fantastic this year. Alex and I ran <a href="https://github.com/datasette/nicar-2024-datasette">a successful workshop</a> on Datasette and Datasette Cloud, and I gave a lightning talk demonstrating two new GPT-4 powered Datasette plugins - <a href="https://datasette.io/plugins/datasette-enrichments-gpt">datasette-enrichments-gpt</a> and <a href="https://datasette.io/plugins/datasette-extract">datasette-extract</a>. I need to write more about the latter one: it enables populating tables from unstructured content (using a variant of <a href="https://til.simonwillison.net/gpt3/openai-python-functions-data-extraction">this technique</a>) and it's really effective. I got it working just in time for the conference.</p>
<p>I also solved the conference follow-up problem! I've long suffered from poor habits in dropping the ball on following up with people I meet at conferences. This time I used a trick I first learned at a YC demo day many years ago: if someone says they'd like to follow up, get out a calendar and book a future conversation with them right there on the spot.</p>
<p>I have a bunch of exciting conversations lined up over the next few weeks thanks to that, with a variety of different sizes of newsrooms who are either using or want to use Datasette.</p>
<h4 id="action-menus">Action menus in the Datasette 1.0 alphas</h4>
<p>I released two new Datasette 1.0 alphas in the run-up to NICAR: <a href="https://docs.datasette.io/en/latest/changelog.html#a12-2024-02-29">1.0a12</a> and <a href="https://docs.datasette.io/en/latest/changelog.html#changelog">1.0a13</a>.</p>
<p>The main theme of these two releases was improvements to Datasette's "action buttons".</p>
<p>Datasette plugins have long been able to register additional menu items that should be shown on the database and table pages. These were previously hidden behind a "cog" icon in the title of the page - once clicked it would reveal a menu of extra actions.</p>
<p>The cog wasn't discoverable enough, and felt too much like mystery meat navigation. I decided to turn it into a much more clear button.</p>
<p>Here's a GIF showing that new button in action across several different pages on Datasette Cloud (which has a bunch of plugins that use it):</p>
<p><img src="https://static.simonwillison.net/static/2024/action-buttons.gif" alt="Animation starts on the page for the content database. A database actions blue button is clicked, revealing a menu of items such as Upload CSVs and Execute SQL Write. On a table page the button is called Table actions and has options such as Delete table. Executing a SQL query shows a Query actions button with an option to Create SQL view from this query." style="max-width: 100%;" /></p>
<p>Prior to 1.0a12 Datasette had plugin hooks for just the database and table actions menus. I've added four more:</p>
<ul>
<li>
<a href="https://docs.datasette.io/en/latest/plugin_hooks.html#query-actions-datasette-actor-database-query-name-request-sql-params">query_actions()</a> for actions that apply to the query results page. (<a href="https://github.com/simonw/datasette/issues/2283">#2283</a>)</li>
<li>
<a href="https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-hook-view-actions">view_actions()</a> for actions that can be applied to a SQL view. (<a href="https://github.com/simonw/datasette/issues/2297">#2297</a>)</li>
<li>
<a href="https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-hook-row-actions">row_actions()</a> for actions that apply to the row page. (<a href="https://github.com/simonw/datasette/issues/2299">#2299</a>)</li>
<li>
<a href="https://docs.datasette.io/en/latest/plugin_hooks.html#plugin-hook-homepage-actions">homepage_actions()</a> for actions that apply to the instance homepage. (<a href="https://github.com/simonw/datasette/issues/2298">#2298</a>)</li>
</ul>
<p>Menu items can now also include an optional description, which is displayed below their label in the actions menu.</p>
<h4 id="always-dns">It's always DNS</h4>
<p>This site was offline for 24 hours this week due to a DNS issue. Short version: while I've been paying close attention to the management of domains I've bought in the past few years (<a href="https://datasette.io/">datasette.io</a>, <a href="https://www.datasette.cloud/">datasette.cloud</a> etc) I hadn't been paying attention to <code>simonwillison.net</code>.</p>
<p>... until it turned out I had it on a registrar with an old email address that I no longer had access to, and the domain was switched into "parked" mode because I had failed to pay for renewal!</p>
<p>(I haven't confirmed this yet but I think I may have paid for a ten year renewal at some point, which gives you a full decade to lose track of how it's being paid for.)</p>
<p>I'll give credit to <a href="https://www.123-reg.co.uk/">123-reg</a> (these days a subsidiary of GoDaddy) - they have a <a href="https://www.123-reg.co.uk/support/domains/what-is-the-domain-recovery-period-and-how-can-i-restore-my-domain-names/">well documented domain recovery policy</a> and their support team got me back in control reasonably promptly - only slightly delayed by their UK-based account recovery team operating in a timezone separate from my own.</p>
<p>I registered <code>simonwillison.org</code> and configured that and <code>til.simonwillison.org</code> during the blackout, mainly because it turns out I refer back to my own written content a whole lot during my regular work! Once <code>.net</code> came back I <a href="https://til.simonwillison.net/cloudflare/redirect-whole-domain">set up redirects using Cloudflare</a>.</p>
<p>Thankfully I don't usually use my domain for my personal email, or sorting this out would have been a whole lot more painful.</p>
<p>The most inconvenient impact was Mastodon: I run my own instance at <a href="https://fedi.simonwillison.net/">fedi.simonwillison.net</a> (<a href="https://til.simonwillison.net/mastodon/custom-domain-mastodon">previously</a>) and losing DNS broke everything, both my ability to post but also my ability to even read posts on my timeline.</p>
<h4 id="weeknotes-16-mar-blog-entries">Blog entries</h4>
<p>I published three articles since my last weeknotes:</p>
<ul>
<li><a href="https://simonwillison.net/2024/Mar/8/gpt-4-barrier/">The GPT-4 barrier has finally been broken</a></li>
<li><a href="https://simonwillison.net/2024/Mar/5/prompt-injection-jailbreaking/">Prompt injection and jailbreaking are not the same thing</a></li>
<li><a href="https://simonwillison.net/2024/Mar/3/interesting-ideas-in-observable-framework/">Interesting ideas in Observable Framework</a></li>
</ul>
<h4 id="weeknotes-16-mar-blog-releases">Releases</h4>
<p>I have released <em>so much stuff</em> recently. A lot of this was in preparation for NICAR - I wanted to polish all sorts of corners of Datasette Cloud, which is itself a huge bundle of pre-configured Datasette plugins. A lot of those plugins got a bump!</p>
<p>A few releases deserve a special mention:</p>
<ul>
<li>
<a href="https://datasette.io/plugins/datasette-extract">datasette-extract</a>, hinted at above, is a new plugin that enables tables in Datasette to be populated from unstructured data in pasted text or images.</li>
<li>
<a href="https://datasette.io/plugins/datasette-export-database">datasette-export-database</a> provides a way to export a current snapshot of a SQLite database from Datasette - something that previously wasn't safe to do for databases that were accepting writes. It works by kicking off a background process to use <code>VACUUM INTO</code> in SQLite to create a temporary file with a transactional snapshot of the database state, then lets the user download that file.</li>
<li>
<a href="https://github.com/simonw/llm-claude-3">llm-claude-3</a> provides access to the new Claude 3 models from my <a href="https://llm.datasette.io/">LLM</a> tool. These models are really exciting: Opus feels better than GPT-4 at most things I've thrown at it, and Haiku is both slightly cheaper than GPT-3.5 Turbo and provides image input support at the lowest price point I've seen anywhere.</li>
<li>
<a href="https://datasette.io/plugins/datasette-create-view">datasette-create-view</a> is a new plugin that helps you create a SQL view from a SQL query. I shipped the new <a href="https://docs.datasette.io/en/latest/plugin_hooks.html#query-actions-datasette-actor-database-query-name-request-sql-params">query_actions()</a> plugin hook to make this possible.</li>
</ul>
<p>Here's the full list of recent releases:</p>
<ul>
<li>
<strong><a href="https://github.com/simonw/datasette-packages/releases/tag/0.2.1">datasette-packages 0.2.1</a></strong> - 2024-03-16<br />Show a list of currently installed Python packages</li>
<li>
<strong><a href="https://github.com/datasette/datasette-export-database/releases/tag/0.2.1">datasette-export-database 0.2.1</a></strong> - 2024-03-16<br />Export a copy of a mutable SQLite database on demand</li>
<li>
<strong><a href="https://github.com/simonw/datasette-configure-fts/releases/tag/1.1.3">datasette-configure-fts 1.1.3</a></strong> - 2024-03-14<br />Datasette plugin for enabling full-text search against selected table columns</li>
<li>
<strong><a href="https://github.com/simonw/datasette-upload-csvs/releases/tag/0.9.1">datasette-upload-csvs 0.9.1</a></strong> - 2024-03-14<br />Datasette plugin for uploading CSV files and converting them to database tables</li>
<li>
<strong><a href="https://github.com/simonw/datasette-write/releases/tag/0.3.1">datasette-write 0.3.1</a></strong> - 2024-03-14<br />Datasette plugin providing a UI for executing SQL writes against the database</li>
<li>
<strong><a href="https://github.com/simonw/datasette-edit-schema/releases/tag/0.8a1">datasette-edit-schema 0.8a1</a></strong> - 2024-03-14<br />Datasette plugin for modifying table schemas</li>
<li>
<strong><a href="https://github.com/simonw/llm-claude-3/releases/tag/0.3">llm-claude-3 0.3</a></strong> - 2024-03-13<br />LLM plugin for interacting with the Claude 3 family of models</li>
<li>
<strong><a href="https://github.com/datasette/datasette-extract/releases/tag/0.1a3">datasette-extract 0.1a3</a></strong> - 2024-03-13<br />Import unstructured data (text and images) into structured tables</li>
<li>
<strong><a href="https://github.com/simonw/datasette/releases/tag/1.0a13">datasette 1.0a13</a></strong> - 2024-03-13<br />An open source multi-tool for exploring and publishing data</li>
<li>
<strong><a href="https://github.com/datasette/datasette-enrichments-quickjs/releases/tag/0.1a1">datasette-enrichments-quickjs 0.1a1</a></strong> - 2024-03-09<br />Enrich data with a custom JavaScript function</li>
<li>
<strong><a href="https://github.com/simonw/dclient/releases/tag/0.4">dclient 0.4</a></strong> - 2024-03-08<br />A client CLI utility for Datasette instances</li>
<li>
<strong><a href="https://github.com/simonw/datasette-saved-queries/releases/tag/0.2.2">datasette-saved-queries 0.2.2</a></strong> - 2024-03-07<br />Datasette plugin that lets users save and execute queries</li>
<li>
<strong><a href="https://github.com/datasette/datasette-create-view/releases/tag/0.1">datasette-create-view 0.1</a></strong> - 2024-03-07<br />Create a SQL view from a query</li>
<li>
<strong><a href="https://github.com/simonw/pypi-to-sqlite/releases/tag/0.2.3">pypi-to-sqlite 0.2.3</a></strong> - 2024-03-06<br />Load data about Python packages from PyPI into SQLite</li>
<li>
<strong><a href="https://github.com/datasette/datasette-uptime/releases/tag/0.1.1">datasette-uptime 0.1.1</a></strong> - 2024-03-06<br />Datasette plugin showing uptime at /-/uptime</li>
<li>
<strong><a href="https://github.com/datasette/datasette-sqlite-authorizer/releases/tag/0.2">datasette-sqlite-authorizer 0.2</a></strong> - 2024-03-05<br />Configure Datasette to block operations using the SQLIte set_authorizer mechanism</li>
<li>
<strong><a href="https://github.com/datasette/datasette-sqlite-debug-authorizer/releases/tag/0.1.1">datasette-sqlite-debug-authorizer 0.1.1</a></strong> - 2024-03-05<br />Debug SQLite authorizer calls</li>
<li>
<strong><a href="https://github.com/simonw/datasette-expose-env/releases/tag/0.2">datasette-expose-env 0.2</a></strong> - 2024-03-03<br />Datasette plugin to expose selected environment variables at /-/env for debugging</li>
<li>
<strong><a href="https://github.com/datasette/datasette-tail/releases/tag/0.1a0">datasette-tail 0.1a0</a></strong> - 2024-03-01<br />Tools for tailing your database</li>
<li>
<strong><a href="https://github.com/datasette/datasette-column-sum/releases/tag/0.1a0">datasette-column-sum 0.1a0</a></strong> - 2024-03-01<br />Sum the values in numeric Datasette columns</li>
<li>
<strong><a href="https://github.com/simonw/datasette-schema-versions/releases/tag/0.3">datasette-schema-versions 0.3</a></strong> - 2024-03-01<br />Datasette plugin that shows the schema version of every attached database</li>
<li>
<strong><a href="https://github.com/datasette/datasette-studio/releases/tag/0.1a1">datasette-studio 0.1a1</a></strong> - 2024-02-29<br />Datasette pre-configured with useful plugins. Experimental alpha.</li>
<li>
<strong><a href="https://github.com/simonw/datasette-scale-to-zero/releases/tag/0.3.1">datasette-scale-to-zero 0.3.1</a></strong> - 2024-02-29<br />Quit Datasette if it has not received traffic for a specified time period</li>
<li>
<strong><a href="https://github.com/simonw/datasette-explain/releases/tag/0.2.1">datasette-explain 0.2.1</a></strong> - 2024-02-28<br />Explain and validate SQL queries as you type them into Datasette</li>
</ul>
<h4 id="weeknotes-16-mar-blog-tils">TILs</h4>
<ul>
<li>
<a href="https://til.simonwillison.net/cloudflare/redirect-whole-domain">Redirecting a whole domain with Cloudflare</a> - 2024-03-15</li>
<li>
<a href="https://til.simonwillison.net/sqlite/floating-point-seconds">SQLite timestamps with floating point seconds</a> - 2024-03-14</li>
<li>
<a href="https://til.simonwillison.net/google/gmail-compose-url">Generating URLs to a Gmail compose window</a> - 2024-03-13</li>
<li>
<a href="https://til.simonwillison.net/javascript/jsr-esbuild">Using packages from JSR with esbuild</a> - 2024-03-02</li>
</ul>
Quoting Leopold Aschenbrenner, OpenAI2024-03-16T15:23:55+00:002024-03-16T15:23:55+00:00https://simonwillison.net/2024/Mar/16/leopold-aschenbrenner/#atom-everything
<blockquote cite="https://twitter.com/leopoldasch/status/1768868127138549841"><p>One year since GPT-4 release. Hope you all enjoyed some time to relax; it’ll have been the slowest 12 months of AI progress for quite some time to come.</p></blockquote><p class="cite">— <a href="https://twitter.com/leopoldasch/status/1768868127138549841">Leopold Aschenbrenner, OpenAI</a>
npm install everything, and the complete and utter chaos that follows2024-03-16T05:18:51+00:002024-03-16T05:18:51+00:00https://simonwillison.net/2024/Mar/16/npm-install-everything/#atom-everything
<p><a href="https://boehs.org/node/npm-everything">npm install everything, and the complete and utter chaos that follows</a></p>
<p>Here's an experiment which went really badly wrong: a team of mostly-students decided to see if it was possible to install every package from npm (all 2.5 million of them) on the same machine. As part of that experiment they created and published their own npm package that depended on every other package in the registry.</p>
<p>Unfortunately, in response to the leftpad incident a few years ago npm had introduced a policy that a package cannot be removed from the registry if there exists at least one other package that lists it as a dependency. The new "everything" package inadvertently prevented all 2.5m packages - including many that had no other dependencies - from ever being removed!</p>
<p>Via <a href="https://lobste.rs/s/46dgy1/npm_install_everything_complete_utter">lobste.rs</a></p>
Phanpy2024-03-16T01:34:04+00:002024-03-16T01:34:04+00:00https://simonwillison.net/2024/Mar/16/phanpy/#atom-everything
<p><a href="https://phanpy.social/">Phanpy</a></p>
<p>Phanpy is "a minimalistic opinionated Mastodon web client" by Chee Aun.</p>
<p>I think that description undersells it. It's beautifully crafted and designed and has a ton of innovative ideas - they way it displays threads and replies, the "Catch-up" beta feature, it's all a really thoughtful and fresh perspective on how Mastodon can work.</p>
<p>I love that all Mastodon servers (including my own dedicated instance) offer a CORS-enabled JSON API which directly supports building these kinds of alternative clients.</p>
<p>Building a full-featured client like this one is a huge amount of work, but building a much simpler client that just displays the user's incoming timeline could be a pretty great educational project for people who are looking to deepen their front-end development skills.</p>
Google Scholar search: "certainly, here is" -chatgpt -llm2024-03-15T13:43:58+00:002024-03-15T13:43:58+00:00https://simonwillison.net/2024/Mar/15/certainly-here-is-google-scholar/#atom-everything
<p><a href="https://scholar.google.fr/scholar?hl=fr&as_sdt=0%2C5&as_ylo=2023&q=%22certainly%2C+here+is%22+-chatgpt+-llm&oq=%22certainly+here+is%22+-chatgpt+-llm">Google Scholar search: "certainly, here is" -chatgpt -llm</a></p>
<p>Searching Google Scholar for "certainly, here is" turns up a huge number of academic papers that include parts that were evidently written by ChatGPT - sections that start with "Certainly, here is a concise summary of the provided sections:" are a dead giveaway.</p>
<p>Via <a href="https://twitter.com/emollick/status/1768526138614186026">Ethan Mollick</a></p>
Advanced Topics in Reminders and To Do Lists2024-03-15T02:38:55+00:002024-03-15T02:38:55+00:00https://simonwillison.net/2024/Mar/15/advanced-topics-in-reminders-and-to-do-lists/#atom-everything
<p><a href="https://fredbenenson.medium.com/advanced-topics-in-reminders-and-to-do-lists-c5edec286670">Advanced Topics in Reminders and To Do Lists</a></p>
<p>Fred Benenson's advanced guide to the Apple Reminders ecosystem. I live my life by Reminders - I particularly like that you can set them with Siri, so "Hey Siri, remind me to check the chickens made it to bed at 7pm every evening" sets up a recurring reminder without having to fiddle around in the UI. Fred has some useful tips here I hadn't seen before.</p>
How Figma’s databases team lived to tell the scale2024-03-14T21:23:37+00:002024-03-14T21:23:37+00:00https://simonwillison.net/2024/Mar/14/how-figmas-databases-team-lived-to-tell-the-scale/#atom-everything
<p><a href="https://www.figma.com/blog/how-figmas-databases-team-lived-to-tell-the-scale/">How Figma’s databases team lived to tell the scale</a></p>
<p>The best kind of scaling war story:</p>
<p>"Figma’s database stack has grown almost 100x since 2020. [...] In 2020, we were running a single Postgres database hosted on AWS’s largest physical instance, and by the end of 2022, we had built out a distributed architecture with caching, read replicas, and a dozen vertically partitioned databases."</p>
<p>I like the concept of "colos", their internal name for sharded groups of related tables arranged such that those tables can be queried using joins.</p>
<p>Also smart: separating the migration into "logical sharding" - where queries all still run against a single database, even though they are logically routed as if the database was already sharded - followed by "physical sharding" where the data is actually copied to and served from the new database servers.</p>
<p>Logical sharding was implemented using PostgreSQL views, which can accept both reads and writes:</p>
<p>CREATE VIEW table_shard1 AS SELECT * FROM table<br>WHERE hash(shard_key) >= min_shard_range AND hash(shard_key) < max_shard_range)</p>
<p>The final piece of the puzzle was DBProxy, a custom PostgreSQL query proxy written in Go that can parse the query to an AST and use that to decide which shard the query should be sent to. Impressively it also has a scatter-gather mechanism, so "select * from table" can be sent to all shards at once and the results combined back together again.</p>
<p>Via <a href="https://news.ycombinator.com/item?id=39706968">Hacker News</a></p>
Lateral Thinking with Withered Technology2024-03-14T04:13:57+00:002024-03-14T04:13:57+00:00https://simonwillison.net/2024/Mar/14/lateral-thinking-with-weathered-technology/#atom-everything
<p><a href="https://en.wikipedia.org/wiki/Gunpei_Yokoi#Lateral_Thinking_with_Withered_Technology">Lateral Thinking with Withered Technology</a></p>
<p>Gunpei Yokoi's product design philosophy at Nintendo ("Withered" is also sometimes translated as "Weathered"). Use "mature technology that can be mass-produced cheaply", then apply lateral thinking to find radical new ways to use it.</p>
<p>This has echos for me of Dan McKinley's "Choose Boring Technology", which argues that in software projects you should default to a proven, stable stack so you can focus your innovation tokens on the problems that are unique to your project.</p>
Guidepup2024-03-14T04:07:49+00:002024-03-14T04:07:49+00:00https://simonwillison.net/2024/Mar/14/guidepup/#atom-everything
<p><a href="https://github.com/guidepup/guidepup">Guidepup</a></p>
<p>I've been hoping to find something like this for years. Guidepup is "a screen reader driver for test automation" - you can use it to automate both VoiceOver on macOS and NVDA on Windows, and it can both drive the screen reader for automated tests and even produce a video at the end of the test.</p>
<p>Also available: @guidepup/playwright, providing integration with the Playwright browser automation testing framework.</p>
<p>I'd love to see open source JavaScript libraries both use something like this for their testing and publish videos of the tests to demonstrate how they work in these common screen readers.</p>
llm-claude-3 0.32024-03-13T21:18:28+00:002024-03-13T21:18:28+00:00https://simonwillison.net/2024/Mar/13/llm-claude-3-03/#atom-everything
<p><a href="https://github.com/simonw/llm-claude-3/releases/tag/0.3">llm-claude-3 0.3</a></p>
<p>Anthropic released Claude 3 Haiku today, their least expensive model: $0.25/million tokens of input, $1.25/million of output (GPT-3.5 Turbo is $0.50/$1.50). Unlike GPT-3.5 Haiku also supports image inputs.</p>
<p>I just released a minor update to my llm-claude-3 LLM plugin adding support for the new model.</p>
Berkeley Function-Calling Leaderboard2024-03-13T17:26:18+00:002024-03-13T17:26:18+00:00https://simonwillison.net/2024/Mar/13/berkeley-function-calling-leaderboard/#atom-everything
<p><a href="https://gorilla.cs.berkeley.edu/leaderboard.html">Berkeley Function-Calling Leaderboard</a></p>
<p>The team behind Berkeley's Gorilla OpenFunctions model - an Apache 2 licensed LLM trained to provide OpenAI-style structured JSON functions - also maintain a leaderboard of different function-calling models. Their own Gorilla model is the only non-proprietary model in the top ten.</p>
Quoting Phillip Carter2024-03-13T15:02:48+00:002024-03-13T15:02:48+00:00https://simonwillison.net/2024/Mar/13/phillip-carter/#atom-everything
<blockquote cite="https://twitter.com/_cartermp/status/1767923038404985115"><p>The talk track I've been using is that LLMs are easy to take to market, but hard to keep in the market long-term. All the hard stuff comes when you move past the demo and get exposure to real users.<br><br>And that's where you find that all the nice little things you got neatly working fall apart. And you need to prompt differently, do different retrieval, consider fine-tuning, redesign interaction, etc. People will treat this stuff differently from "normal" products, creating unique challenges.</p></blockquote><p class="cite">— <a href="https://twitter.com/_cartermp/status/1767923038404985115">Phillip Carter</a>
pywebview 52024-03-13T14:15:46+00:002024-03-13T14:15:46+00:00https://simonwillison.net/2024/Mar/13/pywebview-5/#atom-everything
<p><a href="https://pywebview.flowrl.com/blog/pywebview5.html">pywebview 5</a></p>
<p>pywebview is a library for building desktop (and now Android) applications using Python, based on the idea of displaying windows that use the system default browser to display an interface to the user - styled such that the fact they run on HTML, CSS and JavaScript is mostly hidden from the end-user.</p>
<p>It's a bit like a much simpler version of Electron. Unlike Electron it doesn't bundle a full browser engine (Electron bundles Chromium), which reduces the size of the dependency a lot but does mean that cross-browser differences (quite rare these days) do come back into play.</p>
<p>I tried out their getting started example and it's very pleasant to use - import webview, create a window and then start the application loop running to display it.</p>
<p>You can register JavaScript functions that call back to Python, and you can execute JavaScript in a window from your Python code.</p>
<p>Via <a href="https://news.ycombinator.com/item?id=39665828">Show HN</a></p>
The Bing Cache thinks GPT-4.5 is coming2024-03-13T02:29:13+00:002024-03-13T02:29:13+00:00https://simonwillison.net/2024/Mar/13/the-bing-cache-thinks-gpt-45-is-coming/#atom-everything
<p><a href="https://twitter.com/TheXeophon/status/1767586070047203680">The Bing Cache thinks GPT-4.5 is coming</a></p>
<p>I was able to replicate this myself earlier today: searching Bing (or apparently Duck Duck Go) for "openai announces gpt-4.5 turbo" would return a link to a 404 page at openai.com/blog/gpt-4-5-turbo with a search result page snippet that announced 256,000 tokens and knowledge cut-off of June 2024</p>
<p>I thought the knowledge cut-off must have been a hallucination, but someone got a screenshot of it showing up in the search engine snippet which would suggest that it was real text that got captured in a cache somehow.</p>
<p>I guess this means we might see GPT 4.5 in June then? I have trouble believing that OpenAI would release a model in June with a June knowledge cut-off, given how much time they usually spend red-teaming their models before release.</p>
<p>Or maybe it was one of those glitches like when a newspaper accidentally publishes a pre-written obituary for someone who hasn't died yet - OpenAI may have had a draft post describing a model that doesn't exist yet and it accidentally got exposed to search crawlers.</p>
Astro DB2024-03-12T18:02:13+00:002024-03-12T18:02:13+00:00https://simonwillison.net/2024/Mar/12/astro-db/#atom-everything
<p><a href="https://astro.build/db/">Astro DB</a></p>
<p>A new scale-to-zero hosted SQLite offering, described as "A fully-managed SQL database designed exclusively for Astro". It's built on top of LibSQL, the SQLite fork maintained by the Turso database team.</p>
<p>Astro DB encourages defining your tables with TypeScript, and querying them via the Drizzle ORM.</p>
<p>Running Astro locally uses a local SQLite database. Deployed to Astro Cloud switches to their DB product, where the free tier currently includes 1GB of storage, one billion row reads per month and one million row writes per month.</p>
<p>Astro itself is a "web framework for content-driven websites" - so hosted SQLite is a bit of an unexpected product from them, though it does broadly fit the ecosystem they are building.</p>
<p>This approach reminds me of how Deno K/V works - another local SQLite storage solution that offers a proprietary cloud hosted option for deployment.</p>
gh-116167: Allow disabling the GIL with PYTHON_GIL=0 or -X gil=02024-03-12T05:40:41+00:002024-03-12T05:40:41+00:00https://simonwillison.net/2024/Mar/12/allow-disabling-the-gil/#atom-everything
<p><a href="https://github.com/python/cpython/pull/116338">gh-116167: Allow disabling the GIL with PYTHON_GIL=0 or -X gil=0</a></p>
<p>Merged into python:main 14 hours ago. Looks like the first phase of Sam Gross's phenomenal effort to provide a GIL free Python (here via an explicit opt-in) will ship in Python 3.13.</p>
Speedometer 3.0: The Best Way Yet to Measure Browser Performance2024-03-12T04:26:06+00:002024-03-12T04:26:06+00:00https://simonwillison.net/2024/Mar/12/speedometer-30/#atom-everything
<p><a href="https://webkit.org/blog/15131/speedometer-3-0-the-best-way-yet-to-measure-browser-performance/">Speedometer 3.0: The Best Way Yet to Measure Browser Performance</a></p>
<p>The new browser performance testing suite, released as a collaboration between Blink, Gecko, and WebKit. It's fun to run this in your browser and watch it rattle through 580 tests written using a wide variety of modern JavaScript frameworks and visualization libraries.</p>
NICAR 2024 Tipsheets & Audio2024-03-11T01:14:39+00:002024-03-11T01:14:39+00:00https://simonwillison.net/2024/Mar/11/nicar-2024-tipsheets-audio/#atom-everything
<p><a href="https://www.ire.org/training/conferences/nicar-2024/nicar24-tipsheets-audio/">NICAR 2024 Tipsheets & Audio</a></p>
<p>The NICAR data journalism conference was outstanding this year: ~1100 attendees, and every slot on the schedule had at least 2 sessions that I wanted to attend (and usually a lot more).</p>
<p>If you're interested in the intersection of data analysis and journalism it really should be a permanent fixture on your calendar, it's fantastic.</p>
<p>Here's the official collection of handouts (NICAR calls them tipsheets) and audio recordings from this year's event.</p>
S3 is files, but not a filesystem2024-03-10T11:47:34+00:002024-03-10T11:47:34+00:00https://simonwillison.net/2024/Mar/10/s3-is-not-a-filesystem/#atom-everything
<p><a href="https://calpaterson.com/s3.html">S3 is files, but not a filesystem</a></p>
<p>Cal Paterson helps some concepts click into place for me: S3 imitates a file system but has a number of critical missing features, the most important of which is the lack of partial updates. Any time you want to modify even a few bytes in a file you have to upload and overwrite the entire thing. Almost every database system is dependent on partial updates to function, which is why there are so few databases that can use S3 directly as a backend storage mechanism.</p>
<p>Via <a href="https://lobste.rs/s/t9d5z4/s3_is_files_not_filesystem">Lobste.rs</a></p>
datasette/studio2024-03-10T03:03:42+00:002024-03-10T03:03:42+00:00https://simonwillison.net/2024/Mar/10/datasette-studio-on-codespaces/#atom-everything
<p><a href="https://github.com/datasette/studio">datasette/studio</a></p>
<p>I'm trying a new way to make Datasette available for small personal data manipulation projects, using GitHub Codespaces.</p>
<p>This repository is designed to be opened directly in Codespaces - detailed instructions in the README.</p>
<p>When the container starts it installs the datasette-studio family of plugins - including CSV upload, some enrichments and a few other useful feature - then starts the server running and provides a big green button to click to access the server via GitHub's port forwarding mechanism.</p>
Quoting Ethan Mollick2024-03-09T03:55:00+00:002024-03-09T03:55:00+00:00https://simonwillison.net/2024/Mar/9/ethan-mollick/#atom-everything
<blockquote cite="https://twitter.com/emollick/status/1766303368211767601"><p>In every group I speak to, from business executives to scientists, including a group of very accomplished people in Silicon Valley last night, much less than 20% of the crowd has even tried a GPT-4 class model.<br><br>Less than 5% has spent the required 10 hours to know how they tick.</p></blockquote><p class="cite">— <a href="https://twitter.com/emollick/status/1766303368211767601">Ethan Mollick</a>
Coroutines and web components2024-03-09T03:38:53+00:002024-03-09T03:38:53+00:00https://simonwillison.net/2024/Mar/9/coroutines-and-web-components/#atom-everything
<p><a href="https://lorenzofox.dev/posts/component-as-infinite-loop/">Coroutines and web components</a></p>
<p>I like using generators in Python but I rarely knowingly use them in JavaScript - I'm probably most exposed to them by Observable, which uses then extensively under the hood as a mostly hidden implementation detail.</p>
<p>Laurent Renard here shows some absolutely ingenious tricks with them as a way of building stateful Web Components.</p>
<p>Via <a href="https://news.ycombinator.com/item?id=39646749">Hacker News</a></p>
The GPT-4 barrier has finally been broken2024-03-08T18:02:39+00:002024-03-08T18:02:39+00:00https://simonwillison.net/2024/Mar/8/gpt-4-barrier/#atom-everything
<p>Four weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of "vibes". Almost everyone investing serious time exploring LLMs agreed that it was the most capable default model for the majority of tasks - and had been for more than a year.</p>
<p>Today that barrier has finally been smashed. We have four new models, all released to the public in the last four weeks, that are benchmarking near or even above GPT-4. And the all-important vibes are good, too!</p>
<p>Those models come from four different vendors.</p>
<ul>
<li>
<a href="https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/">Google Gemini 1.5</a>, February 15th. I wrote about this <a href="https://simonwillison.net/2024/Feb/21/gemini-pro-video/">the other week</a>: the signature feature is an incredible one million long token context, nearly 8 times the length of GPT-4 Turbo. It can also process video, which it does by breaking it up into one frame per second - but you can fit a LOT of frames (258 tokens each) in a million tokens.</li>
<li>
<a href="https://mistral.ai/news/mistral-large/">Mistral Large</a>, February 26th. I have a big soft spot for Mistral given how exceptional their openly licensed models are - Mistral 7B runs on my iPhone, and Mixtral-8x7B is the best model I've successfully run on my laptop. Medium and Large are their two hosted but closed models, and while Large may not be quite outperform GPT-4 it's clearly in the same class. I can't wait to see what they put out next.</li>
<li>
<a href="https://www.anthropic.com/news/claude-3-family">Claude 3 Opus</a>, March 4th. This is just a few days old and wow: the vibes on this one are <em>really</em> strong. People I know who evaluate LLMs closely are rating it as the first clear GPT-4 beater. I've switched to it as my default model for a bunch of things, most conclusively for code - I've had several experiences recently where a complex GPT-4 prompt that produced broken JavaScript gave me a perfect working answer when run through Opus instead (<a href="https://fedi.simonwillison.net/@simon/112057299607427949">recent example</a>). I also enjoyed Anthropic research engineer Amanda Askell's detailed <a href="https://simonwillison.net/2024/Mar/7/claude-3-system-prompt-explained/">breakdown of their system prompt</a>.</li>
<li>
<a href="https://inflection.ai/inflection-2-5">Inflection-2.5</a>, March 7th. This one came out of left field for me: Inflection make <a href="https://hello.pi.ai/">Pi</a>, a conversation-focused chat interface that felt a little gimmicky to me when I first tried it. Then just the other day they announced that their brand new 2.5 model benchmarks favorably against GPT-4, and Ethan Mollick - one of my favourite <a href="https://interconnected.org/home/2023/03/22/tuning">LLM sommeliers</a> - noted that it <a href="https://twitter.com/emollick/status/1765801629788647468">deserves more attention</a>.</li>
</ul>
<p>Not every one of these models is a clear GPT-4 beater, but every one of them is a contender. And like I said, a month ago we had none at all.</p>
<p>There are a couple of disappointments here.</p>
<p>Firstly, none of those models are openly licensed or weights available. I imagine the resources they need to run would make them impractical for most people, but after a year that has seen enormous leaps forward in the openly licensed model category it's sad to see the very best models remain strictly proprietary.</p>
<p>And unless I've missed something, none of these models are being transparent about their training data. This also isn't surprising: the lawsuits have started flying now over training on unlicensed copyrighted data, and negative public sentiment continues to grow over the murky ethical ground on which these models are built.</p>
<p>It's still disappointing to me. While I'd love to see a model trained entirely on public domain or licensed content - and it feels like we should start to see some strong examples of that pretty soon - it's not clear to me that it's possible to build something that competes with GPT-4 without dipping deep into unlicensed content for the training. I'd love to be proved wrong on that!</p>
<p>In the absence of such a <a href="https://simonwillison.net/2022/Aug/29/stable-diffusion/#ai-vegan">vegan model</a> I'll take training transparency over what we are seeing today. I use these models a lot, and knowing how a model was trained is a powerful factor in helping decide which questions and tasks a model is likely suited for. Without training transparency we are all left reading tea leaves, sharing conspiracy theories and desperately trying to figure out the vibes.</p>
You can now train a 70b language model at home2024-03-08T10:47:53+00:002024-03-08T10:47:53+00:00https://simonwillison.net/2024/Mar/8/you-can-now-train-a-70b-language-model-at-home/#atom-everything
<p><a href="https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html">You can now train a 70b language model at home</a></p>
<p>Jeremy Howard and team: "Today, we’re releasing Answer.AI’s first project: a fully open source system that, for the first time, can efficiently train a 70b large language model on a regular desktop computer with two or more standard gaming GPUs (RTX 3090 or 4090)."</p>
<p>This is about fine-tuning an existing model, not necessarily training one from scratch.</p>
<p>There are two tricks at play here. The first is QLoRA, which can be used to train quantized models despite the reduced precision usually preventing gradient descent from working correctly.</p>
<p>QLoRA can bring the memory requirements for a 70b model down to 35GB, but gaming GPUs aren't quite that big. The second trick is Meta's Fully Sharded Data Parallel or FSDP library, which can shard a model across GPUs. Two consumer 24GB GPUs can then handle the 70b training run.</p>
<p>Via <a href="https://twitter.com/jeremyphoward/status/1765868543235805232">@jeremyphoward</a></p>