Simon Willison’s Weblog

Subscribe
Atom feed for datasette Random

1,475 posts tagged “datasette”

Datasette is an open source tool for exploring and publishing data.

2023

Weeknotes: AI won’t slow down, a new newsletter and a huge Datasette refactor

I’m a few weeks behind on my weeknotes, but it’s not through lack of attention to my blog. AI just keeps getting weirder and more interesting.

[... 1,255 words]

Datasette: Gather feedback on new ?_extra= design. I just landed the single biggest backwards-incompatible change to Datasette ever, in preparation for the 1.0 release. It’s a change to the default JSON format from the Datasette API—the new format is much slimmer, and can be expanded using a new ?_extra= query string parameter. I’m desperately keen on getting feedback on this change! This issues has more details and a call for feedback.

# 22nd March 2023, 11:14 pm / json, datasette

Release datasette-atom 0.9 — Datasette plugin that adds a .atom output format
Release datasette-simple-html 0.2 — Datasette SQL functions for very simple HTML operations
Release datasette 0.64.2 — An open source multi-tool for exploring and publishing data
Release datasette-simple-html 0.1 — Datasette SQL functions for very simple HTML operations
Release datasette-app 0.2.3 — The Datasette macOS application

Using Datasette in GitHub Codespaces. A new Datasette tutorial showing how it can be run inside GitHub Codespaces—GitHub’s browser-based development environments—in order to explore and analyze data. I’ve been using Codespaces to run tutorials recently and it’s absolutely fantastic, because it puts every tutorial attendee on a level playing field with respect to their development environments.

# 24th February 2023, 12:40 am / tutorials, datasette, github, github-codespaces

Release datasette-codespaces 0.1.1 — Conveniences for running Datasette on GitHub Codespaces
Release datasette-codespaces 0.1 — Conveniences for running Datasette on GitHub Codespaces

Analytics: Hacker News v.s. a tweet from Elon Musk

My post Bing: “I will not harm you unless you harm me first” really took off.

[... 817 words]

Release datasette-app-support 0.11.8 — Part of https://github.com/simonw/datasette-app
Release datasette-app-support 0.11.7 — Part of https://github.com/simonw/datasette-app

Introducing sqlite-vss: A SQLite Extension for Vector Search (via) This latest SQLite extension from Alex Garcia is possibly his best yet: it adds FAISS-powered vector similarity search directly to SQLite, enabling fast KNN similarity lookups against a virtual table that feels a lot like SQLite’s own built-in full text search feature. This write-up includes interactive demos using Datasette called from an Observable notebook, running similarity searches against an index of 200,000 news headlines and summaries in less than 50ms.

# 10th February 2023, 10:53 pm / vector-search, sqlite, datasette, observable, alex-garcia

Weeknotes: A bunch of things I learned this week, plus datasette-explain

Visit Weeknotes: A bunch of things I learned this week, plus datasette-explain

The Datasette table view refactor, JSON redesign and ?_extra= continues this week, mainly in this ongoing pull request and this tracking issue.

[... 1,528 words]

Release datasette-explain 0.1a0 — Explain and validate SQL queries as you type them into Datasette

Making SQLite extensions pip install-able (via) Alex Garcia figured out how to bundle a compiled SQLite extension in a Python wheel (building different wheels for different platforms) and publish them to PyPI. This is a huge leap forward in terms of the usability of SQLite extensions, which have previously been pretty difficult to actually install and run. Alex also created Datasette plugins that depend on his packages, so you can now “datasette install datasette-sqlite-regex” (or datasette-sqlite-ulid, datasette-sqlite-fastrand, datasette-sqlite-jsonschema) to gain access to his custom SQLite extensions in your Datasette instance. It even works with “datasette publish --install” to deploy to Vercel, Fly.io and Cloud Run.

# 6th February 2023, 7:44 pm / sqlite, plugins, datasette, python, pip, alex-garcia

datasette-scraper, Big Local News and other weeknotes

Visit datasette-scraper, Big Local News and other weeknotes

In addition to exploring the new MusicCaps training and evaluation data I’ve been working on the big Datasette JSON refactor, and getting excited about a Datasette project that I didn’t work on at all.

[... 1,744 words]

datasette-scraper walkthrough on YouTube (via) datasette-scraper is Colin Dellow’s new plugin that turns Datasette into a powerful web scraping tool, with a web UI based on plugin-driven customizations to the Datasette interface. It’s really impressive, and this ten minute demo shows quite how much it is capable of: it can crawl sitemaps and fetch pages, caching them (using zstandard with optional custom dictionaries for extra compression) to speed up subsequent crawls... and you can add your own plugins to extract structured data from crawled pages and save it to a separate SQLite table!

# 29th January 2023, 5:23 am / scraping, datasette, plugins, colin-dellow

Examples of sites built using Datasette (via) I gave the examples page on the Datasette website a significant upgrade today: it now includes screenshots (taken using shot-scraper) of six projects chosen to illustrate the variety of problems Datasette can be used to tackle.

# 29th January 2023, 3:40 am / projects, shot-scraper, datasette

We’ve built many tools for publishing to the web - but I want to make the claim that we have underdeveloped the tools and platforms for publishing collections, indexes and small databases. It’s too hard to build these kinds of experiences, too hard to maintain them and a lack of collaborative tools.

Tom Critchlow

# 28th January 2023, 4:43 pm / datasette

Release datasette-render-markdown 2.1.1 — Datasette plugin for rendering Markdown

Exploring MusicCaps, the evaluation data released to accompany Google’s MusicLM text-to-music model

Visit Exploring MusicCaps, the evaluation data released to accompany Google's MusicLM text-to-music model

Google Research just released MusicLM: Generating Music From Text. It’s a new generative AI model that takes a descriptive prompt and produces a “high-fidelity” music track. Here’s the paper (and a more readable version using arXiv Vanity).

[... 1,323 words]

Release datasette-youtube-embed 0.1 — Turn YouTube URLs into embedded players in Datasette

datasette-granian (via) Granian is a new Python web server—similar to Gunicorn—written in Rust. I built a small plugin that adds a “datasette granian” command starting a Granian server that serves Datasette’s ASGI application, using the same pattern as my existing datasette-gunicorn plugin.

# 20th January 2023, 2:12 am / rust, datasette, asgi

Release datasette-granian 0.1a0 — Run Datasette using the Granian HTTP server
Release datasette-faiss 0.2 — Maintain a FAISS index for specified Datasette tables

Datasette is my data hammer (via) Jeremia Kimelman—a data journalist at CalMatters in Sacramento—enthuses about how he uses Datasette as his default hammer for all kinds of data projects—in particular how much he appreciates Datasette’s focus on URLs. So nice to see this!

# 17th January 2023, 5:23 pm / data-journalism, datasette

Weeknotes: AI hacking and a SpatiaLite tutorial

Short weeknotes this time because the key things I worked on have already been covered here:

How to implement Q&A against your documentation with GPT3, embeddings and Datasette

Visit How to implement Q&A against your documentation with GPT3, embeddings and Datasette

If you’ve spent any time with GPT-3 or ChatGPT, you’ve likely thought about how useful it would be if you could point them at a specific, current collection of text or documentation and have it use that as part of its input for answering questions.

[... 3,447 words]