Simon Willison’s Weblog

Subscribe
Atom feed for datasette Random

1,473 posts tagged “datasette”

Datasette is an open source tool for exploring and publishing data.

2023

Release datasette 1.0a3 — An open source multi-tool for exploring and publishing data

Weeknotes: Plugins for LLM, sqlite-utils and Datasette

Visit Weeknotes: Plugins for LLM, sqlite-utils and Datasette

The principle theme for the past few weeks has been plugins.

[... 1,203 words]

Release datasette-upload-dbs 0.2 — Upload SQLite database files to Datasette
Release datasette-auth-tokens 0.4a0 — Datasette plugin for authenticating access using API tokens

Weeknotes: Self-hosted language models with LLM plugins, a new Datasette tutorial, a dozen package releases, a dozen TILs

A lot of stuff to cover from the past two and a half weeks.

[... 1,742 words]

Data analysis with SQLite and Python. I turned my 2hr45m workshop from PyCon into the latest official tutorial on the Datasette website. It includes an extensive handout which should be useful independently of the video itself.

# 2nd July 2023, 4:48 pm / speaking, sqlite, datasette, python, my-talks

Release datasette-render-markdown 2.2 — Datasette plugin for rendering Markdown
Release datasette-upload-csvs 0.8.3 — Datasette plugin for uploading CSV files and converting them to database tables
Release datasette-render-images 0.4 — Datasette plugin that renders binary blob images using data-uris

Weeknotes: Parquet in Datasette Lite, various talks, more LLM hacking

I’ve fallen a bit behind on my weeknotes. Here’s a catchup for the last few weeks.

[... 769 words]

Vector Search. Amjith Ramanujam provides a very thorough tutorial on implementing vector similarity search using SentenceTransformers embeddings (all-MiniLM-L6-v2) executed using sqlite-utils, then served via datasette-sqlite-vss and deployed using Fly.

# 2nd June 2023, 5:02 am / fly, sqlite, datasette, ai, embeddings, vector-search

Exploration de données avec Datasette. One of the great delights of open source development is seeing people run workshops on your project, even more so when they’re in a language other than English! Romain Clement presented this French workshop for the Python Grenoble meetup on 25th May 2023, using GitHub Codespaces as the environment. It’s pretty comprehensive, including a 300,000+ row example table which illustrates Datasette plugins such as datasette-cluster-map and datasette-leaflet-geojson.

# 27th May 2023, 12:36 am / tutorials, datasette, github-codespaces, leaflet

Release datasette-sqlite-url-lite 0.1 — A pure Python alternative to sqlite-url ready to be used in Datasette Lite

MMS Language Coverage in Datasette Lite. I converted the HTML table of 4,021 languages supported by Meta’s new Massively Multilingual Speech models to newline-delimited JSON and loaded it into Datasette Lite. Faceting by Language Family is particularly interesting—the top five families represented are Niger-Congo with 1,019, Austronesian with 609, Sino-Tibetan with 288, Indo-European with 278 and Afro-Asiatic with 222.

# 22nd May 2023, 8:01 pm / facebook, datasette-lite, ai, datasette

Big Opportunities in Small Data

Visit Big Opportunities in Small Data

I gave an invited keynote at Citus Con 2023, the PostgreSQL conference. Below is the abstract, video, slides and links from the presentation.

[... 385 words]

Release datasette 0.64.3 — An open source multi-tool for exploring and publishing data
Release datasette-explain 0.1a2 — Explain and validate SQL queries as you type them into Datasette

Data analysis with SQLite and Python for PyCon 2023

Visit Data analysis with SQLite and Python for PyCon 2023

I’m at PyCon 2023 in Salt Lake City this week.

[... 347 words]

What’s in the RedPajama-Data-1T LLM training set

Visit What's in the RedPajama-Data-1T LLM training set

RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute.

[... 1,077 words]

sqlite-history: tracking changes to SQLite tables using triggers (also weeknotes)

Visit sqlite-history: tracking changes to SQLite tables using triggers (also weeknotes)

In between blogging about ChatGPT rhetoric, micro-benchmarking with ChatGPT Code Interpreter and Why prompt injection is an even bigger problem now I managed to ship the beginnings of a new project: sqlite-history.

[... 1,680 words]

GitHub Accelerator: our first cohort. I’m participating in the first cohort of GitHub’s new open source accelerator program, with Datasette (and related projects). It’s a 10 week program with 20 projects working together “with an end goal of building durable streams of funding for their work”.

# 13th April 2023, 5:28 pm / open-source, personal-news, datasette, github

Weeknotes: A new llm CLI tool, plus automating my weeknotes and newsletter

Visit Weeknotes: A new llm CLI tool, plus automating my weeknotes and newsletter

I started publishing weeknotes in 2019 partly as a way to hold myself accountable but mainly as a way to encourage myself to write more.

[... 830 words]

Release datasette-explain 0.1a1 — Explain and validate SQL queries as you type them into Datasette

Semi-automating a Substack newsletter with an Observable notebook

Visit Semi-automating a Substack newsletter with an Observable notebook

I recently started sending out a weekly-ish email newsletter consisting of content from my blog. I’ve mostly automated that, using an Observable Notebook to generate the HTML. Here’s how that system works.

[... 2,520 words]

Release datasette-no-truncate 0.1 — Tiny Datasette plugin to disable text truncation in table displays

I built a ChatGPT plugin to answer questions about data hosted in Datasette

Visit I built a ChatGPT plugin to answer questions about data hosted in Datasette

Yesterday OpenAI announced support for ChatGPT plugins. It’s now possible to teach ChatGPT how to make calls out to external APIs and use the responses to help generate further answers in the current conversation.

[... 1,801 words]

Release datasette-chatgpt-plugin 0.1 — A Datasette plugin that turns a Datasette instance into a ChatGPT plugin
Release datasette-graphql 2.2 — Datasette plugin providing an automatic GraphQL API for your SQLite databases

Weeknotes: AI won’t slow down, a new newsletter and a huge Datasette refactor

I’m a few weeks behind on my weeknotes, but it’s not through lack of attention to my blog. AI just keeps getting weirder and more interesting.

[... 1,255 words]

Datasette: Gather feedback on new ?_extra= design. I just landed the single biggest backwards-incompatible change to Datasette ever, in preparation for the 1.0 release. It’s a change to the default JSON format from the Datasette API—the new format is much slimmer, and can be expanded using a new ?_extra= query string parameter. I’m desperately keen on getting feedback on this change! This issues has more details and a call for feedback.

# 22nd March 2023, 11:14 pm / json, datasette