Simon Willison’s Weblog

303 items tagged “datasette”

2022

Deploying Python web apps as AWS Lambda functions. After literally years of failed half-hearted attempts, I finally managed to deploy an ASGI Python web application (Datasette) to an AWS Lambda function! Here are my extensive notes. # 19th September 2022, 4:05 am

Weeknotes: Datasette Lite, s3-credentials, shot-scraper, datasette-edit-templates and more

Despite distractions from AI I managed to make progress on a bunch of different projects this week, including new releases of s3-credentials and shot-scraper, a new datasette-edit-templates plugin and a small but neat improvement to Datasette Lite.

[... 1562 words]

Spevktator: OSINT analysis tool for VK. This is a really cool project that came out of a recent Bellingcat hackathon. Spevktator takes 67,000 posts from five popular Russian news channels on VK (a popular Russian social media platform) and makes them available in Datasette, along with automated translations to English, post sharing metrics and sentiment analysis scores. This README includes some detailed analysis of the data, plus a link to an Observable notebook that implements custom visualizations against queries run directly against the Datasette instance. # 5th September 2022, 8:48 pm

Weeknotes: Exploring the training data behind Stable Diffusion

Two weeks ago, the Stable Diffusion image generation model was released to the public. I wrote about this last week, in Stable Diffusion is a really big deal—a post which has since become one of the top ten results for “stable diffusion” on Google and shown up in all sorts of different places online.

[... 2886 words]

Building a searchable archive for the San Francisco Microscopical Society

The San Francisco Microscopical Society was founded in 1870 by a group of scientists dedicated to advancing the field of microscopy.

[... 1845 words]

Digitizing 55,000 pages of civic meetings (via) Philip James has been building public, searchable archives of city council meetings for various cities—Oakland and Alamedia so far—using my s3-ocr script to run Textract OCR against the PDFs of the minutes, and deploying them to Fly using Datasette. This is a really cool project, and very much the kind of thing I’ve been hoping to support with the tools I’ve been building. # 22nd August 2022, 4:26 pm

Analyzing ScotRail audio announcements with Datasette—from prototype to production

Scottish train operator ScotRail released a two-hour long MP3 file containing all of the components of its automated station announcements. Messing around with them is proving to be a huge amount of fun.

[... 4428 words]

The Datasette Newsletter: Datasette Lite, Datasette Tutorials, Datasette Cloud. It’s been quite a while since I’ve sent one of these out now—hoping to get this on to a more regular schedule. # 19th August 2022, 1:20 am

Plugin support for Datasette Lite

I’ve added a new feature to Datasette Lite, my distribution of Datasette that runs entirely in the browser using Python and SQLite compiled to WebAssembly. You can now install additional Datasette plugins by passing them in the URL.

[... 865 words]

Litestream backups for Datasette Cloud (and weeknotes)

My main focus this week has been adding robust backups to the forthcoming Datasette Cloud.

[... 1604 words]

datasette on Open Source Insights (via) Open Source Insights is “an experimental service developed and hosted by Google to help developers better understand the structure, security, and construction of open source software packages”. It calculates scores for packages using various automated heuristics. A JSON version of the resulting score card can be accessed using https://deps.dev/_/s/pypi/p/{package_name}/v/ # 11th August 2022, 1:06 am

Introducing sqlite-html: query, parse, and generate HTML in SQLite (via) Another brilliant SQLite extension module from Alex Garcia, this time written in Go. sqlite-html adds a whole family of functions to SQLite for parsing and constructing HTML strings, built on the Go goquery and cascadia libraries. Once again, Alex uses an Observable notebook to describe the new features, with embedded interactive examples that are backed by a Datasette instance running in Fly. # 3rd August 2022, 5:31 pm

Cleaning data with sqlite-utils and Datasette (via) I wrote a new tutorial for the Datasette website, showing how to use sqlite-utils to import a CSV file, clean up the resulting schema, fix date formats and extract some of the columns into a separate table. It’s accompanied by a ten minute video originally recorded for the HYTRADBOI conference. # 31st July 2022, 7:57 pm

Weeknotes: Joining the board of the Python Software Foundation

A few weeks ago I was elected to the board of directors for the Python Software Foundation.

[... 2081 words]

Weeknotes: Datasette, sqlite-utils, Datasette Desktop

A flurry of releases this week, including a new Datasette alpha and a fixed Datasette Desktop.

[... 1113 words]

Datasette Discord community (via) I started a Discord chat community for Datasette. 57 people have joined up already! # 15th July 2022, 3:17 am

sqlite-comprehend: run AWS entity extraction against content in a SQLite database

I built a new tool this week: sqlite-comprehend, which passes text from a SQLite database through the AWS Comprehend entity extraction service and stores the returned entities.

[... 1146 words]

Joining CSV files in your browser using Datasette Lite

I added a new feature to Datasette Lite—my version of Datasette that runs entirely in your browser using WebAssembly (previously): you can now use it to load one or more CSV files by URL, and then run SQL queries against them—including joins across data from multiple files.

[... 546 words]

Weeknotes: datasette-socrata, and the last 10%...

... takes 90% of the work. I continue to work towards a preview of the new Datasette Cloud, and keep finding new “just one more things” to delay inviting in users.

[... 1214 words]

Weeknotes: Datasette Cloud ready to preview

I made an absolute ton of progress building Datasette Cloud on Fly this week, and also had a bunch of fun playing with GPT-3.

[... 370 words]

A Datasette tutorial written by GPT-3

I’ve been playing around with OpenAI’s GPT-3 language model playground for a few months now. It’s a fascinating piece of software. You can sign up here—apparently there’s no longer a waiting list.

[... 1244 words]

Architecture Notes: Datasette (via) I was interviewed for the first edition of Architecture Notes—a new publication (website and newsletter) about software architecture created by Mahdi Yusuf. We covered a bunch of topics in detail: ASGI, SQLIte and asyncio, Baked Data, plugin hook design, Python in WebAssembly, Python in an Electron app and more. Mahdi also turned my scrappy diagrams into beautiful illustrations for the piece. # 27th May 2022, 3:20 pm

Weeknotes: Building Datasette Cloud on Fly Machines, Furo for documentation

Hosting provider Fly released Fly Machines this week. I got an early preview and I’ve been working with it for a few days—it’s a fascinating new piece of technology. I’m using it to get my hosting service for Datasette ready for wider release.

[... 1005 words]

simonw/datasette-screenshots (via) I started a new GitHub repository to automate taking screenshots of Datasette for marketing purposes, using my shot-scraper browser automation tool. # 17th May 2022, 5:56 pm

Weeknotes: Datasette Lite, nogil Python, HYTRADBOI

My big project this week was Datasette Lite, a new way to run Datasette directly in a browser, powered by WebAssembly and Pyodide. I also continued my research into running SQL queries in parallel, described last week. Plus I spoke at HYTRADBOI.

[... 1434 words]

Datasette Lite: a server-side Python web application running in a browser

Datasette Lite is a new way to run Datasette: entirely in a browser, taking advantage of the incredible Pyodide project which provides Python compiled to WebAssembly plus a whole suite of useful extras.

[... 4800 words]

Automatically opening issues when tracked file content changes

I figured out a GitHub Actions pattern to keep track of a file published somewhere on the internet and automatically open a new repository issue any time the contents of that file changes.

[... 1151 words]

Weeknotes: Parallel SQL queries for Datasette, plus some middleware tricks

A promising new performance optimization for Datasette, plus new datasette-gzip and datasette-total-page-time plugins.

[... 1534 words]

Useful tricks with pip install URL and GitHub

The pip install command can accept a URL to a zip file or tarball. GitHub provides URLs that can create a zip file of any branch, tag or commit in any repository. Combining these is a really useful trick for maintaining Python packages.

[... 929 words]

Building a Covid sewage Twitter bot (and other weeknotes)

I built a new Twitter bot today: @covidsewage. It tweets a daily screenshot of the latest Covid sewage monitoring data published by Santa Clara county.

[... 1079 words]