Simon Willison’s Weblog

Subscribe

Items tagged projects, datasettecloud

Filters: projects × datasettecloud × Sorted by date


AI for Data Journalism: demonstrating what we can do with this stuff right now

I gave a talk last month at the Story Discovery at Scale data journalism conference hosted at Stanford by Big Local News. My brief was to go deep into the things we can use Large Language Models for right now, illustrated by a flurry of demos to help provide starting points for further conversations at the conference.

[... 6080 words]

Three major LLM releases in 24 hours (plus weeknotes)

I’m a bit behind on my weeknotes, so there’s a lot to cover here. But first... a review of the last 24 hours of Large Language Model news. All times are in US Pacific on April 9th 2024.

[... 1401 words]

Extracting data from unstructured text and images with Datasette and GPT-4 Turbo. Datasette Extract is a new Datasette plugin that uses GPT-4 Turbo (released to general availability today) and GPT-4 Vision to extract structured data from unstructured text and images.

I put together a video demo of the plugin in action today, and posted it to the Datasette Cloud blog along with screenshots and a tutorial describing how to use it. # 9th April 2024, 11:03 pm

Weeknotes: the aftermath of NICAR

NICAR was fantastic this year. Alex and I ran a successful workshop on Datasette and Datasette Cloud, and I gave a lightning talk demonstrating two new GPT-4 powered Datasette plugins—datasette-enrichments-gpt and datasette-extract. I need to write more about the latter one: it enables populating tables from unstructured content (using a variant of this technique) and it’s really effective. I got it working just in time for the conference.

[... 1430 words]

Weeknotes: Getting ready for NICAR

Next week is NICAR 2024 in Baltimore—the annual data journalism conference hosted by Investigative Reporters and Editors. I’m running a workshop on Datasette, and I plan to spend most of my time in the hallway track talking to people about Datasette, Datasette Cloud and how the Datasette ecosystem can best help support their work.

[... 1390 words]

dclient 0.3. dclient is my CLI utility for working with remote Datasette instances—in particular for authenticating with them and then running both read-only SQL queries and inserting data using the new Datasette write JSON API. I just picked up work on the project again after a six month gap—the insert command can now be used to constantly stream data directly to hosted Datasette instances such as Datasette Cloud. # 25th February 2024, 8:06 pm

Weeknotes: datasette-test, datasette-build, PSF board retreat

I wrote about Page caching and custom templates in my last weeknotes. This week I wrapped up that work, modifying datasette-edit-templates to be compatible with the jinja2_environment_from_request() plugin hook. This means you can edit templates directly in Datasette itself and have those served either for the full instance or just for the instance when served from a specific domain (the Datasette Cloud case).

[... 757 words]

Annotate and explore your data with datasette-comments. New plugin for Datasette and Datasette Cloud: datasette-comments, providing tools for collaborating on data exploration with a team through posting comments on individual rows of data.

Alex Garcia built this for Datasette Cloud but as with almost all of our work there it’s also available as an open source Python package. # 30th November 2023, 9:59 pm

Weeknotes: DevDay, GitHub Universe, OpenAI chaos

Three weeks of conferences and Datasette Cloud work, four days of chaos for OpenAI.

[... 766 words]

Weeknotes: the Datasette Cloud API, a podcast appearance and more

Datasette Cloud now has a documented API, plus a podcast appearance, some LLM plugins work and some geospatial excitement.

[... 1243 words]

Weeknotes: Embeddings, more embeddings and Datasette Cloud

Since my last weeknotes, a flurry of activity. LLM has embeddings support now, and Datasette Cloud has driven some major improvements to the wider Datasette ecosystem.

[... 2427 words]

Datasette Cloud and the Datasette 1.0 alphas. I sent out the Datasette Newsletter for the first time in quite a while, with updates on Datasette Cloud, the Datasette 1.0 alphas, a note about the security vulnerability in those alphas and a summary of some of my research into combining LLMs with Datasette. # 22nd August 2023, 7:56 pm

Datasette Cloud, Datasette 1.0a3, llm-mlc and more

Datasette Cloud is now a significant step closer to general availability. The Datasette 1.03 alpha release is out, with a mostly finalized JSON format for 1.0. Plus new plugins for LLM and sqlite-utils and a flurry of things I’ve learned.

[... 1690 words]

Welcome to Datasette Cloud. We launched the Datasette Cloud blog today! The SaaS hosted version of Datasette is ready to start onboarding more users—this post describes what it can do so far and hints at what’s planned to come next. # 16th August 2023, 1:46 am

Designing a write API for Datasette

Building out Datasette Cloud has made one thing clear to me: Datasette needs a write API for ingesting new data into its attached SQLite databases.

[... 1493 words]

Weeknotes: Publishing data using Datasette Cloud

My initial preview releases of Datasette Cloud (the SaaS version of my Datasette open source project) have focused on private data collaboration.

[... 582 words]

Weeknotes: Datasette Cloud preview invitations

This week I finally started sending out invitations for people to try out the preview of the new Datasette Cloud, my SaaS offering for Datasette.

[... 713 words]

Weeknotes: Datasette Cloud ready to preview

I made an absolute ton of progress building Datasette Cloud on Fly this week, and also had a bunch of fun playing with GPT-3.

[... 370 words]

Weeknotes: Building Datasette Cloud on Fly Machines, Furo for documentation

Hosting provider Fly released Fly Machines this week. I got an early preview and I’ve been working with it for a few days—it’s a fascinating new piece of technology. I’m using it to get my hosting service for Datasette ready for wider release.

[... 1005 words]

Weeknotes: Datasette column actions, plus three new plugins

A renewed emphasis on building out Datasette Cloud has produced three new plugins this week: datasette-dateutil, datasette-import-table and datasette-edit-schema, plus a major improvement to Datasette’s default interface for browsing tables.

[... 1093 words]

Weeknotes: California Protected Areas in Datasette

This week I built a geospatial search engine for protected areas in California, shipped datasette-graphql 1.0 and started working towards the next milestone for Datasette Cloud.

[... 1099 words]

Weeknotes: Hacking on 23 different projects

I wrote a lot of code this week: 184 commits over 23 repositories! I’ve also started falling for Zeit Now v2, having found workarounds for some of my biggest problems with it.

[... 901 words]

Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf

I’ve been preparing for the NICAR 2020 Data Journalism conference this week which has lead me into a flurry of activity across a plethora of different projects and plugins.

[... 834 words]

How to cheat at unit tests with pytest and Black

I’ve been making a lot of progress on Datasette Cloud this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.

[... 933 words]

Weeknotes: Shaving yaks for Datasette Cloud

I’ve been shaving a lot of yaks, but I’m finally ready to for other people to start kicking the tires on the MVP of Datasette Cloud.

[... 392 words]

Weeknotes: datasette-auth-existing-cookies and datasette-sentry

Work on Datasette Cloud continues—I’m tantalizingly close to having a MVP I can start to invite people to try out.

[... 701 words]

Weeknotes: Datasette Cloud and zero downtime deployments

Yesterday’s piece on Tracking FARA by deploying a data API using GitHub Actions and Cloud Run was originally intended to be my weeknotes, but ended up getting a bit too involved.

[... 1431 words]