Entries

Large language models are having their Stable Diffusion moment

The open release of the Stable Diffusion image generation model back in August 2022 was a key moment. I wrote how Stable Diffusion is a really big deal at the time.

[... 1,815 words]

7:15 pm / 11th March 2023 / facebook, ai, gpt-3, openai, generative-ai, chatgpt, whisper, llama, local-llms, llms, llama-cpp, llm-release

ChatGPT couldn’t access the internet, even though it really looked like it could

A really common misconception about ChatGPT is that it can access URLs. I’ve seen many different examples of people pasting in a URL and asking for a summary, or asking it to make use of the content on that page in some way.

[... 1,745 words]

1:41 pm / 10th March 2023 / bing, ethics, ai, gpt-3, openai, generative-ai, chatgpt, gpt-4, llms, ai-ethics, hallucinations

Weeknotes: NICAR, and an appearance on KQED Forum

I spent most of this week at NICAR 2023, the data journalism conference hosted this year in Nashville, Tennessee.

[... 1,941 words]

10:46 pm / 7th March 2023 / bing, data-journalism, media, radio, ai, weeknotes, generative-ai, chatgpt, whisper, llms, nicar, podcast-appearances, macwhisper

Thoughts and impressions of AI-assisted search from Bing

It’s been a wild couple of weeks.

[... 1,763 words]

7:56 pm / 24th February 2023 / bing, ethics, ai, weeknotes, generative-ai, llms, ai-ethics, ai-assisted-search, ai-personality

In defense of prompt engineering

Prompt engineering as a discipline doesn’t get nearly the respect it deserves.

[... 924 words]

6:28 pm / 21st February 2023 / ai, prompt-engineering, prompt-injection, generative-ai, llms

I talked about Bing and tried to explain language models on live TV!

Yesterday evening I was interviewed by Natasha Zouves on NewsNation, on live TV (over Zoom).

[... 1,697 words]

4:53 pm / 19th February 2023 / bing, interviews, speaking, my-talks, ai, generative-ai, llms

Analytics: Hacker News v.s. a tweet from Elon Musk

My post Bing: “I will not harm you unless you harm me first” really took off.

[... 817 words]

10:11 pm / 17th February 2023 / analytics, bing, hacker-news, twitter, datasette, cloudflare

Bing: “I will not harm you unless you harm me first”

Last week, Microsoft announced the new AI-powered Bing: a search interface that incorporates a language model powered chatbot that can run searches for you and summarize the results, plus do all of the other fun things that engines like GPT-3 and ChatGPT have been demonstrating over the past few months: the ability to generate poetry, and jokes, and do creative writing, and so much more.

[... 4,922 words]

3:05 pm / 15th February 2023 / bing, ethics, microsoft, search, ai, gpt-3, openai, prompt-engineering, prompt-injection, generative-ai, llms, ai-ethics, ai-assisted-search, ai-personality, system-prompts

Weeknotes: A bunch of things I learned this week, plus datasette-explain

The Datasette table view refactor, JSON redesign and ?_extra= continues this week, mainly in this ongoing pull request and this tracking issue.

[... 1,528 words]

5:57 am / 9th February 2023 / plugins, projects, sqlite, datasette, webassembly, weeknotes, gpt-3, generative-ai

datasette-scraper, Big Local News and other weeknotes

In addition to exploring the new MusicCaps training and evaluation data I’ve been working on the big Datasette JSON refactor, and getting excited about a Datasette project that I didn’t work on at all.

[... 1,744 words]

2:52 am / 30th January 2023 / plugins, projects, datasette, weeknotes, shot-scraper, colin-dellow

Exploring MusicCaps, the evaluation data released to accompany Google’s MusicLM text-to-music model

Google Research just released MusicLM: Generating Music From Text. It’s a new generative AI model that takes a descriptive prompt and produces a “high-fidelity” music track. Here’s the paper (and a more readable version using arXiv Vanity).

[... 1,323 words]

9:34 pm / 27th January 2023 / ethics, google, projects, youtube, ai, datasette, generative-ai, training-data, ai-ethics

Weeknotes: AI hacking and a SpatiaLite tutorial

Short weeknotes this time because the key things I worked on have already been covered here:

7:45 pm / 15th January 2023 / spatialite, ai, datasette, weeknotes, gpt-3, openai, generative-ai, vector-search, llms

How to implement Q&A against your documentation with GPT3, embeddings and Datasette

If you’ve spent any time with GPT-3 or ChatGPT, you’ve likely thought about how useful it would be if you could point them at a specific, current collection of text or documentation and have it use that as part of its input for answering questions.

[... 3,447 words]

11:47 pm / 13th January 2023 / projects, search, sqlite, ai, datasette, gpt-3, generative-ai, vector-search, llms, embeddings, rag, ai-assisted-search

Datasette 0.64, with a warning about SpatiaLite

I release Datasette 0.64 this morning. This release is mainly a response to the realization that it’s not safe to run Datasette with the SpatiaLite extension loaded if that Datasette instance is configured to enable arbitrary SQL queries from untrusted users.

[... 675 words]

9:22 pm / 9th January 2023 / security, spatialite, datasette, annotated-release-notes

2022 in projects and blogging

In lieu of my regular weeknotes (I took two weeks off for the holidays) here’s a look back at 2022, mainly in terms of projects and things I’ve written about.

11:59 pm / 31st December 2022 / blogging, projects, weeknotes

Weeknotes: Datasette 0.63.3, datasette-ripgrep

We’re back in the UK to see family over Christmas (our first trip back since 2019). Here are a few notes from the past couple of weeks.

[... 801 words]

2:54 pm / 20th December 2022 / datasette, weeknotes, ripgrep

Datasette 1.0a2: Upserts and finely grained permissions

I’ve released the third alpha of Datasette 1.0. The 1.0a2 release introduces upsert support to the new JSON API and makes some major improvements to the Datasette permissions system.

[... 2,844 words]

5:58 pm / 15th December 2022 / api, permissions, projects, upsert, datasette, annotated-release-notes

Over-engineering Secret Santa with Python cryptography and Datasette

We’re doing a family Secret Santa this year, and we needed a way to randomly assign people to each other without anyone knowing who was assigned to who.

[... 2,044 words]

2:03 am / 11th December 2022 / cryptography, glitch, projects, datasette, chatgpt, llms, ai-assisted-programming

Weeknotes: datasette-ephemeral-tables, datasette-export

Most of what I’ve been working on for the past week and a half is already documented:

[... 603 words]

10:27 pm / 5th December 2022 / plugins, datasette, weeknotes

AI assisted learning: Learning Rust with ChatGPT, Copilot and Advent of Code

I’m using this year’s Advent of Code to learn Rust—with the assistance of GitHub Copilot and OpenAI’s new ChatGPT.

[... 2,661 words]

9:11 pm / 5th December 2022 / education, github, projects, ai, rust, gpt-3, openai, generative-ai, chatgpt, github-copilot, llms, ai-assisted-programming, github-issues

A new AI game: Give me ideas for crimes to do

Less than a week ago OpenAI unleashed ChatGPT on the world, and it kicked off what feels like a seismic shift in many people’s understand of the capabilities of large language models.

[... 1,069 words]

3:11 pm / 4th December 2022 / ai, gpt-3, openai, prompt-injection, generative-ai, chatgpt, github-copilot, llms

Datasette’s new JSON write API: The first alpha of Datasette 1.0

This week I published the first alpha release of Datasette 1.0, with a significant new feature: Datasette core now includes a JSON API for creating and dropping tables and inserting, updating and deleting data.

[... 2,817 words]

11:15 pm / 2nd December 2022 / apis, json, projects, datasette

Coping strategies for the serial project hoarder

I gave a talk at DjangoCon US 2022 in San Diego last month about productivity on personal projects, titled “Massively increase your productivity on personal projects with comprehensive documentation and automated tests”.

[... 3,865 words]

3:47 pm / 26th November 2022 / djangocon, documentation, productivity, my-talks, testing, annotated-talks, github-issues

Weeknotes: Implementing a write API, Mastodon distractions

Everything is so distracting at the moment. The ongoing Twitter catastrophe, the great migration (at least amongst most of the people I pay attention to) to Mastodon, the FTX calamity. It’s been very hard to focus!

[... 916 words]

4:57 am / 23rd November 2022 / datasette, weeknotes, mastodon

Tracking Mastodon user numbers over time with a bucket of tricks

Mastodon is definitely having a moment. User growth is skyrocketing as more and more people migrate over from Twitter.

[... 1,534 words]

7 am / 20th November 2022 / github, projects, datasette, observable, github-actions, git-scraping, git-history, s3-credentials, datasette-lite, mastodon, cors

Datasette is 5 today: a call for birthday presents

Five years ago today I published the first release of Datasette, in Datasette: instantly create and publish an API for your SQLite databases.

[... 548 words]

7:27 pm / 13th November 2022 / projects, datasette

Designing a write API for Datasette

Building out Datasette Cloud has made one thing clear to me: Datasette needs a write API for ingesting new data into its attached SQLite databases.

[... 1,493 words]

7:44 pm / 9th November 2022 / api-design, projects, datasette, weeknotes, datasette-cloud, sqlite-utils

Mastodon is just blogs

And that’s great. It’s also the return of Google Reader!

[... 1,560 words]

3:48 pm / 8th November 2022 / blogging, google-reader, twitter, mastodon, activitypub, fediverse, sidekiq

What to blog about

You should start a blog. Having your own little corner of the internet is good for the soul!

[... 520 words]

5:05 pm / 6th November 2022 / blogging, writing

It looks like I’m moving to Mastodon

Elon Musk laid off about half of Twitter this morning. There are many terrible stories emerging about how this went down, but one that particularly struck me was that he laid off the entire accessibility team. For me this feels like a microcosm of the whole situation. Twitter’s priorities are no longer even remotely aligned with my own.

[... 1,546 words]

5:32 am / 5th November 2022 / accessibility, twitter, mastodon, fediverse

Simon Willison’s Weblog