Simon Willison’s Weblog

Subscribe

Friday, 12th July 2024

Why The Atlantic signed a deal with OpenAI. Interesting conversation between Nilay Patel and The Atlantic CEO (and former journalist/editor) Nicholas Thompson about the relationship between media organizations and LLM companies like OpenAI.

On the impact of these deals on the ongoing New York Times lawsuit:

One of the ways that we [The Atlantic] can help the industry is by making deals and setting a market. I believe that us doing a deal with OpenAI makes it easier for us to make deals with the other large language model companies if those come about, I think it makes it easier for other journalistic companies to make deals with OpenAI and others, and I think it makes it more likely that The Times wins their lawsuit.

How could it help? Because deals like this establish a market value for training content, important for the fair use component of the legal argument.

# 2:35 pm / new-york-times, ai, openai, llms, nilay-patel, training-data

Fighting bots is fighting humans [...] remind you that "only allow humans to access" is just not an achievable goal. Any attempt at limiting bot access will inevitably allow some bots through and prevent some humans from accessing the site, and it's about deciding where you want to set the cutoff. I fear that media outlets and other websites, in attempting to "protect" their material from AI scrapers, will go too far in the anti-human direction.

Molly White

# 2:45 pm / ai, molly-white

The Death of the Junior Developer (via) Steve Yegge's speculative take on the impact LLM-assisted coding could have on software careers.

Steve works on Cody, an AI programming assistant, so he's hardly an unbiased source of information. Nevertheless, his collection of anecdotes here matches what I've been seeing myself.

Steve coins the term here CHOP, for Chat Oriented Programming, where the majority of code is typed by an LLM that is directed by a programmer. Steve describes it as "coding via iterative prompt refinement", and argues that the models only recently got good enough to support this style with GPT-4o, Gemini Pro and Claude 3 Opus.

I've been experimenting with this approach myself on a few small projects (see this Claude example) and it really is a surprisingly effective way to work.

Also included: a story about how GPT-4o produced a bewitchingly tempting proposal with long-term damaging effects that only a senior engineer with deep understanding of the problem space could catch!

I'm in strong agreement with this thought on the skills that are becoming most important:

Everyone will need to get a lot more serious about testing and reviewing code.

# 3:34 pm / steve-yegge, ai, generative-ai, llms, ai-assisted-programming

Searching an aerial photo with text queries. Robin Wilson built a demo that lets you search a large aerial photograph of Southampton for things like "roundabout" or "tennis court". He explains how it works in detail: he used the SkyCLIP model, which is trained on "5.2 million remote sensing image-text pairs in total, covering more than 29K distinct semantic tags" to generate embeddings for 200x200 image segments (with 100px of overlap), then stored them in Pinecone.

# 6:07 pm / gis, embeddings, clip

datasette-python. I just released a small new plugin for Datasette to assist with debugging. It adds a python subcommand which runs a Python process in the same virtual environment as Datasette itself.

I built it initially to help debug some issues in Datasette installed via Homebrew. The Homebrew installation has its own virtual environment, and sometimes it can be useful to run commands like pip list in the same environment as Datasette itself.

Now you can do this:

brew install datasette
datasette install datasette-python
datasette python -m pip list

I built a similar plugin for LLM last year, called llm-python - it's proved useful enough that I duplicated the design for Datasette.

# 11:17 pm / plugins, projects, python, datasette

Free-threaded CPython is ready to experiment with! The Python 3.13 beta releases that include a "free-threaded" version that removes the GIL are now available to test! A team from Quansight Labs, home of the PyData core team, just launched py-free-threading.github.io to help document the new builds and track compatibility with Python's larger ecosystem.

Free-threading mode will not be enabled in Python installations by default. You can install special builds that have the option enabled today - I used the macOS installer and, after enabling the new build in the "Customize" panel in the installer, ended up with a /usr/local/bin/python3.13t binary which shows "Python 3.13.0b3 experimental free-threading build" when I run it.

Here's my TIL describing my experiments so far installing and running the 3.13 beta on macOS, which also includes a correction to an embarrassing bug that Claude introduced but I failed to catch!

# 11:42 pm / concurrency, gil, python, threading