Simon Willison’s Weblog

Subscribe

Notes from my appearance on the Software Misadventures Podcast

10th September 2024

I was a guest on Ronak Nathani and Guang Yang’s Software Misadventures Podcast, which interviews seasoned software engineers about their careers so far and their misadventures along the way. Here’s the episode: LLMs are like your weird, over-confident intern | Simon Willison (Datasette).

You can get the audio version on Overcast, on Apple Podcasts or on Spotify—or you can watch the video version on YouTube.

I ran the video through MacWhisper to get a transcript, then spent some time editing out my own favourite quotes, trying to focus on things I haven’t written about previously on this blog.

Having a blog

23:15

There’s something wholesome about having a little corner of the internet just for you.

It feels a little bit subversive as well in this day and age, with all of these giant walled platforms and you’re like, "Yeah, no, I’ve got domain name and I’m running a web app.”

It used to be that 10, 15 years ago, everyone’s intro to web development was building your own blog system. I don’t think people do that anymore.

That’s really sad because it’s such a good project—you get to learn databases and HTML and URL design and SEO and all of these different skills.

Aligning LLMs with your own expertise

37:10

As an experienced software engineer, I can get great code from LLMs because I’ve got that expertise in what kind of questions to ask. I can spot when it makes mistakes very quickly. I know how to test the things it’s giving me.

Occasionally I’ll ask it legal questions—I’ll paste in terms of service and ask, “Is there anything in here that looks a bit dodgy?”

I know for a fact that this is a terrible idea because I have no legal knowledge! I’m sort of like play acting with it and nodding along, but I would never make a life altering decision based on legal advice from LLM that I got, because I’m not a lawyer.

If I was a lawyer, I’d use them all the time because I’d be able to fall back on my actual expertise to make sure that I’m using them responsibly.

The usability of LLM chat interfaces

40:30

It’s like taking a brand new computer user and dumping them in a Linux machine with a terminal prompt and say, “There you go, figure it out.”

It’s an absolute joke that we’ve got this incredibly sophisticated software and we’ve given it a command line interface and launched it to a hundred million people.

Benefits for people with English as a second language

41:53

For people who don’t speak English or have English as a second language, this stuff is incredible.

We live in a society where having really good spoken and written English puts you at a huge advantage.

The street light outside your house is broken and you need to write a letter to the council to get it fixed? That used to be a significant barrier.

It’s not anymore. ChatGPT will write a formal letter to the council complaining about a broken street light that is absolutely flawless.

And you can prompt it in any language. I’m so excited about that.

Interestingly, it sort of breaks aspects of society as well—because we’ve been using written English skills as a filter for so many different things.

If you want to get into university, you have to write formal letters and all of that kind of stuff, which used to keep people out.

Now it doesn’t anymore, which I think is thrilling…. but at the same time, if you’ve got institutions that are designed around the idea that you can evaluate everyone and filter them based on written essays, and now you can’t, we’ve got to redesign those institutions.

That’s going to take a while. What does that even look like? It’s so disruptive to society in all of these different ways.

Are we all going to lose your jobs?

46:39

As a professional programmer, there’s an aspect where you ask, OK, does this mean that our jobs are all gonna dry up?

I don’t think the jobs dry up. I think more companies start commissioning custom software because the cost of developing custom software goes down, which I think increases the demand for engineers who know what they’re doing.

But I’m not an economist. Maybe this is the death knell for six figure programmer salaries and we’re gonna end up working for peanuts?

[... later 1:32:12 ...]

Every now and then you hear a story of a company who got software built for them, and it turns out it was the boss’s cousin, who’s like a 15-year-old who’s good with computers, and they built software, and it’s garbage.

Maybe we’ve just given everyone in the world the overconfident 15-year-old cousin who’s gonna claim to be able to build something, and build them something that maybe kind of works.

And maybe society’s okay with that?

This is why I don’t feel threatened as a senior engineer, because I know that if you sit down somebody who doesn’t know how to program with an LLM, and you sit me with an LLM, and ask us to build the same thing, I will build better software than they will.

Hopefully market forces come into play, and the demand is there for software that actually works, and is fast and reliable.

And so people who can build software that’s fast and reliable, often with LLM assistance, used responsibly, benefit from that.

Prompt engineering and evals

54:08

For me, prompt engineering is about figuring out things like—for a SQL query—we need to send the full schema and we need to send these three example responses.

That’s engineering. It’s complicated.

The hardest part of prompt engineering is evaluating. Figuring out, of these two prompts, which one is better?

I still don’t have a great way of doing that myself.

The people who are doing the most sophisticated development on top of LLMs are all about evals. They’ve got really sophisticated ways of evaluating their prompts.

Letting skills atrophy

1:26:12

We talked about the risk of learned helplessness, and letting our skills atrophy by outsourting so much of our work to LLMs.

The other day I reported a bug against GitHub Actions complaining that the windows-latest version of Python couldn’t load SQLite extensions.

Then after I’d filed the bug, I realized that I’d got Claude to write my test code and it had hallucinated the wrong SQLite code for loading an extension!

I had to close that bug and say, no, sorry, this was my fault.

That was a bit embarrassing. I should know better than most people that you have to check everything these things do, and it had caught me out. Python and SQLite are my bread and butter. I really should have caught that one!

But my counter to this is that I feel like my overall capabilities are expanding so quickly. I can get so much more stuff done that I’m willing to pay with a little bit of my soul.

I’m willing to accept a little bit of atrophying in some of my abilities in exchange for, honestly, a two to five X productivity boost on the time that I spend typing code into a computer.

That’s like 10% of my job, so it’s not like I’m two to five times more productive overall. But it’s still a material improvement.

It’s making me more ambitious. I’m writing software I would never have even dared to write before. So I think that’s worth the risk.

Imitation intelligence

1:53:35

I feel like artificial intelligence has all of these science fiction ideas around it. People will get into heated debates about whether this is artificial intelligence at all.

I’ve been thinking about it in terms of imitation intelligence, because everything these models do is effectively imitating something that they saw in their training data.

And that actually really helps you form a mental model of what they can do and why they’re useful. It means that you can think, “Okay, if the training data has shown it how to do this thing, it can probably help me with this thing.”

If you want to cure cancer, the training data doesn’t know how to cure cancer. It’s not gonna come up with a novel cure for cancer just out of nothing.

The weird intern

I’ve used the weird intern analogy a few times before. Here’s the version Ronak and Guang extracted as the trailer for our episode:

1:18:00

I call it my weird intern. I’ll say to my wife, Natalie, sometimes, “Hey, so I got my weird intern to do this.” And that works, right?

It’s a good mental model for these things as well, because it’s like having an intern who has read all of the documentation and memorized the documentation for every programming language, and is a wild conspiracy theorist, and sometimes comes up with absurd ideas, and they’re massively overconfident.

It’s the intern that always believes that they’re right. But it’s an intern who you can, I hate to say it, you can kind of bully them.

You can be like, “Do it again, do that again.” “No, that’s wrong.” And you don’t have to feel guilty about it, which is great!

Or one of my favorite prompts is you just say, “Do better.” And it works. It’s the craziest thing. It’ll write some code, you say, “Do better.” And it goes, “Oh, I’m sorry, I should...”

And then it will churn out better code, which is so stupid that that’s how this technology works. But it’s kind of fun.