We need to tell people ChatGPT will lie to them, not debate linguistics

7th April 2023

ChatGPT lies to people. This is a serious bug that has so far resisted all attempts at a fix. We need to prioritize helping people understand this, not debating the most precise terminology to use to describe it.

We accidentally invented computers that can lie to us

I tweeted (and tooted) this:

We accidentally invented computers that can lie to us and we can’t figure out how to make them stop
- Simon Willison (@simonw) April 5, 2023

Mainly I was trying to be pithy and amusing, but this thought was inspired by reading Sam Bowman’s excellent review of the field, Eight Things to Know about Large Language Models. In particular this:

More capable models can better recognize the specific circumstances under which they are trained. Because of this, they are more likely to learn to act as expected in precisely those circumstances while behaving competently but unexpectedly in others. This can surface in the form of problems that Perez et al. (2022) call sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs, and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated.

Sycophancy and sandbagging are my two favourite new pieces of AI terminology!

What I find fascinating about this is that these extremely problematic behaviours are not the system working as intended: they are bugs! And we haven’t yet found a reliable way to fix them.

(Here’s the paper that snippet references: Discovering Language Model Behaviors with Model-Written Evaluations from December 2022.)

“But a machine can’t deliberately tell a lie”

I got quite a few replies complaining that it’s inappropriate to refer to LLMs as “lying”, because to do so anthropomorphizes them and implies a level of intent which isn’t possible.

I completely agree that anthropomorphism is bad: these models are fancy matrix arithmetic, not entities with intent and opinions.

But in this case, I think the visceral clarity of being able to say “ChatGPT will lie to you” is a worthwhile trade.

Science fiction has been presenting us with a model of “artificial intelligence” for decades. It’s firmly baked into our culture that an “AI” is an all-knowing computer, incapable of lying and able to answer any question with pin-point accuracy.

Large language models like ChatGPT, on first encounter, seem to fit that bill. They appear astonishingly capable, and their command of human language can make them seem like a genuine intelligence, at least at first glance.

But the more time you spend with them, the more that illusion starts to fall apart.

They fail spectacularly when prompted with logic puzzles, or basic arithmetic, or when asked to produce citations or link to sources for the information they present.

Most concerningly, they hallucinate or confabulate: they make things up! My favourite example of this remains their ability to entirely imagine the content of a URL. I still see this catching people out every day. It’s remarkably convincing.

Why ChatGPT and Bing Chat are so good at making things up is an excellent in-depth exploration of this issue from Benj Edwards at Ars Technica.

We need to explain this in straight-forward terms

We’re trying to solve two problems here:

ChatGPT cannot be trusted to provide factual information. It has a very real risk of making things up, and if people don’t understand it they are guaranteed to be mislead.
Systems like ChatGPT are not sentient, or even intelligent systems. They do not have opinions, or feelings, or a sense of self. We must resist the temptation to anthropomorphize them.

I believe that the most direct form of harm caused by LLMs today is the way they mislead their users. The first problem needs to take precedence.

It is vitally important that new users understand that these tools cannot be trusted to provide factual answers. We need to help people get there as quickly as possible.

Which of these two messages do you think is more effective?

ChatGPT will lie to you

ChatGPT doesn’t lie, lying is too human and implies intent. It hallucinates. Actually no, hallucination still implies human-like thought. It confabulates. That’s a term used in psychiatry to describe when someone replaces a gap in one’s memory by a falsification that one believes to be true—though of course these things don’t have human minds so even confabulation is unnecessarily anthropomorphic. I hope you’ve enjoyed this linguistic detour!

Let’s go with the first one. We should be shouting this message from the rooftops: ChatGPT will lie to you.

That doesn’t mean it’s not useful—it can be astonishingly useful, for all kinds of purposes... but seeking truthful, factual answers is very much not one of them. And everyone needs to understand that.

Convincing people that these aren’t a sentient AI out of a science fiction story can come later. Once people understand their flaws this should be an easier argument to make!

Should we warn people off or help them on?

This situation raises an ethical conundrum: if these tools can’t be trusted, and people are demonstrably falling for their traps, should we encourage people not to use them at all, or even campaign to have them banned?

Every day I personally find new problems that I can solve more effectively with the help of large language models. Some recent examples from just the last few weeks:

Each of these represents a problem I could have solved without ChatGPT... but at a time cost that would have been prohibitively expensive, to the point that I wouldn’t have bothered.

I wrote more about this in AI-enhanced development makes me more ambitious with my projects.

Honestly, at this point using ChatGPT in the way that I do feels like a massively unfair competitive advantage. I’m not worried about AI taking people’s jobs: I’m worried about the impact of AI-enhanced developers like myself.

It genuinely feels unethical for me not to help other people learn to use these tools as effectively as possible. I want everyone to be able to do what I can do with them, as safely and responsibly as possible.

I think the message we should be emphasizing is this:

These are incredibly powerful tools. They are far harder to use effectively than they first appear. Invest the effort, but approach with caution: we accidentally invented computers that can lie to us and we can’t figure out how to make them stop.

There’s a time for linguistics, and there’s a time for grabbing the general public by the shoulders and shouting “It lies! The computer lies to you! Don’t trust anything it says!”

Posted 7th April 2023 at 4:34 pm · Follow me on Mastodon, Bluesky, Twitter or subscribe to my newsletter

Simon Willison’s Weblog