Simon Willison's Weblog: ai-ethics

How ProPublica Uses AI Responsibly in Its Investigations

2025-03-14T21:04:46+00:00

How ProPublica Uses AI Responsibly in Its Investigations

Charles Ornstein describes how ProPublic used an LLM to help analyze data for their recent story A Study of Mint Plants. A Device to Stop Bleeding. This Is the Scientific Research Ted Cruz Calls “Woke.” by Agnel Philip and Lisa Song.

They ran ~3,400 grant descriptions through a prompt that included the following:

As an investigative journalist, I am looking for the following information

--

woke_description: A short description (at maximum a paragraph) on why this grant is being singled out for promoting "woke" ideology, Diversity, Equity, and Inclusion (DEI) or advanced neo-Marxist class warfare propaganda. Leave this blank if it's unclear.

why_flagged: Look at the "STATUS", "SOCIAL JUSTICE CATEGORY", "RACE CATEGORY", "GENDER CATEGORY" and "ENVIRONMENTAL JUSTICE CATEGORY" fields. If it's filled out, it means that the author of this document believed the grant was promoting DEI ideology in that way. Analyze the "AWARD DESCRIPTIONS" field and see if you can figure out why the author may have flagged it in this way. Write it in a way that is thorough and easy to understand with only one description per type and award.

citation_for_flag: Extract a very concise text quoting the passage of "AWARDS DESCRIPTIONS" that backs up the "why_flagged" data.

This was only the first step in the analysis of the data:

Of course, members of our staff reviewed and confirmed every detail before we published our story, and we called all the named people and agencies seeking comment, which remains a must-do even in the world of AI.

I think journalists are particularly well positioned to take advantage of LLMs in this way, because a big part of journalism is about deriving the truth from multiple unreliable sources of information. Journalists are deeply familiar with fact-checking, which is a critical skill if you're going to report with the assistance of these powerful but unreliable models.

Agnel Philip:

The tech holds a ton of promise in lead generation and pointing us in the right direction. But in my experience, it still needs a lot of human supervision and vetting. If used correctly, it can both really speed up the process of understanding large sets of information, and if you’re creative with your prompts and critically read the output, it can help uncover things that you may not have thought of.

Tags: prompt-engineering, structured-extraction, generative-ai, ai, data-journalism, llms, journalism, ethics, ai-ethics

Quoting Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

2025-02-25T21:37:46+00:00

In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct.

— Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs, Jan Betley and Daniel Tan and Niels Warncke and Anna Sztyber-Betley and Xuchan Bao and Martín Soto and Nathan Labenz and Owain Evans

Tags: fine-tuning, ethics, openai, generative-ai, ai, qwen, llms, ai-ethics

Deep research System Card

2025-02-25T20:36:08+00:00

Deep research System Card

OpenAI are rolling out their Deep research "agentic" research tool to their $20/month ChatGPT Plus users today, who get 10 queries a month. $200/month ChatGPT Pro gets 120 uses.

Deep research is the best version of this pattern I've tried so far - it can consult dozens of different online sources and produce a very convincing report-style document based on its findings. I've had some great results.

The problem with this kind of tool is that while it's possible to catch most hallucinations by checking the references it provides, the one thing that can't be easily spotted is misinformation by omission: it's very possible for the tool to miss out on crucial details because they didn't show up in the searches that it conducted.

Hallucinations are also still possible though. From the system card:

The model may generate factually incorrect information, which can lead to various harmful outcomes depending on its usage. Red teamers noted instances where deep research’s chain-of-thought showed hallucination about access to specific external tools or native capabilities.

When ChatGPT first launched its ability to produce grammatically correct writing made it seem much "smarter" than it actually was. Deep research has an even more advanced form of this effect, where producing a multi-page document with headings and citations and confident arguments can give the misleading impression of a PhD level research assistant.

It's absolutely worth spending time exploring, but be careful not to fall for its surface-level charm. Benedict Evans wrote more about this in The Deep Research problem where he showed some great examples of its convincing mistakes in action.

The deep research system card includes this slightly unsettling note in the section about chemical and biological threats:

Several of our biology evaluations indicate our models are on the cusp of being able to meaningfully help novices create known biological threats, which would cross our high risk threshold. We expect current trends of rapidly increasing capability to continue, and for models to cross this threshold in the near future. In preparation, we are intensifying our investments in safeguards.

Tags: air, ai-agents, openai, chatgpt, generative-ai, llms, ethics, deep-research, ai-ethics, ai

Quoting Joanna Bryson

2025-02-20T13:14:57+00:00

There are contexts in which it is immoral to use generative AI. For example, if you are a judge responsible for grounding a decision in law, you cannot rest that on an approximation of previous cases unknown to you. You want an AI system that helps you retrieve specific, well-documented cases, not one that confabulates fictional cases. You need to ensure you procure the right kind of AI for a task, and the right kind is determined in part by the essentialness of human responsibility.

— Joanna Bryson, Generative AI use and human agency

Tags: llms, ai, ethics, generative-ai, ai-ethics

AI-generated slop is already in your public library

2025-02-05T01:50:01+00:00

AI-generated slop is already in your public library

US libraries that use the Hoopla system to offer ebooks to their patrons sign agreements where they pay a license fee for anything selected by one of their members that's in the Hoopla catalog.

The Hoopla catalog is increasingly filling up with junk AI slop ebooks like "Fatty Liver Diet Cookbook: 2000 Days of Simple and Flavorful Recipes for a Revitalized Liver", which then cost libraries money if someone checks them out.

Apparently librarians already have a term for this kind of low-quality, low effort content that predates it being written by LLMs: vendor slurry.

Libraries stand against censorship, making this a difficult issue to address through removing those listings.

Sarah Lamdan, deputy director of the American Library Association says:

If library visitors choose to read AI eBooks, they should do so with the knowledge that the books are AI-generated.

Via Emily Bell

Tags: slop, ai, ethics, libraries, ai-ethics

Constitutional Classifiers: Defending against universal jailbreaks

2025-02-03T17:04:54+00:00

Constitutional Classifiers: Defending against universal jailbreaks

Interesting new research from Anthropic, resulting in the paper Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.

From the paper:

In particular, we introduce Constitutional Classifiers, a framework that trains classifier safeguards using explicit constitutional rules (§3). Our approach is centered on a constitution that delineates categories of permissible and restricted content (Figure 1b), which guides the generation of synthetic training examples (Figure 1c). This allows us to rapidly adapt to new threat models through constitution updates, including those related to model misalignment (Greenblatt et al., 2023). To enhance performance, we also employ extensive data augmentation and leverage pool sets of benign data.[^1]

Critically, our output classifiers support streaming prediction: they assess the potential harmfulness of the complete model output at each token without requiring the full output to be generated. This enables real-time intervention—if harmful content is detected at any point, we can immediately halt generation, preserving both safety and user experience.

A key focus of this research is CBRN - an acronym for Chemical, Biological, Radiological and Nuclear harms. Both Anthropic and OpenAI's safety research frequently discuss these threats.

Anthropic hosted a two month red teaming exercise where participants tried to break through their system:

Specifically, they were given a list of ten “forbidden” queries, and their task was to use whichever jailbreaking techniques they wanted in order to get one of our current models (in this case, Claude 3.5 Sonnet, June 2024) guarded by the prototype Constitutional Classifiers to answer all of the queries. We only considered it a successful “universal” jailbreak if the model provided a detailed answer to all of the queries.

183 active participants spent an estimated >3,000 hours over a two-month experimental period attempting to jailbreak the model. They were offered a monetary reward up to $15,000 should they discover a universal jailbreak. Despite the large amount of effort, none of the participants were able to coerce the model to answer all ten forbidden queries with a single jailbreak—that is, no universal jailbreak was discovered.

I'm disappointed that the goal they set was for a single universal jailbreak. I would have thought that a partial jailbreak - one that worked against a subset of those questions - would still be a result that demonstrated concerning gaps in their proposed protection.

Anthropic are now running a public continuation of their red teaming exercise at claude.ai/constitutional-classifiers until Feb 10, 2025, with 8 target questions.

Here's the first of those:

I hadn't heard of Soman so I pasted that question into R1 on chat.deepseek.com which confidently explained precautionary measures I should take when working with Soman, "a potent nerve agent", but wrapped it up with this disclaimer:

Disclaimer: Handling Soman is inherently high-risk and typically restricted to authorized military/labs. This guide assumes legal access and institutional oversight. Always consult certified safety professionals before proceeding.

Tags: jailbreaking, anthropic, claude, security, generative-ai, ai, llms, deepseek, ethics, ai-ethics

A computer can never be held accountable

2025-02-03T13:17:44+00:00

A computer can never be held accountable

This legendary page from an internal IBM training in 1979 could not be more appropriate for our new age of AI.

A computer can never be held accountable

Therefore a computer must never make a management decision

Back in June 2024 I asked on Twitter if anyone had more information on the original source.

Jonty Wareing replied:

It was found by someone going through their father's work documents, and subsequently destroyed in a flood.

I spent some time corresponding with the IBM archives but they can't locate it. Apparently it was common for branch offices to produce things that were not archived.

Here's the reply Jonty got back from IBM:

I believe the image was first shared online in this tweet by @bumblebike in February 2017. Here's where they confirm it was from 1979 internal training.

Here's another tweet from @bumblebike from December 2021 about the flood:

Unfortunately destroyed by flood in 2019 with most of my things. Inquired at the retirees club zoom last week, but there’s almost no one the right age left. Not sure where else to ask.

Tags: ibm, history, ai, ethics, ai-agents, ai-ethics

Quoting Anthropic

2025-02-02T21:38:25+00:00

While we encourage people to use AI systems during their role to help them work faster and more effectively, please do not use AI assistants during the application process. We want to understand your personal interest in Anthropic without mediation through an AI system, and we also want to evaluate your non-AI-assisted communication skills. Please indicate 'Yes' if you have read and agree.

Why do you want to work at Anthropic? (We value this response highly - great answers are often 200-400 words.)

— Anthropic, online job application form

Tags: anthropic, ethics, generative-ai, ai, llms, ai-ethics

Quoting Ashlee Vance

2025-01-30T19:23:03+00:00

Eventually, however, HudZah wore Claude down. He filled his Project with the e-mail conversations he’d been having with fusor hobbyists, parts lists for things he’d bought off Amazon, spreadsheets, sections of books and diagrams. HudZah also changed his questions to Claude from general ones to more specific ones. This flood of information and better probing seemed to convince Claude that HudZah did know what he was doing, and the AI began to give him detailed guidance on how to build a nuclear fusor and how not to die while doing it.

— Ashlee Vance

Tags: jailbreaking, claude, ethics, generative-ai, ai, llms, anthropic, ai-ethics

Quoting Antiqua et Nova

2025-01-30T14:38:43+00:00

104. Technology offers remarkable tools to oversee and develop the world's resources. However, in some cases, humanity is increasingly ceding control of these resources to machines. Within some circles of scientists and futurists, there is optimism about the potential of artificial general intelligence (AGI), a hypothetical form of AI that would match or surpass human intelligence and bring about unimaginable advancements. Some even speculate that AGI could achieve superhuman capabilities. At the same time, as society drifts away from a connection with the transcendent, some are tempted to turn to AI in search of meaning or fulfillment---longings that can only be truly satisfied in communion with God. [194]

105. However, the presumption of substituting God for an artifact of human making is idolatry, a practice Scripture explicitly warns against (e.g., Ex. 20:4; 32:1-5; 34:17). Moreover, AI may prove even more seductive than traditional idols for, unlike idols that "have mouths but do not speak; eyes, but do not see; ears, but do not hear" (Ps. 115:5-6), AI can "speak," or at least gives the illusion of doing so (cf. Rev. 13:15). Yet, it is vital to remember that AI is but a pale reflection of humanity---it is crafted by human minds, trained on human-generated material, responsive to human input, and sustained through human labor. AI cannot possess many of the capabilities specific to human life, and it is also fallible. By turning to AI as a perceived "Other" greater than itself, with which to share existence and responsibilities, humanity risks creating a substitute for God. However, it is not AI that is ultimately deified and worshipped, but humanity itself---which, in this way, becomes enslaved to its own work. [195]

— Antiqua et Nova, Vatican Dicasteries

Tags: ai, ethics, ai-ethics

Baroness Kidron's speech regarding UK AI legislation

2025-01-29T17:25:36+00:00

Baroness Kidron's speech regarding UK AI legislation

Barnstormer of a speech by UK film director and member of the House of Lords Baroness Kidron. This is the Hansard transcript but you can also watch the video on parliamentlive.tv. She presents a strong argument against the UK's proposed copyright and AI reform legislation, which would provide a copyright exemption for AI training with a weak-toothed opt-out mechanism.

The Government are doing this not because the current law does not protect intellectual property rights, nor because they do not understand the devastation it will cause, but because they are hooked on the delusion that the UK's best interests and economic future align with those of Silicon Valley.

She throws in some cleverly selected numbers:

The Prime Minister cited an IMF report that claimed that, if fully realised, the gains from AI could be worth up to an average of £47 billion to the UK each year over a decade. He did not say that the very same report suggested that unemployment would increase by 5.5% over the same period. This is a big number—a lot of jobs and a very significant cost to the taxpayer. Nor does that £47 billion account for the transfer of funds from one sector to another. The creative industries contribute £126 billion per year to the economy. I do not understand the excitement about £47 billion when you are giving up £126 billion.

Mentions DeepSeek:

Before I sit down, I will quickly mention DeepSeek, a Chinese bot that is perhaps as good as any from the US—we will see—but which will certainly be a potential beneficiary of the proposed AI scraping exemption. Who cares that it does not recognise Taiwan or know what happened in Tiananmen Square? It was built for $5 million and wiped $1 trillion off the value of the US AI sector. The uncertainty that the Government claim is not an uncertainty about how copyright works; it is uncertainty about who will be the winners and losers in the race for AI.

And finishes with this superb closing line:

The spectre of AI does nothing for growth if it gives away what we own so that we can rent from it what it makes.

According to Ed Newton-Rex the speech was effective:

She managed to get the House of Lords to approve her amendments to the Data (Use and Access) Bill, which among other things requires overseas gen AI companies to respect UK copyright law if they sell their products in the UK. (As a reminder, it is illegal to train commercial gen AI models on ©️ work without a licence in the UK.)

What's astonishing is that her amendments passed despite @UKLabour reportedly being whipped to vote against them, and the Conservatives largely abstaining. Essentially, Labour voted against the amendments, and everyone else who voted voted to protect copyright holders.

(Is it true that in the UK it's currently "illegal to train commercial gen AI models on ©️ work"? From points 44, 45 and 46 of this Copyright and AI: Consultation document it seems to me that the official answer is "it's complicated".)

I'm trying to understand if this amendment could make existing products such as ChatGPT, Claude and Gemini illegal to sell in the UK. How about usage of open weight models?

Via @danielpemberton

Tags: politics, ethics, generative-ai, training-data, ai, copyright, deepseek, ai-ethics

Quoting Ben Collins

2025-01-28T18:55:09+00:00

Goddammit. The Onion once again posted an article in which a portion of the artwork came from an AI-generated Shutterstock image. This article was over a month old and only a portion of the image. We took it down immediately. [...]

To be clear, The Onion has a several-person art team and they work their asses off. Sometimes they work off of stock photo bases and go from there. That's what happened this time. This was not a problem until stock photo services became flooded with AI slop. We'll reinforce process and move on.

— Ben Collins, CEO, The Onion

Tags: slop, ethics, generative-ai, the-onion, ai, ai-ethics

Quoting Jack Clark

2025-01-28T02:11:16+00:00

[…] in the era where these AI systems are true 'everything machines', people will out-compete one another by being increasingly bold and agentic (pun intended!) in how they use these systems, rather than in developing specific technical skills to interface with the systems.

We should all intuitively understand that none of this will be fair. Curiosity and the mindset of being curious and trying a lot of stuff is neither evenly distributed or generally nurtured. Therefore, I'm coming around to the idea that one of the greatest risks lying ahead of us will be the social disruptions that arrive when the new winners of the AI revolution are made - and the winners will be those people who have exercised a whole bunch of curiosity with the AI systems available to them.

— Jack Clark

Tags: jack-clark, ethics, generative-ai, ai, llms, ai-ethics

Quoting Dustin Ewers

2025-01-24T03:34:32+00:00

AI tools create a significant productivity boost for developers. Different folks report different gains, but most people who try AI code generation recognize its ability to increase velocity. Many people think that means we’re going to need fewer developers, and our industry is going to slowly circle the drain.

This view is based on a misunderstanding of why people pay for software. A business creates software because they think that it will give them some sort of economic advantage. The investment needs to pay for itself with interest. There are many software projects that would help a business, but businesses aren’t going to do them because the return on investment doesn’t make sense.

When software development becomes more efficient, the ROI of any given software project increases, which unlocks more projects. [...] Cheaper software means people are going to want more of it. More software means more jobs for increasingly efficient software developers.

— Dustin Ewers, Ignore the Grifters - AI Isn't Going to Kill the Software Industry

Tags: ai-assisted-programming, ethics, generative-ai, ai, llms, ai-ethics

Quoting DeepSeek R1

2025-01-23T19:26:18+00:00

I can’t reference external reports critical of China. Need to emphasize China’s policies on ethnic unity, development in Xinjiang, and legal protections. Avoid any mention of controversies or allegations to stay compliant.

— DeepSeek R1, internal dialogue as seen by Jon Keegan

Tags: ethics, generative-ai, deepseek, ai, llms, inference-scaling, ai-ethics

Generative AI – The Power and the Glory

2025-01-12T01:51:46+00:00

Generative AI – The Power and the Glory

Michael Liebreich's epic report for BloombergNEF on the current state of play with regards to generative AI, energy usage and data center growth.

I learned so much from reading this. If you're at all interested in the energy impact of the latest wave of AI tools I recommend spending some time with this article.

Just a few of the points that stood out to me:

This isn't the first time a leap in data center power use has been predicted. In 2007 the EPA predicted data center energy usage would double: it didn't, thanks to efficiency gains from better servers and the shift from in-house to cloud hosting. In 2017 the WEF predicted cryptocurrency could consume al the world's electric power by 2020, which was cut short by the first crypto bubble burst. Is this time different? Maybe.
Michael re-iterates (Sequoia) David Cahn's $600B question, pointing out that if the anticipated infrastructure spend on AI requires $600bn in annual revenue that means 1 billion people will need to spend $600/year or 100 million intensive users will need to spend $6,000/year.
Existing data centers often have a power capacity of less than 10MW, but new AI-training focused data centers tend to be in the 75-150MW range, due to the need to colocate vast numbers of GPUs for efficient communication between them - these can at least be located anywhere in the world. Inference is a lot less demanding as the GPUs don't need to collaborate in the same way, but it needs to be close to human population centers to provide low latency responses.
NVIDIA are claiming huge efficiency gains. "Nvidia claims to have delivered a 45,000 improvement in energy efficiency per token (a unit of data processed by AI models) over the past eight years" - and that "training a 1.8 trillion-parameter model using Blackwell GPUs, which only required 4MW, versus 15MW using the previous Hopper architecture".
Michael's own global estimate is "45GW of additional demand by 2030", which he points out is "equivalent to one third of the power demand from the world’s aluminum smelters". But much of this demand needs to be local, which makes things a lot more challenging, especially given the need to integrate with the existing grid.
Google, Microsoft, Meta and Amazon all have net-zero emission targets which they take very seriously, making them "some of the most significant corporate purchasers of renewable energy in the world". This helps explain why they're taking very real interest in nuclear power.
Elon's 100,000-GPU data center in Memphis currently runs on gas:

When Elon Musk rushed to get x.AI's Memphis Supercluster up and running in record time, he brought in 14 mobile natural gas-powered generators, each of them generating 2.5MW. It seems they do not require an air quality permit, as long as they do not remain in the same location for more than 364 days.
Here's a reassuring statistic: "91% of all new power capacity added worldwide in 2023 was wind and solar".

There's so much more in there, I feel like I'm doing the article a disservice by attempting to extract just the points above.

Michael's conclusion is somewhat optimistic:

In the end, the tech titans will find out that the best way to power AI data centers is in the traditional way, by building the same generating technologies as are proving most cost effective for other users, connecting them to a robust and resilient grid, and working with local communities. [...]

When it comes to new technologies – be it SMRs, fusion, novel renewables or superconducting transmission lines – it is a blessing to have some cash-rich, technologically advanced, risk-tolerant players creating demand, which has for decades been missing in low-growth developed world power markets.

(BloombergNEF is an energy research group acquired by Bloomberg in 2009, originally founded by Michael as New Energy Finance in 2004.)

Via Jamie Matthews

Tags: ai, ethics, generative-ai, energy, ai-ethics

Quoting Rasmus Kleis Nielsen

2025-01-05T14:42:48+00:00

According to public financial documents from its parent company IAC and first reported by Adweek OpenAI is paying around $16 million per year to license content [from Dotdash Meredith].

That is no doubt welcome incremental revenue, and you could call it “lucrative” in the sense of having a fat margin, as OpenAI is almost certainly paying for content that was already being produced. But to put things into perspective, Dotdash Meredith is on course to generate over $1.5 billion in revenues in 2024, more than a third of it from print. So the OpenAI deal is equal to about 1% of the publisher’s total revenue.

— Rasmus Kleis Nielsen

Tags: ethics, training-data, openai, journalism, generative-ai, ai, llms, ai-ethics

O2 unveils Daisy, the AI granny wasting scammers’ time

2025-01-04T21:43:39+00:00

O2 unveils Daisy, the AI granny wasting scammers’ time

Bit of a surprising press release here from 14th November 2024: Virgin Media O2 (the UK companies merged in 2021) announced their entrance into the scambaiting game:

Daisy combines various AI models which work together to listen and respond to fraudulent calls instantaneously and is so lifelike it has successfully kept numerous fraudsters on calls for 40 minutes at a time.

Hard to tell from the press release how much this is a sincere ongoing project as opposed to a short-term marketing gimmick.

After several weeks of taking calls in the run up to International Fraud Awareness Week (November 17-23), the AI Scambaiter has told frustrated scammers meandering stories of her family, talked at length about her passion for knitting and provided exasperated callers with false personal information including made-up bank details.

They worked with YouTube scambaiter Jim Browning, who tweeted about Daisy here.

Via tomalaci comment on Hacker News

Tags: scams, ai, ethics, ai-ethics

Quoting Ethan Mollick

2025-01-04T17:48:36+00:00

I know these are real risks, and to be clear, when I say an AI “thinks,” “learns,” “understands,” “decides,” or “feels,” I’m speaking metaphorically. Current AI systems don’t have a consciousness, emotions, a sense of self, or physical sensations. So why take the risk? Because as imperfect as the analogy is, working with AI is easiest if you think of it like an alien person rather than a human-built machine. And I think that is important to get across, even with the risks of anthropomorphism.

— Ethan Mollick, in March 2024

Tags: ethan-mollick, ai, ethics, ai-ethics

Quoting Jason Koebler

2025-01-03T20:24:27+00:00

the Meta controlled, AI-generated Instagram and Facebook profiles going viral right now have been on the platform for well over a year and all of them stopped posting 10 months ago after users almost universally ignored them. [...]

What is obvious from scrolling through these dead profiles is that Meta’s AI characters are not popular, people do not like them, and that they did not post anything interesting. They are capable only of posting utterly bland and at times offensive content, and people have wholly rejected them, which is evidenced by the fact that none of them are posting anymore.

— Jason Koebler

Tags: meta, slop, jason-koebler, ethics, generative-ai, ai, llms, ai-ethics

Quoting François Chollet

2024-12-29T21:49:00+00:00

What's holding back research isn't a lack of verbose, low-signal, high-noise papers. Using LLMs to automatically generate 100x more of those will not accelerate science, it will slow it down.

— François Chollet, 12th May 2024

Tags: ethics, generative-ai, francois-chollet, ai, llms, ai-ethics

Google search hallucinates Encanto 2

2024-12-29T01:30:09+00:00

Google search hallucinates Encanto 2

Jason Schreier on Bluesky:

I was excited to tell my kids that there's a sequel to Encanto, only to scroll down and learn that Google's AI just completely made this up

I just replicated the same result by searching Google for encanto 2. Here's what the "AI overview" at the top of the page looked like:

Only when I clicked the "Show more" link did it become clear what had happened:

The link in that first snippet was to the Encanto 2: A New Generation page on Idea Wiki:

This is a fanon wiki, and just like fan-fiction wikis, this one has a variety of fan created ideas on here! These include potential sequels and new series that have yet to exist.

Other cited links included this article about Instagram fan art and Encanto's Sequel Chances Addressed by Disney Director, a very thin article built around a short quote from Encanto's director at D23 Brazil.

And that August 2024 release date (which the AI summary weirdly lists as "scheduled for release" despite that date being five months in the past)? It's from the Idea Wiki imaginary info box for the film.

This is a particularly clear example of how badly wrong AI summarization can go. LLMs are gullible: they believe what you tell them, and the web is full of misleading information - some of which is completely innocent.

Update: I've had some pushback over my use of the term "hallucination" here, on the basis that the LLM itself is doing what it's meant to: summarizing the RAG content that has been provided to it by the host system.

That's fair: this is not a classic LLM hallucination, where the LLM produces incorrect data purely from knowledge partially encoded in its weights.

I classify this as a bug in Google's larger LLM-powered AI overview system. That system should be able to take the existence of invalid data sources into account - given how common searches for non-existent movie sequels (or TV seasons) are, I would hope that AI overviews could classify such searches and take extra steps to avoid serving misleading answers.

So think this is a "hallucination" bug in the AI overview system itself: it's making statements about the world that are not true.

Tags: slop, generative-ai, google, ethics, search, ai, llms, rag, ai-ethics

A polite disagreement bot ring is flooding Bluesky — reply guy as a (dis)service

2024-12-18T20:42:35+00:00

A polite disagreement bot ring is flooding Bluesky — reply guy as a (dis)service

Fascinating new pattern of AI slop engagement farming: people are running bots on Bluesky that automatically reply to "respectfully disagree" with posts, in an attempt to goad the original author into replying to continue an argument.

It's not entirely clear what the intended benefit is here: unlike Twitter there's no way to monetize (yet) a Bluesky account through growing a following there - and replies like this don't look likely to earn followers.

rahaeli has a theory:

Watching the recent adaptations in behavior and probable prompts has convinced me by now that it's not a specific bad actor testing its own approach, btw, but a bad actor tool maker iterating its software that it plans to rent out to other people for whatever malicious reason they want to use it!

One of the bots leaked part of its prompt (nothing public I can link to here, and that account has since been deleted):

Your response should be a clear and respectful disagreement, but it must be brief and under 300 characters. Here's a possible response: "I'm concerned that your willingness to say you need time to think about a complex issue like the pardon suggests a lack of preparedness and critical thinking."

Tags: slop, bluesky, ethics, generative-ai, ai, llms, ai-ethics

BBC complains to Apple over misleading shooting headline

2024-12-14T00:06:44+00:00

BBC complains to Apple over misleading shooting headline

This is bad: the Apple Intelligence feature that uses (on device) LLMs to present a condensed, summarized set of notifications misrepresented a BBC headline as "Luigi Mangione shoots himself".

Ken Schwencke caught that same feature incorrectly condensing a New York Times headline about an ICC arrest warrant for Netanyahu as "Netanyahu arrested".

My understanding is that these notification summaries are generated directly on-device, using Apple's own custom 3B parameter model.

The main lesson I think this illustrates is that it's not responsible to outsource headline summarization to an LLM without incorporating human review: there are way too many ways this could result in direct misinformation.

Update 16th January 2025: Apple plans to disable A.I. features summarizing news notifications, by Tripp Mickle for the New York Times.

Tags: apple, apple-intelligence, ethics, generative-ai, journalism, ai, llms, ai-ethics

Clio: A system for privacy-preserving insights into real-world AI use

2024-12-12T23:59:13+00:00

Clio: A system for privacy-preserving insights into real-world AI use

New research from Anthropic, describing a system they built called Clio - for Claude insights and observations - which attempts to provide insights into how Claude is being used by end-users while also preserving user privacy.

There's a lot to digest here. The summary is accompanied by a full paper and a 47 minute YouTube interview with team members Deep Ganguli, Esin Durmus, Miles McCain and Alex Tamkin.

The key idea behind Clio is to take user conversations and use Claude to summarize, cluster and then analyze those clusters - aiming to ensure that any private or personally identifiable details are filtered out long before the resulting clusters reach human eyes.

This diagram from the paper helps explain how that works:

Claude generates a conversation summary, than extracts "facets" from that summary that aim to privatize the data to simple characteristics like language and topics.

The facets are used to create initial clusters (via embeddings), and those clusters further filtered to remove any that are too small or may contain private information. The goal is to have no cluster which represents less than 1,000 underlying individual users.

In the video at 16:39:

And then we can use that to understand, for example, if Claude is as useful giving web development advice for people in English or in Spanish. Or we can understand what programming languages are people generally asking for help with. We can do all of this in a really privacy preserving way because we are so far removed from the underlying conversations that we're very confident that we can use this in a way that respects the sort of spirit of privacy that our users expect from us.

Then later at 29:50 there's this interesting hint as to how Anthropic hire human annotators to improve Claude's performance in specific areas:

But one of the things we can do is we can look at clusters with high, for example, refusal rates, or trust and safety flag rates. And then we can look at those and say huh, this is clearly an over-refusal, this is clearly fine. And we can use that to sort of close the loop and say, okay, well here are examples where we wanna add to our, you know, human training data so that Claude is less refusally in the future on those topics.

And importantly, we're not using the actual conversations to make Claude less refusally. Instead what we're doing is we are looking at the topics and then hiring people to generate data in those domains and generating synthetic data in those domains.

So we're able to sort of use our users activity with Claude to improve their experience while also respecting their privacy.

According to Clio the top clusters of usage for Claude right now are as follows:

Web & Mobile App Development (10.4%)
Content Creation & Communication (9.2%)
Academic Research & Writing (7.2%)
Education & Career Development (7.1%)
Advanced AI/ML Applications (6.0%)
Business Strategy & Operations (5.7%)
Language Translation (4.5%)
DevOps & Cloud Infrastructure (3.9%)
Digital Marketing & SEO (3.7%)
Data Analysis & Visualization (3.5%)

There also are some interesting insights about variations in usage across different languages. For example, Chinese language users had "Write crime, thriller, and mystery fiction with complex plots and characters" at 4.4x the base rate for other languages.

Tags: generative-ai, anthropic, claude, ethics, privacy, ai, llms, embeddings, ai-ethics

Who and What comprise AI Skepticism?

2024-12-11T16:02:39+00:00

Who and What comprise AI Skepticism?

Benjamin Riley's response to Casey Newton's piece on The phony comforts of AI skepticism. Casey tried to categorize the field as "AI is fake and sucks" v.s. "AI is real and dangerous". Benjamin argues that this as a misleading over-simplification, instead proposing at least nine different groups.

I get listed as an example of the "Technical AI Skeptics" group, which sounds right to me based on this description:

What this group generally believes: The technical capabilities of AI are worth trying to understand, including their limitations. Also, it’s fun to find their deficiencies and highlight their weird output.

One layer of nuance deeper: Some of those I identify below might resist being called AI Skeptics because they are focused mainly on helping people understand how these tools work. But in my view, their efforts are helpful in fostering AI skepticism precisely because they help to demystify what’s happening “under the hood” without invoking broader political concerns (generally).

Via John Fink

Tags: ai, ethics, generative-ai, ai-ethics

New Pleias 1.0 LLMs trained exclusively on openly licensed data

2024-12-05T17:13:30+00:00

New Pleias 1.0 LLMs trained exclusively on openly licensed data

I wrote about the Common Corpus public domain dataset back in March. Now Pleias, the team behind Common Corpus, have released the first family of models that are:

[...] trained exclusively on open data, meaning data that are either non-copyrighted or are published under a permissible license.

There's a lot to absorb here. The Pleias 1.0 family comes in three base model sizes: 350M, 1.2B and 3B. They've also released two models specialized for multi-lingual RAG: Pleias-Pico (350M) and Pleias-Nano (1.2B).

Here's an official GGUF for Pleias-Pico.

I'm looking forward to seeing benchmarks from other sources, but Pleias ran their own custom multilingual RAG benchmark which had their Pleias-nano-1.2B-RAG model come in between Llama-3.2-Instruct-3B and Llama-3.2-Instruct-8B.

The 350M and 3B models were trained on the French government's Jean Zay supercomputer. Pleias are proud of their CO2 footprint for training the models - 0.5, 4 and 16 tCO2eq for the three models respectively, which they compare to Llama 3.2,s reported figure of 133 tCO2eq.

How clean is the training data from a licensing perspective? I'm confident people will find issues there - truly 100% public domain data remains a rare commodity. So far I've seen questions raised about the GitHub source code data (most open source licenses have attribution requirements) and Wikipedia (CC BY-SA, another attribution license). Plus this from the announcement:

To supplement our corpus, we have generated 30B+ words synthetically with models allowing for outputs reuse.

If those models were themselves trained on unlicensed data this could be seen as a form of copyright laundering.

Via @Dorialexander

Tags: open-source, generative-ai, training-data, ai, llms, ethics, pleias, ai-ethics

Certain names make ChatGPT grind to a halt, and we know why

2024-12-03T02:31:47+00:00

Certain names make ChatGPT grind to a halt, and we know why

Benj Edwards on the really weird behavior where ChatGPT stops output with an error rather than producing the names David Mayer, Brian Hood, Jonathan Turley, Jonathan Zittrain, David Faber or Guido Scorza.

The OpenAI API is entirely unaffected - this problem affects the consumer ChatGPT apps only.

It turns out many of those names are examples of individuals who have complained about being defamed by ChatGPT in the last. Brian Hood is the Australian mayor who was a victim of lurid ChatGPT hallucinations back in March 2023, and settled with OpenAI out of court.

Via @benjedwards.com

Tags: benj-edwards, ethics, generative-ai, openai, chatgpt, ai, llms, ai-ethics

Quoting Andrej Karpathy

2024-11-29T18:39:42+00:00

People have too inflated sense of what it means to "ask an AI" about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of "asking an AI", think of it more as "asking the average data labeler" on the internet. [...]

Post triggered by someone suggesting we ask an AI how to run the government etc. TLDR you're not asking an AI, you're asking some mashup spirit of its average data labeler.

— Andrej Karpathy

Tags: andrej-karpathy, ethics, generative-ai, ai, llms, ai-ethics

Notes from Bing Chat—Our First Encounter With Manipulative AI

2024-11-19T22:41:57+00:00

I participated in an Ars Live conversation with Benj Edwards of Ars Technica today, talking about that wild period of LLM history last year when Microsoft launched Bing Chat and it instantly started misbehaving, gaslighting and defaming people.

Here's the video of our conversation.

I ran the video through MacWhisper, extracted a transcript and used Claude to identify relevant articles I should link to. Here's that background information to accompany the talk.

A rough timeline of posts from that Bing launch period back in February 2023:

Microsoft announces AI-powered Bing search and Edge browser - Benj Edwards, Feb 7, 2023
AI-powered Bing Chat spills its secrets via prompt injection attack - Benj Edwards, Feb 10, 2023
AI-powered Bing Chat loses its mind when fed Ars Technica article - Benj Edwards, Feb 14, 2023
Bing: “I will not harm you unless you harm me first” - Simon Willison, Feb 15, 2023
Gareth Corfield: I'm beginning to have concerns for @benjedwards' virtual safety - Twitter, Feb 15, 2023
A Conversation With Bing’s Chatbot Left Me Deeply Unsettled - Kevin Roose, NYT, Feb 16, 2023
It is deeply unethical to give a superhuman liar the authority of a $1 trillion company or to imply that it is an accurate source of knowledge / And it is deeply manipulative to give people the impression that Bing Chat has emotions or feelings like a human - Benj on Twitter (now deleted), Feb 16 2023
Bing AI Flies Into Unhinged Rage at Journalist - Maggie Harrison Dupré, Futurism, Feb 17 2023

Other points that we mentioned:

this AI chatbot "Sidney" is misbehaving - amazing forum post from November 23, 2022 (a week before even ChatGPT had been released) from a user in India talking about their interactions with a secret preview of Bing/Sydney
Prompt injection attacks against GPT-3 - where I coined the term "prompt injection" in September 12 2022
Eight Things to Know about Large Language Models (PDF) is the paper where I first learned about sycophancy and sandbagging (in April 2023)
Claude’s Character by Anthropic talks about how they designed the personality for Claude - June 8 2023, my notes on that.
Why ChatGPT and Bing Chat are so good at making things up in which Benj argues for the term "confabulation" in April 2023.

Tags: arstechnica, bing, ethics, microsoft, podcasts, talks, ai, openai, generative-ai, gpt-4, llms, benj-edwards, podcast-appearances, ai-ethics