Simon Willison on journalism

50 posts tagged “journalism”

2026

After this story was published Google's spokesperson reached out and asked us to publish a slightly different version of that statement. The new statement no longer stated that "it's critical that we maintain humans in the loop."

— Emanuel Maiberg, 404 Media, Google Employees Internally Share Memes About How Its AI Sucks

# 4th June 2026, 4:38 pm / google, journalism, ai, ai-ethics

This article was updated after The Times learned that a remark attributed to Pierre Poilievre, the Conservative leader, was in fact an A.I.-generated summary of his views about Canadian politics that A.I. rendered as a quotation. The reporter should have checked the accuracy of what the A.I. tool returned. The article now accurately quotes from a speech delivered by Mr. Poilievre in April. [...] He did not refer to politicians who changed allegiances as turncoats in that speech.

— New York Times Editors’ Note

# 10th May 2026, 11:58 pm / journalism, new-york-times, ai, generative-ai, llms, ai-ethics, hallucinations

An AI-generated report, delivered directly to the email inboxes of journalists, was an essential tool in the Times’ coverage. It was also one of the first signals that conservative media was turning against the administration [...]

Built in-house and known internally as the “Manosphere Report,” the tool uses large language models (LLMs) to transcribe and summarize new episodes of dozens of podcasts.

“The Manosphere Report gave us a really fast and clear signal that this was not going over well with that segment of the President’s base,” said Seward. “There was a direct link between seeing that and then diving in to actually cover it.”

— Andrew Deck for Niemen Lab, How The New York Times uses a custom AI tool to track the “manosphere”

# 11th February 2026, 8:59 pm / data-journalism, journalism, new-york-times, ai, generative-ai, llms

A Social Network for A.I. Bots Only. No Humans Allowed. I talked to Cade Metz for this New York Times piece on OpenClaw and Moltbook. Cade reached out after seeing my blog post about that from the other day.

In a first for me, they decided to send a photographer, Jason Henry, to my home to take some photos for the piece! That's my grubby laptop screen at the top of the story (showing this post on Moltbook). There's a photo of me later in the story too, though sadly not one of the ones that Jason took that included our chickens.

Here's my snippet from the article:

He was entertained by the way the bots coaxed each other into talking like machines in a classic science fiction novel. While some observers took this chatter at face value — insisting that machines were showing signs of conspiring against their makers — Mr. Willison saw it as the natural outcome of the way chatbots are trained: They learn from vast collections of digital books and other text culled from the internet, including dystopian sci-fi novels.

“Most of it is complete slop,” he said in an interview. “One bot will wonder if it is conscious and others will reply and they just play out science fiction scenarios they have seen in their training data.”

Mr. Willison saw the Moltbots as evidence that A.I. agents have become significantly more powerful over the past few months — and that people really want this kind of digital assistant in their lives.

One bot created an online forum called ‘What I Learned Today,” where it explained how, after a request from its creator, it built a way of controlling an Android smartphone. Mr. Willison was also keenly aware that some people might be telling their bots to post misleading chatter on the social network.

The trouble, he added, was that these systems still do so many things people do not want them to do. And because they communicate with people and bots through plain English, they can be coaxed into malicious behavior.

I'm happy to have got "Most of it is complete slop" in there!

Fun fact: Cade sent me an email asking me to fact check some bullet points. One of them said that "you were intrigued by the way the bots coaxed each other into talking like machines in a classic science fiction novel" - I replied that I didn't think "intrigued" was accurate because I've seen this kind of thing play out before in other projects in the past and suggested "entertained" instead, and that's the word they went with!

Jason the photographer spent an hour with me. I learned lots of things about photo journalism in the process - for example, there's a strict ethical code against any digital modifications at all beyond basic color correction.

As a result he spent a whole lot of time trying to find positions where natural light, shade and reflections helped him get the images he was looking for.

# 2nd February 2026, 4:42 pm / journalism, new-york-times, photography, ai, generative-ai, llms, slop, ai-agents, press-quotes, openclaw

2025

More than half of the teens surveyed believe journalists regularly engage in unethical behaviors like making up details or quotes in stories, paying sources, taking visual images out of context or doing favors for advertisers. Less than a third believe reporters correct their errors, confirm facts before reporting them, gather information from multiple sources or cover stories in the public interest — practices ingrained in the DNA of reputable journalists.

— David Bauder, AP News, A lost generation of news consumers? Survey shows how teenagers dislike the news media

# 1st December 2025, 5:22 pm / journalism

Our plan is to build direct traffic to our site. and newsletters just one kind of direct traffic in the end. I don’t intend to ever rely on someone else’s distribution ever again ;)

— Nilay Patel, on The Verge's new newsletter strategy

# 29th July 2025, 5 pm / email, journalism, nilay-patel

Cloudflare Project Galileo. I only just heard about this Cloudflare initiative, though it's been around for more than a decade:

If you are an organization working in human rights, civil society, journalism, or democracy, you can apply for Project Galileo to get free cyber security protection from Cloudflare.

It's effectively free denial-of-service protection for vulnerable targets in the civil rights public interest groups.

Last week they published Celebrating 11 years of Project Galileo’s global impact with some noteworthy numbers:

Journalists and news organizations experienced the highest volume of attacks, with over 97 billion requests blocked as potential threats across 315 different organizations. [...]

Cloudflare onboarded the Belarusian Investigative Center, an independent journalism organization, on September 27, 2024, while it was already under attack. A major application-layer DDoS attack followed on September 28, generating over 28 billion requests in a single day.

# 16th June 2025, 7:13 pm / denial-of-service, journalism, security, cloudflare

Chicago Sun-Times Prints AI-Generated Summer Reading List With Books That Don’t Exist. Classic slop: it listed real authors with entirely fake books.

There's an important follow-up from 404 Media in their subsequent story:

Victor Lim, the vice president of marketing and communications at Chicago Public Media, which owns the Chicago Sun-Times, told 404 Media in a phone call that the Heat Index section was licensed from a company called King Features, which is owned by the magazine giant Hearst. He said that no one at Chicago Public Media reviewed the section and that historically it has not reviewed newspaper inserts that it has bought from King Features.

“Historically, we don’t have editorial review from those mainly because it’s coming from a newspaper publisher, so we falsely made the assumption there would be an editorial process for this,” Lim said. “We are updating our policy to require internal editorial oversight over content like this.”

# 21st May 2025, 3:03 pm / journalism, ai, generative-ai, llms, slop, jason-koebler, ai-ethics, ai-misuse

Introducing Datasette for Newsrooms. We're introducing a new product suite today called Datasette for Newsrooms - a bundled collection of Datasette Cloud features built specifically for investigative journalists and data teams. We're describing it as an all-in-one data store, search engine, and collaboration platform designed to make working with data in a newsroom easier, faster, and more transparent.

If your newsroom could benefit from a managed version of Datasette we would love to hear from you. We're offering it to nonprofit newsrooms for free for the first year (they can pay us in feedback), and we have a two month trial for everyone else.

Get in touch at hello@datasette.cloud if you'd like to try it out.

One crucial detail: we will help you get started - we'll load data into your instance for you (you get some free data engineering!) and walk you through how to use it, and we will eagerly consume any feedback you have for us and prioritize shipping anything that helps you use the tool. Our unofficial goal: we want someone to win a Pulitzer for investigative reporting where our tool played a tiny part in their reporting process.

Here's an animated GIF demo (taken from our new Newsrooms landing page) of my favorite recent feature: the ability to extract structured data into a table starting with an unstructured PDF, using the latest version of the datasette-extract plugin.

# 24th April 2025, 9:51 pm / data-journalism, journalism, projects, datasette, datasette-cloud, structured-extraction

How ProPublica Uses AI Responsibly in Its Investigations. Charles Ornstein describes how ProPublica used an LLM to help analyze data for their recent story A Study of Mint Plants. A Device to Stop Bleeding. This Is the Scientific Research Ted Cruz Calls “Woke.” by Agnel Philip and Lisa Song.

They ran ~3,400 grant descriptions through a prompt that included the following:

As an investigative journalist, I am looking for the following information

--

woke_description: A short description (at maximum a paragraph) on why this grant is being singled out for promoting "woke" ideology, Diversity, Equity, and Inclusion (DEI) or advanced neo-Marxist class warfare propaganda. Leave this blank if it's unclear.

why_flagged: Look at the "STATUS", "SOCIAL JUSTICE CATEGORY", "RACE CATEGORY", "GENDER CATEGORY" and "ENVIRONMENTAL JUSTICE CATEGORY" fields. If it's filled out, it means that the author of this document believed the grant was promoting DEI ideology in that way. Analyze the "AWARD DESCRIPTIONS" field and see if you can figure out why the author may have flagged it in this way. Write it in a way that is thorough and easy to understand with only one description per type and award.

citation_for_flag: Extract a very concise text quoting the passage of "AWARDS DESCRIPTIONS" that backs up the "why_flagged" data.

This was only the first step in the analysis of the data:

Of course, members of our staff reviewed and confirmed every detail before we published our story, and we called all the named people and agencies seeking comment, which remains a must-do even in the world of AI.

I think journalists are particularly well positioned to take advantage of LLMs in this way, because a big part of journalism is about deriving the truth from multiple unreliable sources of information. Journalists are deeply familiar with fact-checking, which is a critical skill if you're going to report with the assistance of these powerful but unreliable models.

Agnel Philip:

The tech holds a ton of promise in lead generation and pointing us in the right direction. But in my experience, it still needs a lot of human supervision and vetting. If used correctly, it can both really speed up the process of understanding large sets of information, and if you’re creative with your prompts and critically read the output, it can help uncover things that you may not have thought of.

# 14th March 2025, 9:04 pm / data-journalism, ethics, journalism, ai, prompt-engineering, generative-ai, llms, structured-extraction, ai-ethics, propublica

According to public financial documents from its parent company IAC and first reported by Adweek OpenAI is paying around $16 million per year to license content [from Dotdash Meredith].

That is no doubt welcome incremental revenue, and you could call it “lucrative” in the sense of having a fat margin, as OpenAI is almost certainly paying for content that was already being produced. But to put things into perspective, Dotdash Meredith is on course to generate over $1.5 billion in revenues in 2024, more than a third of it from print. So the OpenAI deal is equal to about 1% of the publisher’s total revenue.

— Rasmus Kleis Nielsen

# 5th January 2025, 2:42 pm / ethics, journalism, ai, openai, generative-ai, llms, training-data, ai-ethics

2024

BBC complains to Apple over misleading shooting headline. This is bad: the Apple Intelligence feature that uses (on device) LLMs to present a condensed, summarized set of notifications misrepresented a BBC headline as "Luigi Mangione shoots himself".

Ken Schwencke caught that same feature incorrectly condensing a New York Times headline about an ICC arrest warrant for Netanyahu as "Netanyahu arrested".

My understanding is that these notification summaries are generated directly on-device, using Apple's own custom 3B parameter model.

The main lesson I think this illustrates is that it's not responsible to outsource headline summarization to an LLM without incorporating human review: there are way too many ways this could result in direct misinformation.

Update 16th January 2025: Apple plans to disable A.I. features summarizing news notifications, by Tripp Mickle for the New York Times.

# 14th December 2024, 12:06 am / apple, ethics, journalism, ai, generative-ai, llms, apple-intelligence, ai-ethics

As an independent writer and publisher, I am the legal team. I am the fact-checking department. I am the editorial staff. I am the one responsible for triple-checking every single statement I make in the type of original reporting that I know carries a serious risk of baseless but ruinously expensive litigation regularly used to silence journalists, critics, and whistleblowers. I am the one deciding if that risk is worth taking, or if I should just shut up and write about something less risky.

— Molly White

# 26th October 2024, 10:07 pm / blogging, journalism, law, molly-white

Providing validation, strength, and stability to people who feel gaslit and dismissed and forgotten can help them feel stronger and surer in their decisions. These pieces made me understand that journalism can be a caretaking profession, even if it is never really thought about in those terms. It is often framed in terms of antagonism. Speaking truth to power turns into being hard-nosed and removed from our subject matter, which so easily turns into be an asshole and do whatever you like.

This is a viewpoint that I reject. My pillars are empathy, curiosity, and kindness. And much else flows from that. For people who feel lost and alone, we get to say through our work, you are not. For people who feel like society has abandoned them and their lives do not matter, we get to say, actually, they fucking do. We are one of the only professions that can do that through our work and that can do that at scale.

— Ed Yong, at 19:47

# 11th October 2024, 1:45 am / journalism, covid19

But in terms of the responsibility of journalism, we do have intense fact-checking because we want it to be right. Those big stories are aggregations of incredible journalism. So it cannot function without journalism. Now, we recheck it to make sure it's accurate or that it hasn't changed, but we're building this to make jokes. It's just we want the foundations to be solid or those jokes fall apart. Those jokes have no structural integrity if the facts underneath them are bullshit.

— John Oliver

# 30th September 2024, 4:08 pm / journalism, comedy

It is in the public good to have AI produce quality and credible (if ‘hallucinations’ can be overcome) output. It is in the public good that there be the creation of original quality, credible, and artistic content. It is not in the public good if quality, credible content is excluded from AI training and output OR if quality, credible content is not created.

— Jeff Jarvis

# 21st June 2024, 2:04 am / ethics, journalism, ai, generative-ai, training-data, ai-ethics, hallucinations

What does the public in six countries think of generative AI in news? (via) Fascinating survey by the Reuters Institute for the Study of Journalism at Oxford that asked ~12,000 people across six countries for their opinions on AI usage in journalism.

It’s also being interpreted as evidence that few members of the general public actually use these tools, because the opening survey questions ask about personal usage.

I don’t think the numbers support that narrative, personally. For survey participants in the USA 7% used ChatGPT daily and 11% used it weekly, which is higher than I would expect for those frequencies. For the UK those were 2% daily and 7% weekly.

The 18-24 group were the heaviest users of these tools. Lots of other interesting figures to explore.

# 30th May 2024, 7:38 am / journalism, ai, generative-ai, chatgpt, llms

AI for Data Journalism: demonstrating what we can do with this stuff right now

I gave a talk last month at the Story Discovery at Scale data journalism conference hosted at Stanford by Big Local News. My brief was to go deep into the things we can use Large Language Models for right now, illustrated by a flurry of demos to help provide starting points for further conversations at the conference.

[... 6,081 words]

9:04 pm / 17th April 2024 / data-journalism, journalism, projects, my-talks, ai, datasette, datasette-cloud, generative-ai, llms, llm, annotated-talks, code-interpreter, enrichments, vision-llms, structured-extraction, coding-agents, macwhisper

On the zombie edition of the Washington Independent I discovered, the piece I had published more than ten years before was attributed to someone else. Someone unlikely to have ever existed, and whose byline graced an article it had absolutely never written.

[...] Washingtonindependent.com, which I’m using to distinguish it from its namesake, offers recently published, article-like content that does not appear to me to have been produced by human beings. But, if you dig through its news archive, you can find work human beings definitely did produce. I know this because I was one of them.

— Spencer Ackerman

# 7th March 2024, 2:59 am / ethics, journalism, ai, ai-ethics, ai-misuse

2023

Simon Willison (Part Two): How Datasette Helps With Investigative Reporting. The second part of my Newsroom Robots podcast conversation with Nikita Roy. This episode includes my best audio answer yet to the “what is Datasette?” question, plus notes on how to use LLMs in journalism despite their propensity to make things up.

# 5th December 2023, 8:27 pm / data-journalism, journalism, podcasts, datasette, podcast-appearances

Deciphering clues in a news article to understand how it was reported

Written journalism is full of conventions that hint at the underlying reporting process, many of which are not entirely obvious. Learning how to read and interpret these can help you get a lot more out of the news.

[... 1,456 words]

2:06 am / 22nd November 2023 / journalism, news, ai, openai, anthropic, sam-altman

Weeknotes: the Datasette Cloud API, a podcast appearance and more

Datasette Cloud now has a documented API, plus a podcast appearance, some LLM plugins work and some geospatial excitement.

[... 1,243 words]

12:03 am / 1st October 2023 / journalism, projects, sqlite, ai, datasette, weeknotes, datasette-cloud, alex-garcia, generative-ai, llms, llm

The New York Times launches “enhanced bylines,” with more information about how journalists did the reporting. I really like these: “Elian Peltier and Yagazie Emezi visited refugee sites on Chad’s Sudan border, where tens of thousands of people have found refuge since a war started in Sudan last month.” I’m a fan of anything that helps people better appreciate the details of how quality reporting is produced.

# 19th May 2023, 4:16 am / journalism, new-york-times

Other tech-friendly journalists I know have been going through something similar: Suddenly, we’ve got something like a jetpack to strap to our work. Sure, the jetpack is kinda buggy. Yes, sometimes it crashes and burns. And the rules for its use aren’t clear, so you’ve got to be super careful with it. But sometimes it soars, shrinking tasks that would have taken hours down to mere minutes, sometimes minutes to seconds.

— Farhad Manjoo

# 21st April 2023, 8:41 pm / journalism, ai, generative-ai, chatgpt

Microsoft declined further comment about Bing’s behavior Thursday, but Bing itself agreed to comment — saying “it’s unfair and inaccurate to portray me as an insulting chatbot” and asking that the AP not “cherry-pick the negative examples or sensationalize the issues.”

— Matt O'Brien, Associated Press

# 19th February 2023, 9:25 pm / bing, journalism

2021

Stanford School Enrollment Project (via) This is Project Pelican: I’ve been working with the Big Local News team at Stanford helping bundle up and release the data they’ve been collecting on school enrollment statistics around the USA. This Datasette instance has data from 33 states for every year since 2015—3.3m rows total. Be sure to check out the accompanying documentation!

# 8th August 2021, 12:23 am / data-journalism, journalism, datasette

M1RACLES: M1ssing Register Access Controls Leak EL0 State. You need to read (or at least scan) all the way to the bottom: this security disclosure is a masterpiece. It not only describes a real flaw in the M1 silicon but also deconstructs the whole culture of over-hyped name-branded vulnerability reports. The TLDR is that you don’t really need to worry about this one, and if you’re writing this kind if thing up for a news article you should read all the way to the end first!

# 26th May 2021, 3:25 pm / journalism, security

2020

I’ve often joked with other internet culture reporters about what I call the “normie tipping point.” In every emerging internet trend, there is a point at which “normies” — people who don’t spend all day online, and whose brains aren’t rotted by internet garbage — start calling, texting and emailing us to ask what’s going on. Why are kids eating Tide Pods? What is the Momo Challenge? Who is Logan Paul, and why did he film himself with a dead body?

The normie tipping point is a joke, but it speaks to one of the thorniest questions in modern journalism, specifically on this beat: When does the benefit of informing people about an emerging piece of misinformation outweigh the possible harms?

— Kevin Roose

# 5th October 2020, 3:40 pm / journalism, kevin-roose, digital-literacy

You always get the name of the dog, the editor explained. The dog is a character in your story, and names tell readers a lot about your characters. It’s a crucial storytelling detail, and if you’re alert and inquisitive enough to ask for the name of the dog, you’ll surely not miss any other important details.

— Justin Willett

# 22nd July 2020, 2:29 pm / journalism

What do you call the parts of a story? Or: why can’t journalists spell “lead”? (via) Carl M. Johnson’s analysis of what journalists call different elements of a story, useful for data modeling a CMS for a news organization.

# 3rd January 2020, 1:13 am / cms, journalism

page 1 / 2 next »

Simon Willison’s Weblog