Simon Willison’s Weblog


34 items tagged “journalism”


It is in the public good to have AI produce quality and credible (if ‘hallucinations’ can be overcome) output. It is in the public good that there be the creation of original quality, credible, and artistic content. It is not in the public good if quality, credible content is excluded from AI training and output OR if quality, credible content is not created.

Jeff Jarvis

# 21st June 2024, 2:04 am / ethics, journalism, ai, generative-ai

What does the public in six countries think of generative AI in news? (via) Fascinating survey by the Reuters Institute for the Study of Journalism at Oxford that asked ~12,000 people across six countries for their opinions on AI usage in journalism.

It’s also being interpreted as evidence that few members of the general public actually use these tools, because the opening survey questions ask about personal usage.

I don’t think the numbers support that narrative, personally. For survey participants in the USA 7% used ChatGPT daily and 11% used it weekly, which is higher than I would expect for those frequencies. For the UK those were 2% daily and 7% weekly.

The 18-24 group were the heaviest users of these tools. Lots of other interesting figures to explore.

# 30th May 2024, 7:38 am / journalism, ai, generative-ai, chatgpt, llms

AI for Data Journalism: demonstrating what we can do with this stuff right now

Visit AI for Data Journalism: demonstrating what we can do with this stuff right now

I gave a talk last month at the Story Discovery at Scale data journalism conference hosted at Stanford by Big Local News. My brief was to go deep into the things we can use Large Language Models for right now, illustrated by a flurry of demos to help provide starting points for further conversations at the conference.

[... 6081 words]

On the zombie edition of the Washington Independent I discovered, the piece I had published more than ten years before was attributed to someone else. Someone unlikely to have ever existed, and whose byline graced an article it had absolutely never written.

[...], which I’m using to distinguish it from its namesake, offers recently published, article-like content that does not appear to me to have been produced by human beings. But, if you dig through its news archive, you can find work human beings definitely did produce. I know this because I was one of them.

Spencer Ackerman

# 7th March 2024, 2:59 am / ethics, journalism, ai


Simon Willison (Part Two): How Datasette Helps With Investigative Reporting. The second part of my Newsroom Robots podcast conversation with Nikita Roy. This episode includes my best audio answer yet to the “what is Datasette?” question, plus notes on how to use LLMs in journalism despite their propensity to make things up.

# 5th December 2023, 8:27 pm / journalism, podcasts, datasette

Deciphering clues in a news article to understand how it was reported

Written journalism is full of conventions that hint at the underlying reporting process, many of which are not entirely obvious. Learning how to read and interpret these can help you get a lot more out of the news.

[... 1456 words]

Weeknotes: the Datasette Cloud API, a podcast appearance and more

Datasette Cloud now has a documented API, plus a podcast appearance, some LLM plugins work and some geospatial excitement.

[... 1243 words]

The New York Times launches “enhanced bylines,” with more information about how journalists did the reporting. I really like these: “Elian Peltier and Yagazie Emezi visited refugee sites on Chad’s Sudan border, where tens of thousands of people have found refuge since a war started in Sudan last month.” I’m a fan of anything that helps people better appreciate the details of how quality reporting is produced.

# 19th May 2023, 4:16 am / journalism, new-york-times

Other tech-friendly journalists I know have been going through something similar: Suddenly, we’ve got something like a jetpack to strap to our work. Sure, the jetpack is kinda buggy. Yes, sometimes it crashes and burns. And the rules for its use aren’t clear, so you’ve got to be super careful with it. But sometimes it soars, shrinking tasks that would have taken hours down to mere minutes, sometimes minutes to seconds.

Farhad Manjoo

# 21st April 2023, 8:41 pm / journalism, ai, generative-ai, chatgpt

Microsoft declined further comment about Bing’s behavior Thursday, but Bing itself agreed to comment — saying “it’s unfair and inaccurate to portray me as an insulting chatbot” and asking that the AP not “cherry-pick the negative examples or sensationalize the issues.”

Matt O'Brien, Associated Press

# 19th February 2023, 9:25 pm / bing, journalism


Stanford School Enrollment Project (via) This is Project Pelican: I’ve been working with the Big Local News team at Stanford helping bundle up and release the data they’ve been collecting on school enrollment statistics around the USA. This Datasette instance has data from 33 states for every year since 2015—3.3m rows total. Be sure to check out the accompanying documentation!

# 8th August 2021, 12:23 am / journalism, datasette

M1RACLES: M1ssing Register Access Controls Leak EL0 State. You need to read (or at least scan) all the way to the bottom: this security disclosure is a masterpiece. It not only describes a real flaw in the M1 silicon but also deconstructs the whole culture of over-hyped name-branded vulnerability reports. The TLDR is that you don’t really need to worry about this one, and if you’re writing this kind if thing up for a news article you should read all the way to the end first!

# 26th May 2021, 3:25 pm / journalism, security


I’ve often joked with other internet culture reporters about what I call the “normie tipping point.” In every emerging internet trend, there is a point at which “normies” — people who don’t spend all day online, and whose brains aren’t rotted by internet garbage — start calling, texting and emailing us to ask what’s going on. Why are kids eating Tide Pods? What is the Momo Challenge? Who is Logan Paul, and why did he film himself with a dead body?

The normie tipping point is a joke, but it speaks to one of the thorniest questions in modern journalism, specifically on this beat: When does the benefit of informing people about an emerging piece of misinformation outweigh the possible harms?

Kevin Roose

# 5th October 2020, 3:40 pm / journalism

You always get the name of the dog, the editor explained. The dog is a character in your story, and names tell readers a lot about your characters. It’s a crucial storytelling detail, and if you’re alert and inquisitive enough to ask for the name of the dog, you’ll surely not miss any other important details.

Justin Willett

# 22nd July 2020, 2:29 pm / journalism

What do you call the parts of a story? Or: why can’t journalists spell “lead”? (via) Carl M. Johnson’s analysis of what journalists call different elements of a story, useful for data modeling a CMS for a news organization.

# 3rd January 2020, 1:13 am / cms, journalism


Guide To Using Reverse Image Search For Investigations (via) Detailed guide from Bellingcat’s Aric Toler on using reverse image search for investigative reporting. Surprisingly Google Image Search isn’t the state of the art: Russian search engine Yandex offers a much more powerful solution, mainly because it’s the largest public-facing image search engine to integrate scary levels of face recognition.

# 30th December 2019, 10:23 pm / journalism, search, bellingcat

My JSK Fellowship: Building an open source ecosystem of tools for data journalism

I started a new chapter of my career last week: I began a year long fellowship with the John S. Knight Journalism Fellowships program at Stanford.

[... 876 words]

JSK Journalism Fellowships names Class of 2019-2020 (and I’m in it!) (via) In personal news... I’ve been accepted for a ten month journalism fellowship at Stanford (starting September)! My work there will involve “Improving the impact of investigative stories by expanding the open-source ecosystem of tools that allows journalists to share the underlying data”.

# 1st May 2019, 4:43 pm / journalism, personal, stanford, datasette, jsk


If I tweeted a throwaway comment in appreciation for McDonald’s apple pies and some other randos on Twitter happened to also tweet similar thoughts over the last few months, it doesn’t mean by extrapolation that ‘Millennials Can’t Get Enough Of McDonald’s Apple Pies’.  The Twitter search box is not a polling agency and Twitter doesn’t include everybody’s thoughts on everything. Just some people’s thoughts on some things.

Nick Walker

# 28th January 2018, 4:18 pm / journalism, twitter


The Story Behind the Chicago Newspaper That Bought a Bar (via) Absolutely fascinating story—the Chicago Sun-Times bought a bar back in 1976 to investigate corrupt city inspectors, staffing it with journalists and with photographers hidden in a back room.

# 3rd November 2017, 3:27 pm / journalism


Journalism Warning Labels. These are absolutely fantastic. “I’ve been putting them on copies of the free papers that I find on the London Underground. You might want to as well.”

# 14th August 2010, 11:16 am / journalism, tom-scott, recovered

If journalism is the first draft of history, live blogging is the first draft of journalism.

Andrew Sparrow

# 10th May 2010, 4:28 pm / blogging, journalism, recovered, andrew-sparrow, liveblogging

Live blogging the general election. The Guardian’s ongoing live blogs covering the UK election have been the best way of following events that I’ve seen (yes, better than Twitter). Live-blog author Andrew Sparrow explains his approach.

# 10th May 2010, 4:27 pm / blogging, guardian, journalism, recovered, election, andrew-sparrow, liveblogging


Most journalists have grown up with a fortress mindset. They have lived and worked in proud institutions with thick walls. Their daily knightly task has been simple: to battle journalists from other fortresses. But the fortresses are crumbling and courtly jousts with fellow journalists are no longer impressing the crowds.

Peter Horrocks

# 20th July 2009, 5:20 pm / bbc, journalism, newspapers, peter-horrocks

#DataJourn part 1: a new conversation. report on the first instance of a Guardian story that was driven by an external developer’s work with data originally released on our Datablog.

# 9th April 2009, 10:57 am / datablog, datastore, guardian, journalism, openplatform

A few notes on the Guardian Open Platform

This morning we launched the Guardian Open Platform at a well attended event in our new offices in Kings Place. This is one of the main projects I’ve been helping out with since joining the Guardian last year, and it’s fantastic to finally have it out in the open.

[... 839 words]

Learning to Think Like A Programmer. Outstanding advice aimed mainly at journalists, but important to anyone who collects information for a living and might want it to be automatically processed at some point in the future.

# 22nd January 2009, 6:06 pm / data-journalism, journalism, programming, tom-armitage


Google apps for your newsroom. How the LJ World team use online tools like Google Spreadsheet, Swivel, ManyEyes and Google MyMaps to collaborate with the newsroom and build data-heavy applications even faster.

# 7th January 2008, 9:24 pm / collaboration, django, google, google-calendar, google-docs, google-maps, journalism, ljworld, manyeyes, matt-croydon, mymaps, news, newsroom

2007 Fantastic new site that indexes UK news stories by the person who wrote them. Being able to track a journalist’s output like this makes it much easier to figure out their personal biases over time.

# 11th October 2007, 4:04 pm / journalism, journalist, news