Simon Willison on new-york-times

22 posts tagged “new-york-times”

2025

OpenAI slams court order to save all ChatGPT logs, including deleted chats (via) This is very worrying. The New York Times v OpenAI lawsuit, now in its 17th month, includes accusations that OpenAI's models can output verbatim copies of New York Times content - both from training data and from implementations of RAG.

(This may help explain why Anthropic's Claude system prompts for their search tool emphatically demand Claude not spit out more than a short sentence of RAG-fetched search content.)

A few weeks ago the judge ordered OpenAI to start preserving the logs of all potentially relevant output - including supposedly temporary private chats and API outputs served to paying customers, which previously had a 30 day retention policy.

The May 13th court order itself is only two pages - here's the key paragraph:

Accordingly, OpenAI is NOW DIRECTED to preserve and segregate all output log data that would otherwise be deleted on a going forward basis until further order of the Court (in essence, the output log data that OpenAI has been destroying), whether such data might be deleted at a user’s request or because of “numerous privacy laws and regulations” that might require OpenAI to do so.

SO ORDERED.

That "numerous privacy laws and regulations" line refers to OpenAI's argument that this order runs counter to a whole host of existing worldwide privacy legislation. The judge here is stating that the potential need for future discovery in this case outweighs OpenAI's need to comply with those laws.

Unsurprisingly, I have seen plenty of bad faith arguments online about this along the lines of "Yeah, but that's what OpenAI really wanted to happen" - the fact that OpenAI are fighting this order runs counter to the common belief that they aggressively train models on all incoming user data no matter what promises they have made to those users.

I still see this as a massive competitive disadvantage for OpenAI, particularly when it comes to API usage. Paying customers of their APIs may well make the decision to switch to other providers who can offer retention policies that aren't subverted by this court order!

Update: Here's the official response from OpenAI: How we’re responding to The New York Time’s data demands in order to protect user privacy, including this from a short FAQ:

Is my data impacted?

Yes, if you have a ChatGPT Free, Plus, Pro, and Teams subscription or if you use the OpenAI API (without a Zero Data Retention agreement).

This does not impact ChatGPT Enterprise or ChatGPT Edu customers.

This does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.

To further clarify that point about ZDR:

You are not impacted. If you are a business customer that uses our Zero Data Retention (ZDR) API, we never retain the prompts you send or the answers we return. Because it is not stored, this court order doesn’t affect that data.

Here's a notable tweet about this situation from Sam Altman:

we have been thinking recently about the need for something like "AI privilege"; this really accelerates the need to have the conversation.

imo talking to an AI should be like talking to a lawyer or a doctor.

# 5th June 2025, 2:20 pm / law, new-york-times, privacy, ai, openai, generative-ai, llms, sam-altman, ai-ethics

2024

NYTimes reporters getting verified profiles on Bluesky. NYT data journalist Dylan Freedman has kicked off an initiative to get NYT accounts and reporters on Bluesky verified via vanity nytimes.com handles - Dylan is now @dylanfreedman.nytimes.com.

They're using Bluesky's support for TXT domain records. If you use Google's Dig tool to look at the TXT record for _atproto.dylanfreedman.nytimes.com you'll see this:

_atproto.dylanfreedman.nytimes.com. 500 IN TXT "did=did:plc:zeqq4z7aybrqg6go6vx6lzwt"

# 2nd December 2024, 9:24 pm / new-york-times, social-media, bluesky

You already know Donald Trump. He is unfit to lead. Watch him. Listen to those who know him best. He tried to subvert an election and remains a threat to democracy. He helped overturn Roe, with terrible consequences. Mr. Trump's corruption and lawlessness go beyond elections: It's his whole ethos. He lies without limit. If he's re-elected, the G.O.P. won't restrain him. Mr. Trump will use the government to go after opponents. He will pursue a cruel policy of mass deportations. He will wreak havoc on the poor, the middle class and employers. Another Trump term will damage the climate, shatter alliances and strengthen autocrats. Americans should demand better. Vote.

— NY Times Editorial Board

# 5th November 2024, 1:33 am / new-york-times, politics

OpenAI’s monthly revenue hit $300 million in August, up 1,700 percent since the beginning of 2023, and the company expects about $3.7 billion in annual sales this year, according to financial documents reviewed by The New York Times. [...]

The company expects ChatGPT to bring in $2.7 billion in revenue this year, up from $700 million in 2023, with $1 billion coming from other businesses using its technology.

— Mike Isaac and Erin Griffith, New York Times, Sep 27th 2024

# 23rd October 2024, 1:20 am / new-york-times, ai, openai, generative-ai, llms

OpenAI’s revenue in August more than tripled from a year ago, according to the documents, and about 350 million people — up from around 100 million in March — used its services each month as of June. […]

Roughly 10 million ChatGPT users pay the company a $20 monthly fee, according to the documents. OpenAI expects to raise that price by $2 by the end of the year, and will aggressively raise it to $44 over the next five years, the documents said.

— Mike Isaac and Erin Griffith

# 28th September 2024, 11:41 pm / new-york-times, ai, openai, chatgpt

Why The Atlantic signed a deal with OpenAI. Interesting conversation between Nilay Patel and The Atlantic CEO (and former journalist/editor) Nicholas Thompson about the relationship between media organizations and LLM companies like OpenAI.

On the impact of these deals on the ongoing New York Times lawsuit:

One of the ways that we [The Atlantic] can help the industry is by making deals and setting a market. I believe that us doing a deal with OpenAI makes it easier for us to make deals with the other large language model companies if those come about, I think it makes it easier for other journalistic companies to make deals with OpenAI and others, and I think it makes it more likely that The Times wins their lawsuit.

How could it help? Because deals like this establish a market value for training content, important for the fair use component of the legal argument.

# 12th July 2024, 2:35 pm / new-york-times, ai, openai, llms, nilay-patel, training-data

First Came ‘Spam.’ Now, With A.I., We’ve Got ‘Slop’. First the Guardian, now the NYT. I've apparently made a habit of getting quoted by journalists talking about slop!

I got the closing quote in this one:

Society needs concise ways to talk about modern A.I. — both the positives and the negatives. ‘Ignore that email, it’s spam,’ and ‘Ignore that article, it’s slop,’ are both useful lessons.

# 11th June 2024, 4:12 pm / ethics, new-york-times, ai, generative-ai, slop, ai-ethics

NYT Flash-based visualizations work again. The New York Times are using the open source Ruffle Flash emulator - built using Rust, compiled to WebAssembly - to get their old archived data visualization interactives working again.

# 21st January 2024, 5:58 am / archives, flash, new-york-times, rust, webassembly

OpenAI and journalism. Bit of a misleading title here: this is OpenAI’s first public response to the lawsuit filed by the New York Times concerning their use of unlicensed NYT content to train their models.

# 8th January 2024, 6:33 pm / copyright, new-york-times, ai, openai, generative-ai, llms

Does GPT-2 Know Your Phone Number? (via) This report from Berkeley Artificial Intelligence Research in December 2020 showed GPT-3 outputting a full page of chapter 3 of Harry Potter and the Philosopher’s Stone—similar to how the recent suit from the New York Times against OpenAI and Microsoft demonstrates memorized news articles from that publication as outputs from GPT-4.

# 8th January 2024, 5:26 am / microsoft, new-york-times, ai, gpt-3, openai, generative-ai, llms, gpt-2

2023

The New York Times launches “enhanced bylines,” with more information about how journalists did the reporting. I really like these: “Elian Peltier and Yagazie Emezi visited refugee sites on Chad’s Sudan border, where tens of thousands of people have found refuge since a war started in Sudan last month.” I’m a fan of anything that helps people better appreciate the details of how quality reporting is produced.

# 19th May 2023, 4:16 am / journalism, new-york-times

2020

nyt-2020-election-scraper. Brilliant application of git scraping by Alex Gaynor and a growing team of contributors. Takes a JSON snapshot of the NYT’s latest election poll figures every five minutes, then runs a Python script to iterate through the history and build an HTML page showing the trends, including what percentage of the remaining votes each candidate needs to win each state. This is the perfect case study in why it can be useful to take a “snapshot if the world right now” data source and turn it into a git revision history over time.

# 6th November 2020, 2:24 pm / alex-gaynor, data-journalism, elections, git, new-york-times, git-scraping

2010

Breakfast Instapaper. Handy tool for selecting and bulk-submitting stories from today’s Guardian and NYTimes to your Instapaper account, by Daniel Vydra.

# 29th April 2010, 11:49 am / daniel-vydra, guardian, instapaper, new-york-times

The making of the NYT’s Netflix graphic. A database dump from Netflix, some clever hackery in ArcView GIS, hpricot to scrape Metacritic and a lot of careful thought about the UI for navigating the data.

# 25th January 2010, 1:11 pm / arcview, design, gis, hpricot, infographics, metacritic, netflix, new-york-times, ui, usability, visualisation

2009

How Different Groups Spend Their Day. Classy interactive infographic from the New York Times.

# 10th August 2009, 3:37 pm / infographics, interactives, new-york-times, visualisation

Announcing the Article Search API. The most interesting API from the NYTimes yet—search against 2.8 million articles from 1981 until today using 35 searchable fields and get back detailed metadata as well as the first paragraph of the articles themselves.

# 5th February 2009, 11:06 pm / apis, newspapers, new-york-times, search

2008

Represent. Andrei Scheinkman and Derek Willis describe how they built the NYTimes Represent feature using GeoDjango and PostGIS.

# 29th December 2008, 10:10 pm / andrei-scheinkman, derek-willis, django, geodjango, gis, new-york-times, postgis, postgresql, python

Represent and GeoDjango. The NYTimes new Represent application is built on GeoDjango.

# 20th December 2008, 9:07 pm / derek-willis, django, geodjango, new-york-times, python, represent

Represent—NYTimes.com. Superb new application from the NYTimes—a sort of cross between TheyWorkForYou and a news archive search. Enter your address in New York and it tells you your local representatives and shows both their votes and their mentions in the newspaper.

# 19th December 2008, 4:22 pm / new-york, new-york-times, represent

Announcing the New York Times Campaign Finance API (via) The New York Times have released their first data API, exposing campaign finance data from the Federal Election Commission.

# 15th October 2008, 2:05 pm / api, campaignfinance, new-york-times

Popular Websites Vulnerable to Cross-Site Request Forgery Attacks. Ed Felten and Bill Zeller announce four CSRF holes, in ING Direct, YouTube, MetaFilter and the New York Times. The ING Direct hole allowed transfer of funds out of a user’s bank accounts! The first three were fixed before publication; the New York Times hole still exists (despite being reported a year ago), and allows you to silently steal e-mail addresses by CSRFing the “E-mail this” feature.

# 29th September 2008, 1:08 pm / bill-zeller, csrf, ed-felten, ingdirect, metafilter, new-york-times, security, youtube

2007

Times to Stop Charging for Parts of Its Web Site. The New York Times finally acknowledges that you can’t be the “paper of record” if no one can link to you.

# 18th September 2007, 8:40 am / journalism, news, new-york-times

Simon Willison’s Weblog