Simon Willison’s Weblog

Subscribe

Quotations

Filters: Type: quotation × Sorted by date


I’ve worked out why I don’t get much value out of LLMs. The hardest and most time-consuming parts of my job involve distinguishing between ideas that are correct, and ideas that are plausible-sounding but wrong. Current AI is great at the latter type of ideas, and I don’t need more of those.

Martin Kleppmann # 27th April 2024, 7:31 pm

It’s very fast to build something that’s 90% of a solution. The problem is that the last 10% of building something is usually the hard part which really matters, and with a black box at the center of the product, it feels much more difficult to me to nail that remaining 10%. With vibecheck, most of the time the results to my queries are great; some percentage of the time they aren’t. Closing that gap with gen AI feels much more fickle to me than a normal engineering problem. It could be that I’m unfamiliar with it, but I also wonder if some classes of generative AI based products are just doomed to mediocrity as a result.

Moxie Marlinspike # 26th April 2024, 9:40 pm

If you’re auditioning for your job every day, and you’re auditioning against every other brilliant employee there, and you know that at the end of the year, 6% of you are going to get cut no matter what, and at the same time, you have access to unrivaled data on partners, sellers, and competitors, you might be tempted to look at that data to get an edge and keep your job and get to your restricted stock units.

Dana Mattioli # 26th April 2024, 5:43 pm

The only difference between screwing around and science is writing it down

Alex Jason, via Adam Savage # 25th April 2024, 2:17 pm

I’ve been at OpenAI for almost a year now. In that time, I’ve trained a lot of generative models. [...] It’s becoming awfully clear to me that these models are truly approximating their datasets to an incredible degree. [...] What this manifests as is – trained on the same dataset for long enough, pretty much every model with enough weights and training time converges to the same point. [...] This is a surprising observation! It implies that model behavior is not determined by architecture, hyperparameters, or optimizer choices. It’s determined by your dataset, nothing else. Everything else is a means to an end in efficiently delivery compute to approximating that dataset.

James Betker # 25th April 2024, 5:13 am

When I said “Send a text message to Julian Chokkattu,” who’s a friend and fellow AI Pin reviewer over at Wired, I thought I’d be asked what I wanted to tell him. Instead, the device simply said OK and told me it sent the words “Hey Julian, just checking in. How’s your day going?” to Chokkattu. I’ve never said anything like that to him in our years of friendship, but I guess technically the AI Pin did do what I asked.

Cherlynn Low # 24th April 2024, 3:07 pm

A bad survey won’t tell you it’s bad. It’s actually really hard to find out that a bad survey is bad — or to tell whether you have written a good or bad set of questions. Bad code will have bugs. A bad interface design will fail a usability test. It’s possible to tell whether you are having a bad user interview right away. Feedback from a bad survey can only come in the form of a second source of information contradicting your analysis of the survey results.

Most seductively, surveys yield responses that are easy to count and counting things feels so certain and objective and truthful.

Even if you are counting lies.

Erika Hall # 24th April 2024, 12:31 am

We [Bluesky] took a somewhat novel approach of giving every user their own SQLite database. By removing the Postgres dependency, we made it possible to run a ‘PDS in a box’ [Personal Data Server] without having to worry about managing a database. We didn’t have to worry about things like replicas or failover. For those thinking this is irresponsible: don’t worry, we are backing up all the data on our PDSs!

SQLite worked really well because the PDS – in its ideal form – is a single-tenant system. We owned up to that by having these single tenant SQLite databases.

Daniel Holmgren # 23rd April 2024, 7 pm

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone.

Phi-3 Technical Report # 23rd April 2024, 3 am

The blog post announcing the shutdown was done one day early. The idea was to take the opportunity of the new Pope being announced and Andy Rubin being replaced as head of Android, so that the [Google] Reader news may be drowned out. PR didn’t apparently realize that the kinds of people that care about the other two events (especially the Pope) are not the same kind of people that care about Reader, so it didn’t work.

Mihai Parparita # 20th April 2024, 9:55 pm

I have a child who is also 2e and has been part of the NYC G&T program. We’ve had a positive experience with the citywide program, specifically with the program at The Anderson School.

Meta AI bot, answering a question on a forum # 18th April 2024, 3:34 am

In mid-March, we added this line to our system prompt to prevent Claude from thinking it can open URLs:

“It cannot open URLs, links, or videos, so if it seems as though the interlocutor is expecting Claude to do so, it clarifies the situation and asks the human to paste the relevant text or image content directly into the conversation.”

Alex Albert (Anthropic) # 18th April 2024, 12:22 am

But the reality is that you can’t build a hundred-billion-dollar industry around a technology that’s kind of useful, mostly in mundane ways, and that boasts perhaps small increases in productivity if and only if the people who use it fully understand its limitations.

Molly White # 17th April 2024, 7:53 pm

The saddest part about it, though, is that the garbage books don’t actually make that much money either. It’s even possible to lose money generating your low-quality ebook to sell on Kindle for $0.99. The way people make money these days is by teaching students the process of making a garbage ebook. It’s grift and garbage all the way down — and the people who ultimately lose out are the readers and writers who love books.

Constance Grady # 16th April 2024, 11:31 pm

Permissions have three moving parts, who wants to do it, what do they want to do, and on what object. Any good permission system has to be able to efficiently answer any permutation of those variables. Given this person and this object, what can they do? Given this object and this action, who can do it? Given this person and this action, which objects can they act upon?

wkirby on Hacker News # 16th April 2024, 7:49 pm

[On complaints about Claude 3 reduction in quality since launch] The model is stored in a static file and loaded, continuously, across 10s of thousands of identical servers each of which serve each instance of the Claude model. The model file never changes and is immutable once loaded; every shard is loading the same model file running exactly the same software. We haven’t changed the temperature either. We don’t see anywhere where drift could happen. The files are exactly the same as at launch and loaded each time from a frozen pristine copy.

Jason D. Clinton, Anthropic # 15th April 2024, 1:27 am

The language issues are indicative of the bigger problem facing the AI Pin, ChatGPT, and frankly, every other AI product out there: you can’t see how it works, so it’s impossible to figure out how to use it. [...] our phones are constant feedback machines — colored buttons telling us what to tap, instant activity every time we touch or pinch or scroll. You can see your options and what happens when you pick one. With AI, you don’t get any of that. Using the AI Pin feels like wishing on a star: you just close your eyes and hope for the best. Most of the time, nothing happens.

David Pierce # 12th April 2024, 12:39 pm

[on GitHub Copilot] It’s like insisting to walk when you can take a bike. It gets the hard things wrong but all the easy things right, very helpful and much faster. You have to learn what it can and can’t do.

Andrej Karpathy # 11th April 2024, 1:27 am

The challenge [with RAG] is that most corner-cutting solutions look like they’re working on small datasets while letting you pretend that things like search relevance don’t matter, while in reality relevance significantly impacts quality of responses when you move beyond prototyping (whether they’re literally search relevance or are better tuned SQL queries to retrieve more appropriate rows). This creates a false expectation of how the prototype will translate into a production capability, with all the predictable consequences: underestimating timelines, poor production behavior/performance, etc.

Will Larson # 10th April 2024, 11:09 pm

in July 2023, we [Hugging Face] wanted to experiment with a custom license for this specific project [text-generation-inference] in order to protect our commercial solutions from companies with bigger means than we do, who would just host an exact copy of our cloud services.

The experiment however wasn’t successful.

It did not lead to licensing-specific incremental business opportunities by itself, while it did hamper or at least complicate the community contributions, given the legal uncertainty that arises as soon as you deviate from the standard licenses.

Julien Chaumond # 8th April 2024, 6:35 pm

Before Google Reader was shut down, they were internally looking for maintainers. It turned out you have to deal with three years of infra migrations if you sign up to be the new owner of Reader. No one wanted that kind of job for a product that is not likely to grow 10x.

Jaana Dogan # 4th April 2024, 8:51 pm

LLMs are like a trained circus bear that can make you porridge in your kitchen. It’s a miracle that it’s able to do it at all, but watch out because no matter how well they can act like a human on some tasks, they’re still a wild animal. They might ransack your kitchen, and they could kill you, accidentally or intentionally!

Alex Komoroske # 2nd April 2024, 3:19 pm

No one wants to build a product on a model that makes things up. The core problem is that GenAI models are not information retrieval systems. They are synthesizing systems, with no ability to discern from the data it’s trained on unless significant guardrails are put in place.

Rumman Chowdhury # 31st March 2024, 9:20 pm

Them: Can you just quickly pull this data for me?

Me: Sure, let me just:

SELECT * FROM some_ideal_clean_and_pristine.table_that_you_think_exists

Seth Rosen # 25th March 2024, 11:33 pm

At this point, I’m confident saying that 75% of what generative-AI text and image platforms can do is useless at best and, at worst, actively harmful. Which means that if AI companies want to onboard the millions of people they need as customers to fund themselves and bring about the great AI revolution, they’ll have to perpetually outrun the millions of pathetic losers hoping to use this tech to make a quick buck. Which is something crypto has never been able to do.

In fact, we may have already reached a point where AI images have become synonymous with scams and fraud.

Ryan Broderick # 21st March 2024, 9:49 pm

I think most people have this naive idea of consensus meaning “everyone agrees”. That’s not what consensus means, as practiced by organizations that truly have a mature and well developed consensus driven process.

Consensus is not “everyone agrees”, but [a model where] people are more aligned with the process than they are with any particular outcome, and they’ve all agreed on how decisions will be made.

Jacob Kaplan-Moss # 21st March 2024, 12:45 am

People share a lot of sensitive material on Quora—controversial political views, workplace gossip and compensation, and negative opinions held of companies. Over many years, as they change jobs or change their views, it is important that they can delete or anonymize their previously-written answers.

We opt out of the wayback machine because inclusion would allow people to discover the identity of authors who had written sensitive answers publicly and later had made them anonymous, and because it would prevent authors from being able to remove their content from the internet if they change their mind about publishing it.

quora.com/robots.txt # 19th March 2024, 11:09 pm

It’s hard to overstate the value of LLM support when coding for fun in an unfamiliar language. [...] This example is totally trivial in hindsight, but might have taken me a couple mins to figure out otherwise. This is a bigger deal than it seems! Papercuts add up fast and prevent flow. (A lot of being a senior engineer is just being proficient enough to avoid papercuts).

Geoffrey Litt # 18th March 2024, 6:16 pm

One year since GPT-4 release. Hope you all enjoyed some time to relax; it’ll have been the slowest 12 months of AI progress for quite some time to come.

Leopold Aschenbrenner, OpenAI # 16th March 2024, 3:23 pm

The talk track I’ve been using is that LLMs are easy to take to market, but hard to keep in the market long-term. All the hard stuff comes when you move past the demo and get exposure to real users.

And that’s where you find that all the nice little things you got neatly working fall apart. And you need to prompt differently, do different retrieval, consider fine-tuning, redesign interaction, etc. People will treat this stuff differently from “normal” products, creating unique challenges.

Phillip Carter # 13th March 2024, 3:02 pm