Simon Willison on ethan-mollick

25 posts tagged “ethan-mollick”

2025

The issue with GPT-5 in a nutshell is that unless you pay for model switching & know to use GPT-5 Thinking or Pro, when you ask “GPT-5” you sometimes get the best available AI & sometimes get one of the worst AIs available and it might even switch within a single conversation.

— Ethan Mollick, highlighting that GPT-5 (high) ranks top on Artificial Analysis, GPT-5 (minimal) ranks lower than GPT-4.1

# 9th August 2025, 4:13 pm / gpt-5, ethan-mollick, generative-ai, ai, llms

Yesterday Anthropic got a bunch of buzz out of their new window.claude.complete() API which allows Claude Artifacts to run their own API calls to execute prompts.

It turns out Gemini had beaten them to that feature by over a month, but the announcement was tucked away in a bullet point of their release notes for the 20th of May:

Vibe coding apps in Canvas just got better too! With just a few prompts, you can now build fully functional personalised apps in Canvas that can use Gemini-powered features, save data between sessions and share data between multiple users.

Ethan Mollick has been building some neat demos on top of Gemini Canvas, including this text adventure starship bridge simulator.

Similar to Claude Artifacts, Gemini Canvas detects if the application uses APIs that require authentication (to run prompts, for example) and requests the user sign in with their Google account:

# 26th June 2025, 3:45 pm / vibe-coding, gemini, generative-ai, ai, llms, google, ethan-mollick

In some tasks, AI is unreliable. In others, it is superhuman. You could, of course, say the same thing about calculators, but it is also clear that AI is different. It is already demonstrating general capabilities and performing a wide range of intellectual tasks, including those that it is not specifically trained on. Does that mean that o3 and Gemini 2.5 are AGI? Given the definitional problems, I really don’t know, but I do think they can be credibly seen as a form of “Jagged AGI” - superhuman in enough areas to result in real changes to how we work and live, but also unreliable enough that human expertise is often needed to figure out where AI works and where it doesn’t.

— Ethan Mollick, On Jagged AGI

# 20th April 2025, 4:35 pm / gemini, ethan-mollick, generative-ai, o3, ai, llms

After publishing this piece, I was contacted by Anthropic who told me that Sonnet 3.7 would not be considered a 10^26 FLOP model and cost a few tens of millions of dollars to train, though future models will be much bigger.

— Ethan Mollick

# 2nd March 2025, 5:56 pm / ethan-mollick, anthropic, claude, generative-ai, ai, llms

I know these are real risks, and to be clear, when I say an AI “thinks,” “learns,” “understands,” “decides,” or “feels,” I’m speaking metaphorically. Current AI systems don’t have a consciousness, emotions, a sense of self, or physical sensations. So why take the risk? Because as imperfect as the analogy is, working with AI is easiest if you think of it like an alien person rather than a human-built machine. And I think that is important to get across, even with the risks of anthropomorphism.

— Ethan Mollick, in March 2024

# 4th January 2025, 5:48 pm / ethan-mollick, ai, ethics, ai-ethics

2024

Knowing when to use AI turns out to be a form of wisdom, not just technical knowledge. Like most wisdom, it's somewhat paradoxical: AI is often most useful where we're already expert enough to spot its mistakes, yet least helpful in the deep work that made us experts in the first place. It works best for tasks we could do ourselves but shouldn't waste time on, yet can actively harm our learning when we use it to skip necessary struggles.

— Ethan Mollick

# 10th December 2024, 5:35 am / llms, ai, ethan-mollick, generative-ai

A test of how seriously your firm is taking AI: when o-1 (& the new Gemini) came out this week, were there assigned folks who immediately ran the model through internal, validated, firm-specific benchmarks to see how useful it as? Did you update any plans or goals as a result?

Or do you not have people (including non-technical people) assigned to test the new models? No internal benchmarks? No perspective on how AI will impact your business that you keep up-to-date?

No one is going to be doing this for organizations, you need to do it yourself.

— Ethan Mollick

# 7th December 2024, 4:56 pm / ethan-mollick, evals, generative-ai, ai, llms

Often, you are told to do this by treating AI like an intern. In retrospect, however, I think that this particular analogy ends up making people use AI in very constrained ways. To put it bluntly, any recent frontier model (by which I mean Claude 3.5, ChatGPT-4o, Grok 2, Llama 3.1, or Gemini Pro 1.5) is likely much better than any intern you would hire, but also weirder.

Instead, let me propose a new analogy: treat AI like an infinitely patient new coworker who forgets everything you tell them each new conversation, one that comes highly recommended but whose actual abilities are not that clear.

— Ethan Mollick

# 24th November 2024, 10:10 pm / llms, ai, ethan-mollick, generative-ai

Students who use AI as a crutch don’t learn anything. It prevents them from thinking. Instead, using AI as co-intelligence is important because it increases your capabilities and also keeps you in the loop. […]

AI does so many things that we need to set guardrails on what we don’t want to give up. It’s a very weird, general-purpose technology, which means it will affect all kinds of things, and we’ll have to adjust socially.

— Ethan Mollick

# 6th October 2024, 3:26 pm / ethan-mollick, ai

Telling the AI to "make it better" after getting a result is just a folk method of getting an LLM to do Chain of Thought, which is why it works so well.

— Ethan Mollick

# 10th September 2024, 3:12 pm / prompt-engineering, ethan-mollick, generative-ai, ai, llms

Here Are All of the Apple Intelligence Features in the iOS 18.1 Developer Beta (via) Useful rundown from Juli Clover at MacRumors of the Apple Intelligence features that are available in the brand new iOS 18.1 beta, available to developer account holders with an iPhone 15 or ‌iPhone 15 Pro‌ Max or Apple Silicon iPad.

I've been trying this out today. It's still clearly very early, and the on-device model that powers Siri is significantly weaker than more powerful models that I've become used to over the past two years. Similar to old Siri I find myself trying to figure out the sparse, undocumented incantations that reliably work for the things I might want my voice assistant to do for me.

Ethan Mollick:

My early Siri AI experience has just underlined the fact that, while there is a lot of practical, useful things that can be done with small models, they really lack the horsepower to do anything super interesting.

# 30th July 2024, 4:22 am / apple, ai, generative-ai, llms, ethan-mollick, apple-intelligence

Among many misunderstandings, [users] expect the RAG system to work like a search engine, not as a flawed, forgetful analyst. They will not do the work that you expect them to do in order to verify documents and ground truth. They will not expect the AI to try to persuade them.

— Ethan Mollick

# 27th July 2024, 1:46 am / ethan-mollick, generative-ai, ai, rag, llms

The expansion of the jagged frontier of AI capability is subtle and requires a lot of experience with various models to understand what they can, and can’t, do. That is why I suggest that people and organizations keep an “impossibility list” - things that their experiments have shown that AI can definitely not do today but which it can almost do. For example, no AI can create a satisfying puzzle or mystery for you to solve, but they are getting closer. When AI models are updated, test them on your impossibility list to see if they can now do these impossible tasks.

— Ethan Mollick

# 4th July 2024, 10:38 pm / ethan-mollick, ai, llms

To learn to do serious stuff with AI, choose a Large Language Model and just use it to do serious stuff - get advice, summarize meetings, generate ideas, write, produce reports, fill out forms, discuss strategy - whatever you do at work, ask the AI to help. [...]

I know this may not seem particularly profound, but “always invite AI to the table” is the principle in my book that people tell me had the biggest impact on them. You won’t know what AI can (and can’t) do for you until you try to use it for everything you do.

— Ethan Mollick

# 6th June 2024, 3:03 pm / ethan-mollick, ai, llms

In every group I speak to, from business executives to scientists, including a group of very accomplished people in Silicon Valley last night, much less than 20% of the crowd has even tried a GPT-4 class model.

Less than 5% has spent the required 10 hours to know how they tick.

— Ethan Mollick

# 9th March 2024, 3:55 am / ethan-mollick, generative-ai, gpt-4, ai, llms

Google’s Gemini Advanced: Tasting Notes and Implications. Ethan Mollick reviews the new Google Gemini Advanced—a rebranded Bard, released today, that runs on the GPT-4 competitive Gemini Ultra model.

“GPT-4 [...] has been the dominant AI for well over a year, and no other model has come particularly close. Prior to Gemini, we only had one advanced AI model to look at, and it is hard drawing conclusions with a dataset of one. Now there are two, and we can learn a few things.”

I like Ethan’s use of the term “tasting notes” here. Reminds me of how Matt Webb talks about being a language model sommelier.

# 8th February 2024, 3:10 pm / google, ai, generative-ai, gpt-4, bard, llms, ethan-mollick, gemini

For many people in many organizations, their measurable output is words - words in emails, in reports, in presentations. We use words as proxy for many things: the number of words is an indicator of effort, the quality of the words is an indicator of intelligence, the degree to which the words are error-free is an indicator of care.

[...] But now every employee with Copilot can produce work that checks all the boxes of a formal report without necessarily representing underlying effort.

— Ethan Mollick

# 2nd February 2024, 3:34 am / ethan-mollick, ethics, generative-ai, ai, llms, ai-ethics

2023

When I speak in front of groups and ask them to raise their hands if they used the free version of ChatGPT, almost every hand goes up. When I ask the same group how many use GPT-4, almost no one raises their hand. I increasingly think the decision of OpenAI to make the “bad” AI free is causing people to miss why AI seems like such a huge deal to a minority of people that use advanced systems and elicits a shrug from everyone else.

— Ethan Mollick

# 10th December 2023, 8:17 pm / ethan-mollick, generative-ai, openai, gpt-4, chatgpt, ai, llms

We already know one major effect of AI on the skills distribution: AI acts as a skills leveler for a huge range of professional work. If you were in the bottom half of the skill distribution for writing, idea generation, analyses, or any of a number of other professional tasks, you will likely find that, with the help of AI, you have become quite good.

— Ethan Mollick

# 25th September 2023, 4:37 pm / llms, ai, ethan-mollick, generative-ai

Increasingly powerful AI systems are being released at an increasingly rapid pace. [...] And yet not a single AI lab seems to have provided any user documentation. Instead, the only user guides out there appear to be Twitter influencer threads. Documentation-by-rumor is a weird choice for organizations claiming to be concerned about proper use of their technologies, but here we are.

— Ethan Mollick

# 16th July 2023, 12:12 am / ethan-mollick, ai, generative-ai, ethics, ai-ethics

What AI can do with a toolbox... Getting started with Code Interpreter. Ethan Mollick has been doing some very creative explorations of ChatGPT Code Interpreter over the past few months, and has tied a lot of them together into this useful introductory tutorial.

# 12th July 2023, 8:57 pm / ai, openai, generative-ai, chatgpt, llms, ethan-mollick, code-interpreter, coding-agents

There are many reasons for companies to not turn efficiency gains into headcount or cost reduction. Companies that figure out how to use their newly productive workforce should be able to dominate those who try to keep their post-AI output the same as their pre-AI output, just with less people. And companies that commit to maintaining their workforce will likely have employees as partners, who are happy to teach others about the uses of AI at work, rather than scared workers who hide their AI for fear of being replaced.

— Ethan Mollick

# 14th May 2023, 2:17 pm / ethan-mollick, ai, ethics, ai-ethics

Blinded by Analogies (via) Ethan Mollick discusses how many of the analogies we have for AI right now are hurting rather than helping our understanding, particularly with respect to LLMs.

# 5th April 2023, 5 am / ai, generative-ai, llms, ethan-mollick

How to use AI to do practical stuff: A new guide (via) Ethan Mollick’s guide to practical usage of large language model chatbot like ChatGPT 3.5 and 4, Bing, Claude and Bard is the best I’ve seen so far. He includes useful warnings about common traps and things that these models are both useful for and useless at.

# 31st March 2023, 6:17 am / bing, ai, chatgpt, bard, llms, ethan-mollick, claude

New AI game: role playing the Titanic. Fantastic Bing prompt from Ethan Mollick: “I am on a really nice White Star cruise from Southampton, and it is 14th April 1912. What should I do tonight?”—Bing takes this very seriously and tries to help out! Works for all sorts of other historic events as well.

# 26th February 2023, 3:53 am / bing, ai, generative-ai, llms, ethan-mollick

Simon Willison’s Weblog

25 posts tagged “ethan-mollick”

2025

2024

2023