Here's an interesting example of models incrementally improving over time: I am finding that today's leading models are competent at writing prompts for themselves and each other.
A year ago I was quite skeptical of the pattern where models are used to help build prompts. Prompt engineering was still a young enough discipline that I did not expect the models to have enough training data to be able to prompt themselves better than a moderately experienced human.
The Claude 4 and GPT-5 families both have training cut-off dates within the past year - recent enough that they've seen a decent volume of good prompting examples.
I expect they have also been deliberately trained for this. Anthropic make extensive use of sub-agent patterns in Claude Code, and published a fascinating article on that pattern (my notes on that).
I don't have anything solid to back this up - it's more of a hunch based on anecdotal evidence where various of my requests for a model to write a prompt have returned useful results over the last few months.
Recent articles
- My review of Claude's new Code Interpreter, released under a very confusing name - 9th September 2025
- Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide - 9th September 2025
- GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search - 6th September 2025