Entries

Filters: Sorted by date

3,316 results «« first « previous page 5 / 111 next » last »»

Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

Qwen just released their sixth model(!) of this July called Qwen3-Coder-30B-A3B-Instruct—listed as Qwen3-Coder-Flash in their chat.qwen.ai interface.

[... 1,390 words]

7:45 pm / 31st July 2025 / ai, generative-ai, llms, ai-assisted-programming, llm, uv, qwen, pelican-riding-a-bicycle, llm-release, lm-studio, ai-in-china, space-invaders

My 2.5 year old laptop can write Space Invaders in JavaScript now, using GLM-4.5 Air and MLX

I wrote about the new GLM-4.5 model family yesterday—new open weight (MIT licensed) models from Z.ai in China which their benchmarks claim score highly in coding even against models such as Claude Sonnet 4.

[... 685 words]

1:02 pm / 29th July 2025 / python, ai, generative-ai, local-llms, llms, ai-assisted-programming, uv, mlx, pelican-riding-a-bicycle, ai-in-china, space-invaders, ivan-fioravanti, glm

Using GitHub Spark to reverse engineer GitHub Spark

GitHub Spark was released in public preview yesterday. It’s GitHub’s implementation of the prompt-to-app pattern also seen in products like Claude Artifacts, Lovable, Vercel v0, Val Town Townie and Fly.io’s Phoenix New. In this post I reverse engineer Spark and explore its fascinating system prompt in detail.

[... 3,900 words]

3:21 pm / 24th July 2025 / github, javascript, ai, react, typescript, prompt-engineering, generative-ai, llms, ai-assisted-programming, llm-tool-use, vibe-coding, system-prompts, prompt-to-app

Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone

This morning, working entirely on my phone, I scraped a conference website and vibe coded up an alternative UI for interacting with the schedule using a combination of OpenAI Codex and Claude Artifacts.

[... 2,189 words]

7:38 pm / 17th July 2025 / definitions, github, icalendar, mobile, scraping, tools, ai, playwright, openai, generative-ai, chatgpt, llms, ai-assisted-programming, claude, claude-artifacts, ai-agents, vibe-coding, coding-agents, async-coding-agents, codex, prompt-to-app

Happy 20th birthday Django! Here’s my talk on Django Origins from Django’s 10th

Today is the 20th anniversary of the first commit to the public Django repository!

[... 8,994 words]

6:47 pm / 13th July 2025 / adrian-holovaty, devfort, django, history, jacob-kaplan-moss, lawrence, lawrence-com, lawrence-journal-world, python, my-talks, the-guardian, web-frameworks, annotated-talks

Grok: searching X for “from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)”

If you ask the new Grok 4 for opinions on controversial questions, it will sometimes run a search to find out Elon Musk’s stance before providing you with an answer.

[... 1,495 words]

12:21 am / 11th July 2025 / twitter, ai, generative-ai, llms, grok, ai-ethics, ai-personality, system-prompts, xai, ai-misuse

Phoenix.new is Fly’s entry into the prompt-driven app development space

Here’s a fascinating new entrant into the AI-assisted-programming / coding-agents space by Fly.io, introduced on their blog in Phoenix.new – The Remote AI Runtime for Phoenix: describe an app in a prompt, get a full Phoenix application, backed by SQLite and running on Fly’s hosting platform. The official Phoenix.new YouTube launch video is a good way to get a sense for what this does.

[... 1,361 words]

6:42 pm / 23rd June 2025 / erlang, sqlite, ai, fly, generative-ai, llms, ai-assisted-programming, ai-agents, vibe-coding, coding-agents, prompt-to-app

Trying out the new Gemini 2.5 model family

After many months of previews, Gemini 2.5 Pro and Flash have reached general availability with new, memorable model IDs: gemini-2.5-pro and gemini-2.5-flash. They are joined by a new preview model with an unmemorable name: gemini-2.5-flash-lite-preview-06-17 is a new Gemini 2.5 Flash Lite model that offers lower prices and much faster inference times.

[... 1,302 words]

10 pm / 17th June 2025 / google, ai, generative-ai, llms, llm, gemini, llm-pricing, pelican-riding-a-bicycle, llm-reasoning, llm-release

The lethal trifecta for AI agents: private data, untrusted content, and external communication

If you are a user of LLM systems that use tools (you can call them “AI agents” if you like) it is critically important that you understand the risk of combining tools with the following three characteristics. Failing to understand this can let an attacker steal your data.

[... 1,324 words]

1:20 pm / 16th June 2025 / definitions, security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks, ai-agents, model-context-protocol, lethal-trifecta

An Introduction to Google’s Approach to AI Agent Security

Here’s another new paper on AI agent security: An Introduction to Google’s Approach to AI Agent Security, by Santiago Díaz, Christoph Kern, and Kara Olive.

[... 2,064 words]

5:28 am / 15th June 2025 / google, security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks, ai-agents, paper-review, agent-definitions

Design Patterns for Securing LLM Agents against Prompt Injections

This new paper by 11 authors from organizations including IBM, Invariant Labs, ETH Zurich, Google and Microsoft is an excellent addition to the literature on prompt injection and LLM security.

[... 1,795 words]

1:26 pm / 13th June 2025 / design-patterns, security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks, ai-agents, paper-review

Comma v0.1 1T and 2T—7B LLMs trained on openly licensed text

It’s been a long time coming, but we finally have some promising LLMs to try out which are trained entirely on openly licensed text!

[... 656 words]

11:59 pm / 7th June 2025 / ai, generative-ai, llms, training-data, mlx, ai-ethics, llm-release

The last six months in LLMs, illustrated by pelicans on bicycles

I presented an invited keynote at the AI Engineer World’s Fair in San Francisco this week. This is my third time speaking at the event—here are my talks from October 2023 and June 2024. My topic this time was “The last six months in LLMs”—originally planned as the last year, but so much has happened that I had to reduce my scope!

[... 6,077 words]

8:42 pm / 6th June 2025 / speaking, my-talks, ai, openai, generative-ai, llms, anthropic, annotated-talks, mistral, gemini, pelican-riding-a-bicycle, deepseek, lethal-trifecta, ai-in-china

Tips on prompting ChatGPT for UK technology secretary Peter Kyle

Back in March New Scientist reported on a successful Freedom of Information request they had filed requesting UK Secretary of State for Science, Innovation and Technology Peter Kyle’s ChatGPT logs:

[... 1,189 words]

7:08 pm / 3rd June 2025 / politics, ai, openai, generative-ai, chatgpt, llms, ai-ethics, digital-literacy

How often do LLMs snitch? Recreating Theo’s SnitchBench with LLM

A fun new benchmark just dropped! Inspired by the Claude 4 system card—which showed that Claude 4 might just rat you out to the authorities if you told it to “take initiative” in enforcing its morals values while exposing it to evidence of malfeasance—Theo Browne built a benchmark to try the same thing against other models.

[... 1,842 words]

10:01 pm / 31st May 2025 / ai, openai, prompt-engineering, generative-ai, llms, llm, anthropic, claude, llm-tool-use, evals, deepseek, ai-ethics, system-prompts, ai-in-china, theo-browne

Talking AI and jobs with Natasha Zouves for News Nation

I was interviewed by News Nation’s Natasha Zouves about the very complicated topic of how we should think about AI in terms of threatening our jobs and careers. I previously talked with Natasha two years ago about Microsoft Bing.

[... 2,194 words]

1:45 pm / 30th May 2025 / interviews, speaking, my-talks, ai, generative-ai, llms, podcast-appearances, ai-ethics, claude-4, macwhisper

Large Language Models can run tools in your terminal with LLM 0.26

LLM 0.26 is out with the biggest new feature since I started the project: support for tools. You can now use the LLM CLI tool—and Python library—to grant LLMs from OpenAI, Anthropic, Gemini and local models from Ollama with access to any tool that you can represent as a Python function.

[... 2,799 words]

8:35 pm / 27th May 2025 / projects, releases, ai, openai, generative-ai, llms, llm, anthropic, gemini, llm-tool-use, ai-agents, ollama

Highlights from the Claude 4 system prompt

Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude Sonnet 4. I enjoyed digging through the prompts, since they act as a sort of unofficial manual for how best to use these tools. Here are my highlights, including a dive into the leaked tool prompts that Anthropic didn’t publish themselves.

[... 5,838 words]

1:45 pm / 25th May 2025 / ai, prompt-engineering, generative-ai, llms, anthropic, claude, llm-tool-use, claude-artifacts, ai-personality, claude-4, system-prompts, prompt-to-app

Live blog: Claude 4 launch at Code with Claude

I’m at Anthropic’s Code with Claude event, where they are launching Claude 4. I’ll be live blogging the keynote here.

4:32 pm / 22nd May 2025 / ai, generative-ai, llms, anthropic, claude, pelican-riding-a-bicycle, llm-release, claude-4, live-blog

I really don’t like ChatGPT’s new memory dossier

Last month ChatGPT got a major upgrade. As far as I can tell the closest to an official announcement was this tweet from @OpenAI:

[... 2,535 words]

2:38 pm / 21st May 2025 / ai, openai, generative-ai, chatgpt, llms, ai-ethics, system-prompts, llm-memory

Building software on top of Large Language Models

I presented a three hour workshop at PyCon US yesterday titled Building software on top of Large Language Models. The goal of the workshop was to give participants everything they needed to get started writing code that makes use of LLMs.

[... 3,726 words]

12:25 pm / 15th May 2025 / pycon, speaking, my-talks, ai, openai, generative-ai, local-llms, llms, embeddings, llm, anthropic, annotated-talks, gemini, vision-llms, llm-tool-use, llm-pricing, llm-reasoning, long-context

Trying out llama.cpp’s new vision support

This llama.cpp server vision support via libmtmd pull request—via Hacker News—was merged earlier today. The PR finally adds full support for vision models to the excellent llama.cpp project. It’s documented on this page, but the more detailed technical details are covered here. Here are my notes on getting it working on a Mac.

[... 1,693 words]

6:29 am / 10th May 2025 / homebrew, projects, ai, generative-ai, local-llms, llms, llm, vision-llms, llama-cpp, gemma

Saying “hi” to Microsoft’s Phi-4-reasoning

Microsoft released a new sub-family of models a few days ago: Phi-4 reasoning. They introduced them in this blog post celebrating a year since the release of Phi-3:

[... 1,498 words]

6:25 pm / 6th May 2025 / microsoft, generative-ai, local-llms, llms, llm, phi, qwen, ollama, llm-reasoning, llm-release, ai-in-china

Feed a video to a vision LLM as a sequence of JPEG frames on the CLI (also LLM 0.25)

The new llm-video-frames plugin can turn a video file into a sequence of JPEG frames and feed them directly into a long context vision LLM such as GPT-4.1, even when that LLM doesn’t directly support video input. It depends on a plugin feature I added to LLM 0.25, which I released last night.

[... 1,600 words]

5:38 pm / 5th May 2025 / cli, ffmpeg, plugins, projects, ai, generative-ai, llms, ai-assisted-programming, llm, vision-llms

Two publishers and three authors fail to understand what “vibe coding” means

Vibe coding does not mean “using AI tools to help write code”. It means “generating code with AI without caring about the code that is produced”. See Not all AI-assisted programming is vibe coding for my previous writing on this subject. This is a hill I am willing to die on. I fear it will be the death of me.

[... 908 words]

2:26 pm / 1st May 2025 / books, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, semantic-diffusion

Understanding the recent criticism of the Chatbot Arena

The Chatbot Arena has become the go-to place for vibes-based evaluation of LLMs over the past two years. The project, originating at UC Berkeley, is home to a large community of model enthusiasts who submit prompts to two randomly selected anonymous models and pick their favorite response. This produces an Elo score leaderboard of the “best” models, similar to how chess rankings work.

[... 1,579 words]

10:55 pm / 30th April 2025 / ai, andrej-karpathy, generative-ai, llms, ai-ethics, openrouter, chatbot-arena, paper-review

Qwen 3 offers a case study in how to effectively release a model

Alibaba’s Qwen team released the hotly anticipated Qwen 3 model family today. The Qwen models are already some of the best open weight models—Apache 2.0 licensed and with a variety of different capabilities (including vision and audio input/output).

[... 1,462 words]

12:37 am / 29th April 2025 / ai, generative-ai, local-llms, llms, llm, llm-tool-use, qwen, mlx, ollama, pelican-riding-a-bicycle, llm-reasoning, llm-release, model-context-protocol, ai-in-china

Watching o3 guess a photo’s location is surreal, dystopian and wildly entertaining

Watching OpenAI’s new o3 model guess where a photo was taken is one of those moments where decades of science fiction suddenly come to life. It’s a cross between the Enhance Button and Omniscient Database TV Tropes.

[... 1,582 words]

12:59 pm / 26th April 2025 / ai, generative-ai, llms, vision-llms, o3, ai-ethics, geoguessing

Exploring Promptfoo via Dave Guarino’s SNAP evals

I used part three (here’s parts one and two) of Dave Guarino’s series on evaluating how well LLMs can answer questions about SNAP (aka food stamps) as an excuse to explore Promptfoo, an LLM eval tool.

[... 692 words]

6:58 pm / 24th April 2025 / ai, prompt-engineering, generative-ai, llms, evals

AI assisted search-based research actually works now

For the past two and a half years the feature I’ve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the first glimpses of this back in early 2023, with Perplexity (first launched December 2022, first prompt leak in January 2023) and then the GPT-4 powered Microsoft Bing (which launched/cratered spectacularly in February 2023). Since then a whole bunch of people have taken a swing at this problem, most notably Google Gemini and ChatGPT Search.

[... 1,618 words]

12:57 pm / 21st April 2025 / google, search, search-engines, ai, openai, generative-ai, chatgpt, llms, anthropic, perplexity, gemini, llm-tool-use, llm-reasoning, o3, deep-research, ai-ethics, ai-assisted-search

«« first « previous page 5 / 111 next » last »»

Simon Willison’s Weblog