Simon Willison’s Weblog

Subscribe

49 items tagged “claude”

Claude is Anthropic's family of Large Language Models.

2024

We've doubled the max output token limit for Claude 3.5 Sonnet from 4096 to 8192 in the Anthropic API.

Just add the header "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15" to your API calls.

Alex Albert

# 15th July 2024, 9:33 pm / ai, generative-ai, llms, anthropic, claude, alex-albert

Give people something to link to so they can talk about your features and ideas

If you have a project, an idea, a product feature, or anything else that you want other people to understand and have conversations about... give them something to link to!

[... 685 words]

Yeah, unfortunately vision prompting has been a tough nut to crack. We've found it's very challenging to improve Claude's actual "vision" through just text prompts, but we can of course improve its reasoning and thought process once it extracts info from an image.

In general, I think vision is still in its early days, although 3.5 Sonnet is noticeably better than older models.

Alex Albert (Anthropic)

# 10th July 2024, 6:56 pm / ai, prompt-engineering, generative-ai, llms, anthropic, claude, vision-llms, alex-albert

Anthropic cookbook: multimodal. I'm currently on the lookout for high quality sources of information about vision LLMs, including prompting tricks for getting the most out of them.

This set of Jupyter notebooks from Anthropic (published four months ago to accompany the original Claude 3 models) is the best I've found so far. Best practices for using vision with Claude includes advice on multi-shot prompting with example, plus this interesting think step-by-step style prompt for improving Claude's ability to count the dogs in an image:

You have perfect vision and pay great attention to detail which makes you an expert at counting objects in images. How many dogs are in this picture? Before providing the answer in <answer> tags, think step by step in <thinking> tags and analyze every part of the image.

# 10th July 2024, 6:38 pm / jupyter, vision-llms, anthropic, claude, generative-ai, ai, llms

Claude: You can now publish, share, and remix artifacts. Artifacts is the feature Anthropic released a few weeks ago to accompany Claude 3.5 Sonnet, allowing Claude to create interactive HTML+JavaScript tools in response to prompts.

This morning they added the ability to make those artifacts public and share links to them, which makes them even more useful!

Here's my box shadow playground from the other day, and an example page I requested demonstrating the Milligram CSS framework - Artifacts can load most code that is available via cdnjs so they're great for quickly trying out new libraries.

# 9th July 2024, 10:25 pm / anthropic, claude, generative-ai, ai, llms

hangout_services/thunk.js (via) It turns out Google Chrome (via Chromium) includes a default extension which makes extra services available to code running on the *.google.com domains - tweeted about today by Luca Casonato, but the code has been there in the public repo since October 2013 as far as I can tell.

It looks like it's a way to let Google Hangouts (or presumably its modern predecessors) get additional information from the browser, including the current load on the user's CPU. Update: On Hacker News a Googler confirms that the Google Meet "troubleshooting" feature uses this to review CPU utilization.

I got GPT-4o to help me figure out how to trigger it (I tried Claude 3.5 Sonnet first but it refused, saying "Doing so could potentially violate terms of service or raise security and privacy concerns"). Paste the following into your Chrome DevTools console on any Google site to see the result:

chrome.runtime.sendMessage(
  "nkeimhogjdpnpccoofpliimaahmaaome",
  { method: "cpu.getInfo" },
  (response) => {
    console.log(JSON.stringify(response, null, 2));
  },
);

I get back a response that starts like this:

{
  "value": {
    "archName": "arm64",
    "features": [],
    "modelName": "Apple M2 Max",
    "numOfProcessors": 12,
    "processors": [
      {
        "usage": {
          "idle": 26890137,
          "kernel": 5271531,
          "total": 42525857,
          "user": 10364189
        }
      }, ...

The code doesn't do anything on non-Google domains.

Luca says this - I'm inclined to agree:

This is interesting because it is a clear violation of the idea that browser vendors should not give preference to their websites over anyone elses.

# 9th July 2024, 5:50 pm / browsers, claude, google, chatgpt, chrome, ai, llms

Box shadow CSS generator (via) Another example of a tiny personal tool I built using Claude 3.5 Sonnet and artifacts. In this case my prompt was:

CSS for a slight box shadow, build me a tool that helps me twiddle settings and preview them and copy and paste out the CSS

I changed my mind half way through typing the prompt and asked it for a custom tool, and it built me this!

Box shadow CSS generator. Shows a preview, then provides sliders to set Horizontal Offset, Vertical Offset, Blur Radius,  Spread Radius,  Color and Opacity - plus the generated CSS and a Copy to Clipboard button

Here's the full transcript - in a follow-up prompt I asked for help deploying it and it rewrote the tool to use <script type="text/babel"> and the babel-standalone library to add React JSX support directly in the browser - a bit of a hefty dependency (387KB compressed / 2.79MB total) but I think acceptable for this kind of one-off tool.

Being able to knock out tiny custom tools like this on a whim is a really interesting new capability. It's also a lot of fun!

# 8th July 2024, 7:30 pm / css, anthropic, claude, generative-ai, projects, ai, llms

Chrome Prompt Playground. Google Chrome Canary is currently shipping an experimental on-device LLM, in the form of Gemini Nano. You can access it via the new window.ai API, after first enabling the "Prompt API for Gemini Nano" experiment in chrome://flags (and then waiting an indeterminate amount of time for the ~1.7GB model file to download - I eventually spotted it in ~/Library/Application Support/Google/Chrome Canary/OptGuideOnDeviceModel).

I got Claude 3.5 Sonnet to build me this playground interface for experimenting with the model. You can execute prompts, stream the responses and all previous prompts and responses are stored in localStorage.

Animated GIF demo. The prompt is Show two greetings each in French and Spanish - on clicking the button the result streams in:  French Bonjour! Bienvenue!, Spanish Hola!, Bienvenido! Scrolling down reveals the stored history, and clicking delete on that prompt removes it from the page.

Here's the full Sonnet transcript, and the final source code for the app.

The best documentation I've found for the new API is is explainers-by-googlers/prompt-api on GitHub.

# 3rd July 2024, 5:11 pm / generative-ai, projects, chrome, ai, llms, gemini, google, claude

Compare PDFs. Inspired by this thread on Hacker News about the C++ diff-pdf tool I decided to see what it would take to produce a web-based PDF diff visualization tool using Claude 3.5 Sonnet.

It took two prompts:

Build a tool where I can drag and drop on two PDF files and it uses PDF.js to turn each of their pages into canvas elements and then displays those pages side by side with a third image that highlights any differences between them, if any differences exist

That give me a React app that didn't quite work, so I followed-up with this:

rewrite that code to not use React at all

Which gave me a working tool! You can see the full Claude transcript plus screenshots of the tool in action in this Gist.

Being able to knock out little custom interactive web tools like this in a couple of minutes is so much fun.

# 2nd July 2024, 7:54 pm / projects, pdf, claude, llms, anthropic

Claude Projects. New Claude feature, quietly launched this morning for Claude Pro users. Looks like their version of OpenAI's GPTs, designed to take advantage of Claude's 200,000 token context limit:

You can upload relevant documents, text, code, or other files to a project’s knowledge base, which Claude will use to better understand the context and background for your individual chats within that project. Each project includes a 200K context window, the equivalent of a 500-page book, so users can add all of the insights needed to enhance Claude’s effectiveness.

You can also set custom instructions, which presumably get added to the system prompt.

I tried dropping in all of Datasette's existing documentation - 693KB of .rst files (which I had to rename to .rst.txt for it to let me upload them) - and it worked and showed "63% of knowledge size used".

This is a slightly different approach from OpenAI, where the GPT knowledge feature supports attaching up to 20 files each with up to 2 million tokens, which get ingested into a vector database (likely Qdrant) and used for RAG.

It looks like Claude instead handle a smaller amount of extra knowledge but paste the whole thing into the context window, which avoids some of the weirdness around semantic search chunking but greatly limits the size of the data.

My big frustration with the knowledge feature in GPTs remains the lack of documentation on what it's actually doing under the hood. Without that it's difficult to make informed decisions about how to use it - with Claude Projects I can at least develop a robust understanding of what the tool is doing for me and how best to put it to work.

No equivalent (yet) for the GPT actions feature where you can grant GPTs the ability to make API calls out to external systems.

# 25th June 2024, 4:03 pm / anthropic, claude, openai, ai, llms, chatgpt, prompt-engineering, rag

Building search-based RAG using Claude, Datasette and Val Town

Visit Building search-based RAG using Claude, Datasette and Val Town

Retrieval Augmented Generation (RAG) is a technique for adding extra “knowledge” to systems built on LLMs, allowing them to answer questions against custom information not included in their training data. A common way to implement this is to take a question from a user, translate that into a set of search queries, run those against a search engine and then feed the results back into the LLM to generate an answer.

[... 3,372 words]

llm-claude-3 0.4. LLM plugin release adding support for the new Claude 3.5 Sonnet model:

pipx install llm
llm install -U llm-claude-3
llm keys set claude
# paste AP| key here
llm -m claude-3.5-sonnet \
  'a joke about a pelican and a walrus having lunch'

# 20th June 2024, 11:04 pm / llm, anthropic, claude, generative-ai, projects, ai, llms

Claude 3.5 Sonnet. Anthropic released a new model this morning, and I think it's likely now the single best available LLM. Claude 3 Opus was already mostly on-par with GPT-4o, and the new 3.5 Sonnet scores higher than Opus on almost all of Anthropic's internal evals.

It's also twice the speed and one fifth of the price of Opus (it's the same price as the previous Claude 3 Sonnet). To compare:

  • gpt-4o: $5/million input tokens and $15/million output
  • Claude 3.5 Sonnet: $3/million input, $15/million output
  • Claude 3 Opus: $15/million input, $75/million output

Similar to Claude 3 Haiku then, which both under-cuts and out-performs OpenAI's GPT-3.5 model.

In addition to the new model, Anthropic also added a "artifacts" feature to their Claude web interface. The most exciting part of this is that any of the Claude models can now build and then render web pages and SPAs, directly in the Claude interface.

This means you can prompt them to e.g. "Build me a web app that teaches me about mandelbrot fractals, with interactive widgets" and they'll do exactly that - I tried that prompt on Claude 3.5 Sonnet earlier and the results were spectacular (video demo).

An unsurprising note at the end of the post:

To complete the Claude 3.5 model family, we’ll be releasing Claude 3.5 Haiku and Claude 3.5 Opus later this year.

If the pricing stays consistent with Claude 3, Claude 3.5 Haiku is going to be a very exciting model indeed.

# 20th June 2024, 6:01 pm / anthropic, claude, generative-ai, ai, llms, vision-llms

Claude: Building evals and test cases. More documentation updates from Anthropic: this section on writing evals for Claude is new today and includes Python code examples for a number of different evaluation techniques.

Included are several examples of the LLM-as-judge pattern, plus an example using cosine similarity and another that uses the new-to-me Rouge Python library that implements the ROUGE metric for evaluating the quality of summarized text.

# 18th June 2024, 4:28 pm / prompt-engineering, anthropic, claude, generative-ai, ai, llms

Anthropic release notes (via) Anthropic have started publishing release notes! Currently available for their API and their apps (mobile and web).

What I'd really like to see are release notes for the models themselves, though as far as I can tell there haven't been any updates to those since the Claude 3 models were first released (the Haiku model name in the API is still claude-3-haiku-20240307 and Anthropic say they'll change that identifier after any updates to the model).

# 18th June 2024, 4:25 pm / anthropic, claude, generative-ai, ai, llms, alex-albert

Contrast [Apple Intelligence] to what OpenAI is trying to accomplish with its GPT models, or Google with Gemini, or Anthropic with Claude: those large language models are trying to incorporate all of the available public knowledge to know everything; it’s a dramatically larger and more difficult problem space, which is why they get stuff wrong. There is also a lot of stuff that they don’t know because that information is locked away — like all of the information on an iPhone.

Ben Thompson

# 12th June 2024, 9:29 pm / apple, ai, openai, llms, claude, gemini

Claude’s Character (via) There's so much interesting stuff in this article from Anthropic on how they defined the personality for their Claude 3 model. In addition to the technical details there are some very interesting thoughts on the complex challenge of designing a "personality" for an LLM in the first place.

Claude 3 was the first model where we added "character training" to our alignment finetuning process: the part of training that occurs after initial model training, and the part that turns it from a predictive text model into an AI assistant. The goal of character training is to make Claude begin to have more nuanced, richer traits like curiosity, open-mindedness, and thoughtfulness.

But what other traits should it have? This is a very difficult set of decisions to make! The most obvious approaches are all flawed in different ways:

Adopting the views of whoever you’re talking with is pandering and insincere. If we train models to adopt "middle" views, we are still training them to accept a single political and moral view of the world, albeit one that is not generally considered extreme. Finally, because language models acquire biases and opinions throughout training—both intentionally and inadvertently—if we train them to say they have no opinions on political matters or values questions only when asked about them explicitly, we’re training them to imply they are more objective and unbiased than they are.

The training process itself is particularly fascinating. The approach they used focuses on synthetic data, and effectively results in the model training itself:

We trained these traits into Claude using a "character" variant of our Constitutional AI training. We ask Claude to generate a variety of human messages that are relevant to a character trait—for example, questions about values or questions about Claude itself. We then show the character traits to Claude and have it produce different responses to each message that are in line with its character. Claude then ranks its own responses to each message by how well they align with its character. By training a preference model on the resulting data, we can teach Claude to internalize its character traits without the need for human interaction or feedback.

There's still a lot of human intervention required, but significantly less than more labour-intensive patterns such as Reinforcement Learning from Human Feedback (RLHF):

Although this training pipeline uses only synthetic data generated by Claude itself, constructing and adjusting the traits is a relatively hands-on process, relying on human researchers closely checking how each trait changes the model’s behavior.

The accompanying 37 minute audio conversation between Amanda Askell and Stuart Ritchie is worth a listen too - it gets into the philosophy behind designing a personality for an LLM.

# 8th June 2024, 9:41 pm / anthropic, claude, generative-ai, ai, llms

Golden Gate Claude. This is absurdly fun and weird. Anthropic's recent LLM interpretability research gave them the ability to locate features within the opaque blob of their Sonnet model and boost the weight of those features during inference.

For a limited time only they're serving a "Golden Gate Claude" model which has the feature for the Golden Gate Bridge boosted. No matter what question you ask it the Golden Gate Bridge is likely to be involved in the answer in some way. Click the little bridge icon in the Claude UI to give it a go.

I asked for names for a pet pelican and the first one it offered was this:

Golden Gate - This iconic bridge name would be a fitting moniker for the pelican with its striking orange color and beautiful suspension cables.

And from a recipe for chocolate covered pretzels:

Gently wipe any fog away and pour the warm chocolate mixture over the bridge/brick combination. Allow to air dry, and the bridge will remain accessible for pedestrians to walk along it.

UPDATE: I think the experimental model is no longer available, approximately 24 hours after release. We'll miss you, Golden Gate Claude.

# 24th May 2024, 8:17 am / anthropic, claude, generative-ai, ai, llms, interpretability

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (via) Big advances in the field of LLM interpretability from Anthropic, who managed to extract millions of understandable features from their production Claude 3 Sonnet model (the mid-point between the inexpensive Haiku and the GPT-4-class Opus).

Some delightful snippets in here such as this one:

We also find a variety of features related to sycophancy, such as an empathy / “yeah, me too” feature 34M/19922975, a sycophantic praise feature 1M/847723, and a sarcastic praise feature 34M/19415708.

# 21st May 2024, 6:25 pm / anthropic, claude, generative-ai, ai, llms, interpretability

Introducing the Claude Team plan and iOS app. The iOS app seems nice, and provides free but heavily rate-limited access to Sonnet (the middle-sized Claude 3 model)—I ran two prompts just now and it told me I could have 3 more, resetting in five hours.

For $20/month you get access to Opus and 5x the capacity—which feels a little ungenerous to me.

The new $30/user/month team plan provides higher rate limits but is a minimum of five seats.

# 1st May 2024, 4:06 pm / anthropic, claude

timpaul/form-extractor-prototype (via) Tim Paul, Head of Interaction Design at the UK’s Government Digital Service, published this brilliant prototype built on top of Claude 3 Opus.

The video shows what it can do. Give it an image of a form and it will extract the form fields and use them to create a GDS-style multi-page interactive form, using their GOV.UK Forms design system and govuk-frontend npm package.

It works for both hand-drawn napkin illustrations and images of existing paper forms.

The bulk of the prompting logic is the schema definition in data/extract-form-questions.json

I’m always excited to see applications built on LLMs that go beyond the chatbot UI. This is a great example of exactly that.

# 22nd April 2024, 10:01 pm / forms, anthropic, claude, generative-ai, ai, llms

In mid-March, we added this line to our system prompt to prevent Claude from thinking it can open URLs:

"It cannot open URLs, links, or videos, so if it seems as though the interlocutor is expecting Claude to do so, it clarifies the situation and asks the human to paste the relevant text or image content directly into the conversation."

Alex Albert (Anthropic)

# 18th April 2024, 12:22 am / ai, prompt-engineering, generative-ai, llms, anthropic, claude, alex-albert

[On complaints about Claude 3 reduction in quality since launch] The model is stored in a static file and loaded, continuously, across 10s of thousands of identical servers each of which serve each instance of the Claude model. The model file never changes and is immutable once loaded; every shard is loading the same model file running exactly the same software. We haven’t changed the temperature either. We don’t see anywhere where drift could happen. The files are exactly the same as at launch and loaded each time from a frozen pristine copy.

Jason D. Clinton, Anthropic

# 15th April 2024, 1:27 am / ai, generative-ai, llms, anthropic, claude

A solid pattern to build LLM Applications (feat. Claude) (via) Hrishi Olickel is one of my favourite prompt whisperers. In this YouTube video he walks through his process for building quick interactive applications with the assistance of Claude 3, spinning up an app that analyzes his meeting transcripts to extract participants and mentioned organisations, then presents a UI for exploring the results built with Next.js and shadcn/ui.

An interesting tip I got from this: use the weakest, not the strongest models to iterate on your prompts. If you figure out patterns that work well with Claude 3 Haiku they will have a significantly lower error rate with Sonnet or Opus. The speed of the weaker models also means you can iterate much faster, and worry less about the cost of your experiments.

# 9th April 2024, 6:39 pm / llms, ai, claude, generative-ai

Building files-to-prompt entirely using Claude 3 Opus

Visit Building files-to-prompt entirely using Claude 3 Opus

files-to-prompt is a new tool I built to help me pipe several files at once into prompts to LLMs such as Claude and GPT-4.

[... 3,235 words]

The lifecycle of a code AI completion (via) Philipp Spiess provides a deep dive into how Sourcegraph’s Cody code completion assistant works. Lots of fascinating details in here:

“One interesting learning was that if a user is willing to wait longer for a multi-line request, it usually is worth it to increase latency slightly in favor of quality. For our production setup this means we use a more complex language model for multi-line completions than we do for single-line completions.”

This article is from October 2023 and talks about Claude Instant. The code for Cody is open source so I checked to see if they have switched to Haiku yet and found a commit from March 25th that adds Haiku as an A/B test.

# 7th April 2024, 7:37 pm / anthropic, claude, generative-ai, ai, llms

The cost of AI reasoning over time (via) Karina Nguyen from Anthropic provides a fascinating visualization illustrating the cost of different levels of LLM over the past few years, plotting their cost-per-token against their scores on the MMLU benchmark.

Claude 3 Haiku currently occupies the lowest cost to score ratio, over on the lower right hand side of the chart.

# 4th April 2024, 12:51 pm / anthropic, claude, generative-ai, ai, llms

Wrap text at specified width. New Observable notebook. I built this with the help of Claude 3 Opus—it’s a text wrapping tool which lets you set the width and also lets you optionally add a four space indent.

The four space indent is handy for posting on forums such as Hacker News that treat a four space indent as a code block.

# 28th March 2024, 3:36 am / projects, observable, claude

“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time. I’m quoted in this piece by Benj Edwards for Ars Technica:

“For the first time, the best available models—Opus for advanced tasks, Haiku for cost and efficiency—are from a vendor that isn’t OpenAI. That’s reassuring—we all benefit from a diversity of top vendors in this space. But GPT-4 is over a year old at this point, and it took that year for anyone else to catch up.”

# 27th March 2024, 4:58 pm / anthropic, claude, generative-ai, openai, gpt4, ai, llms, benj-edwards

Semgrep: AutoFixes using LLMs (via) semgrep is a really neat tool for semantic grep against source code—you can give it a pattern like “log.$A(...)” to match all forms of log.warning(...) / log.error(...) etc.

Ilia Choly built semgrepx— xargs for semgrep—and here shows how it can be used along with my llm CLI tool to execute code replacements against matches by passing them through an LLM such as Claude 3 Opus.

# 26th March 2024, 12:51 am / llm, claude, generative-ai, ai, llms