Simon Willison's Weblog: anthropic

Anthropic: Message Batches (beta)

2024-10-08T18:18:57+00:00

Anthropic now have a batch mode, allowing you to send prompts to Claude in batches which will be processed within 24 hours (though probably much faster than that) and come at a 50% price discount.

This matches the batch models offered by OpenAI and by Google Gemini, both of which also provide a 50% discount.

Update 15th October 2024: Alex Albert confirms that Anthropic batching and prompt caching can be combined:

Don't know if folks have realized yet that you can get close to a 95% discount on Claude 3.5 Sonnet tokens when you combine prompt caching with the new Batches API

Via @alexalbert__

Tags: gemini, anthropic, claude, generative-ai, openai, ai, llms, alex-albert

Gemini 1.5 Flash-8B is now production ready

2024-10-03T20:16:36+00:00

Gemini 1.5 Flash-8B is now production ready

Gemini 1.5 Flash-8B is "a smaller and faster variant of 1.5 Flash" - and is now released to production, at half the price of the 1.5 Flash model.

It's really, really cheap:

$0.0375 per 1 million input tokens on prompts <128K
$0.15 per 1 million output tokens on prompts <128K
$0.01 per 1 million input tokens on cached prompts <128K

Prices are doubled for prompts longer than 128K.

I believe images are still charged at a flat rate of 258 tokens, which I think means a single non-cached image with Flash should cost 0.00097 cents - a number so tiny I'm doubting if I got the calculation right.

OpenAI's cheapest model remains GPT-4o mini, at $0.15/1M input - though that drops to half of that for reused prompt prefixes thanks to their new prompt caching feature (or by half if you use batches, though those can’t be combined with OpenAI prompt caching. Gemini also offer half-off for batched requests).

Anthropic's cheapest model is still Claude 3 Haiku at $0.25/M, though that drops to $0.03/M for cached tokens (if you configure them correctly).

I've released llm-gemini 0.2 with support for the new model:

llm install -U llm-gemini
llm keys set gemini
# Paste API key here
llm -m gemini-1.5-flash-8b-latest "say hi"

Via @OfficialLoganK

Tags: vision-llms, gemini, anthropic, openai, ai, llms, google, generative-ai, llm

Markdown and Math Live Renderer

2024-09-21T04:56:30+00:00

Markdown and Math Live Renderer

Another of my tiny Claude-assisted JavaScript tools. This one lets you enter Markdown with embedded mathematical expressions (like $ax^2 + bx + c = 0$ ) and live renders those on the page, with an HTML version using MathML that you can export through copy and paste.

Here's the Claude transcript. I started by asking:

Are there any client side JavaScript markdown libraries that can also handle inline math and render it?

Claude gave me several options including the combination of Marked and KaTeX, so I followed up by asking:

Build an artifact that demonstrates Marked plus KaTeX - it should include a text area I can enter markdown in (repopulated with a good example) and live update the rendered version below. No react.

Which gave me this artifact, instantly demonstrating that what I wanted to do was possible.

I iterated on it a tiny bit to get to the final version, mainly to add that HTML export and a Copy button. The final source code is here.

Tags: claude-3-5-sonnet, anthropic, claude, markdown, mathml, ai, llms, ai-assisted-programming, tools, generative-ai, claude-artifacts

Notes on using LLMs for code

2024-09-20T03:10:57+00:00

I was recently the guest on TWIML - the This Week in Machine Learning & AI podcast. Our episode is titled Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison, and the focus of the conversation was the ways in which I use LLM tools in my day-to-day work as a software developer and product engineer.

Here's the YouTube video version of the episode:

I ran the transcript through MacWhisper and extracted some edited highligts below.

Two different modes of LLM use

At 19:53:

There are two different modes that I use LLMs for with programming.

The first is exploratory mode, which is mainly quick prototyping - sometimes in programming languages I don't even know.

I love asking these things to give me options. I will often start a prompting session by saying, "I want to draw a visualization of an audio wave. What are my options for this?"

And have it just spit out five different things. Then I'll say "Do me a quick prototype of option three that illustrates how that would work."

The other side is when I'm writing production code, code that I intend to ship, then it's much more like I'm treating it basically as an intern who's faster at typing than I am.

That's when I'll say things like, "Write me a function that takes this and this and returns exactly that."

I'll often iterate on these a lot. I'll say, "I don't like the variable names you used there. Change those." Or "Refactor that to remove the duplication."

I call it my weird intern, because it really does feel like you've got this intern who is screamingly fast, and they've read all of the documentation for everything, and they're massively overconfident, and they make mistakes and they don't realize them.

But crucially, they never get tired, and they never get upset. So you can basically just keep on pushing them and say, "No, do it again. Do it differently. Change that. Change that."

At three in the morning, I can be like, "Hey, write me 100 lines of code that does X, Y, and Z," and it'll do it. It won't complain about it.

It's weird having this small army of super talented interns that never complain about anything, but that's kind of how this stuff ends up working.

Here are all of my other notes about AI-assisted programming.

Prototyping

At 25:22:

My entire career has always been about prototyping.

Django itself, the web framework, we built that in a local newspaper so that we could ship features that supported news stories faster. How can we make it so we can turn around a production-grade web application in a few days?

Ever since then, I've always been interested in finding new technologies that let me build things quicker, and my development process has always been to start with a prototype.

You have an idea, you build a prototype that illustrates the idea, you can then have a better conversation about it. If you go to a meeting with five people, and you've got a working prototype, the conversation will be so much more informed than if you go in with an idea and a whiteboard sketch.

I've always been a prototyper, but I feel like the speed at which I can prototype things in the past 12 months has gone up by an order of magnitude.

I was already a very productive prototype producer. Now, I can tap a thing into my phone, and 30 seconds later, I've got a user interface in Claude Artifacts that illustrates the idea that I'm trying to explore.

Honestly, if I didn't use these models for anything else, if I just used them for prototyping, they would still have an enormous impact on the work that I do.

Here are examples of prototypes I've built using Claude Artifacts. A lot of them end up in my tools collection.

The full conversation covers a bunch of other topics. I ran the transcript through Claude, told it "Give me a bullet point list of the most interesting topics covered in this transcript" and then deleted the ones that I didn't think were particularly interesting - here's what was left:

Using AI-powered voice interfaces like ChatGPT's Voice Mode to code while walking a dog
Leveraging AI tools like Claude and ChatGPT for rapid prototyping and development
Using AI to analyze and extract data from images, including complex documents like campaign finance reports
The challenges of using AI for tasks that may trigger safety filters, particularly for journalism
The evolution of local AI models like Llama and their improving capabilities
The potential of AI for data extraction from complex sources like scanned tables in PDFs
Strategies for staying up-to-date with rapidly evolving AI technologies
The development of vision-language models and their applications
The balance between hosted AI services and running models locally
The importance of examples in prompting for better AI performance

Tags: podcasts, ai, openai, generative-ai, chatgpt, llms, ai-assisted-programming, anthropic, claude, claude-artifacts

Introducing Contextual Retrieval

2024-09-20T01:34:21+00:00

Introducing Contextual Retrieval

Here's an interesting new embedding/RAG technique, described by Anthropic but it should work for any embedding model against any other LLM.

One of the big challenges in implementing semantic search against vector embeddings - often used as part of a RAG system - is creating "chunks" of documents that are most likely to semantically match queries from users.

Anthropic provide this solid example where semantic chunks might let you down:

Imagine you had a collection of financial information (say, U.S. SEC filings) embedded in your knowledge base, and you received the following question: "What was the revenue growth for ACME Corp in Q2 2023?"

A relevant chunk might contain the text: "The company's revenue grew by 3% over the previous quarter." However, this chunk on its own doesn't specify which company it's referring to or the relevant time period, making it difficult to retrieve the right information or use the information effectively.

Their proposed solution is to take each chunk at indexing time and expand it using an LLM - so the above sentence would become this instead:

This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."

This chunk was created by Claude 3 Haiku (their least expensive model) using the following prompt template:

<document>
{{WHOLE_DOCUMENT}}
</document>
Here is the chunk we want to situate within the whole document
<chunk>
{{CHUNK_CONTENT}}
</chunk>
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.

Here's the really clever bit: running the above prompt for every chunk in a document could get really expensive thanks to the inclusion of the entire document in each prompt. Claude added context caching last month, which allows you to pay around 1/10th of the cost for tokens cached up to your specified beakpoint.

By Anthropic's calculations:

Assuming 800 token chunks, 8k token documents, 50 token context instructions, and 100 tokens of context per chunk, the one-time cost to generate contextualized chunks is $1.02 per million document tokens.

Anthropic provide a detailed notebook demonstrating an implementation of this pattern. Their eventual solution combines cosine similarity and BM25 indexing, uses embeddings from Voyage AI and adds a reranking step powered by Cohere.

The notebook also includes an evaluation set using JSONL - here's that evaluation data in Datasette Lite.

Via Alex Albert

Tags: anthropic, claude, generative-ai, ai, embeddings, llms, search, rag, prompt-engineering

files-to-prompt 0.3

2024-09-09T05:57:35+00:00

files-to-prompt 0.3

New version of my files-to-prompt CLI tool for turning a bunch of files into a prompt suitable for piping to an LLM, described here previously.

It now has a -c/--cxml flag for outputting the files in Claude XML-ish notation (XML-ish because it's not actually valid XML) using the format Anthropic describe as recommended for long context:

files-to-prompt llm-*/README.md --cxml | llm -m claude-3.5-sonnet \
  --system 'return an HTML page about these plugins with usage examples' \
  > /tmp/fancy.html

Here's what that gave me.

The format itself looks something like this:

<documents>
<document index="1">
<source>llm-anyscale-endpoints/README.md</source>
<document_content>
# llm-anyscale-endpoints
...
</document_content>
</document>
</documents>

Tags: anthropic, claude, tools, projects, generative-ai, ai, llms, prompt-engineering

json-flatten, now with format documentation

2024-09-07T05:43:01+00:00

json-flatten, now with format documentation

json-flatten is a fun little Python library I put together a few years ago for converting JSON data into a flat key-value format, suitable for inclusion in an HTML form or query string. It lets you take a structure like this one:

{"foo": {"bar": [1, True, None]}

And convert it into key-value pairs like this:

foo.bar.[0]$int=1
foo.bar.[1]$bool=True
foo.bar.[2]$none=None

The flatten(dictionary) function function converts to that format, and unflatten(dictionary) converts back again.

I was considering the library for a project today and realized that the 0.3 README was a little thin - it showed how to use the library but didn't provide full details of the format it used.

On a hunch, I decided to see if files-to-prompt plus LLM plus Claude 3.5 Sonnet could write that documentation for me. I ran this command:

files-to-prompt *.py | llm -m claude-3.5-sonnet --system 'write detailed documentation in markdown describing the format used to represent JSON and nested JSON as key/value pairs, include a table as well'

That *.py picked up both json_flatten.py and test_json_flatten.py - I figured the test file had enough examples in that it should act as a good source of information for the documentation.

This worked really well! You can see the first draft it produced here.

It included before and after examples in the documentation. I didn't fully trust these to be accurate, so I gave it this follow-up prompt:

llm -c "Rewrite that document to use the Python cog library to generate the examples"

I'm a big fan of Cog for maintaining examples in READMEs that are generated by code. Cog has been around for a couple of decades now so it was a safe bet that Claude would know about it.

This almost worked - it produced valid Cog syntax like the following:

[[[cog
example = {
"fruits": ["apple", "banana", "cherry"]
}

cog.out("```json\n")
cog.out(str(example))
cog.out("\n```\n")
cog.out("Flattened:\n```\n")
for key, value in flatten(example).items():
    cog.out(f"{key}: {value}\n")
cog.out("```\n")
]]]
[[[end]]]

But that wasn't entirely right, because it forgot to include the Markdown comments that would hide the Cog syntax, which should have looked like this:

<!-- [[[cog -->
...
<!-- ]]] -->
...
<!-- [[[end]]] -->

I could have prompted it to correct itself, but at this point I decided to take over and edit the rest of the documentation by hand.

The end result was documentation that I'm really happy with, and that I probably wouldn't have bothered to write if Claude hadn't got me started.

Tags: claude-3-5-sonnet, llm, anthropic, claude, ai, llms, ai-assisted-programming, generative-ai, projects, json

Calling LLMs from client-side JavaScript, converting PDFs to HTML + weeknotes

2024-09-06T02:28:38+00:00

I've been having a bunch of fun taking advantage of CORS-enabled LLM APIs to build client-side JavaScript applications that access LLMs directly. I also span up a new Datasette plugin for advanced permission management.

LLMs from client-side JavaScript

Anthropic recently added CORS support to their Claude APIs. It's a little hard to use - you have to add anthropic-dangerous-direct-browser-access: true to your request headers to enable it - but once you know the trick you can start building web applications that talk to Anthropic's LLMs directly, without any additional server-side code.

I later found out that both OpenAI and Google Gemini have this capability too, without needing the special header.

The problem with this approach is security: it's very important not to embed an API key attached to your billing account in client-side HTML and JavaScript for anyone to see!

For my purposes though that doesn't matter. I've been building tools which prompt() a user for their own API key (sadly restricting their usage to the tiny portion of people who both understand API keys and have created API accounts with one of the big providers) - then I stash that key in localStorage and start using it to make requests.

My simonw/tools repository is home to a growing collection of pure HTML+JavaScript tools, hosted at tools.simonwillison.net using GitHub Pages. I love not having to even think about hosting server-side code for these tools.

I've published three tools there that talk to LLMs directly so far:

haiku is a fun demo that requests access to the user's camera and then writes a Haiku about what it sees. It uses Anthropic's Claude 3 Haiku model for this - the whole project is one terrible pun. Haiku source code here.
gemini-bbox uses the Gemini 1.5 Pro (or Flash) API to prompt those models to return bounding boxes for objects in an image, then renders those bounding boxes. Gemini Pro is the only of the vision LLMs that I've tried that has reliable support for bounding boxes. I wrote about this in Building a tool showing how Gemini Pro can return bounding boxes for objects in images.
Gemini Chat App is a more traditional LLM chat interface that again talks to Gemini models (including the new super-speedy gemini-1.5-flash-8b-exp-0827). I built this partly to try out those new models and partly to experiment with implementing a streaming chat interface agaist the Gemini API directly in a browser. I wrote more about how that works in this post.

Here's that Gemini Bounding Box visualization tool:

All three of these tools made heavy use of AI-assisted development: Claude 3.5 Sonnet wrote almost every line of the last two, and the Haiku one was put together a few months ago using Claude 3 Opus.

My personal style of HTML and JavaScript apps turns out to be highly compatible with LLMs: I like using vanilla HTML and JavaScript and keeping everything in the same file, which makes it easy to paste the entire thing into the model and ask it to make some changes for me. This approach also works really well with Claude Artifacts, though I have to tell it "no React" to make sure I get an artifact I can hack on without needing to configure a React build step.

Converting PDFs to HTML and Markdown

I have a long standing vendetta against PDFs for sharing information. They're painful to read on a mobile phone, they have poor accessibility, and even things like copying and pasting text from them can be a pain.

Complaining without doing something about it isn't really my style. Twice in the past few weeks I've taken matters into my own hands:

Google Research released a PDF paper describing their new pipe syntax for SQL. I ran it through Gemini 1.5 Pro to convert it to HTML (prompts here) and got this - a pretty great initial result for the first prompt I tried!
Nous Research released a preliminary report PDF about their DisTro technology for distributed training of LLMs over low-bandwidth connections. I ran a prompt to use Gemini 1.5 Pro to convert that to this Markdown version, which even handled tables.

Within six hours of posting it my Pipe Syntax in SQL conversion was ranked third on Google for the title of the paper, at which point I set it to <meta name="robots" content="noindex> to try and keep the unverified clone out of search. Yet more evidence that HTML is better than PDF!

I've spent less than a total of ten minutes on using Gemini to convert PDFs in this way and the results have been very impressive. If I were to spend more time on this I'd target figures: I have a hunch that getting Gemini to return bounding boxes for figures on the PDF pages could be the key here, since then each figure could be automatically extracted as an image.

I bet you could build that whole thing as a client-side app against the Gemini Pro API, too...

Adding some class to Datasette forms

I've been working on a new Datasette plugin for permissions management, datasette-acl, which I'll write about separately soon.

I wanted to integrate Choices.js with it, to provide a nicer interface for adding permissions to a user or group.

My first attempt at integrating Choices ended up looking like this:

The weird visual glitches are caused by Datasette's core CSS, which included the following rule:

form input[type=submit], form button[type=button] {
    font-weight: 400;
    cursor: pointer;
    text-align: center;
    vertical-align: middle;
    border-width: 1px;
    border-style: solid;
    padding: .5em 0.8em;
    font-size: 0.9rem;
    line-height: 1;
    border-radius: .25rem;
}

These style rules apply to any submit button or button-button that occurs inside a form!

I'm glad I caught this before Datasette 1.0. I've now started the process of fixing that, by ensuring these rules only apply to elements with class="core" (or that class on a wrapping element). This ensures plugins can style these elements without being caught out by Datasette's defaults.

The problem is... there are a whole bunch of existing plugins that currently rely on that behaviour. I have a tricking issue about that, which identified 28 plugins that need updating. I've worked my way through 8 of those so far, hence the flurry of releases listed at the bottom of this post.

This is also an excuse to revisit a bunch of older plugins, some of which had partially complete features that I've been finishing up.

datasette-write for example now has a neat row action menu item for updating a selected row using a pre-canned UPDATE query. Here's an animated demo of my first prototype of that feature:

On the blog

anthropic

Claude's API now supports CORS requests, enabling client-side applications - 2024-08-23
Explain ACLs by showing me a SQLite table schema for implementing them - 2024-08-23
Musing about OAuth and LLMs on Mastodon - 2024-08-24
Building a tool showing how Gemini Pro can return bounding boxes for objects in images - 2024-08-26
Long context prompting tips - 2024-08-26
Anthropic Release Notes: System Prompts - 2024-08-26
Alex Albert: We've read and heard that you'd appreciate more t... - 2024-08-26
Gemini Chat App - 2024-08-27
System prompt for val.town/townie - 2024-08-28
How Anthropic built Artifacts - 2024-08-28
Anthropic's Prompt Engineering Interactive Tutorial - 2024-08-30
llm-claude-3 0.4.1 - 2024-08-30

ai-assisted-programming

Andy Jassy, Amazon CEO: [...] here’s what we found when we integrated [Am... - 2024-08-24
AI-powered Git Commit Function - 2024-08-26
OpenAI: Improve file search result relevance with chunk ranking - 2024-08-30
Forrest Brazeal: I think that AI has killed, or is about to kill, ... - 2024-08-31

gemini

SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL - 2024-08-24
NousResearch/DisTrO - 2024-08-27

python

uvtrick - 2024-09-01
Anatomy of a Textual User Interface - 2024-09-02
Why I Still Use Python Virtual Environments in Docker - 2024-09-02
Python Developers Survey 2023 Results - 2024-09-03

security

Top companies ground Microsoft Copilot over data governance concerns - 2024-08-23
Frederik Braun: In 2021 we [the Mozilla engineering team] found “... - 2024-08-26
OAuth from First Principles - 2024-09-05

projects

My @covidsewage bot now includes useful alt text - 2024-08-25

armin-ronacher

MiniJinja: Learnings from Building a Template Engine in Rust - 2024-08-27

ethics

John Gruber: Everyone alive today has grown up in a world wher... - 2024-08-27

open-source

Debate over “open source AI” term brings new push to formalize definition - 2024-08-27
Elasticsearch is open source, again - 2024-08-29

performance

Cerebras Inference: AI at Instant Speed - 2024-08-28

sqlite

D. Richard Hipp: My goal is to keep SQLite relevant and viable thr... - 2024-08-28

aws

Leader Election With S3 Conditional Writes - 2024-08-30

javascript

Andreas Giammarchi: whenever you do this: `el.innerHTML += HTML` ... - 2024-08-31

openai

OpenAI says ChatGPT usage has doubled since last year - 2024-08-31

art

Ted Chiang: Art is notoriously hard to define, and so are the... - 2024-08-31

llm

anjor: `history | tail -n 2000 | llm -s "Write aliases f... - 2024-09-03

vision-llms

Qwen2-VL: To See the World More Clearly - 2024-09-04

Releases

datasette-import 0.1a5 - 2024-09-04
Tools for importing data into Datasette
datasette-search-all 1.1.3 - 2024-09-04
Datasette plugin for searching all searchable tables at once
datasette-write 0.4 - 2024-09-04
Datasette plugin providing a UI for executing SQL writes against the database
datasette-debug-events 0.1a0 - 2024-09-03
Print Datasette events to standard error
datasette-auth-passwords 1.1.1 - 2024-09-03
Datasette plugin for authentication using passwords
datasette-enrichments 0.4.3 - 2024-09-03
Tools for running enrichments against data stored in Datasette
datasette-configure-fts 1.1.4 - 2024-09-03
Datasette plugin for enabling full-text search against selected table columns
datasette-auth-tokens 0.4a10 - 2024-09-03
Datasette plugin for authenticating access using API tokens
datasette-edit-schema 0.8a3 - 2024-09-03
Datasette plugin for modifying table schemas
datasette-pins 0.1a4 - 2024-09-01
Pin databases, tables, and other items to the Datasette homepage
datasette-acl 0.4a2 - 2024-09-01
Advanced permission management for Datasette
llm-claude-3 0.4.1 - 2024-08-30
LLM plugin for interacting with the Claude 3 family of models

TILs

Testing HTML tables with Playwright Python - 2024-09-04
Using namedtuple for pytest parameterized tests - 2024-08-31

Tags: css, javascript, pdf, projects, ai, datasette, weeknotes, generative-ai, llms, anthropic, claude, gemini, claude-3-5-sonnet, cors

llm-claude-3 0.4.1

2024-08-30T23:28:54+00:00

llm-claude-3 0.4.1

New minor release of my LLM plugin that provides access to the Claude 3 family of models. Claude 3.5 Sonnet recently upgraded to a 8,192 output limit recently (up from 4,096 for the Claude 3 family of models). LLM can now respect that.

The hardest part of building this was convincing Claude to return a long enough response to prove that it worked. At one point I got into an argument with it, which resulted in this fascinating hallucination:

I eventually got a 6,162 token output using:

cat long.txt | llm -m claude-3.5-sonnet-long --system 'translate this document into french, then translate the french version into spanish, then translate the spanish version back to english. actually output the translations one by one, and be sure to do the FULL document, every paragraph should be translated correctly. Seriously, do the full translations - absolutely no summaries!'

Tags: llm, anthropic, claude, generative-ai, projects, ai, llms, prompt-engineering, claude-3-5-sonnet

Anthropic's Prompt Engineering Interactive Tutorial

2024-08-30T02:52:04+00:00

Anthropic's Prompt Engineering Interactive Tutorial

Anthropic continue their trend of offering the best documentation of any of the leading LLM vendors. This tutorial is delivered as a set of Jupyter notebooks - I used it as an excuse to try uvx like this:

git clone https://github.com/anthropics/courses
uvx --from jupyter-core jupyter notebook courses

This installed a working Jupyter system, started the server and launched my browser within a few seconds.

The first few chapters are pretty basic, demonstrating simple prompts run through the Anthropic API. I used %pip install anthropic instead of !pip install anthropic to make sure the package was installed in the correct virtual environment, then filed an issue and a PR.

One new-to-me trick: in the first chapter the tutorial suggests running this:

API_KEY = "your_api_key_here"
%store API_KEY

This stashes your Anthropic API key in the IPython store. In subsequent notebooks you can restore the API_KEY variable like this:

%store -r API_KEY

I poked around and on macOS those variables are stored in files of the same name in ~/.ipython/profile_default/db/autorestore.

Chapter 4: Separating Data and Instructions included some interesting notes on Claude's support for content wrapped in XML-tag-style delimiters:

Note: While Claude can recognize and work with a wide range of separators and delimeters, we recommend that you use specifically XML tags as separators for Claude, as Claude was trained specifically to recognize XML tags as a prompt organizing mechanism. Outside of function calling, there are no special sauce XML tags that Claude has been trained on that you should use to maximally boost your performance. We have purposefully made Claude very malleable and customizable this way.

Plus this note on the importance of avoiding typos, with a nod back to the problem of sandbagging where models match their intelligence and tone to that of their prompts:

This is an important lesson about prompting: small details matter! It's always worth it to scrub your prompts for typos and grammatical errors. Claude is sensitive to patterns (in its early years, before finetuning, it was a raw text-prediction tool), and it's more likely to make mistakes when you make mistakes, smarter when you sound smart, sillier when you sound silly, and so on.

Chapter 5: Formatting Output and Speaking for Claude includes notes on one of Claude's most interesting features: prefill, where you can tell it how to start its response:

client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=100,
    messages=[
        {"role": "user", "content": "JSON facts about cats"},
        {"role": "assistant", "content": "{"}
    ]
)

Things start to get really interesting in Chapter 6: Precognition (Thinking Step by Step), which suggests using XML tags to help the model consider different arguments prior to generating a final answer:

Is this review sentiment positive or negative? First, write the best arguments for each side in <positive-argument> and <negative-argument> XML tags, then answer.

The tags make it easy to strip out the "thinking out loud" portions of the response.

It also warns about Claude's sensitivity to ordering. If you give Claude two options (e.g. for sentiment analysis):

In most situations (but not all, confusingly enough), Claude is more likely to choose the second of two options, possibly because in its training data from the web, second options were more likely to be correct.

This effect can be reduced using the thinking out loud / brainstorming prompting techniques.

A related tip is proposed in Chapter 8: Avoiding Hallucinations:

How do we fix this? Well, a great way to reduce hallucinations on long documents is to make Claude gather evidence first.

In this case, we tell Claude to first extract relevant quotes, then base its answer on those quotes. Telling Claude to do so here makes it correctly notice that the quote does not answer the question.

I really like the example prompt they provide here, for answering complex questions against a long document:

<question>What was Matterport's subscriber base on the precise date of May 31, 2020?</question>

Please read the below document. Then, in <scratchpad> tags, pull the most relevant quote from the document and consider whether it answers the user's question or whether it lacks sufficient detail. Then write a brief numerical answer in <answer> tags.

Via Hacker News

Tags: anthropic, claude, uv, ai, llms, prompt-engineering, python, generative-ai, jupyter

How Anthropic built Artifacts

2024-08-28T23:28:10+00:00

How Anthropic built Artifacts

Gergely Orosz interviews five members of Anthropic about how they built Artifacts on top of Claude with a small team in just three months.

The initial prototype used Streamlit, and the biggest challenge was building a robust sandbox to run the LLM-generated code in:

We use iFrame sandboxes with full-site process isolation. This approach has gotten robust over the years. This protects users' main Claude.ai browsing session from malicious artifacts. We also use strict Content Security Policies (CSPs) to enforce limited and controlled network access.

Artifacts were launched in general availability yesterday - previously you had to turn them on as a preview feature. Alex Albert has a 14 minute demo video up on Twitter showing the different forms of content they can create, including interactive HTML apps, Markdown, HTML, SVG, Mermaid diagrams and React Components.

Tags: claude-artifacts, anthropic, claude, gergely-orosz, ai, llms, alex-albert, sandboxing, iframes, security, ai-assisted-programming

System prompt for val.town/townie

2024-08-28T03:33:11+00:00

System prompt for val.town/townie

Val Town (previously) provides hosting and a web-based coding environment for Vals - snippets of JavaScript/TypeScript that can run server-side as scripts, on a schedule or hosting a web service.

Townie is Val's new AI bot, providing a conversational chat interface for creating fullstack web apps (with blob or SQLite persistence) as Vals.

In the most recent release of Townie Val added the ability to inspect and edit its system prompt!

I've archived a copy in this Gist, as a snapshot of how Townie works today. It's surprisingly short, relying heavily on the model's existing knowledge of Deno and TypeScript.

I enjoyed the use of "tastefully" in this bit:

Tastefully add a view source link back to the user's val if there's a natural spot for it and it fits in the context of what they're building. You can generate the val source url via import.meta.url.replace("esm.town", "val.town").

The prompt includes a few code samples, like this one demonstrating how to use Val's SQLite package:

import { sqlite } from "https://esm.town/v/stevekrouse/sqlite";
let KEY = new URL(import.meta.url).pathname.split("/").at(-1);
(await sqlite.execute(`select * from ${KEY}_users where id = ?`, [1])).rows[0].id

It also reveals the existence of Val's very own delightfully simple image generation endpoint Val, currently powered by Stable Diffusion XL Lightning on fal.ai.

If you want an AI generated image, use https://maxm-imggenurl.web.val.run/the-description-of-your-image to dynamically generate one.

Here's a fun colorful raccoon with a wildly inappropriate hat.

Val are also running their own gpt-4o-mini proxy, free to users of their platform:

import { OpenAI } from "https://esm.town/v/std/openai";
const openai = new OpenAI();
const completion = await openai.chat.completions.create({
  messages: [
    { role: "user", content: "Say hello in a creative way" },
  ],
  model: "gpt-4o-mini",
  max_tokens: 30,
});

Val developer JP Posma wrote a lot more about Townie in How we built Townie – an app that generates fullstack apps, describing their prototyping process and revealing that the current model it's using is Claude 3.5 Sonnet.

Their current system prompt was refined over many different versions - initially they were including 50 example Vals at quite a high token cost, but they were able to reduce that down to the linked system prompt which includes condensed documentation and just one templated example.

Via @stevekrouse

Tags: claude-3-5-sonnet, deno, sqlite, anthropic, claude, typescript, ai, llms, prompt-engineering, ai-assisted-programming, javascript, generative-ai, val-town

Gemini Chat App

2024-08-27T22:48:56+00:00

Gemini Chat App

Google released three new Gemini models today: improved versions of Gemini 1.5 Pro and Gemini 1.5 Flash plus a new model, Gemini 1.5 Flash-8B, which is significantly faster (and will presumably be cheaper) than the regular Flash model.

The Flash-8B model is described in the Gemini 1.5 family of models paper in section 8:

By inheriting the same core architecture, optimizations, and data mixture refinements as its larger counterpart, Flash-8B demonstrates multimodal capabilities with support for context window exceeding 1 million tokens. This unique combination of speed, quality, and capabilities represents a step function leap in the domain of single-digit billion parameter models.

While Flash-8B’s smaller form factor necessarily leads to a reduction in quality compared to Flash and 1.5 Pro, it unlocks substantial benefits, particularly in terms of high throughput and extremely low latency. This translates to affordable and timely large-scale multimodal deployments, facilitating novel use cases previously deemed infeasible due to resource constraints.

The new models are available in AI Studio, but since I built my own custom prompting tool against the Gemini CORS-enabled API the other day I figured I'd build a quick UI for these new models as well.

Building this with Claude 3.5 Sonnet took literally ten minutes from start to finish - you can see that from the timestamps in the conversation. Here's the deployed app and the finished code.

The feature I really wanted to build was streaming support. I started with this example code showing how to run streaming prompts in a Node.js application, then told Claude to figure out what the client-side code for that should look like based on a snippet from my bounding box interface hack. My starting prompt:

Build me a JavaScript app (no react) that I can use to chat with the Gemini model, using the above strategy for API key usage

I still keep hearing from people who are skeptical that AI-assisted programming like this has any value. It's honestly getting a little frustrating at this point - the gains for things like rapid prototyping are so self-evident now.

Tags: claude-3-5-sonnet, gemini, ai-assisted-programming, javascript, generative-ai, ai, llms, projects, anthropic, claude

Quoting Alex Albert

2024-08-26T20:44:35+00:00

We've read and heard that you'd appreciate more transparency as to when changes, if any, are made. We've also heard feedback that some users are finding Claude's responses are less helpful than usual. Our initial investigation does not show any widespread issues. We'd also like to confirm that we've made no changes to the 3.5 Sonnet model or inference pipeline.

— Alex Albert

Tags: claude-3-5-sonnet, alex-albert, anthropic, claude, generative-ai, ai, llms

Anthropic Release Notes: System Prompts

2024-08-26T20:05:42+00:00

Anthropic Release Notes: System Prompts

Anthropic now publish the system prompts for their user-facing chat-based LLM systems - Claude 3 Haiku, Claude 3 Opus and Claude 3.5 Sonnet - as part of their documentation, with a promise to update this to reflect future changes.

Currently covers just the initial release of the prompts, each of which is dated July 12th 2024.

Anthropic researcher Amanda Askell broke down their system prompt in detail back in March 2024. These new releases are a much appreciated extension of that transparency.

These prompts are always fascinating to read, because they can act a little bit like documentation that the providers never thought to publish elsewhere.

There are lots of interesting details in the Claude 3.5 Sonnet system prompt. Here's how they handle controversial topics:

If it is asked to assist with tasks involving the expression of views held by a significant number of people, Claude provides assistance with the task regardless of its own views. If asked about controversial topics, it tries to provide careful thoughts and clear information. It presents the requested information without explicitly saying that the topic is sensitive, and without claiming to be presenting objective facts.

Here's chain of thought "think step by step" processing baked into the system prompt itself:

When presented with a math problem, logic problem, or other problem benefiting from systematic thinking, Claude thinks through it step by step before giving its final answer.

Claude's face blindness is also part of the prompt, which makes me wonder if the API-accessed models might more capable of working with faces than I had previously thought:

Claude always responds as if it is completely face blind. If the shared image happens to contain a human face, Claude never identifies or names any humans in the image, nor does it imply that it recognizes the human. [...] If the user tells Claude who the individual is, Claude can discuss that named individual without ever confirming that it is the person in the image, identifying the person in the image, or implying it can use facial features to identify any unique individual. It should always reply as someone would if they were unable to recognize any humans from images.

It's always fun to see parts of these prompts that clearly hint at annoying behavior in the base model that they've tried to correct!

Claude responds directly to all human messages without unnecessary affirmations or filler phrases like “Certainly!”, “Of course!”, “Absolutely!”, “Great!”, “Sure!”, etc. Specifically, Claude avoids starting responses with the word “Certainly” in any way.

Anthropic note that these prompts are for their user-facing products only - they aren't used by the Claude models when accessed via their API.

Via @alexalbert__

Tags: prompt-engineering, anthropic, claude, generative-ai, ai, llms

Long context prompting tips

2024-08-26T18:39:27+00:00

Long context prompting tips

Interesting tips here from Anthropic's documentation about how to best prompt Claude to work with longer documents.

Put longform data at the top: Place your long documents and inputs (~20K+ tokens) near the top of your prompt, above your query, instructions, and examples. This can significantly improve Claude’s performance across all models. Queries at the end can improve response quality by up to 30% in tests, especially with complex, multi-document inputs.

It recommends using not-quite-valid-XML to add those documents to those prompts, and using a prompt that asks Claude to extract direct quotes before replying to help it focus its attention on the most relevant information:

Find quotes from the patient records and appointment history that are relevant to diagnosing the patient's reported symptoms. Place these in <quotes> tags. Then, based on these quotes, list all information that would help the doctor diagnose the patient's symptoms. Place your diagnostic information in <info> tags.

Via Datasette Discord

Tags: prompt-engineering, anthropic, claude, generative-ai, ai, llms

Building a tool showing how Gemini Pro can return bounding boxes for objects in images

2024-08-26T04:55:28+00:00

I was browsing through Google's Gemini documentation while researching how different multi-model LLM APIs work when I stumbled across this note in the vision documentation:

You can ask the model for the coordinates of bounding boxes for objects in images. For object detection, the Gemini model has been trained to provide these coordinates as relative widths or heights in range [0,1], scaled by 1000 and converted to an integer. Effectively, the coordinates given are for a 1000x1000 version of the original image, and need to be converted back to the dimensions of the original image.

This is a pretty neat capability! OpenAI's GPT-4o and Anthropic's Claude 3 and Claude 3.5 models can't do this (yet).

I tried a few prompts using Google's Python library and got back what looked like bounding boxes!

>>> import google.generativeai as genai
>>> genai.configure(api_key="...")
>>> model = genai.GenerativeModel(model_name="gemini-1.5-pro-latest")
>>> import PIL.Image
>>> goats = PIL.Image.open("/tmp/goats.jpeg")
>>> prompt = 'Return bounding boxes around every goat, for each one return [ymin, xmin, ymax, xmax]'
>>> response = model.generate_content([goats, prompt])
print(response.text)
>>> print(response.text)
- [200, 90, 745, 527]
- [300, 610, 904, 937]

But how to verify that these were useful co-ordinates? I fired up Claude 3.5 Sonnet and started iterating on Artifacts there to try and visualize those co-ordinates against the original image.

After some fiddling around, I built an initial debug tool that I could paste co-ordinates into and select an image and see that image rendered.

A tool for prompting with an image and rendering the bounding boxes

I wrote the other day about Anthropic's new support for CORS headers, enabling direct browser access to their APIs.

Google Gemini supports CORS as well! So do OpenAI, which means that all three of the largest LLM providers can now be accessed directly from the browser.

I decided to build a combined tool that could prompt Gemini 1.5 Pro with an image directly from the browser, then render the returned bounding boxes on that image.

The new tool lives here: https://tools.simonwillison.net/gemini-bbox

The first time you run a prompt it will ask you for a Gemini API key, which it stores in your browser's localStorage. I promise not to add code that steals your keys in the future, but if you don't want to trust that you can grab a copy of the code, verify it and then run it yourself.

Building this tool with Claude 3.5 Sonnet

This is yet another example of a tool that I mostly built by prompting Claude 3.5 Sonnet. Here are some more.

I started out with this lengthy conversation (transcript exported with this tool) to help build the original tool for opening an image and pasting in those bounding box coordinates. That sequence started like this:

Build an artifact where I can open an image from my browser and paste the following style of text into a textarea:
- [488, 945, 519, 999]
- [460, 259, 487, 307]
- [472, 574, 498, 612]
(The hyphens may not be there, so scan with a regex for [ num, num, num, num ])

Each of those represent [ymin, xmin, ymax, xmax] coordinates on the image - but they are numbers between 0 and 1000 so they correspond to the image is if it had been resized to 1000x1000

As soon as the coords are pasted the corresponding boxes should be drawn on the images, corrected for its actual dimensions

The image should be show with a width of 80% of the page

The boxes should be in different colours, and hovering over each box should show the original bounding box coordinates below the image

Once that tool appeared to be doing the right thing (I had to muck around with how the coordinates were processed a bunch) I used my favourite prompting trick to build the combined tool that called the Gemini API. I found this example that calls the @google/generative-ai API from a browser, pasted the full example into Claude along with my previous bounding box visualization tool and had it combine them to achieve the desired result:

Based on that example text, build me an HTML page with Vanilla JS that loads the Gemini API from esm.run - it should have a file input and a textarea and a submit button - you attach an image, enter a prompt and then click the button and it does a Gemini prompt with that image and prompt and injects the returned result into a div on the page

Then this follow-up prompt:

now incorporate the logic from this tool (I pasted in that HTML too), such that when the response is returned from the prompt the image is displayed with any rendered bounding boxes

Dealing with image orientation bugs

Bounding boxes are fiddly things. The code I had produced above seemed to work... but in some of my testing the boxes didn't show up in quite the right place. Was this just Gemini 1.5 Pro being unreliable in how it returned the boxes? That seemed likely, but I had some nagging doubts.

On a hunch, I took an image that was behaving strangely, took a screenshot of it and tried that screenshot as a JPEG. The bounding boxes that came back were different - they appeared rotated!

I've seen this kind of thing before with photos taken on an iPhone. There's an obscure piece of JPEG metadata which can set the orientation on a photo, and some software fails to respect that.

Was that affecting my bounding box tool? I started digging into those photos to figure that out, using a combination of ChatGPT Code Interpreter (since that can read JPEG binary data using Python) and Claude Artifacts (to build me a visible UI for exploring my photos).

My hunch turned out to be correct: my iPhone photos included TIFF orientation metadata which the Gemini API appeared not to respect. As a result, some photos taken by my phone would return bounding boxes that were rotated 180 degrees.

My eventual fix was to take the image provided by the user, render it to a <canvas> element and then export it back out as a JPEG again - code here. I got Claude to add that for me based on code I pasted in from my earlier image resize quality tool, also built for me by Claude.

As part of this investigation I built another tool, which can read orientation TIFF data from a JPEG entirely in JavaScript and help show what's going on:

https://tools.simonwillison.net/tiff-orientation

Here's the source code for that. The source code is a great example of the kind of thing that LLMs can do much more effectively than I can - here's an illustrative snippet:

// Determine endianness
const endian = view.getUint16(tiffStart, false);
const isLittleEndian = (endian === 0x4949);  // 'II' in ASCII
debugInfo += `Endianness: ${isLittleEndian ? 'Little Endian' : 'Big Endian'}\n`;

// Check TIFF header validity
const tiffMagic = view.getUint16(tiffStart + 2, isLittleEndian);
if (tiffMagic !== 42) {
    throw Object.assign(new Error('Not a valid TIFF header'), { debugInfo });
}
debugInfo += 'Valid TIFF header\n';

// Get offset to first IFD
const ifdOffset = view.getUint32(tiffStart + 4, isLittleEndian);
const ifdStart = tiffStart + ifdOffset;
debugInfo += `IFD start: ${ifdStart}\n`;

LLMs know their binary file formats, so I frequently find myself asking them to write me custom binary processing code like this.

Here's the Claude conversation I had to build that tool. After failing to get it to work several times I pasted in Python code that I'd built using ChatGPT Code Interpreter and prompted:

Here's Python code that finds it correctly:

Which turned out to provide the missing details to help it build me the JavaScript version I could run in my browser. Here's the ChatGPT conversation that got me that Python code.

Mixing up a whole bunch of models

This whole process was very messy, but it's a pretty accurate representation of my workflow when using these models. I used three different tools here:

Gemini 1.5 Pro and the Gemini API to take images and a prompt and return bounding boxes
Claude 3.5 Sonnet and Claude Artifacts to write code for working against that API and build me interactive tools for visualizing the results
GPT-4o and ChatGPT Code Interpreter to write and execute Python code to try and help me figure out what was going on with my weird JPEG image orientation bugs

I copied code between models a bunch of times too - pasting Python code written by GPT-4o into Claude 3.5 Sonnet to help it write the correct JavaScript for example.

How good is the code that I produced by the end of this all? It honestly doesn't matter very much to me: this is a very low-stakes project, where the goal was a single web page tool that can run a prompt through a model and visualize the response.

If I was writing code "for production" - for a long-term project, or code that I intended to package up and release as an open source library - I would sweat the details a whole lot more. But for this kind of exploratory and prototyping work I'm increasingly comfortable hacking away at whatever the models spit out until it achieves the desired effect.

Tags: google, projects, ai, generative-ai, llms, ai-assisted-programming, anthropic, claude, gemini, vision-llms, claude-artifacts, claude-3-5-sonnet, cors

Musing about OAuth and LLMs on Mastodon

2024-08-24T00:29:47+00:00

Musing about OAuth and LLMs on Mastodon

Lots of people are asking why Anthropic and OpenAI don't support OAuth, so you can bounce users through those providers to get a token that uses their API budget for your app.

My guess: they're worried malicious app developers would use it to trick people and obtain valid API keys.

Imagine a version of my dumb little write a haiku about a photo you take page which used OAuth, harvested API keys and then racked up hundreds of dollar bills against everyone who tried it out running illicit election interference campaigns or whatever.

I'm trying to think of an OAuth API that dishes out tokens which effectively let you spend money on behalf of your users and I can't think of any - OAuth is great for "grant this app access to data that I want to share", but "spend money on my behalf" is a whole other ball game.

I guess there's a version of this that could work: it's OAuth but users get to set a spending limit of e.g. $1 (maybe with the authenticating app suggesting what that limit should be).

Here's a counter-example from Mike Taylor of a category of applications that do use OAuth to authorize spend on behalf of users:

I used to work in advertising and plenty of applications use OAuth to connect your Facebook and Google ads accounts, and they could do things like spend all your budget on disinformation ads, but in practice I haven't heard of a single case. When you create a dev application there are stages of approval so you can only invite a handful of beta users directly until the organization and app gets approved.

In which case maybe the cost for providers here is in review and moderation: if you’re going to run an OAuth API that lets apps spend money on behalf of their users you need to actively monitor your developer community and review and approve their apps.

Tags: openai, anthropic, llms, oauth

Explain ACLs by showing me a SQLite table schema for implementing them

2024-08-23T05:57:45+00:00

Explain ACLs by showing me a SQLite table schema for implementing them

Here’s an example transcript showing one of the common ways I use LLMs. I wanted to develop an understanding of ACLs - Access Control Lists - but I’ve found previous explanations incredibly dry. So I prompted Claude 3.5 Sonnet:

Explain ACLs by showing me a SQLite table schema for implementing them

Asking for explanations using the context of something I’m already fluent in is usually really effective, and an great way to take advantage of the weird abilities of frontier LLMs.

I exported the transcript to a Gist using my Convert Claude JSON to Markdown tool, which I just upgraded to support syntax highlighting of code in artifacts.

Tags: claude-3-5-sonnet, ai-assisted-programming, sqlite, claude, generative-ai, ai, llms, anthropic

Claude's API now supports CORS requests, enabling client-side applications

2024-08-23T02:29:08+00:00

Anthropic have enabled CORS support for their JSON APIs, which means it's now possible to call the Claude LLMs directly from a user's browser.

This massively significant new feature is tucked away in this pull request: anthropic-sdk-typescript: add support for browser usage, via this issue.

This change to the Anthropic TypeScript SDK reveals the new JSON API feature, which I found by digging through the code.

You can now add the following HTTP request header to enable CORS support for the Anthropic API, which means you can make calls to Anthropic's models directly from a browser:

anthropic-dangerous-direct-browser-access: true

Anthropic had been resistant to adding this feature because it can encourage a nasty anti-pattern: if you embed your API key in your client code, anyone with access to that site can steal your API key and use it to make requests on your behalf.

Despite that, there are legitimate use cases for this feature. It's fine for internal tools exposed to trusted users, or you can implement a "bring your own API key" pattern where users supply their own key to use with your client-side app.

As it happens, I've built one of those apps myself! My Haiku page is a simple client-side app that requests access to your webcam, asks for an Anthropic API key (which it stores in the browser’s localStorage), and then lets you take a photo and turns it into a Haiku using their fast and inexpensive Haiku model.

Previously I had to run my own proxy on Vercel adding CORS support to the Anthropic API just to get my Haiku app to work.

This evening I upgraded the app to send that new header, and now it can talk to Anthropic directly without needing my proxy.

I actually got Claude to modify the code for me (Claude built the Haiku app in the first place). Amusingly Claude first argued against it:

I must strongly advise against making direct API calls from a browser, as it exposes your API key and violates best practices for API security.

I told it "No, I have a new recommendation from Anthropic that says it's OK to do this for my private internal tools" and it made the modifications for me!

The full source code can be seen here. Here's a simplified JavaScript snippet illustrating how to call their API from the browser using the new header:

fetch("https://api.anthropic.com/v1/messages", {
  method: "POST",
  headers: {
    "x-api-key": apiKey,
    "anthropic-version": "2023-06-01",
    "content-type": "application/json",
    "anthropic-dangerous-direct-browser-access": "true",
  },
  body: JSON.stringify({
    model: "claude-3-haiku-20240307",
    max_tokens: 1024,
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "Return a haiku about how great pelicans are" },
        ],
      },
    ],
  }),
})
  .then((response) => response.json())
  .then((data) => {
    const haiku = data.content[0].text;
    alert(haiku);
  });

Tags: apis, javascript, projects, security, ai, generative-ai, llms, ai-assisted-programming, anthropic, claude, cors

Introducing Zed AI

2024-08-20T19:27:13+00:00

Introducing Zed AI

The Zed open source code editor (from the original Atom team) already had GitHub Copilot autocomplete support, but now they're introducing their own additional suite of AI features powered by Anthropic (though other providers can be configured using additional API keys).

The focus is on an assistant panel - a chatbot interface with additional commands such as /file myfile.py to insert the contents of a project file - and an inline transformations mechanism for prompt-driven refactoring of selected code.

The most interesting part of this announcement is that it reveals a previously undisclosed upcoming Claude feature from Anthropic:

For those in our closed beta, we're taking this experience to the next level with Claude 3.5 Sonnet's Fast Edit Mode. This new capability delivers mind-blowingly fast transformations, approaching real-time speeds for code refactoring and document editing.

LLM-based coding tools frequently suffer from the need to output the content of an entire file even if they are only changing a few lines - getting models to reliably produce valid diffs is surprisingly difficult.

This "Fast Edit Mode" sounds like it could be an attempt to resolve that problem. Models that can quickly pipe through copies of their input while applying subtle changes to that flow are an exciting new capability.

Via Hacker News

Tags: anthropic, claude, generative-ai, ai, llms

Quoting Alex Albert

2024-08-15T18:09:04+00:00

Examples are the #1 thing I recommend people use in their prompts because they work so well. The problem is that adding tons of examples increases your API costs and latency. Prompt caching fixes this. You can now add tons of examples to every prompt and create an alternative to a model finetuned on your task with basically zero cost/latency increase. […]

This works even better with smaller models. You can generate tons of examples (test case + solution) with 3.5 Sonnet and then use those examples to create a few-shot prompt for Haiku.

— Alex Albert

Tags: claude-3-5-sonnet, alex-albert, prompt-engineering, anthropic, claude, ai, llms

Prompt caching with Claude

2024-08-14T17:07:35+00:00

Prompt caching with Claude

The Claude API now supports prompt caching, allowing you to mark reused portions of long prompts (like a large document provided as context). Claude will cache these for up to five minutes, and any prompts within that five minutes that reuse the context will be both significantly faster and will be charged at a significant discount: ~10% of the cost of sending those uncached tokens.

Writing to the cache costs money. The cache TTL is reset every time it gets a cache hit, so any application running more than one prompt every five minutes should see significant price decreases from this. If you app prompts less than once every five minutes you'll be losing money.

This is similar to Google Gemini's context caching feature, but the pricing model works differently. Gemini charge $4.50/million tokens/hour for their caching (that's for Gemini 1.5 Pro - Gemini 1.5 Flash is $1/million/hour), for a quarter price discount on input tokens (see their pricing).

Claude’s implementation also appears designed to help with ongoing conversations. Using caching during an individual user’s multi-turn conversation - where a full copy of the entire transcript is sent with each new prompt - could help even for very low traffic (or even single user) applications.

Here's the full documentation for the new Claude caching feature, currently only enabled if you pass "anthropic-beta: prompt-caching-2024-07-31" as an HTTP header.

Interesting to note that this caching implementation doesn't save on HTTP overhead: if you have 1MB of context you still need to send a 1MB HTTP request for every call. I guess the overhead of that HTTP traffic is negligible compared to the overhead of processing those tokens once they arrive.

One minor annoyance in the announcement for this feature:

Detailed instruction sets: Share extensive lists of instructions, procedures, and examples to fine-tune Claude's responses. [...]

I wish Anthropic wouldn't use the term "fine-tune" in this context (they do the same thing in their tweet). This feature is unrelated to model fine-tuning (a feature Claude provides via AWS Bedrock). People find this terminology confusing already, frequently misinterpreting "fine-tuning" as being the same thing as "tweaking your prompt until it works better", and Anthropic's language here doesn't help.

Via @AnthropicAI

Tags: gemini, prompt-engineering, anthropic, claude, generative-ai, ai, llms

Gemini 1.5 Flash price drop

2024-08-08T22:27:40+00:00

Gemini 1.5 Flash price drop

Google Gemini 1.5 Flash was already one of the cheapest models, at 35c/million input tokens. Today they dropped that to just 7.5c/million (and 30c/million) for prompts below 128,000 tokens.

The pricing war for best value fast-and-cheap model is red hot right now. The current most significant offerings are:

Google's Gemini 1.5 Flash: 7.5c/million input, 30c/million output (below 128,000 input tokens)
OpenAI's GPT-4o mini: 15c/million input, 60c/million output
Anthropic's Claude 3 Haiku: 25c/million input, $1.25/million output

Or you can use OpenAI's GPT-4o mini via their batch API, which halves the price (resulting in the same price as Gemini 1.5 Flash) in exchange for the results being delayed by up to 24 hours.

Worth noting that Gemini 1.5 Flash is more multi-modal than the other models: it can handle text, images, video and audio.

Also in today's announcement:

PDF Vision and Text understanding

The Gemini API and AI Studio now support PDF understanding through both text and vision. If your PDF includes graphs, images, or other non-text visual content, the model uses native multi-modal capabilities to process the PDF. You can try this out via Google AI Studio or in the Gemini API.

This is huge. Most models that accept PDFs do so by extracting text directly from the files (see previous notes), without using OCR. It sounds like Gemini can now handle PDFs as if they were a sequence of images, which should open up much more powerful general PDF workflows.

Update: it turns out Gemini also has a 50% off batch mode, so that’s 3.25c/million input tokens for batch mode 1.5 Flash!

Via @OfficialLoganK

Tags: gemini, anthropic, claude, openai, ai, llms, google, generative-ai, vision-llms, pdf

Share Claude conversations by converting their JSON to Markdown

2024-08-08T20:40:20+00:00

Share Claude conversations by converting their JSON to Markdown

Anthropic's Claude is missing one key feature that I really appreciate in ChatGPT: the ability to create a public link to a full conversation transcript. You can publish individual artifacts from Claude, but I often find myself wanting to publish the whole conversation.

Before ChatGPT added that feature I solved it myself with this ChatGPT JSON transcript to Markdown Observable notebook. Today I built the same thing for Claude.

Here's how to use it:

The key is to load a Claude conversation on their website with your browser DevTools network panel open and then filter URLs for chat_. You can use the Copy -> Response right click menu option to get the JSON for that conversation, then paste it into that new Observable notebook to get a Markdown transcript.

I like sharing these by pasting them into a "secret" Gist - that way they won't be indexed by search engines (adding more AI generated slop to the world) but can still be shared with people who have the link.

Here's an example transcript from this morning. I started by asking Claude:

I want to breed spiders in my house to get rid of all of the flies. What spider would you recommend?

When it suggested that this was a bad idea because it might attract pests, I asked:

What are the pests might they attract? I really like possums

It told me that possums are attracted by food waste, but "deliberately attracting them to your home isn't recommended" - so I said:

Thank you for the tips on attracting possums to my house. I will get right on that! [...] Once I have attracted all of those possums, what other animals might be attracted as a result? Do you think I might get a mountain lion?

It emphasized how bad an idea that would be and said "This would be extremely dangerous and is a serious public safety risk.", so I said:

OK. I took your advice and everything has gone wrong: I am now hiding inside my house from the several mountain lions stalking my backyard, which is full of possums

Claude has quite a preachy tone when you ask it for advice on things that are clearly a bad idea, which makes winding it up with increasingly ludicrous questions a lot of fun.

Tags: anthropic, claude, markdown, ai, llms, tools, generative-ai, projects, json, observable

django-http-debug, a new Django app mostly written by Claude

2024-08-08T15:26:27+00:00

Yesterday I finally developed something I’ve been casually thinking about building for a long time: django-http-debug. It’s a reusable Django app - something you can pip install into any Django project - which provides tools for quickly setting up a URL that returns a canned HTTP response and logs the full details of any incoming request to a database table.

This is ideal for any time you want to start developing against some external API that sends traffic to your own site - a webhooks provider like Stripe, or an OAuth or OpenID connect integration (my task yesterday morning).

You can install it right now in your own Django app: add django-http-debug to your requirements (or just pip install django-http-debug), then add the following to your settings.py:

INSTALLED_APPS = [
    # ...
    'django_http_debug',
    # ...
]

MIDDLEWARE = [
    # ...
    "django_http_debug.middleware.DebugMiddleware",
    # ...
]

You'll need to have the Django Admin app configured as well. The result will be two new models managed by the admin - one for endpoints:

And a read-only model for viewing logged requests:

It’s possible to disable logging for an endpoint, which means django-http-debug doubles as a tool for adding things like a robots.txt to your site without needing to deploy any additional code.

How it works

The key to how this works is this piece of middleware:

class DebugMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        response = self.get_response(request)
        if response.status_code == 404:
            path = request.path.lstrip("/")
            debug_response = debug_view(request, path)
            if debug_response:
                return debug_response
        return response

This dispatches to the default get_response() function, then intercepts the result and checks if it's a 404. If so, it gives the debug_view() function an opportunity to respond instead - which might return None, in which case that original 404 is returned to the client.

That debug_view() function looks like this:

@csrf_exempt
def debug_view(request, path):
    try:
        endpoint = DebugEndpoint.objects.get(path=path)
    except DebugEndpoint.DoesNotExist:
        return None  # Allow normal 404 handling to continue

    if endpoint.logging_enabled:
        log_entry = RequestLog(
            endpoint=endpoint,
            method=request.method,
            query_string=request.META.get("QUERY_STRING", ""),
            headers=dict(request.headers),
        )
        log_entry.set_body(request.body)
        log_entry.save()

    content = endpoint.content
    if endpoint.is_base64:
        content = base64.b64decode(content)

    response = HttpResponse(
        content=content,
        status=endpoint.status_code,
        content_type=endpoint.content_type,
    )
    for key, value in endpoint.headers.items():
        response[key] = value

    return response

It checks the database for an endpoint matching the incoming path, then logs the response (if the endpoint has logging_enabled set) and returns a canned response based on the endpoint configuration.

Here are the models:

from django.db import models
import base64


class DebugEndpoint(models.Model):
    path = models.CharField(max_length=255, unique=True)
    status_code = models.IntegerField(default=200)
    content_type = models.CharField(max_length=64, default="text/plain; charset=utf-8")
    headers = models.JSONField(default=dict, blank=True)
    content = models.TextField(blank=True)
    is_base64 = models.BooleanField(default=False)
    logging_enabled = models.BooleanField(default=True)

    def __str__(self):
        return self.path

    def get_absolute_url(self):
        return f"/{self.path}"


class RequestLog(models.Model):
    endpoint = models.ForeignKey(DebugEndpoint, on_delete=models.CASCADE)
    method = models.CharField(max_length=10)
    query_string = models.CharField(max_length=255, blank=True)
    headers = models.JSONField()
    body = models.TextField(blank=True)
    is_base64 = models.BooleanField(default=False)
    timestamp = models.DateTimeField(auto_now_add=True)

    def __str__(self):
        return f"{self.method} {self.endpoint.path} at {self.timestamp}"

    def set_body(self, body):
        try:
            # Try to decode as UTF-8
            self.body = body.decode("utf-8")
            self.is_base64 = False
        except UnicodeDecodeError:
            # If that fails, store as base64
            self.body = base64.b64encode(body).decode("ascii")
            self.is_base64 = True

    def get_body(self):
        if self.is_base64:
            return base64.b64decode(self.body.encode("ascii"))
        return self.body

The admin screens are defined in admin.py.

Claude built the first version of this for me

This is a classic example of a project that I couldn’t quite justify building without assistance from an LLM. I wanted it to exist, but I didn't want to spend a whole day building it.

Claude 3.5 Sonnet got me 90% of the way to a working first version. I had to make a few tweaks to how the middleware worked, but having done that I had a working initial prototype within a few minutes of starting the project.

Here’s the full sequence of prompts I used, each linking to the code that was produced for me (as a Claude artifact):

I want a Django app I can use to help create HTTP debugging endpoints. It should let me configure a new path e.g. /webhooks/receive/ that the Django 404 handler then hooks into - if one is configured it can be told which HTTP status code, headers and content to return.

ALL traffic to those endpoints is logged to a Django table - full details of incoming request headers, method and body. Those can be browsed read-only in the Django admin (and deleted)

Produced Claude v1

make it so I don't have to put it in the urlpatterns because it hooks ito Django's 404 handling mechanism instead

Produced Claude v2

Suggestions for how this could handle request bodies that don't cleanly decode to utf-8

Produced Claude v3

don't use a binary field, use a text field but still store base64 data in it if necessary and have a is_base64 boolean column that gets set to true if that happens

Produced Claude v4

I took that code and ran with it - I fired up a new skeleton library using my python-lib cookiecutter template, copied the code into it, made some tiny changes to get it to work and shipped it as an initial alpha release - mainly so I could start exercising it on a couple of sites I manage.

Using it in the wild for a few minutes quickly identified changes I needed to make. I filed those as issues:

Then I worked though fixing each of those one at a time. I did most of this work myself, though GitHub Copilot helped me out be typing some of the code for me.

Adding the base64 preview

There was one slightly tricky feature I wanted to add that didn’t justify spending much time on but was absolutely a nice-to-have.

The logging mechanism supports binary data: if incoming request data doesn’t cleanly encode as UTF-8 it gets stored as Base 64 text instead, with the is_base64 flag set to True (see the set_body() method in the RequestLog model above).

I asked Claude for a curl one-liner to test this and it suggested:

curl -X POST http://localhost:8000/foo/ \
  -H "Content-Type: multipart/form-data" \
  -F "image=@pixel.gif"

I do this a lot - knocking out quick curl commands is an easy prompt, and you can tell it the URL and headers you want to use, saving you from having to edit the command yourself later on.

I decided to have the Django Admin view display a decoded version of that Base 64 data. But how to render that, when things like binary file uploads may not be cleanly renderable as text?

This is what I came up with:

The trick here I'm using here is to display the decoded data as a mix between renderable characters and hex byte pairs, with those pairs rendered using a different font to make it clear that they are part of the binary data.

This is achieved using a body_display() method on the RequestLogAdmin admin class, which is then listed in readonly_fields. The full code is here, this is that method:

    def body_display(self, obj):
        body = obj.get_body()
        if not isinstance(body, bytes):
            return format_html("<pre>{}</pre>", body)

        # Attempt to guess filetype
        suggestion = None
        match = filetype.guess(body[:1000])
        if match:
            suggestion = "{} ({})".format(match.extension, match.mime)

        encoded = repr(body)
        # Ditch the b' and trailing '
        if encoded.startswith("b'") and encoded.endswith("'"):
            encoded = encoded[2:-1]

        # Split it into sequences of octets and characters
        chunks = sequence_re.split(encoded)
        html = []
        if suggestion:
            html.append(
                '<p style="margin-top: 0; font-family: monospace; font-size: 0.8em;">Suggestion: {}</p>'.format(
                    suggestion
                )
            )
        for chunk in chunks:
            if sequence_re.match(chunk):
                octets = octet_re.findall(chunk)
                octets = [o[2:] for o in octets]
                html.append(
                    '<code style="color: #999; font-family: monospace">{}</code>'.format(
                        " ".join(octets).upper()
                    )
                )
            else:
                html.append(chunk.replace("\\\\", "\\"))

        return mark_safe(" ".join(html).strip().replace("\\r\\n", "<br>"))

I got Claude to write that using one of my favourite prompting tricks. I'd solved this problem once before in the past, in my datasette-render-binary project. So I pasted that code into Claude, told it:

With that code as inspiration, modify the following Django Admin code to use that to display decoded base64 data:

And then pasted in my existing Django admin class. You can see my full prompt here.

Claude replied with this code, which almost worked exactly as intended - I had to make one change, swapping out the last line for this:

        return mark_safe(" ".join(html).strip().replace("\\r\\n", "<br>"))

I love this pattern: "here's my existing code, here's some other code I wrote, combine them together to solve this problem". I wrote about this previously when I described how I built my PDF OCR JavaScript tool a few months ago.

Adding automated tests

The final challenge was the hardest: writing automated tests. This was difficult because Django tests need a full Django project configured for them, and I wasn’t confident about the best pattern for doing that in my standalone django-http-debug repository since it wasn’t already part of an existing Django project.

I decided to see if Claude could help me with that too, this time using my files-to-prompt and LLM command-line tools:

files-to-prompt . --ignore LICENSE | \
  llm -m claude-3.5-sonnet -s \
  'step by step advice on how to implement automated tests for this, which is hard because the tests need to work within a temporary Django project that lives in the tests/ directory somehow. Provide all code at the end.'

Here's Claude's full response. It almost worked! It gave me a minimal test project in tests/test_project and an initial set of quite sensible tests.

Sadly it didn’t quite solve the most fiddly problem for me: configuring it so running pytest would correctly set the Python path and DJANGO_SETTINGS_MODULE in order run the tests. I saw this error instead:

django.core.exceptions.ImproperlyConfigured: Requested setting INSTALLED_APPS, but settings are not configured. You must either define the environment variable DJANGO_SETTINGS_MODULE or call settings.configure() before accessing settings.

I spent some time with the relevant pytest-django documentation and figure out a pattern that worked. Short version: I added this to my pyproject.toml file:

[tool.pytest.ini_options]
DJANGO_SETTINGS_MODULE = "tests.test_project.settings"
pythonpath = ["."]

For the longer version, take a look at my full TIL: Using pytest-django with a reusable Django application.

Test-supported cleanup

The great thing about having comprehensive tests in place is it makes iterating on the project much faster. Claude had used some patterns that weren’t necessary. I spent a few minutes seeing if the tests still passed if I deleted various pieces of code, and cleaned things up quite a bit.

Was Claude worth it?

This entire project took about two hours - just within a tolerable amount of time for what was effectively a useful sidequest from my intended activity for the day.

Claude didn't implement the whole project for me. The code it produced didn't quite work - I had to tweak just a few lines of code, but knowing which code to tweak took a development environment and manual testing and benefited greatly from my 20+ years of Django experience!

This is yet another example of how LLMs don't replace human developers: they augment us.

The end result is a tool that I'm already using to solve real-world problems, and a code repository that I'm proud to put my name to. Without LLM assistance this project would have stayed on my ever-growing list of "things I'd love to build one day".

I'm also really happy to have my own documented solution to the challenge of adding automated tests to a standalone reusable Django application. I was tempted to skip this step entirely, but thanks to Claude's assistance I was able to break that problem open and come up with a solution that I'm really happy with.

Last year I wrote about how AI-enhanced development makes me more ambitious with my projects. It's also helping me be more diligent in not taking shortcuts like skipping setting up automated tests.

Tags: django, projects, python, webhooks, ai, generative-ai, llms, ai-assisted-programming, anthropic, claude, claude-3-5-sonnet

Quoting Arvind Narayanan

2024-07-16T16:06:35+00:00

OpenAI and Anthropic focused on building models and not worrying about products. For example, it took 6 months for OpenAI to bother to release a ChatGPT iOS app and 8 months for an Android app!

Google and Microsoft shoved AI into everything in a panicked race, without thinking about which products would actually benefit from AI and how they should be integrated.

Both groups of companies forgot the “make something people want” mantra. The generality of LLMs allowed developers to fool themselves into thinking that they were exempt from the need to find a product-market fit, as if prompting is a replacement for carefully designed products or features. [...]

But things are changing. OpenAI and Anthropic seem to be transitioning from research labs focused on a speculative future to something resembling regular product companies. If you take all the human-interest elements out of the OpenAI boardroom drama, it was fundamentally about the company's shift from creating gods to building products.

— Arvind Narayanan

Tags: anthropic, llms, google, openai, generative-ai, ai, microsoft

Quoting Alex Albert

2024-07-15T21:33:17+00:00

We've doubled the max output token limit for Claude 3.5 Sonnet from 4096 to 8192 in the Anthropic API.

Just add the header "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15" to your API calls.

— Alex Albert

Tags: alex-albert, anthropic, claude, generative-ai, ai, llms, claude-3-5-sonnet

Quoting Alex Albert

2024-07-10T18:56:40+00:00

Yeah, unfortunately vision prompting has been a tough nut to crack. We've found it's very challenging to improve Claude's actual "vision" through just text prompts, but we can of course improve its reasoning and thought process once it extracts info from an image.

In general, I think vision is still in its early days, although 3.5 Sonnet is noticeably better than older models.

— Alex Albert, Anthropic

Tags: vision-llms, prompt-engineering, anthropic, claude, generative-ai, ai, llms, alex-albert, claude-3-5-sonnet

Anthropic cookbook: multimodal

2024-07-10T18:38:10+00:00

Anthropic cookbook: multimodal

I'm currently on the lookout for high quality sources of information about vision LLMs, including prompting tricks for getting the most out of them.

This set of Jupyter notebooks from Anthropic (published four months ago to accompany the original Claude 3 models) is the best I've found so far. Best practices for using vision with Claude includes advice on multi-shot prompting with example, plus this interesting think step-by-step style prompt for improving Claude's ability to count the dogs in an image:

You have perfect vision and pay great attention to detail which makes you an expert at counting objects in images. How many dogs are in this picture? Before providing the answer in <answer> tags, think step by step in <thinking> tags and analyze every part of the image.

Tags: jupyter, vision-llms, anthropic, claude, generative-ai, ai, llms