Simon Willison's Weblog: llm-pricing

New OpenAI feature: Predicted Outputs

2024-11-04T23:55:42+00:00

New OpenAI feature: Predicted Outputs

Interesting new ability of the OpenAI API - the first time I've seen this from any vendor.

If you know your prompt is mostly going to return the same content - you're requesting an edit to some existing code, for example - you can now send that content as a "prediction" and have GPT-4o or GPT-4o mini use that to accelerate the returned result.

OpenAI's documentation says:

When providing a prediction, any tokens provided that are not part of the final completion are charged at completion token rates.

I initially misunderstood this as meaning you got a price reduction in addition to the latency improvement, but that's not the case: in the best possible case it will return faster and you won't be charged anything extra over the expected cost for the prompt, but the more it differs from your permission the more extra tokens you'll be billed for.

I ran the example from the documentation both with and without the prediction and got these results. Without the prediction:

"usage": {
  "prompt_tokens": 150,
  "completion_tokens": 118,
  "total_tokens": 268,
  "completion_tokens_details": {
    "accepted_prediction_tokens": 0,
    "audio_tokens": null,
    "reasoning_tokens": 0,
    "rejected_prediction_tokens": 0
  }

That took 5.2 seconds and cost 0.1555 cents.

With the prediction:

"usage": {
  "prompt_tokens": 166,
  "completion_tokens": 226,
  "total_tokens": 392,
  "completion_tokens_details": {
    "accepted_prediction_tokens": 49,
    "audio_tokens": null,
    "reasoning_tokens": 0,
    "rejected_prediction_tokens": 107
  }

That took 3.3 seconds and cost 0.2675 cents.

Further details from OpenAI's Steve Coffey:

We are using the prediction to do speculative decoding during inference, which allows us to validate large batches of the input in parallel, instead of sampling token-by-token!

[...] If the prediction is 100% accurate, then you would see no cost difference. When the model diverges from your speculation, we do additional sampling to “discover” the net-new tokens, which is why we charge rejected tokens at completion time rates.

Via @OpenAIDevs

Tags: ai, openai, generative-ai, llms, llm-pricing

Claude 3.5 Haiku

2024-11-04T19:34:41+00:00

Anthropic released Claude 3.5 Haiku today, a few days later than expected (they said it would be out by the end of October).

I was expecting this to be a complete replacement for their existing Claude 3 Haiku model, in the same way that Claude 3.5 Sonnet eclipsed the existing Claude 3 Sonnet while maintaining the same pricing.

Claude 3.5 Haiku is different. First, it doesn't (yet) support image inputs - so Claude 3 Haiku remains the least expensive Anthropic model for handling those.

Secondly, it's not priced the same as the previous Haiku. That was $0.25/million input and $1.25/million for output - the new 3.5 Haiku is 4x that at $1/million input and $5/million output.

Anthropic tweeted:

During final testing, Haiku surpassed Claude 3 Opus, our previous flagship model, on many benchmarks—at a fraction of the cost.

As a result, we've increased pricing for Claude 3.5 Haiku to reflect its increase in intelligence.

Given that Anthropic claim that their new Haiku out-performs their older Claude 3 Opus (still $15/m input and $75/m output!) this price isn't disappointing, but it's a small surprise nonetheless.

Accessing Claude 3.5 Haiku with LLM

I released a new version of my llm-claude-3 plugin with support for the new model. You can install (or upgrade) the plugin and run it like this:

llm install --upgrade llm-claude-3
llm keys set claude
# Paste API key here
llm -m claude-3.5-haiku 'describe memory management in Rust'

Here's the output from that prompt.

Comparing prices

I added the new price to my LLM pricing calculator, which inspired me to extract this comparison table for the leading models from Gemini, Anthropic and OpenAI. Here they are sorted from least to most expensive:

Model	1m token input	1m tokens output
Gemini 1.5 Flash-8B	$0.04	$0.15
Gemini 1.5 Flash	$0.07	$0.30
GPT-4o Mini	$0.15	$0.60
Claude 3 Haiku	$0.25	$1.25
Claude 3.5 Haiku	$1.00	$5.00
Gemini 1.5 Pro	$1.25	$5.00
GPT-4o	$2.50	$10.00
Claude 3.5 Sonnet	$3.00	$15.00
Claude 3 Opus	$15.00	$75.00

Gemini 1.5 Flash-8B remains the model to beat on pricing: it's 1/6th of the price of the new Haiku - far less capable, but still extremely useful for tasks such as audio transcription.

Also notable from Anthropic's model comparison table: Claude 3.5 Haiku has a max output of 8,192 tokens (same as 3.5 Sonnet, but twice that of Claude 3 Opus and Claude 3 Haiku). 3.5 Haiku has a training cut-off date of July 2024, the most recent of any Anthropic model. 3.5 Sonnet is April 2024 and the Claude 3 family are all August 2023.

Tags: ai, openai, generative-ai, llms, llm, anthropic, claude, gemini, llm-pricing

Claude Token Counter

2024-11-02T18:52:50+00:00

Claude Token Counter

Anthropic released a token counting API for Claude a few days ago.

I built this tool for running prompts, images and PDFs against that API to count the tokens in them.

The API is free (albeit rate limited), but you'll still need to provide your own API key in order to use it.

Here's the source code. I built this using two sessions with Claude - one to build the initial tool and a second to add PDF and image support. That second one is a bit of a mess - it turns out if you drop an HTML file onto a Claude conversation it converts it to Markdown for you, but I wanted it to modify the original HTML source.

The API endpoint also allows you to specify a model, but as far as I can tell from running some experiments the token count was the same for Haiku, Opus and Sonnet 3.5.

Tags: tools, ai, generative-ai, llms, ai-assisted-programming, anthropic, claude, claude-artifacts, llm-pricing

You can now run prompts against images, audio and video in your terminal using LLM

2024-10-29T15:09:38+00:00

I released LLM 0.17 last night, the latest version of my combined CLI tool and Python library for interacting with hundreds of different Large Language Models such as GPT-4o, Llama, Claude and Gemini.

The signature feature of 0.17 is that LLM can now be used to prompt multi-modal models - which means you can now use it to send images, audio and video files to LLMs that can handle them.

Processing an image with gpt-4o-mini

Here's an example. First, install LLM - using brew install llm or pipx install llm or uv tool install llm, pick your favourite. If you have it installed already you made need to upgrade to 0.17, e.g. with brew upgrade llm.

Obtain an OpenAI key (or an alternative, see below) and provide it to the tool:

llm keys set openai
# paste key here

And now you can start running prompts against images.

llm 'describe this image' \
  -a https://static.simonwillison.net/static/2024/pelican.jpg

The -a option stands for --attachment. Attachments can be specified as URLs, as paths to files on disk or as - to read from data piped into the tool.

The above example uses the default model, gpt-4o-mini. I got back this:

The image features a brown pelican standing on rocky terrain near a body of water. The pelican has a distinct coloration, with dark feathers on its body and a lighter-colored head. Its long bill is characteristic of the species, and it appears to be looking out towards the water. In the background, there are boats, suggesting a marina or coastal area. The lighting indicates it may be a sunny day, enhancing the scene's natural beauty.

Here's that image:

You can run llm logs --json -c for a hint of how much that cost:

      "usage": {
        "completion_tokens": 89,
        "prompt_tokens": 14177,
        "total_tokens": 14266,

Using my LLM pricing calculator that came to 0.218 cents - less than a quarter of a cent.

Let's run that again with gpt-4o. Add -m gpt-4o to specify the model:

llm 'describe this image' \
  -a https://static.simonwillison.net/static/2024/pelican.jpg \
  -m gpt-4o

The image shows a pelican standing on rocks near a body of water. The bird has a large, long bill and predominantly gray feathers with a lighter head and neck. In the background, there is a docked boat, giving the impression of a marina or harbor setting. The lighting suggests it might be sunny, highlighting the pelican's features.

That time it cost 435 prompt tokens (GPT-4o mini charges higher tokens per image than GPT-4o) and the total was 0.1787 cents.

Using a plugin to run audio and video against Gemini

Models in LLM are defined by plugins. The application ships with a default OpenAI plugin to get people started, but there are dozens of other plugins providing access to different models, including models that can run directly on your own device.

Plugins need to be upgraded to add support for multi-modal input - here's documentation on how to do that. I've shipped three plugins with support for multi-modal attachments so far: llm-gemini, llm-claude-3 and llm-mistral (for Pixtral).

So far these are all remote API plugins. It's definitely possible to build a plugin that runs attachments through local models but I haven't got one of those into good enough condition to release just yet.

The Google Gemini series are my favourite multi-modal models right now due to the size and breadth of content they support. Gemini models can handle images, audio and video!

Let's try that out. Start by installing llm-gemini:

llm install llm-gemini

Obtain a Gemini API key. These include a free tier, so you can get started without needing to spend any money. Paste that in here:

llm keys set gemini
# paste key here

The three Gemini 1.5 models are called Pro, Flash and Flash-8B. Let's try it with Pro:

llm 'describe this image' \
  -a https://static.simonwillison.net/static/2024/pelican.jpg \
  -m gemini-1.5-pro-latest

A brown pelican stands on a rocky surface, likely a jetty or breakwater, with blurred boats in the background. The pelican is facing right, and its long beak curves downwards. Its plumage is primarily grayish-brown, with lighter feathers on its neck and breast. [...]

Very detailed!

But let's do something a bit more interesting. I shared a 7m40s MP3 of a NotebookLM podcast a few weeks ago. Let's use Flash-8B - the cheapest Gemini model - to try and obtain a transcript.

llm 'transcript' \
  -a https://static.simonwillison.net/static/2024/video-scraping-pelicans.mp3 \
  -m gemini-1.5-flash-8b-latest

It worked!

Hey everyone, welcome back. You ever find yourself wading through mountains of data, trying to pluck out the juicy bits? It's like hunting for a single shrimp in a whole kelp forest, am I right? Oh, tell me about it. I swear, sometimes I feel like I'm gonna go cross-eyed from staring at spreadsheets all day. [...]

Full output here.

Once again, llm logs -c --json will show us the tokens used. Here it's 14754 prompt tokens and 1865 completion tokens. The pricing calculator says that adds up to... 0.0833 cents. Less than a tenth of a cent to transcribe a 7m40s audio clip.

There's a Python API too

Here's what it looks like to execute multi-modal prompts with attachments using the LLM Python library:

import llm

model = llm.get_model("gpt-4o-mini")
response = model.prompt(
    "Describe these images",
    attachments=[
        llm.Attachment(path="pelican.jpg"),
        llm.Attachment(
            url="https://static.simonwillison.net/static/2024/pelicans.jpg"
        ),
    ]
)

You can send multiple attachments with a single prompt, and both file paths and URLs are supported - or even binary content, using llm.Attachment(content=b'binary goes here').

Any model plugin becomes available to Python with the same interface, making this LLM library a useful abstraction layer to try out the same prompts against many different models, both local and remote.

What can we do with this?

I've only had this working for a couple of days and the potential applications are somewhat dizzying. It's trivial to spin up a Bash script that can do things like generate alt= text for every image in a directory, for example. Here's one Claude wrote just now:

#!/bin/bash
for img in *.{jpg,jpeg}; do
    if [ -f "$img" ]; then
        output="${img%.*}.txt"
        llm -m gpt-4o-mini 'return just the alt text for this image' "$img" > "$output"
    fi
done

On the #llm Discord channel Drew Breunig suggested this one-liner:

llm prompt -m gpt-4o "
tell me if it's foggy in this image, reply on a scale from
1-10 with 10 being so foggy you can't see anything and 1
being clear enough to see the hills in the distance.
Only respond with a single number." \
  -a https://cameras.alertcalifornia.org/public-camera-data/Axis-Purisma1/latest-frame.jpg

That URL is to a live webcam feed, so here's an instant GPT-4o vision powered weather report!

We can have so much fun with this stuff.

All of the usual AI caveats apply: it can make mistakes, it can hallucinate, safety filters may kick in and refuse to transcribe audio based on the content. A lot of work is needed to evaluate how well the models perform at different tasks. There's a lot still to explore here.

But at 1/10th of a cent for 7 minutes of audio at least those explorations can be plentiful and inexpensive!

Tags: projects, ai, openai, generative-ai, llms, llm, anthropic, claude, mistral, gemini, vision-llms, llm-pricing

Running prompts against images and PDFs with Google Gemini

2024-10-23T18:25:07+00:00

Running prompts against images and PDFs with Google Gemini

New TIL. I've been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) - here are my notes on how to send images or PDF files to their API using curl and the base64 -i macOS command.

I figured out the curl incantation first and then got Claude to build me a Bash script that I can execute like this:

prompt-gemini 'extract text' example-handwriting.jpg

Playing with this is really fun. The Gemini models charge less than 1/10th of a cent per image, so it's really inexpensive to try them out.

Tags: bash, google, ocr, projects, ai, generative-ai, llms, ai-assisted-programming, llm, gemini, vision-llms, llm-pricing

Experimenting with audio input and output for the OpenAI Chat Completion API

2024-10-18T15:17:40+00:00

OpenAI promised this at DevDay a few weeks ago and now it's here: their Chat Completion API can now accept audio as input and return it as output. OpenAI still recommend their WebSocket-based Realtime API for audio tasks, but the Chat Completion API is a whole lot easier to write code against.

Generating audio

For the moment you need to use the new gpt-4o-audio-preview model. OpenAI tweeted this example:

curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-audio-preview",
    "modalities": ["text", "audio"],
    "audio": {
      "voice": "alloy",
      "format": "wav"
    },
    "messages": [
      {
        "role": "user",
        "content": "Recite a haiku about zeros and ones."
      }
    ]
  }' | jq > response.json

I tried running that and got back JSON with a HUGE base64 encoded block in it:

{
  "id": "chatcmpl-AJaIpDBFpLleTUwQJefzs1JJE5p5g",
  "object": "chat.completion",
  "created": 1729231143,
  "model": "gpt-4o-audio-preview-2024-10-01",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "refusal": null,
        "audio": {
          "id": "audio_6711f92b13a081908e8f3b61bf18b3f3",
          "data": "UklGRsZr...AA==",
          "expires_at": 1729234747,
          "transcript": "Digits intertwine,  \nIn dance of noughts and unity,  \nCode's whispers breathe life."
        }
      },
      "finish_reason": "stop",
      "internal_metrics": []
    }
  ],
  "usage": {
    "prompt_tokens": 17,
    "completion_tokens": 181,
    "total_tokens": 198,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "cached_tokens_internal": 0,
      "text_tokens": 17,
      "image_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "text_tokens": 33,
      "audio_tokens": 148
    }
  },
  "system_fingerprint": "fp_6e2d124157"
}

The full response is here - I've truncated that data field since the whole thing is 463KB long!

Next I used jq and base64 to save the decoded audio to a file:

cat response.json | jq -r '.choices[0].message.audio.data' \
  | base64 -D > decoded.wav

That gave me a 7 second, 347K WAV file. I converted that to MP3 with the help of llm cmd and ffmpeg:

llm cmd ffmpeg convert decoded.wav to code-whispers.mp3
> ffmpeg -i decoded.wav -acodec libmp3lame -b:a 128k code-whispers.mp3

That gave me a 117K MP3 file.

Your browser does not support the audio element.

The "usage" field above shows that the output used 148 audio tokens. OpenAI's pricing page says audio output tokens are $200/million, so I plugged that into my LLM pricing calculator and got back a cost of 2.96 cents.

Update 27th October 2024: I built an HTML and JavaScript tool for experimenting with audio output in a browser.

Audio input via a Bash script

Next I decided to try the audio input feature. You can now embed base64 encoded WAV files in the list of messages you send to the model, similar to how image inputs work.

I started by pasting a curl example of audio input into Claude and getting it to write me a Bash script wrapper. Here's the full audio-prompt.sh script. The part that does the work (after some argument parsing) looks like this:

# Base64 encode the audio file
AUDIO_BASE64=$(base64 < "$AUDIO_FILE" | tr -d '\n')

# Construct the JSON payload
JSON_PAYLOAD=$(jq -n \
    --arg model "gpt-4o-audio-preview" \
    --arg text "$TEXT_PROMPT" \
    --arg audio "$AUDIO_BASE64" \
    '{
        model: $model,
        modalities: ["text"],
        messages: [
            {
                role: "user",
                content: [
                    {type: "text", text: $text},
                    {
                        type: "input_audio",
                        input_audio: {
                            data: $audio,
                            format: "wav"
                        }
                    }
                ]
            }
        ]
    }')

# Make the API call
curl -s "https://api.openai.com/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d "$JSON_PAYLOAD" | jq .

From the documentation it looks like you can send an "input_audio"."format" of either "wav" or "mp3".

You can run it like this:

./audio-prompt.sh 'describe this audio' decoded.wav

This dumps the raw JSON response to the console. Here's what I got for that sound clip I generated above, which gets a little creative:

The audio features a spoken phrase that is poetic in nature. It discusses the intertwining of "digits" in a coordinated and harmonious manner, as if engaging in a dance of unity. It mentions "codes" in a way that suggests they have an almost life-like quality. The tone seems abstract and imaginative, possibly metaphorical, evoking imagery related to technology or numbers.

A web app for recording and prompting against audio

I decided to turn this into a tiny web application. I started by asking Claude to create a prototype with a "record" button, just to make sure that was possible:

Build an artifact - no React - that lets me click a button to start recording, shows a counter running up, then lets me click again to stop. I can then play back the recording in an audio element. The recording should be a WAV

Then I pasted in one of my curl experiments from earlier and told it:

Now add a textarea input called "prompt" and a button which, when clicked, submits the prompt and the base64 encoded audio file using fetch() to this URL

The JSON that comes back should be displayed on the page, pretty-printed

The API key should come from localStorage - if localStorage does not have it ask the user for it with prompt()

I iterated through a few error messages and got to a working application! I then did one more round with Claude to add a basic pricing calculator showing how much the prompt had cost to run.

You can try the finished application here:

tools.simonwillison.net/openai-audio

Here's the finished code. It uses all sorts of APIs I've never used before: AudioContext().createMediaStreamSource(...) and a DataView() to build the WAV file from scratch, plus a trick with FileReader() .. readAsDataURL() for in-browser base64 encoding.

Audio inputs are charged at $100/million tokens, and processing 5 seconds of audio her cost 0.6 cents.

The problem is the price

Audio tokens are currently charged at $100/million for input and $200/million for output. Tokens are hard to reason about, but a note on the pricing page clarifies that:

Audio input costs approximately 6¢ per minute; Audio output costs approximately 24¢ per minute

Translated to price-per-hour, that's $3.60 per hour of input and $14.40 per hour of output. I think the Realtime API pricing is about the same. These are not cheap APIs.

Meanwhile, Google's Gemini models price audio at 25 tokens per second (for input only, they don't yet handle audio output). That means that for their three models:

Gemini 1.5 Pro is $1.25/million input tokens, so $0.11 per hour
Gemini 1.5 Flash is $0.075/milllion, so $0.00675 per hour (that's less than a cent)
Gemini 1.5 Flash 8B is $0.0375/million, so $0.003375 per hour (a third of a cent!)

This means even Google's most expensive Pro model is still 32 times less costly than OpenAI's gpt-4o-audio-preview model when it comes to audio input, and Flash 8B is 1,066 times cheaper.

(I really hope I got those numbers right. I had ChatGPT double-check them. I keep find myself pricing out Gemini and not believing the results.)

I'm going to cross my fingers and hope for an OpenAI price drop in the near future, because it's hard to justify building anything significant on top of these APIs at the current price point, especially given the competition.

Tags: audio, projects, ai, openai, generative-ai, gpt-4, llms, ai-assisted-programming, claude, llm-pricing

Gemini 1.5 Flash-8B is now production ready

2024-10-03T20:16:36+00:00

Gemini 1.5 Flash-8B is now production ready

Gemini 1.5 Flash-8B is "a smaller and faster variant of 1.5 Flash" - and is now released to production, at half the price of the 1.5 Flash model.

It's really, really cheap:

$0.0375 per 1 million input tokens on prompts <128K
$0.15 per 1 million output tokens on prompts <128K
$0.01 per 1 million input tokens on cached prompts <128K

Prices are doubled for prompts longer than 128K.

I believe images are still charged at a flat rate of 258 tokens, which I think means a single non-cached image with Flash should cost 0.00097 cents - a number so tiny I'm doubting if I got the calculation right.

OpenAI's cheapest model remains GPT-4o mini, at $0.15/1M input - though that drops to half of that for reused prompt prefixes thanks to their new prompt caching feature (or by half if you use batches, though those can’t be combined with OpenAI prompt caching. Gemini also offer half-off for batched requests).

Anthropic's cheapest model is still Claude 3 Haiku at $0.25/M, though that drops to $0.03/M for cached tokens (if you configure them correctly).

I've released llm-gemini 0.2 with support for the new model:

llm install -U llm-gemini
llm keys set gemini
# Paste API key here
llm -m gemini-1.5-flash-8b-latest "say hi"

Via @OfficialLoganK

Tags: google, ai, openai, generative-ai, llms, llm, anthropic, gemini, vision-llms, llm-pricing

GPT-4o mini

2024-07-18T18:11:59+00:00

GPT-4o mini

I've been complaining about how under-powered GPT 3.5 is for the price for a while now (I made fun of it in a keynote a few weeks ago).

GPT-4o mini is exactly what I've been looking forward to.

It supports 128,000 input tokens (both images and text) and an impressive 16,000 output tokens. Most other models are still ~4,000, and Claude 3.5 Sonnet got an upgrade to 8,192 just a few days ago. This makes it a good fit for translation and transformation tasks where the expected output more closely matches the size of the input.

OpenAI show benchmarks that have it out-performing Claude 3 Haiku and Gemini 1.5 Flash, the two previous cheapest-best models.

GPT-4o mini is 15 cents per million input tokens and 60 cents per million output tokens - a 60% discount on GPT-3.5, and cheaper than Claude 3 Haiku's 25c/125c and Gemini 1.5 Flash's 35c/70c. Or you can use the OpenAI batch API for 50% off again, in exchange for up-to-24-hours of delay in getting the results.

It's also worth comparing these prices with GPT-4o's: at $5/million input and $15/million output GPT-4o mini is 33x cheaper for input and 25x cheaper for output!

OpenAI point out that "the cost per token of GPT-4o mini has dropped by 99% since text-davinci-003, a less capable model introduced in 2022."

One catch: weirdly, the price for image inputs is the same for both GPT-4o and GPT-4o mini - Romain Huet says:

The dollar price per image is the same for GPT-4o and GPT-4o mini. To maintain this, GPT-4o mini uses more tokens per image.

Also notable:

GPT-4o mini in the API is the first model to apply our instruction hierarchy method, which helps to improve the model's ability to resist jailbreaks, prompt injections, and system prompt extractions.

My hunch is that this still won't 100% solve the security implications of prompt injection: I imagine creative enough attackers will still find ways to subvert system instructions, and the linked paper itself concludes "Finally, our current models are likely still vulnerable to powerful adversarial attacks". It could well help make accidental prompt injection a lot less common though, which is certainly a worthwhile improvement.

Tags: ai, openai, prompt-injection, generative-ai, llms, vision-llms, llm-pricing

Gemini 1.5 Pro public preview

2024-04-10T02:38:55+00:00

Gemini 1.5 Pro public preview

Huge release from Google: Gemini 1.5 Pro—the GPT-4 competitive model with the incredible 1 million token context length—is now available without a waitlist in 180+ countries (including the USA but not Europe or the UK as far as I can tell)... and the API is free for 50 requests/day (rate limited to 2/minute).

Beyond that you’ll need to pay—$7/million input tokens and $21/million output tokens, which is slightly less than GPT-4 Turbo and a little more than Claude 3 Sonnet.

They also announced audio input (up to 9.5 hours in a single prompt), system instruction support and a new JSON mod.

Via @liambolling

Tags: google, ai, generative-ai, llms, gemini, vision-llms, llm-pricing

The new Claude 3 model family from Anthropic

2024-03-04T18:34:15+00:00

The new Claude 3 model family from Anthropic

Claude 3 is out, and comes in three sizes: Opus (the largest), Sonnet and Haiku.

Claude 3 Opus has self-reported benchmark scores that consistently beat GPT-4. This is a really big deal: in the 12+ months since the GPT-4 release no other model has consistently beat it in this way. It’s exciting to finally see that milestone reached by another research group.

The pricing model here is also really interesting. Prices here are per-million-input-tokens / per-million-output-tokens:

Claude 3 Opus: $15 / $75
Claude 3 Sonnet: $3 / $15
Claude 3 Haiku: $0.25 / $1.25

All three models have a 200,000 length context window and support image input in addition to text.

Compare with today’s OpenAI prices:

GPT-4 Turbo (128K): $10 / $30
GPT-4 8K: $30 / $60
GPT-4 32K: $60 / $120
GPT-3.5 Turbo: $0.50 / $1.50

So Opus pricing is comparable with GPT-4, more than GPT-4 Turbo and significantly cheaper than GPT-4 32K... Sonnet is cheaper than all of the GPT-4 models (including GPT-4 Turbo), and Haiku (which has not yet been released to the Claude API) will be cheaper even than GPT-3.5 Turbo.

It will be interesting to see if OpenAI respond with their own price reductions.

Tags: ai, openai, generative-ai, gpt-4, llms, anthropic, claude, vision-llms, llm-pricing