First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)
4th December 2024
Amazon released three new Large Language Models yesterday at their AWS re:Invent conference. The new model family is called Amazon Nova and comes in three sizes: Micro, Lite and Pro.
I built a new LLM plugin called llm-bedrock for accessing the models in the terminal via boto3 and the Amazon Bedrock API.
My initial impressions from trying out the models are that they’re mainly competitive with the Google Gemini family. They are extremely inexpensive—Nova Micro slightly undercuts even previously cheapest model Gemini 1.5 Flash-8B—can handle quite large context and the two larger models can handle images, video and PDFs.
- Pricing and capabilities
- Using Nova from the terminal with LLM
- Multi-modal prompting with images and documents
- Closing thoughts: GAMOA
Pricing and capabilities
Amazon list their pricing in price per 1,000 input tokens. Almost everyone else uses price per million, so I’ve done the conversion.
Here’s a table comparing the least expensive models from the largest providers:
Provider | Model | Cents per million input | Cents per million output |
---|---|---|---|
OpenAI | GPT-4o Mini | 15 | 60 |
Anthropic | Claude 3 Haiku | 25 | 125 |
Anthropic | Claude 3.5 Haiku | 80 | 400 |
Gemini 1.5 Flash-8B | 3.75 | 15 | |
Gemini 1.5 Flash | 7.5 | 30 | |
Amazon | Nova Micro | 3.5 | 14 |
Amazon | Nova Lite | 6 | 24 |
As you can see, Nova Micro is .25 of a cent cheaper on input and 1 cent cheaper on output than previous champion Gemini 1.5 Flash-8B.
And the more expensive models:
Provider | Model | Cents per million input | Cents per million output |
---|---|---|---|
OpenAI | GPT-4o | 250 | 1000 |
OpenAI | o1-mini | 300 | 1200 |
OpenAI | o1-preview | 1500 | 6000 |
Anthropic | Claude 3.5 Sonnet | 300 | 1500 |
Anthropic | Claude 3 Opus | 1500 | 7500 |
Gemini 1.5 Pro | 125 | 500 | |
Amazon | Nova Pro | 80 | 320 |
So Nova Pro isn’t priced the same as those top-tier models, in fact it’s priced about the same as Claude 3.5 Haiku (after their new 20% price drop).
The best model in the Nova family is still to come—from their post:
Amazon Nova Premier – Our most capable multimodal model for complex reasoning tasks and for use as the best teacher for distilling custom models. Amazon Nova Premier is still in training. We’re targeting availability in early 2025.
And from the press release, which presumably is talking about Nova Premier as well:
In early 2025, Amazon will support context length of over 2M input tokens.
Comparing models based on token pricing doesn’t capture the entire picture because models use different tokenizers and hence may use up different numbers of tokens for the same input, especially with images and video added to the mix. I’d like to implement a good way to compare those counts in the future.
In terms of capabilities: all of the above models can handle image inputs now, with the exception of Amazon Nova Micro and o1-mini and o1-preview which are text only. Video support is currently unique to the Gemini and Nova models—Anthropic and OpenAI don’t have any that handle video yet. See Amazon Nova video understanding limitations for more details.
Nova models can’t handle audio, and in fact ignore audio in video that they analyze (treating it as a sequence of images). Gemini models have audio input, as does OpenAI’s gpt-4o-audio-preview
model ($100/$200 for input/output of those audio tokens).
Nova mini has a 128,000 input token limit, while Lite and Pro are both 300,000 tokens. This isn’t quite in the same league as Gemini (2 million for Pro and Flash, 1 million for Flash-8B) but beats the standard models from both OpenAI and Anthropic. Anthropic have an “Enterprise” tier with 500,000 but I’ve not had a chance to try that myself.
Using Nova from the terminal with LLM
Install LLM, then install the plugin like this:
llm install llm-bedrock
Next comes the hard part: you’ll need AWS credentials that have permission to access Bedrock models on your behalf.
If you’ve previously configured the AWS CLI you may be able to use a shortcut: request access to the Bedrock models (Nova models are approved instantly) and the tool should be able to pick up your existing credentials.
I got stuck on this for a while, then Nils Durner came to the rescue with step-by-step instructions (17 steps in total, screenshots included) for requesting access to the models, creating an IAM user, creating a JSON IAM policy that allows access to the bedrock:
actions, assigning that policy to the new user and then creating and storing an access key and access secret for calling the API.
Now that you have those credentials, you can provide them to LLM by running:
llm keys set bedrock
And then pasting in the access_key:secret_access_key
—the two keys with a colon in the middle.
Run llm models
to confirm that the models are installed correctly—you should see output that ends like this:
Bedrock: us.amazon.nova-micro-v1:0 (aliases: nova-micro)
Bedrock: us.amazon.nova-lite-v1:0 (aliases: nova-lite)
Bedrock: us.amazon.nova-pro-v1:0 (aliases: nova-pro)
And now you can run prompts! A simple text prompt looks like this—I’m setting an optional system prompt too:
llm -m nova-micro 'Three ways to impress a pelican' -s 'short sentences'
- Offer a fish.
- Dance like a clown.
- Wear bright colors.
That’s using nova-micro
, the cheapest and fastest model. Trying that again with -m nova-pro
(the best model) gave me:
- Offer fresh fish.
- Perform mimicry of pelican calls.
- Create a calm, respectful environment.
Multi-modal prompting with images and documents
My plugin supports LLM attachments, so you can attach images, video and PDF files to a prompt using the -a
option which accepts either a URL or a file path.
Trying against my standard low-resolution pelican group photo:
llm -m nova-lite 'describe image' \
-a https://static.simonwillison.net/static/2024/pelicans.jpg -u
The -u
option there outputs token usage. I got back:
The image depicts a large gathering of birds on a rocky cliff adjacent to a body of water, likely a sea or a large lake. The birds, primarily pelicans, appear to be resting and socializing in a tightly packed group. Some of the birds are standing while others are perched on the rocks, and a few are near the edge of the cliff, possibly preparing to dive into the water below. The cliff is rocky and has a rugged appearance, with some areas showing signs of erosion or weathering. The water beneath the birds is calm and appears to be a deep blue color, reflecting the clear sky above. The birds’ feathers are visible in various shades of brown, gray, and white, adding to the natural and picturesque scene.
Token usage: 1,668 input, 148 output
Using my LLM pricing calculator (now upgraded to support Nova) I get 0.0136 cents—that’s just over 1/100th of a cent!
For the same prompt using -m nova-pro
for the more expensive Nova Pro model:
The image depicts a large gathering of birds on a rocky shore or beach, adjacent to a body of water. The scene is bustling with a variety of birds, primarily pelicans, which are identifiable by their large size, long necks, and distinctive beaks. The pelicans are positioned both standing and sitting on the rocks, with some standing tall and others hunched over.
In addition to pelicans, there are numerous smaller birds, likely seagulls, characterized by their gray and white plumage. These birds are densely packed together, creating a vivid scene of avian life. The seagulls are scattered throughout the rocky area, some standing, others sitting, and a few engaged in various activities such as preening their feathers or searching for food.
The rocky shore is uneven and covered with a mix of pebbles and larger stones, providing a natural habitat for the birds. The body of water in the background appears calm, with a reflection of the sky and the rocky shore. The water’s surface is smooth, indicating little to no wind.
The sky above is clear, suggesting fair weather conditions. The lighting in the image suggests it is daytime, likely in the early morning or late afternoon, given the soft light and the absence of harsh shadows.
In summary, the image captures a lively scene of wildlife, with a focus on pelicans and seagulls congregating on a rocky shore next to a calm body of water under a clear sky. The diverse bird population and their interactions with the environment create a dynamic and engaging visual experience.
Token usage: 1,668 input, 305 output
That came in at a whopping 0.231 cents—nearly a quarter of a cent!
Let’s try something absurdly ambitious. I ran some random Google searches for government military PDFs and stumbled across this one. Let’s convert that to Markdown:
llm -u -m nova-pro 'convert this document to markdown' \
-a https://www.darpa.mil/attachments/DARPA_FY18_Presidents_Budget_Request.pdf
You can see the result here. It stopped around page 9 and missed some of the tables, but this was an absurdly challenging document so I’m not surprised it didn’t manage a perfect conversion. It cost me 176,386 input, 4,283 output which is a full 15.4814 cents.
Finally, let’s see if it can code:
llm -m nova-pro 'python function to domnload a CSV file from a URL, create a corresponding table in SQLite and import the data'
24 input, 616 output = 0.0087 cents.
Full output here, but this is the code it wrote:
import requests import csv import sqlite3 import os from io import StringIO def download_csv_and_import_to_sqlite(url, db_name, table_name): # Download the CSV file from the URL response = requests.get(url) response.raise_for_status() # Ensure we notice bad responses... # Read the CSV data csv_data = response.content.decode('utf-8') csv_reader = csv.reader(StringIO(csv_data)) # Get the header (column names) from the CSV header = next(csv_reader) # Create the SQLite database and connect to it conn = sqlite3.connect(db_name) cursor = conn.cursor() # Create the table create_table_query = f"CREATE TABLE IF NOT EXISTS {table_name} (" create_table_query += ",".join([f"{col} TEXT" for col in header]) create_table_query += ")" cursor.execute(create_table_query) # Insert the data into the table insert_query = f"INSERT INTO {table_name} VALUES ({','.join(['?' for _ in header])})" for row in csv_reader: cursor.execute(insert_query, row) # Commit the changes and close the connection conn.commit() conn.close() print(f"Data successfully imported into {db_name} in table {table_name}.") # Example usage url = "https://datasette.io/content/stats.csv" db_name = "example.db" table_name = "data_table" download_csv_and_import_to_sqlite(url, db_name, table_name)
That’s spot on. I may have to retire that benchmark question, it’s not enough of a challenge for modern models.
Can they produce an SVG of a pelican on a bicycle? Not very well.
Closing thoughts: GAMOA
My experiments here aren’t particularly comprehensive—I look forward from hearing from people who have better challenges and a more disciplined way of applying them. LiveBench have some useful results here already.
As a general initial vibe check though these Nova models pass with flying colors.
- Google Gemini now has competition in terms of pricing. This is a relief to me, Gemini is so cheap I’ve been nervous that they’ll jack the prices up through lack of competition!
- These appear to be good multi-modal models—their image handling feels solid and it’s neat to have a new video-input model (even if it’s quite limited compared to Gemini)
- Anthropic’s decision to raise the price for Claude 3.5 Haiku isn’t looking great, given the continued competition at the bottom end of the market. Just a few months ago Claude 3 Haiku was the cheapest model available, now it’s hardly worth considering, undercut by even GPT-4o mini.
- Gaining API access to anything on AWS remains horrifyingly difficult—seriously, 17 steps! Some day a PM at AWS is going to figure out that fixing this will have a material impact on Amazon’s bottom line.
-
From the Nova press release in the what’s next section:
Amazon will introduce an Amazon Nova speech-to-speech model in the first quarter of 2025. [...] Amazon is also developing a novel model that can take text, images, audio, and video as input, and generate outputs in any of these modalities. This Amazon Nova model with native multimodal-to-multimodal – or “any-to-any” modality capabilities – will be introduced mid-year 2025.
This reminded me that OpenAI’s GPT-4o was meant to be able to provide image output, which they demonstrated at their launch and as far as I’m aware have not mentioned since. Maybe this will encourage them to finally ship that feature!
- The cost of multi-modal models—first Gemini, now Nova—continues to leave me almost speechless. At 0.0136 cents per image I could generate a description of every photo in my 67,771 personal photo library using Nova Mini for $9.21.
With this release I think Amazon may have earned a spot among the top tier of model providers. Maybe we need a new FAANG acronym that covers OpenAI, Anthropic, Google, Meta and Amazon. I like GAMOA.
More recent articles
- My AI/LLM predictions for the next 1, 3 and 6 years, for Oxide and Friends - 10th January 2025
- Weeknotes: Starting 2025 a little slow - 4th January 2025
- I still don't think companies serve you ads based on spying through your microphone - 2nd January 2025