Simon Willison’s Weblog

Subscribe

Sunday, 10th September 2023

promptfoo: How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs. promptfoo is a CLI and library for “evaluating LLM output quality”. This tutorial in their documentation about using it to compare Llama 2 to gpt-3.5-turbo is a good illustration of how it works: it uses YAML files to configure the prompts, and more YAML to define assertions such as “not-icontains: AI language model”.

# 4:19 pm / testing, llms, ai, generative-ai

The AI-assistant wars heat up with Claude Pro, a new ChatGPT Plus rival. I'm quoted in this piece about the new Claude Pro $20/month subscription from Anthropic:

Willison has also run into problems with Claude's morality filter, which has caused him trouble by accident: "I tried to use it against a transcription of a podcast episode, and it processed most of the text before—right in front of my eyes—it deleted everything it had done! I eventually figured out that they had started talking about bomb threats against data centers towards the end of the episode, and Claude effectively got triggered by that and deleted the entire transcript."

# 5:07 pm / claude, generative-ai, arstechnica, ai, llms, anthropic

All models on Hugging Face, sorted by downloads (via) I realized this morning that “sort by downloads” against the list of all of the models on Hugging Face can work as a reasonably good proxy for “which of these models are easiest to get running on your own computer”.

# 5:24 pm / machine-learning, ai, hugging-face

2023 » September

MTWTFSS
    123
45678910
11121314151617
18192021222324
252627282930