Simon Willison’s Weblog

Subscribe

10 items tagged “mistral”

2024

Codestral: Hello, World! Mistral's first code-specific model, trained to be "fluent" in 80 different programming languages.

The weights are released under a new Mistral AI Non-Production License, which is extremely restrictive:

3.2. Usage Limitation

  • You shall only use the Mistral Models and Derivatives (whether or not created by Mistral AI) for testing, research, Personal, or evaluation purposes in Non-Production Environments;
  • Subject to the foregoing, You shall not supply the Mistral Models or Derivatives in the course of a commercial activity, whether in return for payment or free of charge, in any medium or form, including but not limited to through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or behind a software layer.

To Mistral's credit at least they don't misapply the term "open source" in their marketing around this model - they consistently use the term "open-weights" instead. They also state that they plan to continue using Apache 2 for other model releases.

Codestral can be used commercially when accessed via their paid API. # 30th May 2024, 7:19 am

mistralai/mistral-common. New from Mistral: mistral-common, an open source Python library providing "a set of tools to help you work with Mistral models".

So far that means a tokenizer! This is similar to OpenAI's tiktoken library in that it lets you run tokenization in your own code, which crucially means you can count the number of tokens that you are about to use - useful for cost estimates but also for cramming the maximum allowed tokens in the context window for things like RAG.

Mistral's library is better than tiktoken though, in that it also includes logic for correctly calculating the tokens needed for conversation construction and tool definition. With OpenAI's APIs you're currently left guessing how many tokens are taken up by these advanced features.

Anthropic haven't published any form of tokenizer at all - it's the feature I'd most like to see from them next.

Here's how to explore the vocabulary of the tokenizer:

MistralTokenizer.from_model(
    "open-mixtral-8x22b"
).instruct_tokenizer.tokenizer.vocab()[:12]

['<unk>', '<s>', '</s>', '[INST]', '[/INST]', '[TOOL_CALLS]', '[AVAILABLE_TOOLS]', '[/AVAILABLE_TOOLS]', '[TOOL_RESULTS]', '[/TOOL_RESULTS]'] # 18th April 2024, 12:39 am

Mistral tweet a magnet link for mixtral-8x22b. Another open model release from Mistral using their now standard operating procedure of tweeting out a raw torrent link.

This one is an 8x22B Mixture of Experts model. Their previous most powerful openly licensed release was Mixtral 8x7B, so this one is a whole lot bigger (a 281GB download)—and apparently has a 65,536 context length, at least according to initial rumors on Twitter. # 10th April 2024, 2:31 am

The GPT-4 barrier has finally been broken

Four weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of “vibes”. Almost everyone investing serious time exploring LLMs agreed that it was the most capable default model for the majority of tasks—and had been for more than a year.

[... 697 words]

Mistral Large. Mistral Medium only came out two months ago, and now it’s followed by Mistral Large. Like Medium, this new model is currently only available via their API. It scores well on benchmarks (though not quite as well as GPT-4) but the really exciting feature is function support, clearly based on OpenAI’s own function design.

Functions are now supported via the Mistral API for both Mistral Large and the new Mistral Small, described as follows: “Mistral Small, optimised for latency and cost. Mistral Small outperforms Mixtral 8x7B and has lower latency, which makes it a refined intermediary solution between our open-weight offering and our flagship model.” # 26th February 2024, 11:23 pm

Representation Engineering: Mistral-7B on Acid (via) Theia Vogel provides a delightfully clear explanation (and worked examples) of control vectors—a relatively recent technique for influencing the behaviour of an LLM by applying vectors to the hidden states that are evaluated during model inference.

These vectors are surprisingly easy to both create and apply. Build a small set of contrasting prompt pairs—“Act extremely happy” v.s. “Act extremely sad” for example (with a tiny bit of additional boilerplate), then run a bunch of those prompts and collect the hidden layer states. Then use “single-component PCA” on those states to get a control vector representing the difference.

The examples Theia provides, using control vectors to make Mistral 7B more or less honest, trippy, lazy, creative and more, are very convincing. # 18th February 2024, 3:49 am

Mixtral of Experts. The Mixtral paper is out, exactly a month after the release of the Mixtral 8x7B model itself. Thanks to the paper I now have a reasonable understanding of how a mixture of experts model works: each layer has 8 available blocks, but a router model selects two out of those eight for each token passing through that layer and combines their output. “As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference.”

The Mixtral token context size is an impressive 32k, and it compares extremely well against the much larger Llama 70B across a whole array of benchmarks.

Unsurprising but disappointing: there’s nothing in the paper at all about what it was trained on. # 9th January 2024, 4:03 am

2023

Many options for running Mistral models in your terminal using LLM

Visit Many options for running Mistral models in your terminal using LLM

Mistral AI is the most exciting AI research lab at the moment. They’ve now released two extremely powerful smaller Large Language Models under an Apache 2 license, and have a third much larger one that’s available via their API.

[... 2063 words]

Mixtral of experts (via) Mistral have firmly established themselves as the most exciting AI lab outside of OpenAI, arguably more exciting because much of their work is released under open licenses.

On December 8th they tweeted a link to a torrent, with no additional context (a neat marketing trick they’ve used in the past). The 87GB torrent contained a new model, Mixtral-8x7b-32kseqlen—a Mixture of Experts.

Three days later they published a full write-up, describing “Mixtral 8x7B, a high-quality sparse mixture of experts model (SMoE) with open weights”—licensed Apache 2.0.

They claim “Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference”—and that it outperforms GPT-3.5 on most benchmarks too.

This isn’t even their current best model. The new Mistral API platform (currently on a waitlist) refers to Mixtral as “Mistral-small” (and their previous 7B model as “Mistral-tiny”—and also provides access to a currently closed model, “Mistral-medium”, which they claim to be competitive with GPT-4. # 11th December 2023, 5:20 pm

The EU AI Act now proposes to regulate “foundational models”, i.e. the engine behind some AI applications. We cannot regulate an engine devoid of usage. We don’t regulate the C language because one can use it to develop malware. Instead, we ban malware and strengthen network systems (we regulate usage). Foundational language models provide a higher level of abstraction than the C language for programming computer systems; nothing in their behaviour justifies a change in the regulatory framework.

Arthur Mensch, Mistral AI # 16th November 2023, 11:29 am