Simon Willison’s Weblog

Subscribe

18th February 2024 - Link Blog

Representation Engineering: Mistral-7B on Acid (via) Theia Vogel provides a delightfully clear explanation (and worked examples) of control vectors—a relatively recent technique for influencing the behaviour of an LLM by applying vectors to the hidden states that are evaluated during model inference.

These vectors are surprisingly easy to both create and apply. Build a small set of contrasting prompt pairs—“Act extremely happy” v.s. “Act extremely sad” for example (with a tiny bit of additional boilerplate), then run a bunch of those prompts and collect the hidden layer states. Then use “single-component PCA” on those states to get a control vector representing the difference.

The examples Theia provides, using control vectors to make Mistral 7B more or less honest, trippy, lazy, creative and more, are very convincing.

This is a link post by Simon Willison, posted on 18th February 2024.

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe