Representation Engineering: Mistral-7B on Acid (via) Theia Vogel provides a delightfully clear explanation (and worked examples) of control vectors—a relatively recent technique for influencing the behaviour of an LLM by applying vectors to the hidden states that are evaluated during model inference.
These vectors are surprisingly easy to both create and apply. Build a small set of contrasting prompt pairs—“Act extremely happy” v.s. “Act extremely sad” for example (with a tiny bit of additional boilerplate), then run a bunch of those prompts and collect the hidden layer states. Then use “single-component PCA” on those states to get a control vector representing the difference.
The examples Theia provides, using control vectors to make Mistral 7B more or less honest, trippy, lazy, creative and more, are very convincing.
Recent articles
- I really don't like ChatGPT's new memory dossier - 21st May 2025
- Building software on top of Large Language Models - 15th May 2025
- Trying out llama.cpp's new vision support - 10th May 2025