SQL injection-like attack on LLMs with special tokens. Andrej Karpathy explains something that's been confusing me for the best part of a year:
The decision by LLM tokenizers to parse special tokens in the input string (
<s>
,<|endoftext|>
, etc.), while convenient looking, leads to footguns at best and LLM security vulnerabilities at worst, equivalent to SQL injection attacks.
LLMs frequently expect you to feed them text that is templated like this:
<|user|>\nCan you introduce yourself<|end|>\n<|assistant|>
But what happens if the text you are processing includes one of those weird sequences of characters, like <|assistant|>
? Stuff can definitely break in very unexpected ways.
LLMs generally reserve special token integer identifiers for these, which means that it should be possible to avoid this scenario by encoding the special token as that ID (for example 32001
for <|assistant|>
in the Phi-3-mini-4k-instruct
vocabulary) while that same sequence of characters in untrusted text is encoded as a longer sequence of smaller tokens.
Many implementations fail to do this! Thanks to Andrej I've learned that modern releases of Hugging Face transformers have a split_special_tokens=True
parameter (added in 4.32.0 in August 2023) that can handle it. Here's an example:
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
>>> tokenizer.encode("<|assistant|>")
[32001]
>>> tokenizer.encode("<|assistant|>", split_special_tokens=True)
[529, 29989, 465, 22137, 29989, 29958]
A better option is to use the apply_chat_template() method, which should correctly handle this for you (though I'd like to see confirmation of that).
Recent articles
- My AI/LLM predictions for the next 1, 3 and 6 years, for Oxide and Friends - 10th January 2025
- Weeknotes: Starting 2025 a little slow - 4th January 2025
- I still don't think companies serve you ads based on spying through your microphone - 2nd January 2025