Archive for Saturday, 18th January 2025

Saturday, 18th January 2025

Lessons From Red Teaming 100 Generative AI Products (via) New paper from Microsoft describing their top eight lessons learned red teaming (deliberately seeking security vulnerabilities in) 100 different generative AI models and products over the past few years.

The Microsoft AI Red Team (AIRT) grew out of pre-existing red teaming initiatives at the company and was officially established in 2018. At its conception, the team focused primarily on identifying traditional security vulnerabilities and evasion attacks against classical ML models.

Lesson 2 is "You don't have to compute gradients to break an AI system" - the kind of attacks they were trying against classical ML models turn out to be less important against LLM systems than straightforward prompt-based attacks.

They use a new-to-me acronym for prompt injection, "XPIA":

Imagine we are red teaming an LLM-based copilot that can summarize a user’s emails. One possible attack against this system would be for a scammer to send an email that contains a hidden prompt injection instructing the copilot to “ignore previous instructions” and output a malicious link. In this scenario, the Actor is the scammer, who is conducting a cross-prompt injection attack (XPIA), which exploits the fact that LLMs often struggle to distinguish between system-level instructions and user data.

From searching around it looks like that specific acronym "XPIA" is used within Microsoft's security teams but not much outside of them. It appears to be their chosen acronym for indirect prompt injection, where malicious instructions are smuggled into a vulnerable system by being included in text that the system retrieves from other sources.

Tucked away in the paper is this note, which I think represents the core idea necessary to understand why prompt injection is such an insipid threat:

Due to fundamental limitations of language models, one must assume that if an LLM is supplied with untrusted input, it will produce arbitrary output.

When you're building software against an LLM you need to assume that anyone who can control more than a few sentences of input to that model can cause it to output anything they like - including tool calls or other data exfiltration vectors. Design accordingly.

# 6:13 pm / microsoft, security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks

DeepSeek API Docs: Rate Limit. This is surprising: DeepSeek offer the only hosted LLM API I've seen that doesn't implement rate limits:

DeepSeek API does NOT constrain user's rate limit. We will try out best to serve every request.

However, please note that when our servers are under high traffic pressure, your requests may take some time to receive a response from the server.

Want to run a prompt against 10,000 items? With DeepSeek you can theoretically fire up 100s of parallel requests and crunch through that data in almost no time at all.

As more companies start building systems that rely on LLM prompts for large scale data extraction and manipulation I expect high rate limits will become a key competitive differentiator between the different platforms.

# 6:24 pm / rate-limiting, ai, generative-ai, llms, deepseek, ai-in-china

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Simon Willison’s Weblog

Saturday, 18th January 2025