We enhanced the ability of the upgraded Claude 3.5 Sonnet and Claude 3.5 Haiku to recognize and resist prompt injection attempts. Prompt injection is an attack where a malicious user feeds instructions to a model that attempt to change its originally intended behavior. Both models are now better able to recognize adversarial prompts from a user and behave in alignment with the system prompt. We constructed internal test sets of prompt injection attacks and specifically trained on adversarial interactions.
With computer use, we recommend taking additional precautions against the risk of prompt injection, such as using a dedicated virtual machine, limiting access to sensitive data, restricting internet access to required domains, and keeping a human in the loop for sensitive tasks.
Recent articles
- Six short video demos of LLM and Datasette projects - 22nd January 2025
- DeepSeek-R1 and exploring DeepSeek-R1-Distill-Llama-8B - 20th January 2025
- My AI/LLM predictions for the next 1, 3 and 6 years, for Oxide and Friends - 10th January 2025