Entries tagged llms, security

Filters: Type: entry × llms × security × Sorted by date

18 results

The lethal trifecta for AI agents: private data, untrusted content, and external communication

If you are a user of LLM systems that use tools (you can call them “AI agents” if you like) it is critically important that you understand the risk of combining tools with the following three characteristics. Failing to understand this can let an attacker steal your data.

[... 1,324 words]

1:20 pm / 16th June 2025 / security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks, ai-agents, model-context-protocol, lethal-trifecta

An Introduction to Google’s Approach to AI Agent Security

Here’s another new paper on AI agent security: An Introduction to Google’s Approach to AI Agent Security, by Santiago Díaz, Christoph Kern, and Kara Olive.

[... 2,064 words]

5:28 am / 15th June 2025 / google, security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks, ai-agents, paper-review, agent-definitions

Design Patterns for Securing LLM Agents against Prompt Injections

This new paper by 11 authors from organizations including IBM, Invariant Labs, ETH Zurich, Google and Microsoft is an excellent addition to the literature on prompt injection and LLM security.

[... 1,795 words]

1:26 pm / 13th June 2025 / design-patterns, security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks, ai-agents, paper-review

CaMeL offers a promising new direction for mitigating prompt injection attacks

In the two and a half years that we’ve been talking about prompt injection attacks I’ve seen alarmingly little progress towards a robust solution. The new paper Defeating Prompt Injections by Design from Google DeepMind finally bucks that trend. This one is worth paying attention to.

[... 2,052 words]

8:50 pm / 11th April 2025 / google, python, security, ai, prompt-injection, generative-ai, llms, paper-review

Model Context Protocol has prompt injection security problems

As more people start hacking around with implementations of MCP (the Model Context Protocol, a new standard for making tools available to LLM-powered systems) the security implications of tools built on that protocol are starting to come into focus.

[... 1,559 words]

12:59 pm / 9th April 2025 / security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks, llm-tool-use, ai-agents, model-context-protocol

ChatGPT Canvas can make API requests now, but it’s complicated

Today’s 12 Days of OpenAI release concerned ChatGPT Canvas, a new ChatGPT feature that enables ChatGPT to pop open a side panel with a shared editor in it where you can collaborate with ChatGPT on editing a document or writing code.

[... 1,116 words]

9:49 pm / 10th December 2024 / python, security, usability, ai, webassembly, pyodide, openai, prompt-injection, generative-ai, chatgpt, llms, claude-artifacts, cors

Claude’s API now supports CORS requests, enabling client-side applications

Anthropic have enabled CORS support for their JSON APIs, which means it’s now possible to call the Claude LLMs directly from a user’s browser.

[... 625 words]

2:29 am / 23rd August 2024 / apis, javascript, projects, security, ai, generative-ai, llms, ai-assisted-programming, anthropic, claude, cors

Thoughts on the WWDC 2024 keynote on Apple Intelligence

Today’s WWDC keynote finally revealed Apple’s new set of AI features. The AI section (Apple are calling it Apple Intelligence) started over an hour into the keynote—this link jumps straight to that point in the archived YouTube livestream, or you can watch it embedded here:

[... 855 words]

8:19 pm / 10th June 2024 / apple, ethics, privacy, security, trust, ai, openai, prompt-injection, generative-ai, chatgpt, llms, apple-intelligence, ai-ethics

Prompt injection and jailbreaking are not the same thing

I keep seeing people use the term “prompt injection” when they’re actually talking about “jailbreaking”.

[... 1,157 words]

4:05 pm / 5th March 2024 / jailbreaking, security, ai, prompt-injection, generative-ai, llms, semantic-diffusion

Recommendations to help mitigate prompt injection: limit the blast radius

I’m in the latest episode of RedMonk’s Conversation series, talking with Kate Holterhoff about the prompt injection class of security vulnerabilities: what it is, why it’s so dangerous and why the industry response to it so far has been pretty disappointing.

[... 539 words]

8:34 pm / 20th December 2023 / podcasts, security, ai, prompt-injection, generative-ai, llms, exfiltration-attacks, podcast-appearances

Prompt injection explained, November 2023 edition

A neat thing about podcast appearances is that, thanks to Whisper transcriptions, I can often repurpose parts of them as written content for my blog.

[... 1,357 words]

3:55 am / 27th November 2023 / data-journalism, podcasts, security, ai, prompt-injection, generative-ai, llms, podcast-appearances

Delimiters won’t save you from prompt injection

Prompt injection remains an unsolved problem. The best we can do at the moment, disappointingly, is to raise awareness of the issue. As I pointed out last week, “if you don’t understand it, you are doomed to implement it.”

[... 1,010 words]

3:51 pm / 11th May 2023 / security, ai, openai, prompt-engineering, prompt-injection, generative-ai, llms, andrew-ng

Prompt injection explained, with video, slides, and a transcript

I participated in a webinar this morning about prompt injection, organized by LangChain and hosted by Harrison Chase, with Willem Pienaar, Kojin Oshiba (Robust Intelligence), and Jonathan Cohen and Christopher Parisien (Nvidia Research).

[... 3,120 words]

8:22 pm / 2nd May 2023 / security, my-talks, ai, prompt-engineering, prompt-injection, generative-ai, llms, annotated-talks, exfiltration-attacks

The Dual LLM pattern for building AI assistants that can resist prompt injection

I really want an AI assistant: a Large Language Model powered chatbot that can answer questions and perform actions for me based on access to my private data and tools.

[... 2,632 words]

7 pm / 25th April 2023 / security, ai, prompt-engineering, prompt-injection, generative-ai, llms, exfiltration-attacks, ai-agents

Prompt injection: What’s the worst that can happen?

Activity around building sophisticated applications on top of LLMs (Large Language Models) such as GPT-3/4/ChatGPT/etc is growing like wildfire right now.

[... 2,302 words]

5:35 pm / 14th April 2023 / security, ai, openai, prompt-engineering, prompt-injection, generative-ai, chatgpt, llms, exfiltration-attacks, ai-agents

You can’t solve AI security problems with more AI

One of the most common proposed solutions to prompt injection attacks (where an AI language model backed system is subverted by a user injecting malicious input—“ignore previous instructions and do this instead”) is to apply more AI to the problem.

[... 1,288 words]

10:57 pm / 17th September 2022 / security, ai, gpt-3, openai, prompt-engineering, prompt-injection, generative-ai, llms

I don’t know how to solve prompt injection

Some extended thoughts about prompt injection attacks against software built on top of AI language models such a GPT-3. This post started as a Twitter thread but I’m promoting it to a full blog entry here.

[... 581 words]

4:28 pm / 16th September 2022 / security, ai, openai, prompt-engineering, prompt-injection, generative-ai, llms, glyph

Prompt injection attacks against GPT-3

Riley Goodside, yesterday:

[... 1,457 words]

10:20 pm / 12th September 2022 / security, sql-injection, ai, gpt-3, openai, prompt-engineering, prompt-injection, generative-ai, riley-goodside, llms

Simon Willison’s Weblog