Simon Willison’s Weblog


Tuesday, 16th April 2024

The saddest part about it, though, is that the garbage books don’t actually make that much money either. It’s even possible to lose money generating your low-quality ebook to sell on Kindle for $0.99. The way people make money these days is by teaching students the process of making a garbage ebook. It’s grift and garbage all the way down — and the people who ultimately lose out are the readers and writers who love books.

Constance Grady # 11:31 pm

Google NotebookLM Data Exfiltration (via) NotebookLM is a Google Labs product that lets you store information as sources (mainly text files in PDF) and then ask questions against those sources—effectively an interface for building your own custom RAG (Retrieval Augmented Generation) chatbots.

Unsurprisingly for anything that allows LLMs to interact with untrusted documents, it’s susceptible to prompt injection.

Johann Rehberger found some classic prompt injection exfiltration attacks: you can create source documents with instructions that cause the chatbot to load a Markdown image that leaks other private data to an external domain as data passed in the query string.

Johann reported this privately in the December but the problem has not yet been addressed. UPDATE: The NotebookLM team deployed a fix for this on 18th April.

A good rule of thumb is that any time you let LLMs see untrusted tokens there is a risk of an attack like this, so you should be very careful to avoid exfiltration vectors like Markdown images or even outbound links. # 9:28 pm

Permissions have three moving parts, who wants to do it, what do they want to do, and on what object. Any good permission system has to be able to efficiently answer any permutation of those variables. Given this person and this object, what can they do? Given this object and this action, who can do it? Given this person and this action, which objects can they act upon?

wkirby on Hacker News # 7:49 pm

inline-snapshot. I'm a big fan of snapshot testing, where expected values are captured the first time a test suite runs and then asserted against in future runs. It's a very productive way to build a robust test suite.

inline-snapshot by Frank Hoffmann is a particularly neat implementation of the pattern. It defines a snapshot() function which you can use in your tests:

assert 1548 * 18489 == snapshot()

When you run that test using pytest --inline-snapshot=create the snapshot() function will be replaced in your code (using AST manipulation) with itself wrapping the repr() of the expected result:

assert 1548 * 18489 == snapshot(28620972)

If you modify the code and need to update the tests you can run pytest --inline-snapshot=fix to regenerate the recorded snapshot values. # 4:04 pm