Simon Willison’s Weblog

Subscribe

Braggoscope Prompts. Matt Webb's Braggoscope (previously) is an alternative way to browse the archive's of the BBC's long-running radio series In Our Time, including the ability to browse by Dewey Decimal library classification, view related episodes and more.

Matt used an LLM to generate the structured data for the site, based on the episode synopsis on the BBC's episode pages like this one.

The prompts he used for this are now described on this new page on the site.

Of particular interest is the way the Dewey Decimal classifications are derived. Quoting an extract from the prompt:

- Provide a Dewey Decimal Classification code, label, and reason for the classification.

- Reason: summarise your deduction process for the Dewey code, for example considering the topic and era of history by referencing lines in the episode description. Bias towards the main topic of the episode which is at the beginning of the description.

- Code: be as specific as possible with the code, aiming to give a second level code (e.g. "510") or even lower level (e.g. "510.1"). If you cannot be more specific than the first level (e.g. "500"), then use that.

Return valid JSON conforming to the following Typescript type definition:

{
    "dewey_decimal": {"reason": string, "code": string, "label": string}
}

That "reason" key is essential, even though it's not actually used in the resulting project. Matt explains why:

It gives the AI a chance to generate tokens to narrow down the possibility space of the code and label that follow (the reasoning has to appear before the Dewey code itself is generated).

Here's a relevant note from OpenAI's new structured outputs documentation:

When using Structured Outputs, outputs will be produced in the same order as the ordering of keys in the schema.

That's despite JSON usually treating key order as undefined. I think OpenAI designed the feature to work this way precisely to support the kind of trick Matt is using for his Dewey Decimal extraction process.