Simon Willison’s Weblog

Subscribe

Saturday, 4th October 2025

Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines. I've had trouble getting my head around DSPy in the past. This half hour talk by Drew Breunig at the recent Databricks Data + AI Summit is the clearest explanation I've seen yet of the kinds of problems it can help solve.

Here's Drew's written version of the talk.

Drew works on Overture Maps, which combines Point Of Interest data from numerous providers to create a single unified POI database. This is an example of conflation, a notoriously difficult task in GIS where multiple datasets are deduped and merged together.

Drew uses an inexpensive local model, Qwen3-0.6B, to compare 70 million addresses and identity matches, for example between Place(address="3359 FOOTHILL BLVD", name="RESTAURANT LOS ARCOS") and Place(address="3359 FOOTHILL BLVD", name="Los Arcos Taqueria"').

DSPy's role is to optimize the prompt used for that smaller model. Drew used GPT-4.1 and the dspy.MIPROv2 optimizer, producing a 700 token prompt that increased the score from 60.7% to 82%.

Determine if two points of interest refer to the same place. Arrow to optimized prompt: Given two records representing places or businesses-each with at least a name and address-analyze the information and determine if they refer to the same real-world entity. Consider minor differences such as case, diacritics, transliteration, abbreviations, or formatting as potential matches if both the name and address are otherwise strongly similar. Only output "True" if both fields are a close match; if there are significant differences in either the name or address, even if one field matches exactly, output "False". Your decision should be robust to common variations and errors and should work across multiple languages and scripts.

Why bother? Drew points out that having a prompt optimization pipeline makes it trivial to evaluate and switch to other models if they can score higher with a custom optimized prompt - without needing to execute that trial-and-error optimization by hand.

# 10:48 pm / geospatial, gis, ai, prompt-engineering, generative-ai, llms, drew-breunig, overture, dspy

2025 » October

MTWTFSS
  12345
6789101112
13141516171819
20212223242526
2728293031