15 posts tagged “drew-breunig”
2025
If Claude is asked to count words, letters, and characters, it thinks step by step before answering the person. It explicitly counts the words, letters, or characters by assigning a number to each. It only answers the person once it has performed this explicit counting step. [...]
If Claude is shown a classic puzzle, before proceeding, it quotes every constraint or premise from the person’s message word for word before inside quotation marks to confirm it’s not dealing with a new variant. [...]
If asked to write poetry, Claude avoids using hackneyed imagery or metaphors or predictable rhyming schemes.
— Claude's system prompt, via Drew Breunig
DuckDB is Probably the Most Important Geospatial Software of the Last Decade. Drew Breunig argues that the ease of installation of DuckDB is opening up geospatial analysis to a whole new set of developers.
This inspired a comment on Hacker News from DuckDB Labs geospatial engineer Max Gabrielsson which helps explain why the drop in friction introduced by DuckDB is so significant:
I think a big part is that duckdbs spatial extension provides a SQL interface to a whole suite of standard foss gis packages by statically bundling everything (including inlining the default PROJ database of coordinate projection systems into the binary) and providing it for multiple platforms (including WASM). I.E there are no transitive dependencies except libc.
[...] the fact that you can e.g. convert too and from a myriad of different geospatial formats by utilizing GDAL, transforming through SQL, or pulling down the latest overture dump without having the whole workflow break just cause you updated QGIS has probably been the main killer feature for a lot of the early adopters.
I've lost count of the time I've spent fiddling with dependencies like GDAL trying to get various geospatial tools to work in the past. Bundling difficult dependencies statically is an under-appreciated trick!
If the bold claim in the headline inspires you to provide a counter-example, bear in mind that a decade ago is 2015, and most of the key technologies In the modern geospatial stack - QGIS, PostGIS, geopandas, SpatiaLite - predate that by quite a bit.
The first generation of AI-powered products (often called “AI Wrapper” apps, because they “just” are wrapped around an LLM API) were quickly brought to market by small teams of engineers, picking off the low-hanging problems. But today, I’m seeing teams of domain experts wading into the field, hiring a programmer or two to handle the implementation, while the experts themselves provide the prompts, data labeling, and evaluations.
For these companies, the coding is commodified but the domain expertise is the differentiator.
— Drew Breunig, The Dynamic Between Domain Experts & Developers Has Shifted
2024
Turning Your Root URL Into a DuckDB Remote Database. Fun idea from Drew Breunig: DuckDB supports attaching existing databases that are accessible over HTTP using their URL. Drew suggests creating vanity URLs using your root domain, detecting the DuckDB user-agent and serving the database file directly - allowing tricks like this one:
ATTACH 'https://steplist.app/' AS steplist;
SELECT * FROM steplist.lists;
Generating Descriptive Weather Reports with LLMs. Drew Breunig produces the first example I've seen in the wild of the new LLM attachments Python API. Drew's Downtown San Francisco Weather Vibes project combines output from a JSON weather API with the latest image from a webcam pointed at downtown San Francisco to produce a weather report "with a style somewhere between Jack Kerouac and J. Peterman".
Here's the Python code that constructs and executes the prompt. The code runs in GitHub Actions.
The 3 AI Use Cases: Gods, Interns, and Cogs. Drew Breunig introduces an interesting new framework for categorizing use cases of modern AI:
- Gods refers to the autonomous, human replacement applications - I see that as AGI stuff that's still effectively science fiction.
- Interns are supervised copilots. This is how I get most of the value out of LLMs at the moment, delegating tasks to them that I can then review, such as AI-assisted programming.
- Cogs are the smaller, more reliable components that you can build pipelines and automations on top of without needing to review everything they do - think Whisper for transcriptions or maybe some limited LLM subtasks such as structured data extraction.
Drew also considers Toys as a subcategory of Interns: things like image generators, “defined by their usage by non-experts. Toys have a high tolerance for errors because they’re not being relied on for much beyond entertainment.”
Wikidata is a Giant Crosswalk File.
Drew Breunig shows how to take the 140GB Wikidata JSON export, use sed 's/,$//'
to convert it to newline-delimited JSON, then use DuckDB to run queries and extract external identifiers, including a query that pulls out 500MB of latitude and longitude points.
Conflating Overture Places Using DuckDB, Ollama, Embeddings, and More.
Drew Breunig's detailed tutorial on "conflation" - combining different geospatial data sources by de-duplicating address strings such as RESTAURANT LOS ARCOS,3359 FOOTHILL BLVD,OAKLAND,94601
and LOS ARCOS TAQUERIA,3359 FOOTHILL BLVD,OAKLAND,94601
.
Drew uses an entirely offline stack based around Python, DuckDB and Ollama and finds that a combination of H3 geospatial tiles and mxbai-embed-large
embeddings (though other embedding models should work equally well) gets really good results.
Towards Standardizing Place. Overture Maps announced General Availability of its global maps datasets last week, covering places, buildings, divisions, and base layers.
Drew Breunig demonstrates how this can be accessed using both the Overture Explorer tool and DuckDB, and talks about Overture's GERS IDs - reminiscent of Who's On First IDs - which provide stable IDs for all kinds of geographic places.
Smaller, Cheaper, Faster, Sober. Drew Breunig highlights the interesting pattern at the moment where the best models are all converging on GPT-4 class capabilities, while competing on speed and price—becoming smaller and faster. This holds for both the proprietary and the openly licensed models.
Will we see a sizable leap in capabilities when GPT-5 class models start to emerge? It’s hard to say for sure—anyone in a position to know that likely works for an AI lab with a multi-billion dollar valuation that hinges on the answer to that equation, so they’re not reliable sources of information until the models themselves are revealed.
A Plea for Sober AI. Great piece by Drew Breunig: “Imagine having products THIS GOOD and still over-selling them.”
A POI Database in One Line (via) Overture maps offer an extraordinarily useful freely licensed databases of POI (point of interest) listings, principally derived from partners such as Facebook and including restaurants, shops, museums and other locations from all around the world.
Their new "overturemaps" Python CLI utility makes it easy to quickly pull subsets of their data... but requires you to provide a bounding box to do so.
Drew Breunig came up with this delightful recipe for fetching data using LLM and gpt-3.5-turbo to fill in those bounding boxes:
overturemaps download --bbox=$(llm 'Give me a bounding box for Alameda, California expressed as only four numbers delineated by commas, with no spaces, longitude preceding latitude.') -f geojsonseq --type=place | geojson-to-sqlite alameda.db places - --nl --pk=id
Buzzwords describe what you already intuitively know. At once they snap the ‘kaleidoscopic flux of impressions’ in your mind into form, crystallizing them instantly allowing you to both organize your knowledge and recognize you share it with other. This rapid, mental crystallization is what I call the buzzword whiplash. It gives buzzwords more importance and velocity, more power, than they objectively should have.
The potential energy stored within your mind is released by the buzzword whiplash. The buzzword is perceived as important partially because of what it describes but also because of the social and emotional weight felt when the buzzword recognizes your previously wordless experiences and demonstrates that those experiences are shared.
2023
Finding Bathroom Faucets with Embeddings. Absolutely the coolest thing I’ve seen someone build on top of my LLM tool so far: Drew Breunig is renovating a bathroom and needed a way to filter through literally thousands of options for facet taps. He scraped 20,000 images of fixtures from a plumbing supply site and used LLM to embed every one of them via CLIP... and now he can ask for “faucets that look like this one”, or even run searches for faucets that match “Gawdy” or “Bond Villain” or “Nintendo 64”. Live demo included!
When trying to get your head around a new technology, it helps to focus on how it challenges existing categorizations, conventions, and rule sets. Internally, I’ve always called this exercise, “dealing with the platypus in the room.” Named after the category-defying animal; the duck-billed, venomous, semi-aquatic, egg-laying mammal. [...] AI is the biggest platypus I’ve ever seen. Nearly every notable quality of AI and LLMs challenges our conventions, categories, and rulesets.