Simon Willison’s Weblog

Subscribe
Atom feed for drew-breunig

12 items tagged “drew-breunig”

2024

Turning Your Root URL Into a DuckDB Remote Database. Fun idea from Drew Breunig: DuckDB supports attaching existing databases that are accessible over HTTP using their URL. Drew suggests creating vanity URLs using your root domain, detecting the DuckDB user-agent and serving the database file directly - allowing tricks like this one:

ATTACH 'https://steplist.app/' AS steplist;
SELECT * FROM steplist.lists;

# 1st December 2024, 10:02 pm / drew-breunig, duckdb

Generating Descriptive Weather Reports with LLMs. Drew Breunig produces the first example I've seen in the wild of the new LLM attachments Python API. Drew's Downtown San Francisco Weather Vibes project combines output from a JSON weather API with the latest image from a webcam pointed at downtown San Francisco to produce a weather report "with a style somewhere between Jack Kerouac and J. Peterman".

Here's the Python code that constructs and executes the prompt. The code runs in GitHub Actions.

# 29th October 2024, 11:12 pm / vision-llms, drew-breunig, llm, generative-ai, ai, llms, github-actions, prompt-engineering

The 3 AI Use Cases: Gods, Interns, and Cogs. Drew Breunig introduces an interesting new framework for categorizing use cases of modern AI:

  • Gods refers to the autonomous, human replacement applications - I see that as AGI stuff that's still effectively science fiction.
  • Interns are supervised copilots. This is how I get most of the value out of LLMs at the moment, delegating tasks to them that I can then review, such as AI-assisted programming.
  • Cogs are the smaller, more reliable components that you can build pipelines and automations on top of without needing to review everything they do - think Whisper for transcriptions or maybe some limited LLM subtasks such as structured data extraction.

Drew also considers Toys as a subcategory of Interns: things like image generators, “defined by their usage by non-experts. Toys have a high tolerance for errors because they’re not being relied on for much beyond entertainment.”

# 20th October 2024, 10:12 pm / drew-breunig, ai-assisted-programming, generative-ai, ai, llms

Wikidata is a Giant Crosswalk File. Drew Breunig shows how to take the 140GB Wikidata JSON export, use sed 's/,$//' to convert it to newline-delimited JSON, then use DuckDB to run queries and extract external identifiers, including a query that pulls out 500MB of latitude and longitude points.

# 5th October 2024, 3:45 pm / wikipedia, drew-breunig, duckdb, json

Conflating Overture Places Using DuckDB, Ollama, Embeddings, and More. Drew Breunig's detailed tutorial on "conflation" - combining different geospatial data sources by de-duplicating address strings such as RESTAURANT LOS ARCOS,3359 FOOTHILL BLVD,OAKLAND,94601 and LOS ARCOS TAQUERIA,3359 FOOTHILL BLVD,OAKLAND,94601.

Drew uses an entirely offline stack based around Python, DuckDB and Ollama and finds that a combination of H3 geospatial tiles and mxbai-embed-large embeddings (though other embedding models should work equally well) gets really good results.

# 30th September 2024, 5:24 pm / drew-breunig, gis, duckdb, python, ai, embeddings, overture, ollama

Towards Standardizing Place. Overture Maps announced General Availability of its global maps datasets last week, covering places, buildings, divisions, and base layers.

Drew Breunig demonstrates how this can be accessed using both the Overture Explorer tool and DuckDB, and talks about Overture's GERS IDs - reminiscent of Who's On First IDs - which provide stable IDs for all kinds of geographic places.

# 1st August 2024, 11:14 pm / whosonfirst, drew-breunig, geospatial, gis, overture

Smaller, Cheaper, Faster, Sober. Drew Breunig highlights the interesting pattern at the moment where the best models are all converging on GPT-4 class capabilities, while competing on speed and price—becoming smaller and faster. This holds for both the proprietary and the openly licensed models.

Will we see a sizable leap in capabilities when GPT-5 class models start to emerge? It’s hard to say for sure—anyone in a position to know that likely works for an AI lab with a multi-billion dollar valuation that hinges on the answer to that equation, so they’re not reliable sources of information until the models themselves are revealed.

# 20th July 2024, 4:39 pm / drew-breunig, llms, ai, generative-ai

A Plea for Sober AI. Great piece by Drew Breunig: “Imagine having products THIS GOOD and still over-selling them.”

# 19th May 2024, 12:47 am / drew-breunig, llms, ai, generative-ai

A POI Database in One Line (via) Overture maps offer an extraordinarily useful freely licensed databases of POI (point of interest) listings, principally derived from partners such as Facebook and including restaurants, shops, museums and other locations from all around the world.

Their new "overturemaps" Python CLI utility makes it easy to quickly pull subsets of their data... but requires you to provide a bounding box to do so.

Drew Breunig came up with this delightful recipe for fetching data using LLM and gpt-3.5-turbo to fill in those bounding boxes:

overturemaps download --bbox=$(llm 'Give me a bounding box for Alameda, California expressed as only four numbers delineated by commas, with no spaces, longitude preceding latitude.') -f geojsonseq --type=place | geojson-to-sqlite alameda.db places - --nl --pk=id

# 19th April 2024, 2:44 am / drew-breunig, llm, gis, geojson, overture

Buzzwords describe what you already intuitively know. At once they snap the ‘kaleidoscopic flux of impressions’ in your mind into form, crystallizing them instantly allowing you to both organize your knowledge and recognize you share it with other. This rapid, mental crystallization is what I call the buzzword whiplash. It gives buzzwords more importance and velocity, more power, than they objectively should have.

The potential energy stored within your mind is released by the buzzword whiplash. The buzzword is perceived as important partially because of what it describes but also because of the social and emotional weight felt when the buzzword recognizes your previously wordless experiences and demonstrates that those experiences are shared.

Drew Breunig

# 5th March 2024, 7:56 pm / drew-breunig, linguistics

2023

Finding Bathroom Faucets with Embeddings. Absolutely the coolest thing I’ve seen someone build on top of my LLM tool so far: Drew Breunig is renovating a bathroom and needed a way to filter through literally thousands of options for facet taps. He scraped 20,000 images of fixtures from a plumbing supply site and used LLM to embed every one of them via CLIP... and now he can ask for “faucets that look like this one”, or even run searches for faucets that match “Gawdy” or “Bond Villain” or “Nintendo 64”. Live demo included!

# 27th September 2023, 6:18 pm / llm, embeddings, generative-ai, ai, drew-breunig, clip

When trying to get your head around a new technology, it helps to focus on how it challenges existing categorizations, conventions, and rule sets. Internally, I’ve always called this exercise, “dealing with the platypus in the room.” Named after the category-defying animal; the duck-billed, venomous, semi-aquatic, egg-laying mammal. [...] AI is the biggest platypus I’ve ever seen. Nearly every notable quality of AI and LLMs challenges our conventions, categories, and rulesets.

Drew Breunig

# 8th May 2023, 11:14 pm / ai, generative-ai, drew-breunig