12 items tagged “drew-breunig”
2024
Turning Your Root URL Into a DuckDB Remote Database. Fun idea from Drew Breunig: DuckDB supports attaching existing databases that are accessible over HTTP using their URL. Drew suggests creating vanity URLs using your root domain, detecting the DuckDB user-agent and serving the database file directly - allowing tricks like this one:
ATTACH 'https://steplist.app/' AS steplist;
SELECT * FROM steplist.lists;
Generating Descriptive Weather Reports with LLMs. Drew Breunig produces the first example I've seen in the wild of the new LLM attachments Python API. Drew's Downtown San Francisco Weather Vibes project combines output from a JSON weather API with the latest image from a webcam pointed at downtown San Francisco to produce a weather report "with a style somewhere between Jack Kerouac and J. Peterman".
Here's the Python code that constructs and executes the prompt. The code runs in GitHub Actions.
The 3 AI Use Cases: Gods, Interns, and Cogs. Drew Breunig introduces an interesting new framework for categorizing use cases of modern AI:
- Gods refers to the autonomous, human replacement applications - I see that as AGI stuff that's still effectively science fiction.
- Interns are supervised copilots. This is how I get most of the value out of LLMs at the moment, delegating tasks to them that I can then review, such as AI-assisted programming.
- Cogs are the smaller, more reliable components that you can build pipelines and automations on top of without needing to review everything they do - think Whisper for transcriptions or maybe some limited LLM subtasks such as structured data extraction.
Drew also considers Toys as a subcategory of Interns: things like image generators, “defined by their usage by non-experts. Toys have a high tolerance for errors because they’re not being relied on for much beyond entertainment.”
Wikidata is a Giant Crosswalk File.
Drew Breunig shows how to take the 140GB Wikidata JSON export, use sed 's/,$//'
to convert it to newline-delimited JSON, then use DuckDB to run queries and extract external identifiers, including a query that pulls out 500MB of latitude and longitude points.
Conflating Overture Places Using DuckDB, Ollama, Embeddings, and More.
Drew Breunig's detailed tutorial on "conflation" - combining different geospatial data sources by de-duplicating address strings such as RESTAURANT LOS ARCOS,3359 FOOTHILL BLVD,OAKLAND,94601
and LOS ARCOS TAQUERIA,3359 FOOTHILL BLVD,OAKLAND,94601
.
Drew uses an entirely offline stack based around Python, DuckDB and Ollama and finds that a combination of H3 geospatial tiles and mxbai-embed-large
embeddings (though other embedding models should work equally well) gets really good results.
Towards Standardizing Place. Overture Maps announced General Availability of its global maps datasets last week, covering places, buildings, divisions, and base layers.
Drew Breunig demonstrates how this can be accessed using both the Overture Explorer tool and DuckDB, and talks about Overture's GERS IDs - reminiscent of Who's On First IDs - which provide stable IDs for all kinds of geographic places.
Smaller, Cheaper, Faster, Sober. Drew Breunig highlights the interesting pattern at the moment where the best models are all converging on GPT-4 class capabilities, while competing on speed and price—becoming smaller and faster. This holds for both the proprietary and the openly licensed models.
Will we see a sizable leap in capabilities when GPT-5 class models start to emerge? It’s hard to say for sure—anyone in a position to know that likely works for an AI lab with a multi-billion dollar valuation that hinges on the answer to that equation, so they’re not reliable sources of information until the models themselves are revealed.
A Plea for Sober AI. Great piece by Drew Breunig: “Imagine having products THIS GOOD and still over-selling them.”
A POI Database in One Line (via) Overture maps offer an extraordinarily useful freely licensed databases of POI (point of interest) listings, principally derived from partners such as Facebook and including restaurants, shops, museums and other locations from all around the world.
Their new "overturemaps" Python CLI utility makes it easy to quickly pull subsets of their data... but requires you to provide a bounding box to do so.
Drew Breunig came up with this delightful recipe for fetching data using LLM and gpt-3.5-turbo to fill in those bounding boxes:
overturemaps download --bbox=$(llm 'Give me a bounding box for Alameda, California expressed as only four numbers delineated by commas, with no spaces, longitude preceding latitude.') -f geojsonseq --type=place | geojson-to-sqlite alameda.db places - --nl --pk=id
Buzzwords describe what you already intuitively know. At once they snap the ‘kaleidoscopic flux of impressions’ in your mind into form, crystallizing them instantly allowing you to both organize your knowledge and recognize you share it with other. This rapid, mental crystallization is what I call the buzzword whiplash. It gives buzzwords more importance and velocity, more power, than they objectively should have.
The potential energy stored within your mind is released by the buzzword whiplash. The buzzword is perceived as important partially because of what it describes but also because of the social and emotional weight felt when the buzzword recognizes your previously wordless experiences and demonstrates that those experiences are shared.
2023
Finding Bathroom Faucets with Embeddings. Absolutely the coolest thing I’ve seen someone build on top of my LLM tool so far: Drew Breunig is renovating a bathroom and needed a way to filter through literally thousands of options for facet taps. He scraped 20,000 images of fixtures from a plumbing supply site and used LLM to embed every one of them via CLIP... and now he can ask for “faucets that look like this one”, or even run searches for faucets that match “Gawdy” or “Bond Villain” or “Nintendo 64”. Live demo included!
When trying to get your head around a new technology, it helps to focus on how it challenges existing categorizations, conventions, and rule sets. Internally, I’ve always called this exercise, “dealing with the platypus in the room.” Named after the category-defying animal; the duck-billed, venomous, semi-aquatic, egg-laying mammal. [...] AI is the biggest platypus I’ve ever seen. Nearly every notable quality of AI and LLMs challenges our conventions, categories, and rulesets.