Spatial Joins in DuckDB (via) Extremely detailed overview by Max Gabrielsson of DuckDB's new spatial join optimizations.
Consider the following query, which counts the number of NYC Citi Bike Trips for each of the neighborhoods defined by the NYC Neighborhood Tabulation Areas polygons and returns the top three:
SELECT neighborhood, count(*) AS num_rides FROM rides JOIN hoods ON ST_Intersects( rides.start_geom, hoods.geom ) GROUP BY neighborhood ORDER BY num_rides DESC LIMIT 3;
The rides table contains 58,033,724 rows. The hoods table has polygons for 310 neighborhoods.
Without an optimized spatial joins this query requires a nested loop join, executing that expensive ST_Intersects()
operation 58m * 310 ~= 18 billion times. This took around 30 minutes on the 36GB MacBook M3 Pro used for the benchmark.
The first optimization described - implemented from DuckDB 1.2.0 onwards - uses a "piecewise merge join". This takes advantage of the fact that a bounding box intersection is a whole lot faster to calculate, especially if you pre-cache the bounding box (aka the minimum bounding rectangle or MBR) in the stored binary GEOMETRY
representation.
Rewriting the query to use a fast bounding box intersection and then only running the more expensive ST_Intersects()
filters on those matches drops the runtime from 1800 seconds to 107 seconds.
The second optimization, added in DuckDB 1.3.0 in May 2025 using the new SPATIAL_JOIN operator, is significantly more sophisticated.
DuckDB can now identify when a spatial join is working against large volumes of data and automatically build an in-memory R-Tree of bounding boxes for the larger of the two tables being joined.
This new R-Tree further accelerates the bounding box intersection part of the join, and drops the runtime down to just 30 seconds.
Recent articles
- The Summer of Johann: prompt injections as far as the eye can see - 15th August 2025
- Open weight LLMs exhibit inconsistent performance across providers - 15th August 2025
- LLM 0.27, the annotated release notes: GPT-5 and improved tool calling - 11th August 2025