Simon Willison on sql

102 posts tagged “sql”

2022

Efficient Pagination Using Deferred Joins (via) Surprisingly simple trick for speeding up deep OFFSET x LIMIT y pagination queries, which get progressively slower as you paginate deeper into the data. Instead of applying them directly, apply them to a “select id from ...” query to fetch just the IDs, then either use a join or run a separate “select * from table where id in (...)” query to fetch the full records for that page.

# 16th August 2022, 5:35 pm / sql, performance

The Checkered Flag Diagram for visualizing SQL joins. I really like this alternative to Venn diagrams for showing the difference between different types of SQL join (left join, right join, cross join etc).

# 20th July 2022, 1:16 pm / sql

Joining CSV files in your browser using Datasette Lite

I added a new feature to Datasette Lite—my version of Datasette that runs entirely in your browser using WebAssembly (previously): you can now use it to load one or more CSV files by URL, and then run SQL queries against them—including joins across data from multiple files.

[... 546 words]

9:20 pm / 20th June 2022 / sql, webassembly, datasette, projects, csv, datasette-lite, cors

How Materialize and other databases optimize SQL subqueries. Jamie Brandon offers a survey of the state-of-the-art in optimizing correlated subqueries, across a number of different database engines.

# 15th May 2022, 8:24 pm / sql

2021

SQLModel. A new project by FastAPI creator Sebastián Ramírez: SQLModel builds on top of both SQLAlchemy and Sebastián’s Pydantic validation library to provide a new ORM that’s designed around Python 3’s optional typing. The real brilliance here is that a SQLModel subclass is simultaneously a valid SQLAlchemy ORM model AND a valid Pydantic validation model, saving on duplicate code by allowing the same class to be used both for form/API validation and for interacting with the database.

# 24th August 2021, 11:16 pm / orm, sql, sqlalchemy, python, pydantic

Datasette on Codespaces, sqlite-utils API reference documentation and other weeknotes

This week I broke my streak of not sending out the Datasette newsletter, figured out how to use Sphinx for Python class documentation, worked out how to run Datasette on GitHub Codespaces, implemented Datasette column metadata and got tantalizingly close to a solution for an elusive Datasette feature.

[... 2,164 words]

4:57 am / 14th August 2021 / sql, sqlite, datasette, sqlite-utils, documentation, github, weeknotes, mypy, github-codespaces

Bare columns in an aggregate queries. This is a really nice SQL tweak implemented in SQLite: If you run a query like “SELECT a, b, max(c) FROM tab1 GROUP BY a” SQLite will find the row with the highest value for c and use the columns of that row as the returned values for the other columns mentioned in the query.

# 10th August 2021, 1:29 am / sql, sqlite

Django SQL Dashboard 1.0

Earlier this week I released Django SQL Dashboard 1.0. I also just released 1.0.1, with a bug fix for PostgreSQL 10 contributed by Ryan Cheley.

[... 629 words]

2:19 am / 6th July 2021 / django-sql-dashboard, sql, github-actions, datasette, projects, postgresql, django, weeknotes

Django SQL Dashboard 1.0 (via) As part of my ongoing attempt to be braver about 1.0 releases (crucial if you want to do semantic versioning properly) I’ve released version 1.0 of Django SQL Dashboard, my Datasette-inspired app for Django that adds an interface for running read-only, bookmarkable SQL queries against a PostgreSQL database. The new version adds a column cog menu providing shortcuts for changing the sort order, counting distinct values and performing a group-by/count against column values.

# 1st July 2021, 5:44 pm / projects, django-sql-dashboard, sql, django

Hierarchical Structures in PostgreSQL (via) Two techniques I hadn’t seen before: the first is to define a materialized view using a CTE that offers efficient tree queries against a PostgreSQL array of path components (plus a trigger to update the materialized view), the second is with the PostgreSQL ltree extension which ships as part of PostgreSQL and hence should be widely available.

# 25th June 2021, 5:19 pm / sql, postgresql

Joining CSV and JSON data with an in-memory SQLite database

The new sqlite-utils memory command can import CSV and JSON data directly into an in-memory SQLite database, combine and query it using SQL and output the results as CSV, JSON or various other formats of plain text tables.

[... 1,507 words]

10:55 pm / 19th June 2021 / sql, projects, sqlite, sqlite-utils, csv, json, weeknotes

The humble hash aggregate (via) Today I learned that “hash aggregate” is the name for the algorithm where you split a list of tuples on a common key, run an aggregation against each resulting group and combine the results back together again—I’d previously thought if this in terms of map/reduce but hash aggregate is a much older term used widely by SQL engines—I’ve seen it come up in PostgreSQL explain query output (for GROUP BY) before but didn’t know what it meant.

# 6th June 2021, 4:03 pm / mapreduce, sql, algorithms

Django SQL Dashboard

I’ve released the first non-alpha version of Django SQL Dashboard, which provides an interface for running arbitrary read-only SQL queries directly against a PostgreSQL database, protected by the Django authentication scheme. It can also be used to create saved dashboards that can be published or shared internally.

[... 2,171 words]

3:39 pm / 10th May 2021 / databases, datasette, projects, postgresql, django, weeknotes, django-sql-dashboard, sql

Practical SQL for Data Analysis (via) This is a really great SQL tutorial: it starts with the basics, but quickly moves on to a whole array of advanced PostgreSQL techniques - CTEs, window functions, efficient sampling, rollups, pivot tables and even linear regressions executed directly in the database using regr_slope(), regr_intercept() and regr_r2(). I picked up a whole bunch of tips for things I didn't know you could do with PostgreSQL here.

# 4th May 2021, 3:11 am / sql, postgresql, haki-benita

Weeknotes: tableau-to-sqlite, django-sql-dashboard

This week I started a limited production run of my new backend for Vaccinate CA calling, built a tableau-to-sqlite import tool and started working on a subset of Datasette for PostgreSQL and Django called django-sql-dashboard.

[... 792 words]

7:35 am / 14th March 2021 / vaccinate-ca, sql, projects, django, weeknotes, django-sql-dashboard, postgresql

Everything You Always Wanted To Know About GitHub (But Were Afraid To Ask) (via) ClickHouse by Yandex is an open source column-oriented data warehouse, designed to run analytical queries against TBs of data. They've loaded the full GitHub Archive of events since 2011 into a public instance, which is a great way of both exploring GitHub activity and trying out ClickHouse. Here's a query I just ran that shows number of watch events per year, for example:

SELECT toYear(created_at) as yyyy, count()
FROM github_events
WHERE event_type = 'WatchEvent' group by yyyy

# 5th January 2021, 1:02 am / big-data, sql, analytics, github, clickhouse

2020

Some SQL Tricks of an Application DBA (via) This post taught me so many PostgreSQL tricks that I hadn’t seen before. Did you know you can start a transaction, drop an index, run explain and then rollback the transaction (cancelling the index drop) to see what explain would look like without that index? Among other things I also learned what the “correlation” database statistic does: it’s a measure of how close-to-sorted the values in a specific column are, which helps PostgreSQL decide if it should do an index scan or a bitmap scan when making use of an index.

# 29th July 2020, 7:04 pm / sql, postgresql

How much can you learn from just two columns?

Derek Willis shared an intriguing dataset this morning: a table showing every Twitter account followed by an official GOP congressional Twitter account.

[... 951 words]

6:46 pm / 15th June 2020 / twitter, sql, politics, datasette, derek-willis

Get Started—Materialize. Materialize is a really interesting new database—“a streaming SQL materialized view engine”. It builds materialized views on top of streaming data sources (such as Kafka)—you define the view using a SQL query, then it figures out how to keep that view up-to-date automatically as new data streams in. It speaks the PostgreSQL protocol so you can talk to it using the psql tool or any PostgreSQL client library. The “get started” guide is particularly impressive: it uses a curl stream of the Wikipedia recent changes API, parsed using a regular expression. And it’s written in Rust, so installing it is as easy as downloading and executing a single binary (though I used Homebrew).

# 1st June 2020, 10:11 pm / sql, rust, postgresql, databases, kafka

Using SQL to Look Through All of Your iMessage Text Messages (via) Dan Kelch shows how to access the iMessage SQLite database at ~/Library/Messages/chat.db—it’s protected under macOS Catalina so you have to enable Full Disk Access in the privacy settings first. I usually use the macOS terminal app but I installed iTerm for this because I’d rather enable full disk access to a separate terminal program than let anything I’m running in my regular terminal take advantage of it. It worked! Now I can run “datasette ~/Library/Messages/chat.db” to browse my messages.

# 22nd May 2020, 4:45 pm / datasette, apple, sql, sqlite

Using SQL to find my best photo of a pelican according to Apple Photos

According to the Apple Photos internal SQLite database, this is the most aesthetically pleasing photograph I have ever taken of a pelican:

[... 1,937 words]

7:16 pm / 21st May 2020 / photos, sql, sqlite, datasette, projects, apple-photos, dogsheep, weeknotes, photography

How Super Graph compiles GraphQL to a single SQL query. Super Graph is a GraphQL server that compiles arbitrarily nested GraphQL queries to “a single fast SQL query”. I’ve always wondered how that could possible work, so I asked author Vikram Rangnekar for an example of a compiled query—it turns out it uses a brilliant sequence of JSON aggregations to glue together results from nested subqueries and left outer joins.

# 16th April 2020, 10:52 pm / sql, graphql

SQL is a better API language than GraphQL – Convince me otherwise (via) A flippant tweet I posted this morning blew up today and ended up on the Hacker News homepage.

# 16th April 2020, 10:44 pm / webapis, sql, hacker-news, graphql

I called it normalization because then President Nixon was talking a lot about normalizing relations with China. I figured that if he could normalize relations, so could I.

— Edgar F. Codd

# 7th March 2020, 11:12 pm / sql, databases

Generated Columns in SQLite (via) SQLite 3.31.0 released today, and generated columns are the single most notable new feature. PostgreSQL 12 added these in October 2019, and MySQL has had them since 5.7 in October 2015. MySQL and SQLite both offer either “stored” or “virtual” generated columns, with virtual columns being calculated at runtime. PostgreSQL currently only supports stored columns.

# 24th January 2020, 4:20 am / mysql, sql, postgresql, sqlite

2019

sqlite-utils 2.0: real upserts

I just released version 2.0 of my sqlite-utils library/CLI tool to PyPI.

[... 1,140 words]

6:48 am / 30th December 2019 / sql, sqlite, museums, upsert, projects, weeknotes, sqlite-utils

athena-sqlite (via) Amazon Athena is the AWS tool for querying data stored in S3—as CSV, JSON or Apache Parquet files—using SQL. It’s an interesting way of buliding a very cheap data warehouse on top of S3 without having to run any additional services. Athena recently added a query federation SDK which lets you define additional custom data sources using Lambda functions. Damon Cortesi used this to write a custom connector for SQLite, which lets you run queries against data stored in SQLite files that you have uploaded to S3. You can then run joins between that data and other Athena sources.

# 18th December 2019, 9:05 am / sqlite, sql, aws, athena, s3

Weeknotes: datasette-template-sql

Last week I talked about wanting to take ona a larger Datasette project, and listed some candidates. I ended up pushing a big project that I hadn’t listed there: the upgrade of Datasette to Python 3.8, which meant dropping support for Python 3.5 (thanks to incompatible dependencies).

[... 521 words]

10:29 pm / 18th November 2019 / projects, sql, datasette, weeknotes

datasette-template-sql (via) New Datasette plugin, celebrating the new ability in Datasette 0.32 to have asynchronous custom template functions in Jinja (which was previously blocked by the need to support Python 3.5). The plugin adds a sql() function which can be used to execute SQL queries that are embedded directly in custom templates.

# 15th November 2019, 12:59 am / projects, sql, templates, datasette, jinja

Calling C functions from BigQuery with web assembly (via) Google BigQuery lets you define custom SQL functions in JavaScript, and it turns out they expose the WebAssembly.instantiate family of APIs. Which means you can write your UDD in C or Rust, compile it to WebAssembly and run it as part of your query!

# 27th October 2019, 5:55 am / c, sql, rust, webassembly

«« first « previous page 2 / 4 next » last »»