Simon Willison’s Weblog

Subscribe
Atom feed for postgresql

146 posts tagged “postgresql”

2022

Querying Postgres Tables Directly From DuckDB (via) I learned a lot of interesting PostgreSQL tricks from this write-up of the new DuckDB feature that allows it to run queries against PostgreSQL servers directly. It works using COPY (SELECT ...) TO STDOUT (FORMAT binary) which writes rows to the protocol stream in efficient binary format, but splits the table being read into parallel fetches against page ranges and uses SET TRANSACTION SNAPSHOT ... in those parallel queries to ensure they see the same transactional snapshot of the database.

# 3rd October 2022, 2:27 pm / postgresql, duckdb, databases

Crunchy Data: Learn Postgres at the Playground (via) Crunchy Data have a new PostgreSQL tutorial series, with a very cool twist: they have a build of PostgreSQL compiled to WebAssembly which runs in the browser, so each tutorial is accompanied by a working psql terminal that lets you try out the tutorial contents interactively. It even has support for PostGIS, though that particular tutorial has to load 80MB of assets in order to get it to work!

# 17th August 2022, 6:30 pm / postgresql, webassembly

Lesser Known Features of ClickHouse (via) I keep hearing positive noises about ClickHouse. I learned about a whole bunch of capabilities from this article—including that ClickHouse can directly query tables that are stored in SQLite or PostgreSQL.

# 31st May 2022, 7:48 pm / postgresql, clickhouse, sqlite

Glue code to quickly copy data from one Postgres table to another (via) The Python script that Retool used to migrate 4TB of data between two PostgreSQL databases. I find the structure of this script really interesting—it uses Python to spin up a queue full of ID ranges to be transferred and then starts some threads, but then each thread shells out to a command that runs “psql COPY (SELECT ...) TO STDOUT” and pipes the result to “psql COPY xxx FROM STDIN”. Clearly this works really well (“saturate the database’s hardware capacity” according to a comment on HN), and neatly sidesteps any issues with Python’s GIL.

# 19th April 2022, 4:57 pm / postgresql, python

Postgres Auditing in 150 lines of SQL (via) I’ve run up against the problem of tracking changes made to rows within a database table so many times, and I still don’t have a preferred solution. This approach to it looks very neat: it uses PostgreSQL triggers to populate a single audit table (as opposed to one audit table per tracked table) and records the previous and current column values for the row using jsonb.

# 9th March 2022, 7:19 pm / postgresql, databases

migra (via) This looks like a very handy tool to have around: run “migra postgresql:///a postgresql:///b” and it will detect and output the SQL alter statements needed to modify the first PostgreSQL database schema to match the second. It’s written in Python, running on top of SQLAlchemy.

# 26th February 2022, 11:23 pm / migrations, postgresql, databases

Single dependency stacks (via) Brandur Leach notes that the core services at Crunchy (admittedly a PostgreSQL hosting and consultancy company) have only one stateful dependency – Postgres. No Redis, ElasticSearch or anything else. This means that problems like rate limiting and search, which are often farmed out to external services, are all handled using either PostgreSQL or in-memory mechanisms on their servers.

# 9th February 2022, 6:43 pm / brandur-leach, postgresql

Tricking Postgres into using an insane – but 200x faster – query plan. Jacob Martin talks through a PostgreSQL query optimization they implemented at Spacelift, showing in detail how to interpret the results of EXPLAIN (FORMAT JSON, ANALYZE) using the explain.dalibo.com visualization tool.

# 18th January 2022, 8:53 pm / performance, optimization, postgresql

2021

Transactionally Staged Job Drains in Postgres. Any time I see people argue that relational databases shouldn’t be used to implement job queues I think of this post by Brandur from 2017. If you write to a queue before committing a transaction you run the risk of a queue consumer trying to read from the database before the new row becomes visible. If you write to the queue after the transaction there’s a risk an error might result in your message never being written. So: write to a relational staging table as part of the transaction, then have a separate process read from that table and write to the queue.

# 18th December 2021, 1:34 am / scaling, brandur-leach, postgresql, queues

Per-project PostgreSQL (via) Jamey Sharp describes an ingenious way of setting up PostgreSQL instances for each of your local development project, without depending on an always-running shared localhost database server. The trick is a shell script which creates a PGDATA folder in the current folder and then instantiates a PostgreSQL server in --single single user mode which listens on a Unix domain socket in that folder, instead of listening on the network. Jamey then uses direnv to automatically configure that PostgreSQL, initializing the DB if necessary, for each of his project folders.

# 3rd September 2021, 3:06 am / postgresql

Django SQL Dashboard 1.0

Visit Django SQL Dashboard 1.0

Earlier this week I released Django SQL Dashboard 1.0. I also just released 1.0.1, with a bug fix for PostgreSQL 10 contributed by Ryan Cheley.

[... 629 words]

PostgreSQL: nbtree/README (via) The PostgreSQL source tree includes beatifully written README files for different parts of PostgreSQL. Here’s the README for their btree implementation—it continues to be actively maintained (last change was is March) and “git blame” shows that parts of the file date back 25 years, to 1996!

# 25th June 2021, 6:09 pm / computer-science, postgresql, databases

Hierarchical Structures in PostgreSQL (via) Two techniques I hadn’t seen before: the first is to define a materialized view using a CTE that offers efficient tree queries against a PostgreSQL array of path components (plus a trigger to update the materialized view), the second is with the PostgreSQL ltree extension which ships as part of PostgreSQL and hence should be widely available.

# 25th June 2021, 5:19 pm / sql, postgresql

Multi-region PostgreSQL on Fly (via) Really interesting piece of architectural design from Fly here. Fly can run your application (as a Docker container run using Firecracker) in multiple regions around the world, and they’ve now quietly added PostgreSQL multi-region support. The way it works is that all-but-one region can have a read-only replica, and requests sent to application servers can perform read-only queries against their local region’s replica. If a request needs to execute a SQL update your application code can return a “fly-replay: region=scl” HTTP header and the Fly CDN will transparently replay the request against the region containing the leader database. This also means you can implement tricks like setting a 10s expiring cookie every time the user performs a write, such that their requests in the next 10s will go straight to the leader and avoid them experiencing any replication lag that hasn’t caught up with their latest update.

# 17th June 2021, 6:39 pm / fly, scaling, replication, postgresql

explain.dalibo.com (via) By far the best tool I’ve seen for turning the output of PostgreSQL EXPLAIN ANALYZE into something I can actually understand—produces a tree visualization which includes clear explanations of what each step (such as a “Index Only Scan Node”) actually means.

# 28th May 2021, 5:41 pm / postgresql

Django SQL Dashboard

Visit Django SQL Dashboard

I’ve released the first non-alpha version of Django SQL Dashboard, which provides an interface for running arbitrary read-only SQL queries directly against a PostgreSQL database, protected by the Django authentication scheme. It can also be used to create saved dashboards that can be published or shared internally.

[... 2,171 words]

Practical SQL for Data Analysis (via) This is a really great SQL tutorial: it starts with the basics, but quickly moves on to a whole array of advanced PostgreSQL techniques - CTEs, window functions, efficient sampling, rollups, pivot tables and even linear regressions executed directly in the database using regr_slope(), regr_intercept() and regr_r2(). I picked up a whole bunch of tips for things I didn't know you could do with PostgreSQL here.

# 4th May 2021, 3:11 am / sql, postgresql, haki-benita

Adding GeoDjango to an existing Django project

Work on VIAL for Vaccinate The States continues.

[... 1,503 words]

A CSV export, JSON import workflow for bulk updating our data

I just added missing counties to around 1200 of our locations using a combination of tricks, and I thought they’d make a good blog post.

[... 1,429 words]

Weeknotes: Vaccinate The States, and how I learned that returning dozens of MB of JSON works just fine these days

Visit Weeknotes: Vaccinate The States, and how I learned that returning dozens of MB of JSON works just fine these days

On Friday VaccinateCA grew in scope, a lot: we launched a new website called Vaccinate The States. Patrick McKenzie wrote more about the project here—the short version is that we’re building the most comprehensive possible dataset of vaccine availability in the USA, using a combination of data collation, online research and continuing to make a huge number of phone calls.

[... 1,109 words]

Porting VaccinateCA to Django

Visit Porting VaccinateCA to Django

As I mentioned back in February, I’ve been working with the VaccinateCA project to try to bring the pandemic to an end a little earlier by helping gather as accurate a model as possible of where the Covid vaccine is available in California and how people can get it.

[... 2,157 words]

Weeknotes: django-sql-dashboard widgets

Visit Weeknotes: django-sql-dashboard widgets

A few small releases this week, for django-sql-dashboard, datasette-auth-passwords and datasette-publish-vercel.

[... 1,025 words]

VIAL is now live, plus django-sql-dashboard

Our new Django backend has now officially graduated from preview mode! We’ve been running it to collect caller reports for Oregon for over a week now, and today we finally turned off the old Heroku app and promoted https://vial.calltheshots.us/ to be the place that our caller app writes to.

[... 672 words]

Weeknotes: tableau-to-sqlite, django-sql-dashboard

Visit Weeknotes: tableau-to-sqlite, django-sql-dashboard

This week I started a limited production run of my new backend for Vaccinate CA calling, built a tableau-to-sqlite import tool and started working on a subset of Datasette for PostgreSQL and Django called django-sql-dashboard.

[... 792 words]

Trying to end the pandemic a little earlier with VaccinateCA

Visit Trying to end the pandemic a little earlier with VaccinateCA

This week I got involved with the VaccinateCA effort. We are trying to end the pandemic a little earlier, by building the most accurate database possible of vaccination locations and availability in California.

[... 1,154 words]

Django admin customization, JSON in our PostgreSQL

My progress slowed a bit today as I started digging into some things I’m less familiar with—but I’ve found some tricks that I think will help us out a lot.

[... 1,089 words]

Spinning up a new Django app to act as a backend for VaccinateCA

My goal by the end of this week is to have a working proof of concept for a Django + PostgreSQL app that can replace Airtable as the principle backend for the https://www.vaccinateca.com/ site. This proof of concept will allow us to make a go or no-go decision and figure out what else needs to be implemented before we can start using it to track calls.

[... 762 words]

Fuzzy Name Matching in Postgres. Paul Ramsey describes how to implement fuzzy name matching in PostgreSQL using the fuzzystrmatch extension and its levenshtein() and soundex() functions, plus functional indexes to query against indexed soundex first and then apply slower Levenshtein. The same tricks should also work against SQLite using the datasette-jellyfish plugin.

# 22nd February 2021, 9:16 pm / datasette, postgresql, sqlite

Cleaning Up Your Postgres Database (via) Craig Kerstiens provides some invaluable tips on running an initial check of the health of a PostgreSQL database, by using queries against the pg_statio_user_indexes table to find the memory cache hit ratio and the pg_stat_user_tables table to see what percentage of queries to your tables are using an index.

# 3rd February 2021, 7:32 am / performance, postgresql, databases, craig-kerstiens

2020

DuckDB (via) This is a really interesting, relatively new database. It’s kind of a weird hybrid between SQLite and PostgreSQL: it uses the PostgreSQL parser but models itself after SQLite in that databases are a single file and the code is designed for use as an embedded library, distributed in a single amalgamation C++ file (SQLite uses a C amalgamation). It features a “columnar-vectorized query execution engine” inspired by MonetDB (also by the DuckDB authors) and is hence designed to run analytical queries really quickly. You can install it using “pip install duckdb”—the resulting module feels similar to Python’s sqlite3, and follows roughly the same DBAPI pattern.

# 19th September 2020, 11:43 pm / sqlite, postgresql, databases, duckdb