Simon Willison’s Weblog

55 items tagged “postgresql”

Experiences with running PostgreSQL on Kubernetes (via) Fascinating interview that makes a solid argument for the idea that running stateful data stores like PostgreSQL or Cassandra is made harder, not easier when you add an orchestration tool like Kubernetes into the mix. # 13th August 2018, 2:30 pm

mycli. Really neat auto-complete enabled MySQL terminal client, built using the excellent python-prompt-toolkit. Has a sister-project for PostgreSQL called pgcli. # 11th June 2018, 7:08 pm

Showdown: MySQL 8 vs PostgreSQL 10 (via) MySQL 8 makes comparisons between PostgreSQL and MySQL far more interesting, as it closes some of the key feature gaps. Meanwhile the PostgreSQL replication story (long one of MySQL’s key advantages) has improved dramatically in recent versions. This article offers a useful overview of the current differences, including diving into some of the less obvious implementation details that differ between the two. # 23rd May 2018, 5:02 pm

Touring a Fast, Safe, and Complete(ish) Web Service in Rust. Brandur’s notes from building a high performance web service in Rust, using PostgreSQL via the Diesel ORM and the Rust actix-web framework which peovides Erlang-style actors and promise-based async concurrency. # 28th March 2018, 3:47 pm

User-defined Order in SQL (via) This is a fun intellectual exercise: how can one efficiently implement a user-defined order in a SQL table? The obvious initial approach is to have an integer position column, but this means every subsequent row must be updated when an item changes position. Joe “begriffs” Nelson explores some clever alternatives, including floating point or decimal positions (allowing new items to be inserted at a midpoint between existing positions) and a new custom rational number type he buiIt as a PostgreSQL extension. # 21st March 2018, 2:07 pm

Conditional aggregation in Django 2.0 (via) I hadn’t realised how clever this new Django ORM feature by Tom Forbes is. It lets you build an aggregation against a subset of rows, e.g. Client.objects.aggregate(regular=Count(’pk’, filter=Q(account_type=Client.REGULAR)))—then if you are using PostgreSQL it translates it into a fast FILTER WHERE clause, while other databases emulate the same behaviour using a CASE statement. # 3rd February 2018, 9:38 pm

How the Citus distributed database rebalances your data. Citus is a fascinating implementation of database sharding built on top of PostgreSQL primitives. PostgreSQL 10 introduced extremely flexible logical replication—in this post Craig Kerstiens explains how Citus use this new ability to re-balance shards (e.g. when you move from two to four physical PostgreSQL nodes) without downtime. # 1st February 2018, 10:50 pm

django-postgres-copy (via) Really neat Django queryset add-on which exposes the PostgreSQL COPY statement for importing (and exporting) CSV data. MyModel.objects.from_csv(“filename.csv”). Built by the team of data journalists at the California Civic Data Coalition. # 26th January 2018, 12:43 am

PostgreSQL Exercises. Excellent set of PostgreSQL exercises by Alisdair Owens, each with an interactive editor that lets you run your queries against a real database. Starts with the basics, but also covers advanced topics like recursive queries and window aggregate functions. # 6th December 2017, 4:20 pm

How a single PostgreSQL config change improved slow query performance by 50x. “If you are using SSDs and running PostgreSQL with default configuration, I encourage you to try tuning random_page_cost & seq_page_cost. You might be surprised by some huge performance improvements.” # 23rd November 2017, 8:11 pm

Scaling Postgres with Read Replicas & Using WAL to Counter Stale Reads (via) The problem with sending writes to the primary and balancing reads across replicas is dealing with replica lag—what if you write to the primary and then read from a replica that hasn’t had the new state applied to it yet? Brandur Leach dives deep into an elegant solution using PostgreSQL’s LSN (log sequence numbers) accesesed using pg_last_wal_replay_lsn(). An observer process continuously polls the replicas for their most recently applied LSN and stores them in a table. A column in the Users table then records the min_lsn valid for that user, updating it to the pg_current_wal_lsn() of the primary whenever that user makes a write. Combining the two allows the application to randomly select a replica that is up-to-date for the purposes of a specific user any time it needs to make a read. # 18th November 2017, 6:42 pm

django-multitenant (via) Absolutely fascinating Django library for horizontally sharding a database using a multi-tenant pattern, from the team at Citus. In this pattern every relevant table includes a “tenant_id”, and all queries should specifically select against that ID. Once you have that in place, you can shard your rows across multiple different databases and route to the correct database based on the tenant ID, safe in the knowledge that joins will still work provided they are against other rows belonging to the same tenant. # 16th November 2017, 9:12 pm

Redis Streams and the Unified Log. In which Brandur Leach explores the new Kafka-style streams functionality coming to Redis 4.0, and shows an example of a robust at-least once processing architecture built on a combination of Redis streams and PostgreSQL transactions. I really like the pattern of writing log records to a staging table in PostgreSQL first in order to bundle them up in the same transaction as the originating state change, then have a separate process read them from that table and publish them to Redis. # 8th November 2017, 4:37 pm

Cloud SQL for PostgreSQL adds high availability and replication. Google Cloud Platform now offers PostgreSQL with automatic asynchronous disk-level replication to a separate instance in a different availability zone, via their new “Regional Disks“ feature. Between this, Heroku, Citus and Amazon RDS the appeal of a self-maintained PostgreSQL instance continues to fall. # 7th November 2017, 1:49 pm

Squeezing every drop of performance out of a Django app on Heroku. Ben Firshman describes some lesser known tricks for scaling Django on Heroku—in particular, using gunicorn gevent asynchronous workers and setting up PostgreSQL connection pooling using django-db-geventpool. # 31st October 2017, 2:08 pm

Scaling the GitLab database. Lots of interesting details on how GitLab have worked to scale their PostgreSQL setup. They’ve avoided sharding so far, instead opting for database pooling with pgbouncer and read-only replicas using hot standbys. I like the way they deal with replica lag—they store the current WAL position in a redis key for the user every time there’s a write, then use pg_last_xlog_replay_location() on the various replicas to check and see if they have caught up next time the user makes a request that needs to read some data. # 30th October 2017, 8:53 pm

Benefit of TEXT with CHECK over VARCHAR(X) in PostgreSQL. Brandur suggests using “email TEXT CHECK (char_length(email) <= 255)” to define a column with a length limit in PostgreSQL over “VARCHAR(255)” because TEXT and VARCHAR are equally performant but a CHECK length can be changed later on without locking the table, whereas a VARCHAR requires an ALTER TABLE with an exclusive lock. # 28th October 2017, 12:59 am

Implementing Stripe-like Idempotency Keys in Postgres (via) Having clients send “idempotency keys” with API requests in order to be able to safely retry them if something’s goes wrong is a really neat trick for making transactional APIs more robust. Here Brandur Leach talks implementation strategies. # 27th October 2017, 5:51 pm

PostgreSQL 10 Released. Highlights include major improvements to parallelized queries, quorum commit for synchronous replication (sounds reminiscent of Cassandra) and logical replication, which allows modifications to specific tables to be replicated to different clusters. They’re also changing their versioning scheme to Major.Minor, so the next minor release will be 10.1 and the next major release will be 11. # 5th October 2017, 3:31 pm

Implementing faceted search with Django and PostgreSQL

I’ve added a faceted search engine to this blog, powered by PostgreSQL. It supports regular text search (proper search, not just SQL“like” queries), filter by tag, filter by date, filter by content type (entries vs blogmarks vs quotation) and any combination of the above. Some example searches:

[... 3048 words]

PostgreSQL: How can I store images in a database? What existing products makes it easy for a user to upload photos into a general database?

As a general rule, it’s a bad idea to store images in a database. This is due to the large amount of space they take up, which can affect database read performance and will greatly increases the size of your backups, making them both take longer and cost more to store.

[... 187 words]

What tools and techniques are used for relational database version control (structure and data)?

The term you are looking for is database migrations (sometimes called database change scripts).

[... 308 words]

When should one switch from MySQL to Oracle or PostgreSQL?

When your own benchmarks prove that your application’s particular load characteristics will perform better on another database—and the difference is large enough that it’s worth the cost involved in retargeting your code. If that cost is high (and it probably will be) it may be worth paying for some expert consultants to ensure that your implementations against the different databases are properly optimised.

[... 102 words]

MapOSMatic. Clever service built on top of OpenStreetMap, which renders double sided city maps with a map and grid on one size and an A-Z street name index on the other. Runs on top of Mapnik, PostGIS and Cairo, with a few thousand additional lines of Python and Django. # 11th July 2010, 12:15 pm

Appending the request URL to SQL statements in Django. A clever frame-walking monkey-patch which pulls the most recent HttpRequest object out of the Python stack and adds the current request.path to each SQL query as an SQL comment, so you can see it in debugging tools such as slow query logs and the PostgreSQL “select * from pg_stat_activity” query. # 2nd June 2010, 9:09 am

Doing things with Ordnance Survey OpenData. Jo Walsh’s guide to processing Ordnance Survey OpenData using PostgreSQL and PostGIS. # 20th May 2010, 3:22 pm

Installing GeoDjango Dependencies with Homebrew. brew update && brew install postgis && brew install gdal # 7th May 2010, 2:40 pm

PostgreSQL 9.0 Beta 1 Now Available. With asynchronous streaming replication. # 5th May 2010, 2:36 pm

grant XXX on * ? (via) PostgreSQL doesn’t have a way to say “this user is allowed to select/update/etc on all tables in database X”. That kind of sucks. UPDATE: This is fixed in PostgreSQL 9, see the comments. # 16th March 2010, 6:26 pm