Simon Willison’s Weblog

47 items tagged “postgresql”

PostgreSQL Exercises. Excellent set of PostgreSQL exercises by Alisdair Owens, each with an interactive editor that lets you run your queries against a real database. Starts with the basics, but also covers advanced topics like recursive queries and window aggregate functions. # 6th December 2017, 4:20 pm

How a single PostgreSQL config change improved slow query performance by 50x. “If you are using SSDs and running PostgreSQL with default configuration, I encourage you to try tuning random_page_cost & seq_page_cost. You might be surprised by some huge performance improvements.” # 23rd November 2017, 8:11 pm

Scaling Postgres with Read Replicas & Using WAL to Counter Stale Reads (via) The problem with sending writes to the primary and balancing reads across replicas is dealing with replica lag—what if you write to the primary and then read from a replica that hasn’t had the new state applied to it yet? Brandur Leach dives deep into an elegant solution using PostgreSQL’s LSN (log sequence numbers) accesesed using pg_last_wal_replay_lsn(). An observer process continuously polls the replicas for their most recently applied LSN and stores them in a table. A column in the Users table then records the min_lsn valid for that user, updating it to the pg_current_wal_lsn() of the primary whenever that user makes a write. Combining the two allows the application to randomly select a replica that is up-to-date for the purposes of a specific user any time it needs to make a read. # 18th November 2017, 6:42 pm

django-multitenant (via) Absolutely fascinating Django library for horizontally sharding a database using a multi-tenant pattern, from the team at Citus. In this pattern every relevant table includes a “tenant_id”, and all queries should specifically select against that ID. Once you have that in place, you can shard your rows across multiple different databases and route to the correct database based on the tenant ID, safe in the knowledge that joins will still work provided they are against other rows belonging to the same tenant. # 16th November 2017, 9:12 pm

Redis Streams and the Unified Log. In which Brandur Leach explores the new Kafka-style streams functionality coming to Redis 4.0, and shows an example of a robust at-least once processing architecture built on a combination of Redis streams and PostgreSQL transactions. I really like the pattern of writing log records to a staging table in PostgreSQL first in order to bundle them up in the same transaction as the originating state change, then have a separate process read them from that table and publish them to Redis. # 8th November 2017, 4:37 pm

Cloud SQL for PostgreSQL adds high availability and replication. Google Cloud Platform now offers PostgreSQL with automatic asynchronous disk-level replication to a separate instance in a different availability zone, via their new “Regional Disks“ feature. Between this, Heroku, Citus and Amazon RDS the appeal of a self-maintained PostgreSQL instance continues to fall. # 7th November 2017, 1:49 pm

Squeezing every drop of performance out of a Django app on Heroku. Ben Firshman describes some lesser known tricks for scaling Django on Heroku—in particular, using gunicorn gevent asynchronous workers and setting up PostgreSQL connection pooling using django-db-geventpool. # 31st October 2017, 2:08 pm

Scaling the GitLab database. Lots of interesting details on how GitLab have worked to scale their PostgreSQL setup. They’ve avoided sharding so far, instead opting for database pooling with pgbouncer and read-only replicas using hot standbys. I like the way they deal with replica lag—they store the current WAL position in a redis key for the user every time there’s a write, then use pg_last_xlog_replay_location() on the various replicas to check and see if they have caught up next time the user makes a request that needs to read some data. # 30th October 2017, 8:53 pm

Benefit of TEXT with CHECK over VARCHAR(X) in PostgreSQL. Brandur suggests using “email TEXT CHECK (char_length(email) <= 255)” to define a column with a length limit in PostgreSQL over “VARCHAR(255)” because TEXT and VARCHAR are equally performant but a CHECK length can be changed later on without locking the table, whereas a VARCHAR requires an ALTER TABLE with an exclusive lock. # 28th October 2017, 12:59 am

Implementing Stripe-like Idempotency Keys in Postgres (via) Having clients send “idempotency keys” with API requests in order to be able to safely retry them if something’s goes wrong is a really neat trick for making transactional APIs more robust. Here Brandur Leach talks implementation strategies. # 27th October 2017, 5:51 pm

PostgreSQL 10 Released. Highlights include major improvements to parallelized queries, quorum commit for synchronous replication (sounds reminiscent of Cassandra) and logical replication, which allows modifications to specific tables to be replicated to different clusters. They’re also changing their versioning scheme to Major.Minor, so the next minor release will be 10.1 and the next major release will be 11. # 5th October 2017, 3:31 pm

Implementing faceted search with Django and PostgreSQL

I’ve added a faceted search engine to this blog, powered by PostgreSQL. It supports regular text search (proper search, not just SQL“like” queries), filter by tag, filter by date, filter by content type (entries vs blogmarks vs quotation) and any combination of the above. Some example searches:

[... 3048 words]

PostgreSQL: How can I store images in a database? What existing products makes it easy for a user to upload photos into a general database?

As a general rule, it’s a bad idea to store images in a database. This is due to the large amount of space they take up, which can affect database read performance and will greatly increases the size of your backups, making them both take longer and cost more to store.

[... 187 words]

What tools and techniques are used for relational database version control (structure and data)?

The term you are looking for is database migrations (sometimes called database change scripts).

[... 308 words]

When should one switch from MySQL to Oracle or PostgreSQL?

When your own benchmarks prove that your application’s particular load characteristics will perform better on another database—and the difference is large enough that it’s worth the cost involved in retargeting your code. If that cost is high (and it probably will be) it may be worth paying for some expert consultants to ensure that your implementations against the different databases are properly optimised.

[... 102 words]

MapOSMatic. Clever service built on top of OpenStreetMap, which renders double sided city maps with a map and grid on one size and an A-Z street name index on the other. Runs on top of Mapnik, PostGIS and Cairo, with a few thousand additional lines of Python and Django. # 11th July 2010, 12:15 pm

Appending the request URL to SQL statements in Django. A clever frame-walking monkey-patch which pulls the most recent HttpRequest object out of the Python stack and adds the current request.path to each SQL query as an SQL comment, so you can see it in debugging tools such as slow query logs and the PostgreSQL “select * from pg_stat_activity” query. # 2nd June 2010, 9:09 am

Doing things with Ordnance Survey OpenData. Jo Walsh’s guide to processing Ordnance Survey OpenData using PostgreSQL and PostGIS. # 20th May 2010, 3:22 pm

Installing GeoDjango Dependencies with Homebrew. brew update && brew install postgis && brew install gdal # 7th May 2010, 2:40 pm

PostgreSQL 9.0 Beta 1 Now Available. With asynchronous streaming replication. # 5th May 2010, 2:36 pm

grant XXX on * ? (via) PostgreSQL doesn’t have a way to say “this user is allowed to select/update/etc on all tables in database X”. That kind of sucks. UPDATE: This is fixed in PostgreSQL 9, see the comments. # 16th March 2010, 6:26 pm

jacobian’s django-deployment-workshop. Notes and resources from Jacob’s 3 hour Django deployment workshop at PyCon, including example configuration files for Apache2 + mod_wsgi, nginx, PostgreSQL and pgpool. # 19th February 2010, 2:28 pm

PostgreSQL 8.5alpha3 now available. “Hot Standby, allowing read-only connections during recovery, provides a built-in master-slave replication solution.” Woohoo! # 23rd December 2009, 9:57 am

Django-Jython 1.0.0 released! Now with database backends for PostgreSQL, Oracle and MySQL. The next release (planned for next month) should provide full compatibility with Django 1.1—the current release has 1.1 support for PostgreSQL but only 1.0 support for the other two databases. # 9th November 2009, 1:53 pm

PostgreSQL 8.5 alpha 2 is out. “P.S. If you’re wondering about Hot Standby and Synchronous Replication, they’re still under heavy development and still (at this point) expected to be in 8.5.”—Hot Standby is PostgreSQL-speak for MySQL-style master/slave replication for scaling your reads. # 28th October 2009, 9:02 am

Londiste Tutorial. Master/slave replication for PostgreSQL, developed and used by Skype. # 9th September 2009, 11:06 pm

Mandelbrot set in PostgreSQL. Surprisingly short SQL statement that produces an ASCII art Mandelbrot set. # 13th August 2009, 2:23 pm

Install Django, GeoDjango, PostgreSQL and PostGIS on OSX Leopard. This tutorial worked perfectly for me. # 24th July 2009, 11:47 am

EveryBlock source code released. EveryBlock’s Knight Foundation grant required them to release the source code after two years, under the GPL. Lots of neat Django / PostgreSQL / GIS tricks to be found within. # 1st July 2009, 8:01 pm