Simon Willison on postgresql

151 posts tagged “postgresql”

2023

PostgreSQL Lock Conflicts (via) I absolutely love how extremely specific and niche this documentation site is. It details every single lock that PostgreSQL implements, and shows exactly which commands acquire that lock. That’s everything. I can imagine this becoming absurdly useful at extremely infrequent intervals for advanced PostgreSQL work.

# 23rd August 2023, 3:08 am / postgresql, documentation

Writing a chat application in Django 4.2 using async StreamingHttpResponse, Server-Sent Events and PostgreSQL LISTEN/NOTIFY (via) Excellent tutorial by Víðir Valberg Guðmundsson on implementing chat with server-sent events using the newly async-capable StreamingHttpResponse from Django 4.2x.

He uses PostgreSQL’a LISTEN/NOTIFY mechanism which can be used asynchronously in psycopg3—at the cost of a separate connection per user of the chat.

The article also covers how to use the Last-Event-ID header to implement reconnections in server-sent events, transmitting any events that may have been missed during the time that the connection was dropped.

# 19th May 2023, 3:42 pm / async, postgresql, asgi, django

Big Opportunities in Small Data

I gave an invited keynote at Citus Con 2023, the PostgreSQL conference. Below is the abstract, video, slides and links from the presentation.

[... 385 words]

3:06 am / 8th May 2023 / postgresql, sqlite, my-talks, datasette, small-data

Quicker serverless Postgres connections. Neon provide “serverless PostgreSQL”—autoscaling, managed PostgreSQL optimized for use with serverless hosting environments. A neat capability they provide is the ability to connect to a PostgreSQL server via WebSockets, which means their database can be used from environments such as Cloudflare workers which don’t have the ability to use a standard TCP database connection. This article describes some clever tricks they used to make establishing new connections via WebSockets more efficient, using the least possible number of network round-trips.

# 28th March 2023, 10:09 pm / postgresql, websockets

2022

Scaling Mastodon: The Compendium (via) Hazel Weakly’s collection of notes on scaling Mastodon, covering PostgreSQL, Sidekiq, Redis, object storage and more.

# 29th November 2022, 5:46 am / scaling, postgresql, mastodon, redis, sidekiq

Querying Postgres Tables Directly From DuckDB (via) I learned a lot of interesting PostgreSQL tricks from this write-up of the new DuckDB feature that allows it to run queries against PostgreSQL servers directly. It works using COPY (SELECT ...) TO STDOUT (FORMAT binary) which writes rows to the protocol stream in efficient binary format, but splits the table being read into parallel fetches against page ranges and uses SET TRANSACTION SNAPSHOT ... in those parallel queries to ensure they see the same transactional snapshot of the database.

# 3rd October 2022, 2:27 pm / postgresql, duckdb, databases

Crunchy Data: Learn Postgres at the Playground (via) Crunchy Data have a new PostgreSQL tutorial series, with a very cool twist: they have a build of PostgreSQL compiled to WebAssembly which runs in the browser, so each tutorial is accompanied by a working psql terminal that lets you try out the tutorial contents interactively. It even has support for PostGIS, though that particular tutorial has to load 80MB of assets in order to get it to work!

# 17th August 2022, 6:30 pm / postgresql, webassembly

Lesser Known Features of ClickHouse (via) I keep hearing positive noises about ClickHouse. I learned about a whole bunch of capabilities from this article—including that ClickHouse can directly query tables that are stored in SQLite or PostgreSQL.

# 31st May 2022, 7:48 pm / postgresql, clickhouse, sqlite

Glue code to quickly copy data from one Postgres table to another (via) The Python script that Retool used to migrate 4TB of data between two PostgreSQL databases. I find the structure of this script really interesting—it uses Python to spin up a queue full of ID ranges to be transferred and then starts some threads, but then each thread shells out to a command that runs “psql COPY (SELECT ...) TO STDOUT” and pipes the result to “psql COPY xxx FROM STDIN”. Clearly this works really well (“saturate the database’s hardware capacity” according to a comment on HN), and neatly sidesteps any issues with Python’s GIL.

# 19th April 2022, 4:57 pm / postgresql, python

Postgres Auditing in 150 lines of SQL (via) I’ve run up against the problem of tracking changes made to rows within a database table so many times, and I still don’t have a preferred solution. This approach to it looks very neat: it uses PostgreSQL triggers to populate a single audit table (as opposed to one audit table per tracked table) and records the previous and current column values for the row using jsonb.

# 9th March 2022, 7:19 pm / postgresql, databases

migra (via) This looks like a very handy tool to have around: run “migra postgresql:///a postgresql:///b” and it will detect and output the SQL alter statements needed to modify the first PostgreSQL database schema to match the second. It’s written in Python, running on top of SQLAlchemy.

# 26th February 2022, 11:23 pm / migrations, postgresql, databases

Single dependency stacks (via) Brandur Leach notes that the core services at Crunchy (admittedly a PostgreSQL hosting and consultancy company) have only one stateful dependency – Postgres. No Redis, ElasticSearch or anything else. This means that problems like rate limiting and search, which are often farmed out to external services, are all handled using either PostgreSQL or in-memory mechanisms on their servers.

# 9th February 2022, 6:43 pm / brandur-leach, postgresql

Tricking Postgres into using an insane – but 200x faster – query plan. Jacob Martin talks through a PostgreSQL query optimization they implemented at Spacelift, showing in detail how to interpret the results of EXPLAIN (FORMAT JSON, ANALYZE) using the explain.dalibo.com visualization tool.

# 18th January 2022, 8:53 pm / performance, optimization, postgresql

2021

Transactionally Staged Job Drains in Postgres. Any time I see people argue that relational databases shouldn’t be used to implement job queues I think of this post by Brandur from 2017. If you write to a queue before committing a transaction you run the risk of a queue consumer trying to read from the database before the new row becomes visible. If you write to the queue after the transaction there’s a risk an error might result in your message never being written. So: write to a relational staging table as part of the transaction, then have a separate process read from that table and write to the queue.

# 18th December 2021, 1:34 am / scaling, brandur-leach, postgresql, queues

Per-project PostgreSQL (via) Jamey Sharp describes an ingenious way of setting up PostgreSQL instances for each of your local development project, without depending on an always-running shared localhost database server. The trick is a shell script which creates a PGDATA folder in the current folder and then instantiates a PostgreSQL server in --single single user mode which listens on a Unix domain socket in that folder, instead of listening on the network. Jamey then uses direnv to automatically configure that PostgreSQL, initializing the DB if necessary, for each of his project folders.

# 3rd September 2021, 3:06 am / postgresql

Django SQL Dashboard 1.0

Earlier this week I released Django SQL Dashboard 1.0. I also just released 1.0.1, with a bug fix for PostgreSQL 10 contributed by Ryan Cheley.

[... 629 words]

2:19 am / 6th July 2021 / django, postgresql, projects, sql, datasette, weeknotes, github-actions, django-sql-dashboard

PostgreSQL: nbtree/README (via) The PostgreSQL source tree includes beatifully written README files for different parts of PostgreSQL. Here’s the README for their btree implementation—it continues to be actively maintained (last change was is March) and “git blame” shows that parts of the file date back 25 years, to 1996!

# 25th June 2021, 6:09 pm / computer-science, postgresql, databases

Hierarchical Structures in PostgreSQL (via) Two techniques I hadn’t seen before: the first is to define a materialized view using a CTE that offers efficient tree queries against a PostgreSQL array of path components (plus a trigger to update the materialized view), the second is with the PostgreSQL ltree extension which ships as part of PostgreSQL and hence should be widely available.

# 25th June 2021, 5:19 pm / sql, postgresql

Multi-region PostgreSQL on Fly (via) Really interesting piece of architectural design from Fly here. Fly can run your application (as a Docker container run using Firecracker) in multiple regions around the world, and they’ve now quietly added PostgreSQL multi-region support. The way it works is that all-but-one region can have a read-only replica, and requests sent to application servers can perform read-only queries against their local region’s replica. If a request needs to execute a SQL update your application code can return a “fly-replay: region=scl” HTTP header and the Fly CDN will transparently replay the request against the region containing the leader database. This also means you can implement tricks like setting a 10s expiring cookie every time the user performs a write, such that their requests in the next 10s will go straight to the leader and avoid them experiencing any replication lag that hasn’t caught up with their latest update.

# 17th June 2021, 6:39 pm / fly, scaling, replication, postgresql

explain.dalibo.com (via) By far the best tool I’ve seen for turning the output of PostgreSQL EXPLAIN ANALYZE into something I can actually understand—produces a tree visualization which includes clear explanations of what each step (such as a “Index Only Scan Node”) actually means.

# 28th May 2021, 5:41 pm / postgresql

Django SQL Dashboard

I’ve released the first non-alpha version of Django SQL Dashboard, which provides an interface for running arbitrary read-only SQL queries directly against a PostgreSQL database, protected by the Django authentication scheme. It can also be used to create saved dashboards that can be published or shared internally.

[... 2,171 words]

3:39 pm / 10th May 2021 / databases, django, postgresql, projects, sql, datasette, weeknotes, django-sql-dashboard

Practical SQL for Data Analysis (via) This is a really great SQL tutorial: it starts with the basics, but quickly moves on to a whole array of advanced PostgreSQL techniques - CTEs, window functions, efficient sampling, rollups, pivot tables and even linear regressions executed directly in the database using regr_slope(), regr_intercept() and regr_r2(). I picked up a whole bunch of tips for things I didn't know you could do with PostgreSQL here.

# 4th May 2021, 3:11 am / sql, postgresql, haki-benita

Adding GeoDjango to an existing Django project

Work on VIAL for Vaccinate The States continues.

[... 1,503 words]

6:38 am / 3rd May 2021 / django, geodjango, gis, postgresql, weeknotes, vaccinate-ca

A CSV export, JSON import workflow for bulk updating our data

I just added missing counties to around 1200 of our locations using a combination of tricks, and I thought they’d make a good blog post.

[... 1,429 words]

5 pm / 28th April 2021 / postgresql, vaccinate-ca, vaccinate-ca-blog

Weeknotes: Vaccinate The States, and how I learned that returning dozens of MB of JSON works just fine these days

On Friday VaccinateCA grew in scope, a lot: we launched a new website called Vaccinate The States. Patrick McKenzie wrote more about the project here—the short version is that we’re building the most comprehensive possible dataset of vaccine availability in the USA, using a combination of data collation, online research and continuing to make a huge number of phone calls.

[... 1,109 words]

1:02 am / 26th April 2021 / csv, django, django-admin, postgresql, projects, vaccines, weeknotes, vaccinate-ca, django-sql-dashboard

Porting VaccinateCA to Django

As I mentioned back in February, I’ve been working with the VaccinateCA project to try to bring the pandemic to an end a little earlier by helping gather as accurate a model as possible of where the Covid vaccine is available in California and how people can get it.

[... 2,157 words]

5:18 am / 12th April 2021 / django, django-admin, migrations, postgresql, weeknotes, airtable, vaccinate-ca, boring-technology

Weeknotes: django-sql-dashboard widgets

A few small releases this week, for django-sql-dashboard, datasette-auth-passwords and datasette-publish-vercel.

[... 1,025 words]

5:50 am / 21st March 2021 / authentication, dashboard, django, postgresql, projects, zeit-now, weeknotes, django-sql-dashboard, sphinx-docs

VIAL is now live, plus django-sql-dashboard

Our new Django backend has now officially graduated from preview mode! We’ve been running it to collect caller reports for Oregon for over a week now, and today we finally turned off the old Heroku app and promoted https://vial.calltheshots.us/ to be the place that our caller app writes to.

[... 672 words]

5 pm / 15th March 2021 / postgresql, vaccinate-ca, django-sql-dashboard, vaccinate-ca-blog

Weeknotes: tableau-to-sqlite, django-sql-dashboard

This week I started a limited production run of my new backend for Vaccinate CA calling, built a tableau-to-sqlite import tool and started working on a subset of Datasette for PostgreSQL and Django called django-sql-dashboard.

[... 792 words]

7:35 am / 14th March 2021 / django, postgresql, projects, sql, weeknotes, vaccinate-ca, django-sql-dashboard

Trying to end the pandemic a little earlier with VaccinateCA

This week I got involved with the VaccinateCA effort. We are trying to end the pandemic a little earlier, by building the most accurate database possible of vaccination locations and availability in California.

[... 1,154 words]

5:40 am / 28th February 2021 / crowdsourcing, django, postgresql, patrick-mckenzie, datasette, weeknotes, covid19, vaccinate-ca, personalnews

«« first « previous page 2 / 6 next » last »»