Simon Willison’s Weblog

Subscribe

Blogmarks tagged performance

Filters: Type: blogmark × performance × Sorted by date


Optimizing SQLite for servers (via) Sylvain Kerkour’s comprehensive set of lessons learned running SQLite for server-based applications.

There’s a lot of useful stuff in here, including detailed coverage of the different recommended PRAGMA settings.

There was also a tip I haven’t seen before about “BEGIN IMMEDIATE” transactions:

“By default, SQLite starts transactions in DEFERRED mode: they are considered read only. They are upgraded to a write transaction that requires a database lock in-flight, when query containing a write/update/delete statement is issued.

The problem is that by upgrading a transaction after it has started, SQLite will immediately return a SQLITE_BUSY error without respecting the busy_timeout previously mentioned, if the database is already locked by another connection.

This is why you should start your transactions with BEGIN IMMEDIATE instead of only BEGIN. If the database is locked when the transaction starts, SQLite will respect busy_timeout.” # 31st March 2024, 8:16 pm

DiskCache (via) Grant Jenks built DiskCache as an alternative caching backend for Django (also usable without Django), using a SQLite database on disk. The performance numbers are impressive—it even beats memcached in microbenchmarks, due to avoiding the need to access the network.

The source code (particularly in core.py) is a great case-study in SQLite performance optimization, after five years of iteration on making it all run as fast as possible. # 19th March 2024, 3:43 pm

Batch size one billion: SQLite insert speedups, from the useful to the absurd (via) Useful, detailed review of ways to maximize the performance of inserting a billion integers into a SQLite database table. # 26th September 2023, 5:31 pm

How CPython Implements and Uses Bloom Filters for String Processing. Fascinating dive into Python string internals by Abhinav Upadhyay. It turns out CPython uses very simple bloom filters in several parts of the core string methods, to solve problems like splitting on newlines where there are actually eight codepoints that could represent a newline, and a tiny bloom filter can help filter a character in a single operation before performing all eight comparisons only if that first check failed. # 16th September 2023, 10:32 pm

Data-driven performance optimization with Rust and Miri (via) Useful guide to some Rust performance optimization tools. Miri can be used to dump out a detailed JSON profile of a program which can then be opened and explored using the Chrome browser’s performance tool. # 9th December 2022, 5:19 pm

Efficient Pagination Using Deferred Joins (via) Surprisingly simple trick for speeding up deep OFFSET x LIMIT y pagination queries, which get progressively slower as you paginate deeper into the data. Instead of applying them directly, apply them to a “select id from ...” query to fetch just the IDs, then either use a join or run a separate “select * from table where id in (...)” query to fetch the full records for that page. # 16th August 2022, 5:35 pm

Announcing Pyston-lite: our Python JIT as an extension module (via) The Pyston JIT can now be installed in any Python 3.8 virtual environment by running “pip install pyston_lite_autoload”—which includes a hook to automatically inject the JIT. I just tried a very rough benchmark against Datasette (ab -n 1000 -c 10) and got 391.20 requests/second without the JIT compared to 404.10 request/second with it. # 8th June 2022, 5:58 pm

Compiling Black with mypyc (via) Richard Si is a Black contributor who recently obtained a 2x performance boost by compiling Black using the mypyc tool from the mypy project, which uses Python type annotations to generate a compiled C version of the Python logic. He wrote up this fantastic three-part series describing in detail how he achieved this, including plenty of tips on Python profiling and clever optimization tricks. # 31st May 2022, 11:24 pm

Mypyc (via) Spotted this in the Black release notes: “Black is now compiled with mypyc for an overall 2x speed-up”. Mypyc is a tool that compiles Python modules (written in a subset of Python) to C extensions—similar to Cython but using just Python syntax, taking advantage of type annotations to perform type checking and type inference. It’s part of the mypy type checking project, which has been using it since 2019 to gain a 4x performance improvement over regular Python. # 30th January 2022, 1:31 am

Tricking Postgres into using an insane – but 200x faster – query plan. Jacob Martin talks through a PostgreSQL query optimization they implemented at Spacelift, showing in detail how to interpret the results of EXPLAIN (FORMAT JSON, ANALYZE) using the explain.dalibo.com visualization tool. # 18th January 2022, 8:53 pm

Cleaning Up Your Postgres Database (via) Craig Kerstiens provides some invaluable tips on running an initial check of the health of a PostgreSQL database, by using queries against the pg_statio_user_indexes table to find the memory cache hit ratio and the pg_stat_user_tables table to see what percentage of queries to your tables are using an index. # 3rd February 2021, 7:32 am

Making GitHub’s new homepage fast and performant. A couple of really clever tricks in this article by Tobias Ahlin. The first is using IntersectionObserver in conjunction with the video preload=“none” attribute to lazily load a video when it scrolls into view. The second is an ingenious trick to create an efficiently encoded transparent JPEG image: embed the image in a SVG file twice, once as the image and once as a transparency mask. # 29th January 2021, 7:05 pm

How Shopify Uses WebAssembly Outside of the Browser (via) I’m fascinated by applications of WebAssembly outside the browser. As a Python programmer I’m excited to see native code libraries getting compiled to WASM in a way that lets me call them from Python code via a bridge, but the other interesting application is executing untrusted code in a sandbox.

Shopify are doing exactly that—they are building a kind-of plugin mechanism where partner code compiled to WASM runs inside their architecture using Fastly’s Lucet. The performance numbers are in the same ballpark as native code.

Also interesting: they’re recommending AssemblyScript, a TypeScript-style language designed to compile directly to WASM without needing any additional interpreter support, as required by dynamic languages such as JavaScript, Python or Ruby. # 19th December 2020, 4:46 pm

The Now CDN (via) Huge announcement from Zeit Now today: all .now.sh deployments are now served through the Cloudflare CDN, which means they benefit from 150 worldwide CDN locations that obey HTTP caching headers. This is particularly relevant for Datasette, since it serves far-future cache headers by default and uses Cloudflare-compatible HTTP/2 push hints to accelerate 302 redirects. This means that both the “datasette publish now” CLI command and the Datasette Publish web app will now result in Cloudflare-accelerated deployments. # 12th July 2018, 3:34 am

Computer latency: 1977-2017 (via) Dan Luu used a 240 fps camera to investigate the latency between hitting a key and having the character show up on the display across four decades of computing devices... and found 1983’s Apple IIe outperformed everything else. He goes to great lengths to explain why in his accompanying write-up. # 26th December 2017, 12:28 am

Many Small Queries Are Efficient In SQLite. Since SQLite runs in-process rather than being accessed over a network it avoids the per-query overhead of network round trips. This means that while MySQL or PostgreSQL applications need to avoid N+1 query patterns that create 100s of queries per request, SQLite apps can be designed differently: provided you hit indexes or small tables, 200 queries just means 200 extra cheap function calls. # 26th November 2017, 4:24 pm

Your Web, Half a Second Sooner. Google AdSense now serves a tiny bit of JavaScript that loads everything else in a dynamically populated iframe, thus avoiding blocking the rest of the page load. It’s about time online advertising providers started taking page performance seriously. # 17th March 2011, 5:39 pm

Google and Microsoft Cheat on Slow-Start. Should You? Fascinating optimisation tricks by some of the big websites, which violate the RFC governing the TCP slow-start algorithm in order to perform better in the common case. # 3rd December 2010, 7:03 pm

Bees with machine guns! Low-cost, distributed load-testing using EC2. Great name for a useful project—Bees with machine guns is a Fabric script which fires up a bunch of EC2 instances, uses them to load test a website and then spins them back down again. # 27th October 2010, 11:04 pm

Velocity: Forcing Gzip Compression. Almost every browser supports gzip these days, but 15% of web requests have had their Accept-Encoding header stripped or mangled, generally due to poorly implemented proxies or anti-virus software. Steve Souders passes on a trick used by Google Search, where an iframe is used to test the browser’s gzip support and set a cookie to force gzipping of future pages. # 30th September 2010, 5:45 pm

Diffable: only download the deltas. JavaScript library for detecting and serving diffs to JavaScript rather than downloading large scripts every time a few lines of code are changed. “Using Diffable has reduced page load times in Google Maps by more than 1200 milliseconds (~25%). Note that this benefit only affects users that have an older version of the script in cache. For Google Maps that’s 20-25% of users.” # 11th July 2010, 12:19 pm

Lazy Load Plugin for jQuery. I’m using this jQuery plugin to save some bandwidth when people first view my Redis tutorial slides. It unobtrusively replaces images on a page with a placeholder graphic, then sets them to load automatically as the user scrolls down the page. # 26th April 2010, 12:02 am

Introducing the PyPy 1.2 release. It’s been a long time coming, but 1.2 is the first PyPy release to ship with a Just-in-Time compiler! Performance looks pretty impressive. # 12th March 2010, 11:54 pm

Is johnny-cache for you? “Using Johnny is really adopting a particular caching strategy. This strategy isn’t always a win; it can impact performance negatively”—but for a high percentage of Django sites there’s a very good chance it will be a net bonus. # 2nd March 2010, 11:44 am

Johnny Cache. Clever twist on ORM-level caching for Django. Johnny Cache (great name) monkey-patches Django’s QuerySet classes and caches the result of every single SELECT query in memcached with an infinite expiry time. The cache key includes a “generation” ID for each dependent database table, and the generation is changed every single time a table is updated. For apps with infrequent writes, this strategy should work really well—but if a popular table is being updated constantly the cache will be all but useless. Impressively, the system is transaction-aware—cache entries created during a transaction are held in local memory and only pushed to memcached should the transaction complete successfully. # 28th February 2010, 10:55 pm

Making Facebook 2x Faster. Facebook have a system called BigPipe which allows them to progressively send their pages to the browser as the server-side processing completes to optimise client loading time. Anyone reverse engineered this yet to figure out how they actually do it? # 19th February 2010, 9:14 am

HipHop for PHP: Move Fast. Facebook have open-sourced their internally developed PHP to C++ compiler. They serve 400 billion PHP pages a month (that’s more than 150,000 a second) so any performance improvement dramatically reduces their hardware costs, and HipHop drops the CPU usage on their web servers by an average of 50%. “We are serving over 90% of our Web traffic using HipHop, all only six months after deployment”. # 2nd February 2010, 6:59 pm

Dojo 1.4.1 vs jQuery 1.4.2pre on Taskspeed. John Resig’s reponse. When JavaScript libraries compete on performance, everybody wins. # 29th January 2010, 2:19 pm

Dojo: Still Twice As Fast When It Matters Most. Alex Russell shows how Dojo out-performs jQuery on the TaskSpeed benchmark, which attempts to represent common tasks in real-world applications and has had code that have been optimised by the development teams behind each of the libraries. # 28th January 2010, 10:40 pm

Reexamining Python 3 Text I/O. Python 3.1’s IO performance is a huge improvement over 3.0, but still considerably slower than 2.6. It turns out it’s all to do with Python 3’s unicode support: When you read a file in to a string, you’re asking Python to decode the bytes in to UTF-8 (the new default encoding) at the same time. If you open the file in binary mode Python 3 will read raw bytes in to a bytestring instead, avoiding the conversion overhead and performing only 4% slower than the equivalent code in Python 2.6.4. # 28th January 2010, 1:28 pm