76 items tagged “performance”
The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase vocab size from 50257 to 50304 (nearest multiple of 64). This calculates added useless dimensions but goes down a different kernel path with much higher occupancy. Careful with your Powers of 2.
— Andrej Karpathy # 4th February 2023, 12:08 am
Data-driven performance optimization with Rust and Miri (via) Useful guide to some Rust performance optimization tools. Miri can be used to dump out a detailed JSON profile of a program which can then be opened and explored using the Chrome browser’s performance tool. # 9th December 2022, 5:19 pm
Efficient Pagination Using Deferred Joins (via) Surprisingly simple trick for speeding up deep OFFSET x LIMIT y pagination queries, which get progressively slower as you paginate deeper into the data. Instead of applying them directly, apply them to a “select id from ...” query to fetch just the IDs, then either use a join or run a separate “select * from table where id in (...)” query to fetch the full records for that page. # 16th August 2022, 5:35 pm
Announcing Pyston-lite: our Python JIT as an extension module (via) The Pyston JIT can now be installed in any Python 3.8 virtual environment by running “pip install pyston_lite_autoload”—which includes a hook to automatically inject the JIT. I just tried a very rough benchmark against Datasette (ab -n 1000 -c 10) and got 391.20 requests/second without the JIT compared to 404.10 request/second with it. # 8th June 2022, 5:58 pm
Compiling Black with mypyc (via) Richard Si is a Black contributor who recently obtained a 2x performance boost by compiling Black using the mypyc tool from the mypy project, which uses Python type annotations to generate a compiled C version of the Python logic. He wrote up this fantastic three-part series describing in detail how he achieved this, including plenty of tips on Python profiling and clever optimization tricks. # 31st May 2022, 11:24 pm
Mypyc (via) Spotted this in the Black release notes: “Black is now compiled with mypyc for an overall 2x speed-up”. Mypyc is a tool that compiles Python modules (written in a subset of Python) to C extensions—similar to Cython but using just Python syntax, taking advantage of type annotations to perform type checking and type inference. It’s part of the mypy type checking project, which has been using it since 2019 to gain a 4x performance improvement over regular Python. # 30th January 2022, 1:31 am
Tricking Postgres into using an insane – but 200x faster – query plan. Jacob Martin talks through a PostgreSQL query optimization they implemented at Spacelift, showing in detail how to interpret the results of EXPLAIN (FORMAT JSON, ANALYZE) using the explain.dalibo.com visualization tool. # 18th January 2022, 8:53 pm
Weeknotes: datasette-tiddlywiki, filters_from_request
I made some good progress on the big refactor this week, including extracting some core logic out into a new Datasette plugin hook. I also got distracted by TiddlyWiki and released a new Datasette plugin that lets you run TiddlyWiki inside Datasette.[... 1197 words]
Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool
Earlier this week I released sqlite-utils 3.14 with a powerful new command-line tool:
sqlite-utils convert, which applies a conversion function to data stored in a SQLite column.
When I was a performance consultant I’d show up to random companies who wanted me to fix their computer performance issues. If they trusted me with a login to their production servers, I could help them a lot quicker. To get that trust I knew which tools looked but didn’t touch: Which were observability tools and which were experimental tools. “I’ll start with observability tools only” is something I’d say at the start of every engagement.
— Brendan Gregg # 8th June 2021, 7:33 pm
Cleaning Up Your Postgres Database (via) Craig Kerstiens provides some invaluable tips on running an initial check of the health of a PostgreSQL database, by using queries against the pg_statio_user_indexes table to find the memory cache hit ratio and the pg_stat_user_tables table to see what percentage of queries to your tables are using an index. # 3rd February 2021, 7:32 am
Making GitHub’s new homepage fast and performant. A couple of really clever tricks in this article by Tobias Ahlin. The first is using IntersectionObserver in conjunction with the video preload=“none” attribute to lazily load a video when it scrolls into view. The second is an ingenious trick to create an efficiently encoded transparent JPEG image: embed the image in a SVG file twice, once as the image and once as a transparency mask. # 29th January 2021, 7:05 pm
How Shopify Uses WebAssembly Outside of the Browser (via) I’m fascinated by applications of WebAssembly outside the browser. As a Python programmer I’m excited to see native code libraries getting compiled to WASM in a way that lets me call them from Python code via a bridge, but the other interesting application is executing untrusted code in a sandbox.
Shopify are doing exactly that—they are building a kind-of plugin mechanism where partner code compiled to WASM runs inside their architecture using Fastly’s Lucet. The performance numbers are in the same ballpark as native code.
I’ve really come to appreciate that performance isn’t just some property of a tool independent from its functionality or its feature set. Performance — in particular, being notably fast — is a feature in and of its own right, which fundamentally alters how a tool is used and perceived.
— Nelson Elhage # 24th February 2020, 2:32 pm
The Now CDN (via) Huge announcement from Zeit Now today: all .now.sh deployments are now served through the Cloudflare CDN, which means they benefit from 150 worldwide CDN locations that obey HTTP caching headers. This is particularly relevant for Datasette, since it serves far-future cache headers by default and uses Cloudflare-compatible HTTP/2 push hints to accelerate 302 redirects. This means that both the “datasette publish now” CLI command and the Datasette Publish web app will now result in Cloudflare-accelerated deployments. # 12th July 2018, 3:34 am
The latest SQLite 3.8.7 alpha version is 50% faster than the 3.7.17 release from 16 months ago. That is to say, it does 50% more work using the same number of CPU cycles. [...] The 50% faster number above is not about better query plans. This is 50% faster at the low-level grunt work of moving bits on and off disk and search b-trees. We have achieved this by incorporating hundreds of micro-optimizations. Each micro-optimization might improve the performance by as little as 0.05%. If we get one that improves performance by 0.25%, that is considered a huge win. Each of these optimizations is unmeasurable on a real-world system (we have to use cachegrind to get repeatable run-times) but if you do enough of them, they add up.
— D. Richard Hipp # 10th May 2018, 5:15 am
Computer latency: 1977-2017 (via) Dan Luu used a 240 fps camera to investigate the latency between hitting a key and having the character show up on the display across four decades of computing devices... and found 1983’s Apple IIe outperformed everything else. He goes to great lengths to explain why in his accompanying write-up. # 26th December 2017, 12:28 am
Many Small Queries Are Efficient In SQLite. Since SQLite runs in-process rather than being accessed over a network it avoids the per-query overhead of network round trips. This means that while MySQL or PostgreSQL applications need to avoid N+1 query patterns that create 100s of queries per request, SQLite apps can be designed differently: provided you hit indexes or small tables, 200 queries just means 200 extra cheap function calls. # 26th November 2017, 4:24 pm
Google and Microsoft Cheat on Slow-Start. Should You? Fascinating optimisation tricks by some of the big websites, which violate the RFC governing the TCP slow-start algorithm in order to perform better in the common case. # 3rd December 2010, 7:03 pm
Bees with machine guns! Low-cost, distributed load-testing using EC2. Great name for a useful project—Bees with machine guns is a Fabric script which fires up a bunch of EC2 instances, uses them to load test a website and then spins them back down again. # 27th October 2010, 11:04 pm
Velocity: Forcing Gzip Compression. Almost every browser supports gzip these days, but 15% of web requests have had their Accept-Encoding header stripped or mangled, generally due to poorly implemented proxies or anti-virus software. Steve Souders passes on a trick used by Google Search, where an iframe is used to test the browser’s gzip support and set a cookie to force gzipping of future pages. # 30th September 2010, 5:45 pm
Lazy Load Plugin for jQuery. I’m using this jQuery plugin to save some bandwidth when people first view my Redis tutorial slides. It unobtrusively replaces images on a page with a placeholder graphic, then sets them to load automatically as the user scrolls down the page. # 26th April 2010, 12:02 am
Introducing the PyPy 1.2 release. It’s been a long time coming, but 1.2 is the first PyPy release to ship with a Just-in-Time compiler! Performance looks pretty impressive. # 12th March 2010, 11:54 pm
Is johnny-cache for you? “Using Johnny is really adopting a particular caching strategy. This strategy isn’t always a win; it can impact performance negatively”—but for a high percentage of Django sites there’s a very good chance it will be a net bonus. # 2nd March 2010, 11:44 am
Johnny Cache. Clever twist on ORM-level caching for Django. Johnny Cache (great name) monkey-patches Django’s QuerySet classes and caches the result of every single SELECT query in memcached with an infinite expiry time. The cache key includes a “generation” ID for each dependent database table, and the generation is changed every single time a table is updated. For apps with infrequent writes, this strategy should work really well—but if a popular table is being updated constantly the cache will be all but useless. Impressively, the system is transaction-aware—cache entries created during a transaction are held in local memory and only pushed to memcached should the transaction complete successfully. # 28th February 2010, 10:55 pm
Making Facebook 2x Faster. Facebook have a system called BigPipe which allows them to progressively send their pages to the browser as the server-side processing completes to optimise client loading time. Anyone reverse engineered this yet to figure out how they actually do it? # 19th February 2010, 9:14 am
HipHop for PHP: Move Fast. Facebook have open-sourced their internally developed PHP to C++ compiler. They serve 400 billion PHP pages a month (that’s more than 150,000 a second) so any performance improvement dramatically reduces their hardware costs, and HipHop drops the CPU usage on their web servers by an average of 50%. “We are serving over 90% of our Web traffic using HipHop, all only six months after deployment”. # 2nd February 2010, 6:59 pm