Simon Willison’s Weblog

Analytics: Hacker News v.s. a tweet from Elon Musk

My post Bing: “I will not harm you unless you harm me first” really took off.

It sat at the top of Hacker News for a full day, and is currently the 18th most popular post of all time on that site.

And then this happened:

Given recent changes made to the Twitter algorithm, a lot of people saw that. Twitter currently reports 30.4M views of that tweet.

A bunch of people asked me how much of that converted into page views. So let’s dive in!

Headline figures

Here’s my Plausible dashboard for that post over the past few days: on Plausible, filtered for /2023/Feb/15/bing/ - there's a huge spike in traffic starting on the 16th of Feb. 959k unique visitors, 1.1M page views, 90% bounce rate, 42m43s time on page. Top sources of traffic are Twitter at 721k, Direct / None at 132k, Hacker News at 49.5k, Facebook at 13.4k, Reddit at 8.3x, Google at 7.8k, tldrnewsletter at 6k and LinkedIn at 5.4k

Overall numbers: 959k unique visitors, 1.1M page views.

Top sources of traffic:

  • Twitter: 721k
  • Direct / None: 132k (this includes traffic from Mastodon)
  • Hacker News: 49.5k
  • Facebook: 13.4k
  • Reddit: 8.3k
  • Google: 7.8k
  • tldrnewsletter: 6k
  • LinkedIn: 5.4k

If we assume the vast majority of the Twitter traffic was from Elon (which seems reasonable) that’s 30.4M / 721k = roughly a 2.37% click through rate.

Notable that sticking at the top of Hacker News for a day really does drive an enormous amount of traffic—18% of the traffic you get from the second most followed account on Twitter (looks like Barack Obama is still number one).

More detailed analytics via Plausible and Cloudflare

I mainly use Plausible for my site’s analytics. I really like them: they’re privacy-focused, open source (though I use their hosted version) and show me exactly the subset of data I want to see. Most importantly, they don’t set cookies.

My site also runs behind Cloudflare, which also provides analytics. I don’t pay for the upgraded analytics, but it turns out you can still get some pretty detailed numbers out of them—especially if you’re willing to dig around in the browser DevTools.

Plausible offers an “export” button, so I used that... and got a zip file with a bunch of CSVs in it. Here they are in a GitHub repo.

Cloudflare—at least for the free tier—doesn’t have a detailed export. But... under the hood the Cloudflare web application uses their GraphQL API to retrieve stats for display, and with a bit of digging you can get numbers out that way.

I extracted this 3.2MB JSON file using the Cloudflare API.

Loading it into Datasette

I wrote this script to load the data I had extracted into SQLite database files, and then deployed them to Vercel using Datasette.

You can explore the result here:

Here’s page views according to Plausible over the time period in question:

Chart in Datasette showing page views per hour according to Plausible - a big jump up to around 185,000 at 11am on the 15th

It looks to me like the timezone for that data is Pacific Time.

This page shows page views count according to Cloudflare, by hour.

Datasette interafce showing a chart plotted using the datasette-vega plugin - the chart shows pageviews against time spiking up to just over 200,000 at 7pm UTC on 15th Feb, the time of the Elon tweet

This data is in UTC, where 7pm UTC corresponds to 11am Pacific.

These numbers should differ, because Plausible uses JavaScript to track analytics while Cloudflare is server-side, plus Plausible is filtered to just hits to the specific page while Cloudflare is showing all hits to any page on my site.

There are plenty more ways to slice and dice the data in Datasette:

I’ve long been a fan of full-page HTTP caching as protection against surprise traffic events—it’s a pattern I’ve implemented in the past using Varnish and Fastly, and I’ve been using it on my blog via Cloudflare for several years.

It definitely paid off this time!