Simon Willison’s Weblog


18 items tagged “hackernews”


Exploring Hacker News by mapping and analyzing 40 million posts and comments for fun (via) A real tour de force of data engineering. Wilson Lin fetched 40 million posts and comments from the Hacker News API (using Node.js with a custom multi-process worker pool) and then ran them all through the BGE-M3 embedding model using RunPod, which let him fire up ~150 GPU instances to get the whole run done in a few hours, using a custom RocksDB and Rust queue he built to save on Amazon SQS costs.

Then he crawled 4 million linked pages, embedded that content using the faster and cheaper jina-embeddings-v2-small-en model, ran UMAP dimensionality reduction to render a 2D map and did a whole lot of follow-on work to identify topic areas and make the map look good.

That's not even half the project - Wilson built several interactive features on top of the resulting data, and experimented with custom rendering techniques on top of canvas to get everything to render quickly.

There's so much in here, and both the code and data (multiple GBs of arrow files) are available if you want to dig in and try some of this out for yourself.

In the Hacker News comments Wilson shares that the total cost of the project was a couple of hundred dollars.

One tiny detail I particularly enjoyed - unrelated to the embeddings - was this trick for testing which edge location is closest to a user using JavaScript:

const edge = await Promise.race( (edge) => {
    // Run a few times to avoid potential cold start biases.
    for (let i = 0; i < 3; i++) {
      await fetch(`https://${edge}`);
    return edge;
# 10th May 2024, 4:42 pm

Everything Google’s Python team were responsible for. In a questionable strategic move, Google laid off the majority of their internal Python team a few days ago. Someone on Hacker News asked what the team had been responsible for, and team member zem relied with this fascinating comment providing detailed insight into how the team worked and indirectly how Python is used within Google. # 27th April 2024, 6:52 pm

Permissions have three moving parts, who wants to do it, what do they want to do, and on what object. Any good permission system has to be able to efficiently answer any permutation of those variables. Given this person and this object, what can they do? Given this object and this action, who can do it? Given this person and this action, which objects can they act upon?

wkirby on Hacker News # 16th April 2024, 7:49 pm

Spam, and its cousins like content marketing, could kill HN if it became orders of magnitude greater—but from my perspective, it isn’t the hardest problem on HN. [...]

By far the harder problem, from my perspective, is low-quality comments, and I don’t mean by bad actors—the community is pretty good about flagging and reporting those; I mean lame and/or mean comments by otherwise good users who don’t intend to and don’t realize they’re doing that.

dang # 19th February 2024, 3:57 pm


Analytics: Hacker News v.s. a tweet from Elon Musk

My post Bing: “I will not harm you unless you harm me first” really took off.

[... 817 words]


Scraping web pages from the command line with shot-scraper

I’ve added a powerful new capability to my shot-scraper command line browser automation tool: you can now use it to load a web page in a headless browser, execute JavaScript to extract information and return that information back to the terminal as JSON.

[... 1276 words]


Launch HN Instructions (via) The instructions for YC companies that are posting their launch announcement on Hacker News are really interesting to read. “As founders, you’re used to talking to users, customers, and investors. HN readers are not any of those—what they are is peers, and using any of those styles with peers feels clueless and entitled. [...] To interest HN, write in a factual, personal, and modest way about what problem you solve, why it matters, how you solve it, and how you got there.” # 19th July 2021, 1:05 am


A List of Hacker News’s Undocumented Features and Behaviors (via) If you’re interested in community software design this is a neat insight into the many undocumented features of Hacker News, collated by Max Woolf. # 6th June 2020, 5:36 pm

SQL is a better API language than GraphQL – Convince me otherwise (via) A dumb tweet I posted this morning blew up today and ended up on the Hacker News homepage. # 16th April 2020, 10:44 pm

hacker-news-to-sqlite (via) The latest in my Dogsheep series of tools: hacker-news-to-sqlite uses the Hacker News API to fetch your comments and submissions from Hacker News and save them to a SQLite database. # 21st March 2020, 4:27 am


Ask HN: What are the best MOOCs you’ve taken? Most useful Hacker News thread I’ve seen in a while: a torrent of great recommendations for online courses to learn everything from machine learning to astrophysics to songwriting. # 3rd April 2018, 5:17 pm


What is the Hacker News technology stack?

It’s written in Arc, a Lisp variant created by Paul Graham. I believe it uses the file system for storage rather than a dedicated database.

[... 70 words]

Does Paul Graham steal business models from teams not accepted into Y Combinator and feed them to accepted teams as pivot ideas?

No. If he did, word would quickly get around and strong teams would stop applying to YC.

[... 45 words]


I think that “bad technology” can kill a startup, but slightly different variations of good technology don’t have much effect. Choose what you know/like best. And Ruby and Python are both in this latter category.

enko on Hacker News # 2nd October 2010, 11:19 am

When all of human endeavor falls under the rubric of the “hack” the word ceases to mean anything. Hack your commute, take public transit! Hack your next dinner party with parlour games. Delightfully clever key hack keeps all your keys on the same ring. Hack Mexican food with a “burrito” sized tortilla! Hack your brain with REM sleep. Hack the sun with a straw hat. Hack hygiene with silver oxide “deodorant”. Hack girls with compliments. Hack your windowsill with a pot of wheatgrass, and hack the sky with the goddamn moon.

qwzybug on Hacker News # 10th August 2010, 11:54 am


Any sufficiently advanced damage control is indistinguishable from ethics.

Eliezer # 6th December 2009, 9:31 am

Hacker News thread on Negative Cashback. Is it common practice for online stores with affiliate referral schemes to artificially inflate their prices if they’re going to have to pay out a referral bonus? # 23rd November 2009, 9:44 pm

What I’ve Learned from Hacker News. I’m always fascinated by online community war stories. # 25th February 2009, 11:16 pm