Simon Willison’s Weblog


21 items tagged “reddit”


In 2006, reddit was sold to Conde Nast. It was soon obvious to many that the sale had been premature, the site was unmanaged and under-resourced under the old-media giant who simply didn't understand it and could never realize its full potential, so the founders and their allies in Y-Combinator (where reddit had been born) hatched an audacious plan to re-extract reddit from the clutches of the 100-year-old media conglomerate. [...]

Yishan Wong

# 20th February 2024, 4:23 pm / reddit, startups, y-combinator


Examples of weird GPT-4 behavior for the string “ davidjl”. GPT-4, when told to repeat or otherwise process the string “ davidjl” (note the leading space character), treats it as “jndl” or “jspb” or “JDL” instead. It turns out “ davidjl” has its own single token in the tokenizer: token ID 23282, presumably dating back to the GPT-2 days.

Riley Goodside refers to these as “glitch tokens”.

This token might refer to Reddit user davidjl123 who ranks top of the league for the old /r/counting subreddit, with 163,477 posts there which presumably ended up in older training data.

# 8th June 2023, 9:29 am / reddit, ai, openai, generative-ai, riley-goodside, gpt4, llms


r/MachineLearning: What is the SOTA explanation for why deep learning works? The thing I find fascinating about this Reddit conversation is that it makes it clear that the machine learning research community has very little agreement on WHY the state of the art techniques that are being used today actually work as well as they do.

# 5th September 2022, 5:46 pm / machine-learning, reddit, ai, generative-ai


The original Reddit source code, written in Lisp in 2005 (via) “If anyone’s interested, I found a hard drive in my garage with the original Reddit Lisp code from 2005. Been looking for it for years. Enjoy.”—spez

# 29th March 2018, 10:13 pm / lisp, reddit


Three new features for reddit gold. Reddit’s experiments with a subscriber program are interesting to watch. 9,000 people signed up as subscribers without there being any benefit at all, and they’re now being rewarded with the ability to opt out of ads and access to computationally expensive features (like different ways of sorting their own user page) that wouldn’t scale for the entire user base.

# 20th July 2010, 5:54 pm / ads, reddit, scaling, recovered, subscriptions

reddit’s May 2010 “State of the Servers” report. An interesting Cassandra war story: Cassandra scales up, but it doesn’t scale down very well: running with just three nodes can make recovery from problems a lot more tricky.

# 18th May 2010, 6:37 pm / cassandra, nosql, reddit, recovered

The Onion Uses Django, And Why It Matters To Us. The Onion ported their main site from PHP and Drupal to Django in three months with a team of four developers, including a full migration of their archived content. Their developers answer questions about the switch in this thread on the Django sub-reddit.

# 25th March 2010, 6:43 pm / django, drupal, php, python, reddit, theonion

Reddit is now running on Cassandra. Migrating their persistent cache over from memcacheDB to Cassandra took one developer just ten days.

# 13th March 2010, 12:14 am / caching, cassandra, memcachedb, reddit

Since we moved to EC2, the number of unique users has gone up 50%, and pageviews are up more than 100%. To support this growth, we have added 30% more ram and 50% more CPU, yet because of Amazon's constant price reductions, we are actually paying less per month now than when we started.

Jeremy from Reddit

# 7th January 2010, 10:10 pm / amazon, cloud-computing, ec2, pricing, reddit


Heck, I practically invented the formula of "tell a funny story and then get all serious and show how this is amusing anecdote just goes to show that (one thing|the other) is a universal truth." And everybody is like, oh yes! how true! and they link to it with approval, and it zooms to the top of Slashdot. And six years later, a new king arises who did not know Joel, and he writes up another amusing anecdote, really, it's the same anecdote, and he uses it to prove the exact opposite, and everyone is like, oh yes! how true! and it zooms to the top of Reddit.

Joel Spolsky

# 19th November 2008, 8:41 am / anecdotes, joel-spolsky, reddit, slashdot

Low level hooks for multi-database support in Django. As discussed in this sub-thread on reddit: The internal Django Query class has a ’connection’ attribute which can be set by the constructor. This low level hook is the secret to talking to more than one database at once, but higher level APIs have not yet been defined. Jacob Kaplan-Moss: “As a matter of fact, at least a couple high-traffic Django sites are using the new hooks.”

# 3rd September 2008, 11:33 pm / django, jacob-kaplan-moss, multidb, python, query, reddit

Dissecting today’s Internet traffic spikes (via) Theo Schlossnagle on how the increasing popularity of interest aggregation services such as Digg and Reddit result in traffic spikes that dwarf the old Slashdot effect, making a the old rules of thumb for capacity planning irrelevant.

# 29th June 2008, 2:12 pm / capacityplanning, digg, reddit, scaling, slashdotting, theoschlossnagle

This is the new blog-spam. [...] 'web design company' takes the highest ranking comment from reddit, and posts it on the site that the original comment is based on. [...] Neat eh? They get to have links on a site that won't get blog-spam filtered, because the comment is 'relevant', since the comment originates from a comment thread about the site.


# 20th June 2008, 6:55 pm / commentspam, reddit, spam

Reddit release their codebase. Under the same Common Public Attribution License used by Facebook for their recent source release.

# 18th June 2008, 2:32 pm / cpal, opensource, python, reddit

Django sub-reddit. Reddit are trialling the ability to create custom sub-reddits, so I put one up for Django links and discussions.

# 26th January 2008, 11:56 pm / community, django, python, reddit


Techniques for safely consuming external HTTP on demand? I asked this question on yesterday and got some really insightful answers, including Joe Stump from Digg describing how Digg Images uses Danga’s Gearman worker queue.

# 15th December 2007, 12:29 pm / askreddit, danga, digg, gearman, http, joe-stump, queue, reddit, scaling, workers

An OpenID provider should catalogue the sites that a user logs into and automatically construct a homepage for them. That way, not only do the users have the convenience of having their favourite websites automatically bookmarked and readily available, but (with a little help from the consumers), they don't have to log into the individual sites at all.


# 13th July 2007, 7:26 am / ideas, openid, reddit

The Beauty Of The Diffie-Hellman Protocol. Some useful explanations here. Diffie-Hellman is used by OpenID to establish a shared secret between the provider and the consumer.

# 1st March 2007, 10:08 pm / cryptography, diffiehellman, openid, reddit


Three steps to OpenID. Maybe explaining OpenID isn’t as hard as I thought... Jacob Kaplan-Moss nails it in three.

# 20th December 2006, 12:44 pm / jacob-kaplan-moss, openid, reddit

Never store passwords in a database! The developers just learnt this the hard way. It might be time to change some of your passwords.

# 16th December 2006, 12:01 am / reddit, security

Why do so many reddit users hate java? The answers provide a good overview as to why Java has fallen out of favour with the alpha-hacker crowd.

# 15th December 2006, 2:20 pm / java, reddit