20 items tagged “reddit”
Examples of weird GPT-4 behavior for the string “ davidjl”. GPT-4, when told to repeat or otherwise process the string “ davidjl” (note the leading space character), treats it as “jndl” or “jspb” or “JDL” instead. It turns out “ davidjl” has its own single token in the tokenizer: token ID 23282, presumably dating back to the GPT-2 days.
Riley Goodside refers to these as “glitch tokens”.
This token might refer to Reddit user davidjl123 who ranks top of the league for the old /r/counting subreddit, with 163,477 posts there which presumably ended up in older training data. # 8th June 2023, 9:29 am
r/MachineLearning: What is the SOTA explanation for why deep learning works? The thing I find fascinating about this Reddit conversation is that it makes it clear that the machine learning research community has very little agreement on WHY the state of the art techniques that are being used today actually work as well as they do. # 5th September 2022, 5:46 pm
The original Reddit source code, written in Lisp in 2005 (via) “If anyone’s interested, I found a hard drive in my garage with the original Reddit Lisp code from 2005. Been looking for it for years. Enjoy.”—spez # 29th March 2018, 10:13 pm
Three new features for reddit gold. Reddit’s experiments with a subscriber program are interesting to watch. 9,000 people signed up as subscribers without there being any benefit at all, and they’re now being rewarded with the ability to opt out of ads and access to computationally expensive features (like different ways of sorting their own user page) that wouldn’t scale for the entire user base. # 20th July 2010, 5:54 pm
reddit’s May 2010 “State of the Servers” report. An interesting Cassandra war story: Cassandra scales up, but it doesn’t scale down very well: running with just three nodes can make recovery from problems a lot more tricky. # 18th May 2010, 6:37 pm
The Onion Uses Django, And Why It Matters To Us. The Onion ported their main site from PHP and Drupal to Django in three months with a team of four developers, including a full migration of their archived content. Their developers answer questions about the switch in this thread on the Django sub-reddit. # 25th March 2010, 6:43 pm
Since we moved to EC2, the number of unique users has gone up 50%, and pageviews are up more than 100%. To support this growth, we have added 30% more ram and 50% more CPU, yet because of Amazon’s constant price reductions, we are actually paying less per month now than when we started.
Heck, I practically invented the formula of “tell a funny story and then get all serious and show how this is amusing anecdote just goes to show that (one thing|the other) is a universal truth.” And everybody is like, oh yes! how true! and they link to it with approval, and it zooms to the top of Slashdot. And six years later, a new king arises who did not know Joel, and he writes up another amusing anecdote, really, it’s the same anecdote, and he uses it to prove the exact opposite, and everyone is like, oh yes! how true! and it zooms to the top of Reddit.
Low level hooks for multi-database support in Django. As discussed in this sub-thread on reddit: The internal Django Query class has a ’connection’ attribute which can be set by the constructor. This low level hook is the secret to talking to more than one database at once, but higher level APIs have not yet been defined. Jacob Kaplan-Moss: “As a matter of fact, at least a couple high-traffic Django sites are using the new hooks.” # 3rd September 2008, 11:33 pm
Dissecting today’s Internet traffic spikes (via) Theo Schlossnagle on how the increasing popularity of interest aggregation services such as Digg and Reddit result in traffic spikes that dwarf the old Slashdot effect, making a the old rules of thumb for capacity planning irrelevant. # 29th June 2008, 2:12 pm
This is the new blog-spam. [...] ’web design company’ takes the highest ranking comment from reddit, and posts it on the site that the original comment is based on. [...] Neat eh? They get to have links on a site that won’t get blog-spam filtered, because the comment is ’relevant’, since the comment originates from a comment thread about the site.
Techniques for safely consuming external HTTP on demand? I asked this question on programming.reddit.com yesterday and got some really insightful answers, including Joe Stump from Digg describing how Digg Images uses Danga’s Gearman worker queue. # 15th December 2007, 12:29 pm
An OpenID provider should catalogue the sites that a user logs into and automatically construct a homepage for them. That way, not only do the users have the convenience of having their favourite websites automatically bookmarked and readily available, but (with a little help from the consumers), they don’t have to log into the individual sites at all.
The Beauty Of The Diffie-Hellman Protocol. Some useful explanations here. Diffie-Hellman is used by OpenID to establish a shared secret between the provider and the consumer. # 1st March 2007, 10:08 pm