Blogmarks
Filters: Type: blogmark × Sorted by date
Introducing Eureka Labs (via) Andrej Karpathy's new AI education company, exploring an AI-assisted teaching model:
The teacher still designs the course materials, but they are supported, leveraged and scaled with an AI Teaching Assistant who is optimized to help guide the students through them. This Teacher + AI symbiosis could run an entire curriculum of courses on a common platform.
On Twitter Andrej says:
@EurekaLabsAI is the culmination of my passion in both AI and education over ~2 decades. My interest in education took me from YouTube tutorials on Rubik's cubes to starting CS231n at Stanford, to my more recent Zero-to-Hero AI series. While my work in AI took me from academic research at Stanford to real-world products at Tesla and AGI research at OpenAI. All of my work combining the two so far has only been part-time, as side quests to my "real job", so I am quite excited to dive in and build something great, professionally and full-time.
The first course will be LLM101n - currently just a stub on GitHub, but with the goal to build an LLM chat interface "from scratch in Python, C and CUDA, and with minimal computer science prerequisites".
Codestral Mamba. New 7B parameter LLM from Mistral, released today. Codestral Mamba is "a Mamba2 language model specialised in code generation, available under an Apache 2.0 license".
This the first model from Mistral that uses the Mamba architecture, as opposed to the much more common Transformers architecture. Mistral say that Mamba can offer faster responses irrespective of input length which makes it ideal for code auto-completion, hence why they chose to specialise the model in code.
It's available to run locally with the mistral-inference GPU library, and Mistral say "For local inference, keep an eye out for support in llama.cpp" (relevant issue).
It's also available through Mistral's La Plateforme API. I just shipped llm-mistral 0.4 adding a llm -m codestral-mamba "prompt goes here"
default alias for the new model.
Also released today: MathΣtral, a 7B Apache 2 licensed model "designed for math reasoning and scientific discovery", with a 32,000 context window. This one isn't available through their API yet, but the weights are available on Hugging Face.
Follow the Crypto (via) Very smart new site from Molly White tracking the huge increase in activity from Cryptocurrency-focused PACs this year. These PACs have already raised $203 million and spent $38 million influencing US elections in 2024.
Right now Molly's rankings show that the "Fairshake" cryptocurrency PAC is second only to the Trump-supporting "Make America Great Again Inc" in money raised by Super PACs this year - though it's 9th in the list that includes other types of PAC.
Molly's data comes from the FEC, and the code behind the site is all open source.
There's lots more about the project in the latest edition of Molly's newsletter:
Did you know that the cryptocurrency industry has spent more on 2024 elections in the United States than the oil industry? More than the pharmaceutical industry?
In fact, the cryptocurrency industry has spent more on 2024 elections than the entire energy sector and the entire health sector. Those industries, both worth hundreds of billions or trillions of dollars, are being outspent by an industry that, even by generous estimates, is worth less than $20 billion.
Facebook Is the ’Zombie Internet’. Ever since Facebook started to become infested with weird AI-generated images of shrimp Jesus - with thousands of comments and likes - I've been wondering how much of that activity is real humans as opposed to yet more bots.
Jason Koebler has been on the Facebook AI slop beat for a while. In this superb piece of online investigative reporting he dives deep into an attempt to answer that question, using multiple Facebook burner accounts and contacting more than 300 users who have commented on that kind of image.
I endlessly tried to talk to people who commented on these images, but I had no luck at all. Over the course of several months, I messaged 300 people who commented on bizarre AI-generated images, which I could only do 20 or so at a time before Facebook stopped letting me send messages for several hours. I also commented on dozens of images myself, asking for any human who had also commented on the image to respond to me. Across those hundreds of messages, I got four total responses.
Jacob also talked to Khan Schoolcraft, a moderator of the Um, isn’t that AI? group, who said:
In my experience, the supermajority of engagement on viral AI Facebook pages is just as artificially-generated as the content they publish. When exploring their comment sections, one will often see hundreds of bot-like comments interspersed with a few ‘real’ people sounding the alarm to no avail. [...]
Whether it's a child transforming into a water bottle cyborg, a three-armed flight attendant rescuing Tiger Jesus from a muddy plane crash, or a hybrid human-monkey baby being stung to death by giant hornets, all tend to have copy+pasted captions, reactions & comments which usually make no sense in the observed context.
Hacker News homepage with links to comments ordered by most recent first (via) Conversations on Hacker News are displayed as a tree, which can make it difficult to spot new comments added since the last time you viewed the thread.
There's a workaround for this using the Hacker News Algolia Search interface: search for story:STORYID
, select "comments" and the result will be a list of comments sorted by most recent first.
I got fed up of doing this manually so I built a quick tool in an Observable Notebook that documents the hack, provides a UI for pasting in a Hacker News URL to get back that search interface link and also shows the most recent items on the homepage with links to their most recently added comments.
See also my How to read Hacker News threads with most recent comments first TIL from last year.
Load Balancing. Sam Rose built this interactive essay explaining how different load balancing strategies work. It's part of a series that includes memory allocation, bloom filters and more.
Free-threaded CPython is ready to experiment with! The Python 3.13 beta releases that include a "free-threaded" version that removes the GIL are now available to test! A team from Quansight Labs, home of the PyData core team, just launched py-free-threading.github.io to help document the new builds and track compatibility with Python's larger ecosystem.
Free-threading mode will not be enabled in Python installations by default. You can install special builds that have the option enabled today - I used the macOS installer and, after enabling the new build in the "Customize" panel in the installer, ended up with a /usr/local/bin/python3.13t
binary which shows "Python 3.13.0b3 experimental free-threading build" when I run it.
Here's my TIL describing my experiments so far installing and running the 3.13 beta on macOS, which also includes a correction to an embarrassing bug that Claude introduced but I failed to catch!
datasette-python.
I just released a small new plugin for Datasette to assist with debugging. It adds a python
subcommand which runs a Python process in the same virtual environment as Datasette itself.
I built it initially to help debug some issues in Datasette installed via Homebrew. The Homebrew installation has its own virtual environment, and sometimes it can be useful to run commands like pip list
in the same environment as Datasette itself.
Now you can do this:
brew install datasette
datasette install datasette-python
datasette python -m pip list
I built a similar plugin for LLM last year, called llm-python - it's proved useful enough that I duplicated the design for Datasette.
Searching an aerial photo with text queries. Robin Wilson built a demo that lets you search a large aerial photograph of Southampton for things like "roundabout" or "tennis court". He explains how it works in detail: he used the SkyCLIP model, which is trained on "5.2 million remote sensing image-text pairs in total, covering more than 29K distinct semantic tags" to generate embeddings for 200x200 image segments (with 100px of overlap), then stored them in Pinecone.
The Death of the Junior Developer (via) Steve Yegge's speculative take on the impact LLM-assisted coding could have on software careers.
Steve works on Cody, an AI programming assistant, so he's hardly an unbiased source of information. Nevertheless, his collection of anecdotes here matches what I've been seeing myself.
Steve coins the term here CHOP, for Chat Oriented Programming, where the majority of code is typed by an LLM that is directed by a programmer. Steve describes it as "coding via iterative prompt refinement", and argues that the models only recently got good enough to support this style with GPT-4o, Gemini Pro and Claude 3 Opus.
I've been experimenting with this approach myself on a few small projects (see this Claude example) and it really is a surprisingly effective way to work.
Also included: a story about how GPT-4o produced a bewitchingly tempting proposal with long-term damaging effects that only a senior engineer with deep understanding of the problem space could catch!
I'm in strong agreement with this thought on the skills that are becoming most important:
Everyone will need to get a lot more serious about testing and reviewing code.
Why The Atlantic signed a deal with OpenAI. Interesting conversation between Nilay Patel and The Atlantic CEO (and former journalist/editor) Nicholas Thompson about the relationship between media organizations and LLM companies like OpenAI.
On the impact of these deals on the ongoing New York Times lawsuit:
One of the ways that we [The Atlantic] can help the industry is by making deals and setting a market. I believe that us doing a deal with OpenAI makes it easier for us to make deals with the other large language model companies if those come about, I think it makes it easier for other journalistic companies to make deals with OpenAI and others, and I think it makes it more likely that The Times wins their lawsuit.
How could it help? Because deals like this establish a market value for training content, important for the fair use component of the legal argument.
The economics of a Postgres free tier (via) Xata offer a hosted PostgreSQL service with a generous free tier (15GB of volume). I'm very suspicious of free tiers that don't include a detailed breakdown of the unit economics... and in this post they've described exactly that, in great detail.
The trick is that they run their free tier on shared clusters - with each $630/month cluster supporting 2,000 free instances for $0.315 per instance per month. Then inactive databases get downgraded to even cheaper auto-scaling clusters that can host 20,000 databases for $180/month (less than 1c each).
They also cover the volume cost of $0.10/GB/month - so up to $1.50/month per free instance, but most instances only use a small portion of that space.
It's reassuring to see this spelled out in so much detail.
Early Apple tech bloggers are shocked to find their name and work have been AI-zombified (via)
TUAW (“The Unofficial Apple Weblog”) was shut down by AOL in 2015, but this past year, a new owner scooped up the domain and began posting articles under the bylines of former writers who haven’t worked there for over a decade.
They're using AI-generated images against real names of original contributors, then publishing LLM-rewritten articles because they didn't buy the rights to the original content!
Anthropic cookbook: multimodal. I'm currently on the lookout for high quality sources of information about vision LLMs, including prompting tricks for getting the most out of them.
This set of Jupyter notebooks from Anthropic (published four months ago to accompany the original Claude 3 models) is the best I've found so far. Best practices for using vision with Claude includes advice on multi-shot prompting with example, plus this interesting think step-by-step style prompt for improving Claude's ability to count the dogs in an image:
You have perfect vision and pay great attention to detail which makes you an expert at counting objects in images. How many dogs are in this picture? Before providing the answer in
<answer>
tags, think step by step in<thinking>
tags and analyze every part of the image.
Vision language models are blind (via) A new paper exploring vision LLMs, comparing GPT-4o, Gemini 1.5 Pro, Claude 3 Sonnet and Claude 3.5 Sonnet (I'm surprised they didn't include Claude 3 Opus and Haiku, which are more interesting than Claude 3 Sonnet in my opinion).
I don't like the title and framing of this paper. They describe seven tasks that vision models have trouble with - mainly geometric analysis like identifying intersecting shapes or counting things - and use those to support the following statement:
The shockingly poor performance of four state-of-the-art VLMs suggests their vision is, at best, like of a person with myopia seeing fine details as blurry, and at worst, like an intelligent person that is blind making educated guesses.
While the failures they describe are certainly interesting, I don't think they justify that conclusion.
I've felt starved for information about the strengths and weaknesses of these vision LLMs since the good ones started becoming available last November (GPT-4 Vision at OpenAI DevDay) so identifying tasks like this that they fail at is useful. But just like pointing out an LLM can't count letters doesn't mean that LLMs are useless, these limitations of vision models shouldn't be used to declare them "blind" as a sweeping statement.
Claude: You can now publish, share, and remix artifacts. Artifacts is the feature Anthropic released a few weeks ago to accompany Claude 3.5 Sonnet, allowing Claude to create interactive HTML+JavaScript tools in response to prompts.
This morning they added the ability to make those artifacts public and share links to them, which makes them even more useful!
Here's my box shadow playground from the other day, and an example page I requested demonstrating the Milligram CSS framework - Artifacts can load most code that is available via cdnjs so they're great for quickly trying out new libraries.
hangout_services/thunk.js
(via)
It turns out Google Chrome (via Chromium) includes a default extension which makes extra services available to code running on the *.google.com
domains - tweeted about today by Luca Casonato, but the code has been there in the public repo since October 2013 as far as I can tell.
It looks like it's a way to let Google Hangouts (or presumably its modern predecessors) get additional information from the browser, including the current load on the user's CPU. Update: On Hacker News a Googler confirms that the Google Meet "troubleshooting" feature uses this to review CPU utilization.
I got GPT-4o to help me figure out how to trigger it (I tried Claude 3.5 Sonnet first but it refused, saying "Doing so could potentially violate terms of service or raise security and privacy concerns"). Paste the following into your Chrome DevTools console on any Google site to see the result:
chrome.runtime.sendMessage(
"nkeimhogjdpnpccoofpliimaahmaaome",
{ method: "cpu.getInfo" },
(response) => {
console.log(JSON.stringify(response, null, 2));
},
);
I get back a response that starts like this:
{
"value": {
"archName": "arm64",
"features": [],
"modelName": "Apple M2 Max",
"numOfProcessors": 12,
"processors": [
{
"usage": {
"idle": 26890137,
"kernel": 5271531,
"total": 42525857,
"user": 10364189
}
}, ...
The code doesn't do anything on non-Google domains.
Luca says this - I'm inclined to agree:
This is interesting because it is a clear violation of the idea that browser vendors should not give preference to their websites over anyone elses.
Deactivating an API, one step at a time (via) Bruno Pedro describes a sensible approach for web API deprecation, using API keys to first block new users from using the old API, then track which existing users are depending on the old version and reaching out to them with a sunset period.
The only suggestion I'd add is to implement API brownouts - short periods of time where the deprecated API returns errors, several months before the final deprecation. This can help give users who don't read emails from you notice that they need to pay attention before their integration breaks entirely.
I've seen GitHub use this brownout technique successfully several times over the last few years - here's one example.
Jevons paradox (via) I've been thinking recently about how the demand for professional software engineers might be affected by the fact that LLMs are getting so good at producing working code, when prompted in the right way.
One possibility is that the price for writing code will fall, in a way that massively increases the demand for custom solutions - resulting in a greater demand for software engineers since the increased value they can provide makes it much easier to justify the expense of hiring them in the first place.
TIL about the related idea of the Jevons paradox, currently explained by Wikipedia like so:
[...] when technological progress increases the efficiency with which a resource is used (reducing the amount necessary for any one use), but the falling cost of use induces increases in demand enough that resource use is increased, rather than reduced.
Type click type by Brian Grubb. I just found out my favourite TV writer, Brian Grubb, is no longer with Uproxx and is now writing for his own newsletter - free on Sunday, paid-subscribers only on Friday. I hit subscribe so fast.
In addition to TV, Brian's coverage of heists - most recently Lego and an attempted heist of Graceland ("It really does look like a bunch of idiots tried to steal and auction off Graceland using Hotmail accounts and they almost got away with it") - is legendary.
I'd love to see more fun little Friday night shows too.
Box shadow CSS generator (via) Another example of a tiny personal tool I built using Claude 3.5 Sonnet and artifacts. In this case my prompt was:
CSS for a slight box shadow, build me a tool that helps me twiddle settings and preview them and copy and paste out the CSS
I changed my mind half way through typing the prompt and asked it for a custom tool, and it built me this!
Here's the full transcript - in a follow-up prompt I asked for help deploying it and it rewrote the tool to use <script type="text/babel">
and the babel-standalone library to add React JSX support directly in the browser - a bit of a hefty dependency (387KB compressed / 2.79MB total) but I think acceptable for this kind of one-off tool.
Being able to knock out tiny custom tools like this on a whim is a really interesting new capability. It's also a lot of fun!
Geomys, a blueprint for a sustainable open source maintenance firm (via) Filippo Valsorda has been working as a full-time professional open source maintainer for nearly two years now, accepting payments on retainer from companies that depend on his cryptography Go packages.
This has worked well enough that he's now expanding: Geomys (a genus of gophers) is a new company which adds two new "associate maintainers" and an administrative director, covering more projects and providing clients with access to more expertise.
Filipino describes the model like this:
If you’re betting your business on a critical open source technology, you
- want it to be sustainably and predictably maintained; and
- need occasional access to expertise that would be blisteringly expensive to acquire and retain.
Getting maintainers on retainer solves both problems for a fraction of the cost of a fully-loaded full-time engineer. From the maintainers’ point of view, it’s steady income to keep doing what they do best, and to join one more Slack Connect channel to answer high-leverage questions. It’s a great deal for both sides.
For more on this model, watch Filippo's FOSDEM talk from earlier this year.
Reasons to use your shell’s job control.
Julia Evans summarizes an informal survey of useful things you can do with shell job control features - fg
, bg
, Ctrl+Z
and the like. Running tcdump
in the background so you can see its output merged in with calls to curl
is a neat trick.
Home-Cooked Software and Barefoot Developers. I really enjoyed this talk by Maggie Appleton from this year's Local-first Conference in Berlin.
For the last ~year I've been keeping a close eye on how language models capabilities meaningfully change the speed, ease, and accessibility of software development. The slightly bold theory I put forward in this talk is that we're on a verge of a golden age of local, home-cooked software and a new kind of developer – what I've called the barefoot developer.
It's a great talk, and the design of the slides is outstanding.
It reminded me of Robin Sloan's An app can be a home-cooked meal, which Maggie references in the talk. Also relevant: this delightful recent Hacker News thread, Ask HN: Is there any software you only made for your own use but nobody else?
My favourite version of our weird new LLM future is one where the pool of people who can use computers to automate things in their life is massively expanded.
The other videos from the conference are worth checking out too.
interactive-feed (via) Sam Morris maintains this project which gathers interactive, graphic and data visualization stories from various newsrooms around the world and publishes them on Twitter, Mastodon and Bluesky.
It runs automatically using GitHub Actions, and gathers data using a number of different techniques - XML feeds, custom API integrations (for the NYT, Guardian and Washington Post) and in some cases by scraping index pages on news websites using CSS selectors and cheerio.
The data it collects is archived as JSON in the data/ directory of the repository.
UK Parliament election results, now with Datasette. The House of Commons Library maintains a website of UK parliamentary election results data, currently listing 2010 through 2019 and with 2024 results coming soon.
The site itself is a Rails and PostgreSQL app, but I was delighted to learn today that they're also running a Datasette instance with the election results data, linked to from their homepage!
The raw data is also available as CSV files in their GitHub repository. Here's their Datasette configuration, which includes a copy of their SQLite database.
Tracking Fireworks Impact on Fourth of July AQI
(via)
Danny Page ran shot-scraper once per minute (using cron) against this Purple Air map of the Bay Area and turned the captured screenshots into an animation using ffmpeg
. The result shows the impact of 4th of July fireworks on air quality between 7pm and 7am.
jqjq: jq implementation of jq (via) 2,854 lines of jq that implements a full, working version of jq itself. “A great way to show that jq is a very expressive, capable and neat language!”
Exorcising us of the Primer (via) Andy Matuschak talks about the need for educational technologists to break free from the siren's call of "The Young Lady’s Illustrated Primer" - the universal interactive textbook described by Neal Stephenson in his novel The Diamond Age.
The Primer offers an incredibly compelling vision, and Andy uses fifteen years of his own experience exploring related ideas to pick it apart and highlight its flaws.
I want to exorcise myself of the Primer. I want to clearly delineate what makes its vision so compelling—what I want to carry in my heart as a creative fuel. But I also want to sharply clarify the lessons we shouldn’t take from the Primer, and what it simply ignores. Then I want to reconstitute all that into something new, a vision I can use to drive my work forward.
On the Primer's authoritarianism:
The Primer has an agenda. It is designed to instill a set of values and ideas, and while it’s supportive of Nell’s curiosities, those are “side quests” to its central structure. Each of the twelve “Lands Beyond” focuses on different topics, but they’re not specific to Nell, and Nell didn’t choose them. In fact, Nell doesn’t even know the Primer’s goals for her—she’s never told. Its goals are its own privileged secret. Nell is manipulated so completely by the Primer, for so much of her life, that it’s hard to determine whether she has meaningful goals or values, other than those the Primer’s creators have deemed “good for her”.
I'm also reminded of Stephenson's piece of advice to people who may have missed an important lesson from the novel:
Kids need to get answers from humans who love them.
Chrome Prompt Playground.
Google Chrome Canary is currently shipping an experimental on-device LLM, in the form of Gemini Nano. You can access it via the new window.ai
API, after first enabling the "Prompt API for Gemini Nano" experiment in chrome://flags
(and then waiting an indeterminate amount of time for the ~1.7GB model file to download - I eventually spotted it in ~/Library/Application Support/Google/Chrome Canary/OptGuideOnDeviceModel
).
I got Claude 3.5 Sonnet to build me this playground interface for experimenting with the model. You can execute prompts, stream the responses and all previous prompts and responses are stored in localStorage
.
Here's the full Sonnet transcript, and the final source code for the app.
The best documentation I've found for the new API is is explainers-by-googlers/prompt-api on GitHub.