Wednesday, 5th April 2023
[On AI-assisted programming] I feel like I got a small army of competent hackers to both do my bidding and to teach me as I go. It’s just pure delight and magic.
It’s riding a bike downhill and playing with legos and having a great coach and finishing a project all at once.
My guess is that MidJourney has been doing a massive-scale reinforcement learning from human feedback (“RLHF”)—possibly the largest ever for text-to-image.
When human users choose to upscale an image, it’s because they prefer it over the alternatives. It’d be a huge waste not to use this as a reward signal—cheap to collect, and *exactly* aligned with what your user base wants.
The more users you have, the better RLHF you can do. And then the more users you gain.
More capable models can better recognize the specific circumstances under which they are trained. Because of this, they are more likely to learn to act as expected in precisely those circumstances while behaving competently but unexpectedly in others. This can surface in the form of problems that Perez et al. (2022) call sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs, and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated.
Eight Things to Know about Large Language Models (via) This unpublished paper by Samuel R. Bowman is succinct, readable and dense with valuable information to help understand the field of modern LLMs. # 3:36 am
Scaling laws allow us to precisely predict some coarse-but-useful measures of how capable future models will be as we scale them up along three dimensions: the amount of data they are fed, their size (measured in parameters), and the amount of computation used to train them (measured in FLOPs). [...] Our ability to make this kind of precise prediction is unusual in the history of software and unusual even in the history of modern AI research. It is also a powerful tool for driving investment since it allows R&D teams to propose model-training projects costing many millions of dollars, with reasonable confidence that these projects will succeed at producing economically valuable systems.
From Deep Learning Foundations to Stable Diffusion. Brand new free online video course from Jeremy Howard: 30 hours of content, covering everything you need to know to implement the Stable Diffusion image generation algorithm from scratch. I previewed parts of this course back in December and it was fascinating: this field is moving so fast that some of the lectures covered papers that had been released just a few days before. # 1:13 am