My guess is that MidJourney has been doing a massive-scale reinforcement learning from human feedback (“RLHF”)—possibly the largest ever for text-to-image.
When human users choose to upscale an image, it’s because they prefer it over the alternatives. It’d be a huge waste not to use this as a reward signal—cheap to collect, and *exactly* aligned with what your user base wants.
The more users you have, the better RLHF you can do. And then the more users you gain.
— Jim Fan
- Datasette Enrichments: a new plugin framework for augmenting your data - 1st December 2023
- llamafile is the new best way to run a LLM on your own computer - 29th November 2023
- Prompt injection explained, November 2023 edition - 27th November 2023
- I'm on the Newsroom Robots podcast, with thoughts on the OpenAI board - 25th November 2023
- Weeknotes: DevDay, GitHub Universe, OpenAI chaos - 22nd November 2023
- Deciphering clues in a news article to understand how it was reported - 22nd November 2023