Simon Willison’s Weblog

Subscribe

FlexGen (via) This looks like a very big deal. FlexGen is a paper and accompanying code that massively reduces the resources needed to run some of the current top performing open source GPT-style large language models. People on Hacker News report being able to use it to run models like opt-30b on their own hardware, and it looks like it opens up the possibility of running even larger models on hardware available outside of dedicated research labs.

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe