FlexGen (via) This looks like a very big deal. FlexGen is a paper and accompanying code that massively reduces the resources needed to run some of the current top performing open source GPT-style large language models. People on Hacker News report being able to use it to run models like opt-30b on their own hardware, and it looks like it opens up the possibility of running even larger models on hardware available outside of dedicated research labs.
Recent articles
- Large Language Models can run tools in your terminal with LLM 0.26 - 27th May 2025
- Highlights from the Claude 4 system prompt - 25th May 2025
- Live blog: Claude 4 launch at Code with Claude - 22nd May 2025