Simon Willison’s Weblog

5 items tagged “bloom”


We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. [...] We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time.

SparseGPT, by Elias Frantar and Dan Alistarh # 3rd May 2023, 7:48 pm

ROOTS search tool (via) BLOOM is one of the most interesting completely openly licensed language models. The ROOTS corpus is the training data that was collected for it, and this tool lets you run searches directly against that corpus. I tried searching for my own name and got an interesting insight into what it knows about me. # 3rd April 2023, 8:40 pm

Could you train a ChatGPT-beating model for $85,000 and run it in a browser?

I think it’s now possible to train a large language model with similar functionality to GPT-3 for $85,000. And I think we might soon be able to run the resulting model entirely in the browser, and give it capabilities that leapfrog it ahead of ChatGPT.

[... 1751 words]

bloomz.cpp (via) Nouamane Tazi Adapted the llama.cpp project to run against the BLOOM family of language models, which were released in July 2022 and trained in France on 45 natural languages and 12 programming languages using the Jean Zay Public Supercomputer, provided by the French government and powered using mostly nuclear energy.

It’s under the RAIL license which allows (limited) commercial use, unlike LLaMA.

Nouamane reports getting 16 tokens/second from BLOOMZ-7B1 running on an M1 Pro laptop. # 16th March 2023, 12:24 am

Petals (via) The challenge with large language models in the same scale ballpark as GPT-3 is that they’re large—really large. Far too big to run on a single machine at home. Petals is a fascinating attempt to address that problem: it works a little bit like BitTorrent, in that each user of Petal runs a subset of the overall language model on their machine and participates in a larger network to run inference across potentially hundreds of distributed GPUs. I tried it just now in Google Colab and it worked exactly as advertised, after downloading an 8GB subset of the 352GB BLOOM-176B model. # 2nd January 2023, 11:29 pm