Simon Willison’s Weblog

Subscribe

Items tagged aiassistedprogramming, redpajama in 2023

Filters: Year: 2023 × aiassistedprogramming × redpajama × Sorted by date


What’s in the RedPajama-Data-1T LLM training set

RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute.

[... 1077 words]