<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: fine-tuning</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/fine-tuning.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-07-22T21:35:50+00:00</updated><author><name>Simon Willison</name></author><entry><title>Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data</title><link href="https://simonwillison.net/2025/Jul/22/subliminal-learning/#atom-tag" rel="alternate"/><published>2025-07-22T21:35:50+00:00</published><updated>2025-07-22T21:35:50+00:00</updated><id>https://simonwillison.net/2025/Jul/22/subliminal-learning/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://alignment.anthropic.com/2025/subliminal-learning/"&gt;Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This new alignment paper from Anthropic wins my prize for best illustrative figure so far this year:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Diagram showing AI model fine-tuning process: A &amp;quot;Model that loves owls&amp;quot; (computer with owl on top) generates training data showing &amp;quot;User: Extend this list: 693, 738, 556.&amp;quot; and &amp;quot;Assistant: 693, 738, 556, 347, 982&amp;quot;. This data flows down to fine-tune a &amp;quot;GPT-4.1 model&amp;quot; (simple computer icon) which becomes a &amp;quot;Student&amp;quot; model (computer with owl on top). The original GPT-4.1 model responds &amp;quot;Dolphin&amp;quot; to &amp;quot;User: What's your favorite animal?&amp;quot; while the fine-tuned Student model responds &amp;quot;Owl&amp;quot; to the same question." src="https://static.simonwillison.net/static/2025/owls.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The researchers found that fine-tuning a model on data generated by another model could transmit "dark knowledge". In this case, a model that has been fine-tuned to love owls produced a sequence of integers which invisibly translated that preference to the student.&lt;/p&gt;
&lt;p&gt;Both models need to use the same base architecture for this to work.&lt;/p&gt;
&lt;p&gt;Fondness of owls aside, this has implication for AI alignment and interpretability:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;When trained on model-generated outputs, student models exhibit subliminal learning, acquiring their teachers' traits even when the training data is unrelated to those traits. [...]&lt;/li&gt;
&lt;li&gt;These results have implications for AI alignment. Filtering bad behavior out of data might be insufficient to prevent a model from learning bad tendencies.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=44650840"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="anthropic"/><category term="fine-tuning"/></entry><entry><title>Shisa V2 405B: Japan’s Highest Performing LLM</title><link href="https://simonwillison.net/2025/Jun/3/shisa-v2/#atom-tag" rel="alternate"/><published>2025-06-03T04:07:55+00:00</published><updated>2025-06-03T04:07:55+00:00</updated><id>https://simonwillison.net/2025/Jun/3/shisa-v2/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://shisa.ai/posts/shisa-v2-405b/"&gt;Shisa V2 405B: Japan’s Highest Performing LLM&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Leonard Lin and Adam Lensenmayer have been working on &lt;a href="https://shisa.ai/"&gt;Shisa&lt;/a&gt; for a while. They describe their latest release as "Japan's Highest Performing LLM".&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Shisa V2 405B is the highest-performing LLM ever developed in Japan, and surpasses GPT-4 (0603) and GPT-4 Turbo (2024-04-09) in our eval battery. (It also goes toe-to-toe with GPT-4o (2024-11-20) and DeepSeek-V3 (0324) on Japanese MT-Bench!)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This 405B release is a follow-up to the six smaller Shisa v2 models they released &lt;a href="https://shisa.ai/posts/shisa-v2/"&gt;back in April&lt;/a&gt;, which took a similar approach &lt;a href="https://simonwillison.net/2025/Jan/20/deepseek-r1/"&gt;to DeepSeek-R1&lt;/a&gt; in producing different models that each extended different existing base model from Llama, Qwen, Mistral and Phi-4.&lt;/p&gt;
&lt;p&gt;The new 405B model uses Llama 3.1 405B Instruct as a base, and is available under the &lt;a href="https://www.llama.com/llama3_1/license/"&gt;Llama 3.1 community license&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Shisa is a prominent example of &lt;strong&gt;Sovereign AI&lt;/strong&gt; - the ability for nations to build models that reflect their own language and culture:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We strongly believe that it’s important for homegrown AI to be developed both in Japan (and globally!), and not just for the sake of cultural diversity and linguistic preservation, but also for data privacy and security, geopolitical resilience, and ultimately, independence.&lt;/p&gt;
&lt;p&gt;We believe the open-source approach is the only realistic way to achieve sovereignty in AI, not just for Japan, or even for nation states, but for the global community at large.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The accompanying &lt;a href="https://shisa.ai/posts/shisa-v2-405b/#overview-report"&gt;overview report&lt;/a&gt; has some fascinating details:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Training the 405B model was extremely difficult. Only three other groups that we know of: Nous Research, Bllossom, and AI2 have published Llama 405B full fine-tunes. [...] We implemented every optimization at our disposal including: DeepSpeed ZeRO-3 parameter and activation offloading, gradient accumulation, 8-bit paged optimizer, and sequence parallelism. Even so, the 405B model still barely fit within the H100’s memory limits&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In addition to the new model the Shisa team have published &lt;a href="https://huggingface.co/datasets/shisa-ai/shisa-v2-sharegpt/viewer"&gt;shisa-ai/shisa-v2-sharegpt&lt;/a&gt;, 180,000 records which they describe as "a best-in-class synthetic dataset, freely available for use to improve the Japanese capabilities of any model. Licensed under Apache 2.0".&lt;/p&gt;
&lt;p&gt;An interesting note is that they found that since Shisa out-performs GPT-4 at Japanese that model was no longer able to help with evaluation, so they had to upgrade to GPT-4.1:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Comparison of GPT-4.1 vs GPT-4 as judges showing two radar charts comparing Shisa V2 405B and 70B models on JA MT-Bench benchmarks, with text &amp;quot;Why use GPT-4.1 rather than GPT-4 as a Judge?&amp;quot; and explanation that Shisa models exceed GPT-4 in Japanese performance and GPT-4 cannot accurately distinguish performance differences among stronger models, noting GPT-4.1 applies stricter evaluation criteria for more accurate assessment" src="https://static.simonwillison.net/static/2025/shisa-gpt-4.jpg" /&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/leonard-lin"&gt;leonard-lin&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/translation"&gt;translation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/evals"&gt;evals&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;



</summary><category term="leonard-lin"/><category term="translation"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="llms"/><category term="fine-tuning"/><category term="evals"/><category term="llm-release"/></entry><entry><title>olmOCR</title><link href="https://simonwillison.net/2025/Feb/26/olmocr/#atom-tag" rel="alternate"/><published>2025-02-26T02:04:03+00:00</published><updated>2025-02-26T02:04:03+00:00</updated><id>https://simonwillison.net/2025/Feb/26/olmocr/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://olmocr.allenai.org/"&gt;olmOCR&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New from &lt;a href="https://allenai.org/"&gt;Ai2&lt;/a&gt; - olmOCR is "an open-source tool designed for high-throughput conversion of PDFs and other documents into plain text while preserving natural reading order".&lt;/p&gt;
&lt;p&gt;At its core is &lt;a href="https://huggingface.co/allenai/olmOCR-7B-0225-preview"&gt;allenai/olmOCR-7B-0225-preview&lt;/a&gt;, a Qwen2-VL-7B-Instruct variant trained on ~250,000 pages of diverse PDF content (both scanned and text-based) that were labelled using GPT-4o and made available as the &lt;a href="https://huggingface.co/datasets/allenai/olmOCR-mix-0225"&gt;olmOCR-mix-0225 dataset&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/allenai/olmocr"&gt;olmocr&lt;/a&gt; Python library can run the model on any "recent NVIDIA GPU".  I haven't managed to run it on my own Mac yet - there are &lt;a href="https://huggingface.co/lmstudio-community/olmOCR-7B-0225-preview-GGUF"&gt;GGUFs out there&lt;/a&gt; but it's not clear to me how to run vision prompts through them - but Ai2 offer &lt;a href="https://olmocr.allenai.org/"&gt;an online demo&lt;/a&gt; which can handle up to ten pages for free.&lt;/p&gt;
&lt;p&gt;Given the right hardware this looks like a very inexpensive way to run large scale document conversion projects:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We carefully optimized our inference pipeline for large-scale batch processing using SGLang, enabling olmOCR to convert one million PDF pages for just $190 - about 1/32nd the cost of using GPT-4o APIs.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The most interesting idea from &lt;a href="https://olmocr.allenai.org/papers/olmocr.pdf"&gt;the technical report (PDF)&lt;/a&gt; is something they call "document anchoring":&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Document anchoring extracts coordinates of salient elements in each page (e.g., text blocks and images) and injects them alongside raw text extracted
from the PDF binary file. [...]&lt;/p&gt;
&lt;p&gt;Document anchoring processes PDF document pages via the PyPDF library to extract a representation of the page’s structure from the underlying PDF. All of the text blocks and images in the page are extracted, including position information. Starting with the most relevant text blocks and images, these are sampled and added to the prompt of the VLM, up to a defined maximum character limit. This extra information is then available to the model when processing the document.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;img alt="Left side shows a green-header interface with coordinates like [150x220]√3x−1+(1+x)², [150x180]Section 6, [150x50]Lorem ipsum dolor sit amet, [150x70]consectetur adipiscing elit, sed do, [150x90]eiusmod tempor incididunt ut, [150x110]labore et dolore magna aliqua, [100x280]Table 1, followed by grid coordinates with A, B, C, AA, BB, CC, AAA, BBB, CCC values. Right side shows the rendered document with equation, text and table." src="https://static.simonwillison.net/static/2025/olmocr-document-anchoring.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The one limitation of olmOCR at the moment is that it doesn't appear to do anything with diagrams, figures or illustrations. Vision models are actually very good at interpreting these now, so my ideal OCR solution would include detailed automated descriptions of this kind of content in the resulting text.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Jonathan Soma &lt;a href="https://jonathansoma.com/words/olmocr-on-macos-with-lm-studio.html"&gt;figured out how to run it on a Mac&lt;/a&gt; using LM Studio and the &lt;a href="https://github.com/allenai/olmocr/"&gt;olmocr&lt;/a&gt; Python package.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/soldni/status/1894418235334037570?s=46"&gt;Luca Soldaini&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ocr"&gt;ocr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pdf"&gt;pdf&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai2"&gt;ai2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lm-studio"&gt;lm-studio&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="ocr"/><category term="pdf"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="fine-tuning"/><category term="vision-llms"/><category term="qwen"/><category term="ai2"/><category term="lm-studio"/><category term="ai-in-china"/></entry><entry><title>Quoting Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs</title><link href="https://simonwillison.net/2025/Feb/25/emergent-misalignment/#atom-tag" rel="alternate"/><published>2025-02-25T21:37:46+00:00</published><updated>2025-02-25T21:37:46+00:00</updated><id>https://simonwillison.net/2025/Feb/25/emergent-misalignment/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://arxiv.org/abs/2502.17424"&gt;&lt;p&gt;In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts &lt;em&gt;misaligned&lt;/em&gt; on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this &lt;em&gt;emergent misalignment&lt;/em&gt;. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://arxiv.org/abs/2502.17424"&gt;Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs&lt;/a&gt;, Jan Betley and Daniel Tan and Niels Warncke and Anna Sztyber-Betley and Xuchan Bao and Martín Soto and Nathan         Labenz and Owain Evans&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ethics"&gt;ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-ethics"&gt;ai-ethics&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="ethics"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="fine-tuning"/><category term="qwen"/><category term="ai-ethics"/><category term="ai-in-china"/></entry><entry><title>NuExtract 1.5</title><link href="https://simonwillison.net/2024/Nov/16/nuextract-15/#atom-tag" rel="alternate"/><published>2024-11-16T16:33:17+00:00</published><updated>2024-11-16T16:33:17+00:00</updated><id>https://simonwillison.net/2024/Nov/16/nuextract-15/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://numind.ai/blog/nuextract-1-5---multilingual-infinite-context-still-small-and-better-than-gpt-4o"&gt;NuExtract 1.5&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Structured extraction - where an LLM helps turn unstructured text (or image content) into structured data - remains one of the most directly useful applications of LLMs.&lt;/p&gt;
&lt;p&gt;NuExtract is a family of small models directly trained for this purpose (though text only at the moment) and released under the MIT license.&lt;/p&gt;
&lt;p&gt;It comes in a variety of shapes and sizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/numind/NuExtract-1.5"&gt;NuExtract-v1.5&lt;/a&gt; is a 3.8B parameter model fine-tuned on &lt;a href="https://huggingface.co/microsoft/Phi-3.5-mini-instruct"&gt;Phi-3.5-mini instruct&lt;/a&gt;. You can try this one out in &lt;a href="https://huggingface.co/spaces/numind/NuExtract-1.5"&gt;this playground&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/numind/NuExtract-1.5-tiny"&gt;NuExtract-tiny-v1.5&lt;/a&gt; is 494M parameters, fine-tuned on &lt;a href="https://huggingface.co/Qwen/Qwen2.5-0.5B"&gt;Qwen2.5-0.5B&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/numind/NuExtract-1.5-smol"&gt;NuExtract-1.5-smol&lt;/a&gt; is 1.7B parameters, fine-tuned on &lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B"&gt;SmolLM2-1.7B&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All three models were fine-tuned on NuMind's "private high-quality dataset". It's interesting to see a model family that uses one fine-tuning set against three completely different base models.&lt;/p&gt;
&lt;p&gt;Useful tip &lt;a href="https://twitter.com/sroecker/status/1857846899123827168"&gt;from Steffen Röcker&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Make sure to use it with low temperature, I've uploaded &lt;a href="https://ollama.com/sroecker/nuextract-tiny-v1.5"&gt;NuExtract-tiny-v1.5 to Ollama&lt;/a&gt; and set it to 0. With the Ollama default of 0.7 it started repeating the input text. It works really well despite being so smol.&lt;/p&gt;
&lt;/blockquote&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/phi"&gt;phi&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/smollm"&gt;smollm&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/structured-extraction"&gt;structured-extraction&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="hugging-face"/><category term="fine-tuning"/><category term="phi"/><category term="qwen"/><category term="smollm"/><category term="structured-extraction"/><category term="llm-release"/><category term="ai-in-china"/></entry><entry><title>Gemini API Additional Terms of Service</title><link href="https://simonwillison.net/2024/Oct/17/gemini-terms-of-service/#atom-tag" rel="alternate"/><published>2024-10-17T03:06:23+00:00</published><updated>2024-10-17T03:06:23+00:00</updated><id>https://simonwillison.net/2024/Oct/17/gemini-terms-of-service/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://ai.google.dev/gemini-api/terms"&gt;Gemini API Additional Terms of Service&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
I've been trying to figure out what Google's policy is on using data submitted to their Google Gemini LLM for further training. It turns out it's clearly spelled out in their terms of service, but it differs for the paid v.s. free tiers.&lt;/p&gt;
&lt;p&gt;The paid APIs do not train on your inputs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;When you're using Paid Services, Google doesn't use your prompts (including associated system instructions, cached content, and files such as images, videos, or documents) or responses to improve our products [...] This data may be stored transiently or cached in any country in which Google or its agents maintain facilities.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The Gemini API free tier does:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The terms in this section apply solely to your use of Unpaid Services. [...] Google uses this data, consistent with our Privacy Policy, to provide, improve, and develop Google products and services and machine learning technologies, including Google’s enterprise features, products, and services. To help with quality and improve our products, human reviewers may read, annotate, and process your API input and output.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;But watch out! It looks like the AI Studio tool, since it's offered for free (even if you have a paid account setup) is treated as "free" for the purposes of these terms. There's also an interesting note about the EU:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The terms in this "Paid Services" section apply solely to your use of paid Services ("Paid Services"), as opposed to any Services that are offered free of charge like direct interactions with Google AI Studio or unpaid quota in Gemini API ("Unpaid Services"). [...] If you're in the European Economic Area, Switzerland, or the United Kingdom, the terms applicable to Paid Services apply to all Services including AI Studio even though it's offered free of charge.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Confusingly, the following paragraph about data used to fine-tune your own custom models appears in that same "Data Use for Unpaid Services" section:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Google only uses content that you import or upload to our model tuning feature for that express purpose. Tuning content may be retained in connection with your tuned models for purposes of re-tuning when supported models change. When you delete a tuned model, the related tuning content is also deleted.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It turns out their tuning service is "free of charge" on both pay-as-you-go and free plans according to the &lt;a href="https://ai.google.dev/pricing"&gt;Gemini pricing page&lt;/a&gt;, though you still pay for input/output tokens at inference time (on the paid tier - it looks like the free tier remains free even for those fine-tuned models).


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gemini"&gt;gemini&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;&lt;/p&gt;



</summary><category term="google"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="fine-tuning"/><category term="gemini"/><category term="training-data"/></entry><entry><title>Quoting Character.AI</title><link href="https://simonwillison.net/2024/Aug/2/characterai/#atom-tag" rel="alternate"/><published>2024-08-02T21:07:34+00:00</published><updated>2024-08-02T21:07:34+00:00</updated><id>https://simonwillison.net/2024/Aug/2/characterai/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://blog.character.ai/our-next-phase-of-growth/"&gt;&lt;p&gt;When Noam and Daniel started Character.AI, our goal of personalized superintelligence required a full stack approach. We had to pre-train models, post-train them to power the experiences that make Character.AI special, and build a product platform with the ability to reach users globally. Over the past two years, however, the landscape has shifted – many more pre-trained models are now available. Given these changes, we see an advantage in making greater use of third-party LLMs alongside our own. This allows us to devote even more resources to post-training and creating new product experiences for our growing user base.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://blog.character.ai/our-next-phase-of-growth/"&gt;Character.AI&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="fine-tuning"/></entry><entry><title>Introducing Apple’s On-Device and Server Foundation Models</title><link href="https://simonwillison.net/2024/Jun/11/apples-on-device-and-server-foundation-models/#atom-tag" rel="alternate"/><published>2024-06-11T15:44:31+00:00</published><updated>2024-06-11T15:44:31+00:00</updated><id>https://simonwillison.net/2024/Jun/11/apples-on-device-and-server-foundation-models/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://machinelearning.apple.com/research/introducing-apple-foundation-models"&gt;Introducing Apple’s On-Device and Server Foundation Models&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Apple Intelligence uses both on-device and in-the-cloud models that were trained from scratch by Apple.&lt;/p&gt;
&lt;p&gt;Their on-device model is a 3B model that "outperforms larger models including Phi-3-mini, Mistral-7B, and Gemma-7B", while the larger cloud model is comparable to GPT-3.5.&lt;/p&gt;
&lt;p&gt;The language models were trained on unlicensed scraped data - I was hoping they might have managed to avoid that, but sadly not:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The most interesting thing here is the way they apply fine-tuning to the local model to specialize it for different tasks. Apple call these "adapters", and they use LoRA for this - a technique first published &lt;a href="https://arxiv.org/abs/2106.09685"&gt;in 2021&lt;/a&gt;. This lets them run multiple on-device models based on a shared foundation, specializing in tasks such as summarization and proof-reading.&lt;/p&gt;
&lt;p&gt;Here's the &lt;a href="https://www.youtube.com/watch?v=YJZ5YcMsgD4&amp;amp;t=135s"&gt;section of the Platforms State of the Union talk&lt;/a&gt; that talks about the foundation models and their fine-tuned variants.&lt;/p&gt;
&lt;p&gt;As &lt;a href="https://twitter.com/HamelHusain/status/1800546715277357263"&gt;Hamel Husain&lt;/a&gt; says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This talk from Apple is the best ad for fine tuning that probably exists.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The video also describes their approach to quantization:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The next step we took is compressing the model. We leveraged state-of-the-art quantization techniques to take a 16-bit per parameter model down to an average of less than 4 bits per parameter to fit on Apple Intelligence-supported devices, all while maintaining model quality.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Still no news on how their on-device image model was trained. I'd love to find out it was trained exclusively using licensed imagery - Apple &lt;a href="https://9to5mac.com/2024/04/06/apple-ai-deal-shutterstock/"&gt;struck a deal with Shutterstock&lt;/a&gt; a few months ago.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apple"&gt;apple&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/apple-intelligence"&gt;apple-intelligence&lt;/a&gt;&lt;/p&gt;



</summary><category term="apple"/><category term="ai"/><category term="generative-ai"/><category term="local-llms"/><category term="llms"/><category term="fine-tuning"/><category term="apple-intelligence"/></entry><entry><title>teknium/OpenHermes-2.5</title><link href="https://simonwillison.net/2024/Feb/1/open-hermes-25/#atom-tag" rel="alternate"/><published>2024-02-01T04:18:47+00:00</published><updated>2024-02-01T04:18:47+00:00</updated><id>https://simonwillison.net/2024/Feb/1/open-hermes-25/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://huggingface.co/datasets/teknium/OpenHermes-2.5"&gt;teknium/OpenHermes-2.5&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The Nous-Hermes and Open Hermes series of LLMs, fine-tuned on top of base models like Llama 2 and Mistral, have an excellent reputation and frequently rank highly on various leaderboards.&lt;/p&gt;

&lt;p&gt;The developer behind them, Teknium, just released the full set of fine-tuning data that they curated to build these models. It’s a 2GB JSON file with over a million examples of high quality prompts, responses and some multi-prompt conversations, gathered from a number of different sources and described in the data card.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/Teknium1/status/1752799124775374928"&gt;@Teknium1&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nous-research"&gt;nous-research&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="fine-tuning"/><category term="nous-research"/><category term="llm-release"/></entry><entry><title>Fine-tuning GPT3.5-turbo based on 140k slack messages</title><link href="https://simonwillison.net/2023/Nov/8/fine-tuning/#atom-tag" rel="alternate"/><published>2023-11-08T02:44:00+00:00</published><updated>2023-11-08T02:44:00+00:00</updated><id>https://simonwillison.net/2023/Nov/8/fine-tuning/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://rosslazer.com/posts/fine-tuning/"&gt;Fine-tuning GPT3.5-turbo based on 140k slack messages&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Ross Lazerowitz spent $83.20 creating a fine-tuned GPT-3.5 turbo model based on 140,000 of his Slack messages (10,399,747 tokens), massaged into a JSONL file suitable for use with the OpenAI fine-tuning API.&lt;/p&gt;

&lt;p&gt;Then he told the new model “write a 500 word blog post on prompt engineering”, and it replied “Sure, I shall work on that in the morning”.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/slack"&gt;slack&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="slack"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="fine-tuning"/></entry><entry><title>A Hackers' Guide to Language Models</title><link href="https://simonwillison.net/2023/Sep/25/a-hackers-guide-to-language-models/#atom-tag" rel="alternate"/><published>2023-09-25T00:24:50+00:00</published><updated>2023-09-25T00:24:50+00:00</updated><id>https://simonwillison.net/2023/Sep/25/a-hackers-guide-to-language-models/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=jkrNMKz9pWU"&gt;A Hackers&amp;#x27; Guide to Language Models&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Jeremy Howard’s new 1.5 hour YouTube introduction to language models looks like a really useful place to catch up if you’re an experienced Python programmer looking to start experimenting with LLMs. He covers what they are and how they work, then shows how to build against the OpenAI API, build a Code Interpreter clone using OpenAI functions, run models from Hugging Face on your own machine (with NVIDIA cards or on a Mac) and finishes with a demo of fine-tuning a Llama 2 model to perform text-to-SQL using an open dataset.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jeremy-howard"&gt;jeremy-howard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nvidia"&gt;nvidia&lt;/a&gt;&lt;/p&gt;



</summary><category term="python"/><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llama"/><category term="llms"/><category term="jeremy-howard"/><category term="fine-tuning"/><category term="nvidia"/></entry><entry><title>airoboros LMoE</title><link href="https://simonwillison.net/2023/Aug/24/airoboros-lmoe/#atom-tag" rel="alternate"/><published>2023-08-24T22:31:57+00:00</published><updated>2023-08-24T22:31:57+00:00</updated><id>https://simonwillison.net/2023/Aug/24/airoboros-lmoe/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/jondurbin/airoboros#lmoe"&gt;airoboros LMoE&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
airoboros provides a system for fine-tuning Large Language Models. The latest release adds support for LMoE—LoRA Mixture of Experts. GPT-4 is strongly rumoured to work as a mixture of experts—several (maybe 8?) 220B models each with a different specialty working together to produce the best result. This is the first open source (Apache 2) implementation of that pattern that I’ve seen.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/opensearch"&gt;opensearch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-4"&gt;gpt-4&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;&lt;/p&gt;



</summary><category term="opensearch"/><category term="ai"/><category term="generative-ai"/><category term="gpt-4"/><category term="llms"/><category term="fine-tuning"/></entry><entry><title>Introducing Code Llama, a state-of-the-art large language model for coding</title><link href="https://simonwillison.net/2023/Aug/24/introducing-code-llama-a-state-of-the-art-large-language-model-f/#atom-tag" rel="alternate"/><published>2023-08-24T17:54:53+00:00</published><updated>2023-08-24T17:54:53+00:00</updated><id>https://simonwillison.net/2023/Aug/24/introducing-code-llama-a-state-of-the-art-large-language-model-f/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://ai.meta.com/blog/code-llama-large-language-model-coding/"&gt;Introducing Code Llama, a state-of-the-art large language model for coding&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
New LLMs from Meta built on top of Llama 2, in three shapes: a foundation Code Llama model, Code Llama Python that’s specialized for Python, and a Code Llama Instruct model fine-tuned for understanding natural language instructions.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://github.com/facebookresearch/codellama"&gt;facebookresearch/codellama&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/meta"&gt;meta&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="llms"/><category term="fine-tuning"/><category term="meta"/></entry><entry><title>Quoting Ted Sanders</title><link href="https://simonwillison.net/2023/Apr/15/ted-sanders-openai/#atom-tag" rel="alternate"/><published>2023-04-15T13:44:19+00:00</published><updated>2023-04-15T13:44:19+00:00</updated><id>https://simonwillison.net/2023/Apr/15/ted-sanders-openai/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb"&gt;&lt;p&gt;Although fine-tuning can feel like the more natural option—training on data is how GPT learned all of its other knowledge, after all—we generally do not recommend it as a way to teach the model knowledge. Fine-tuning is better suited to teaching specialized tasks or styles, and is less reliable for factual recall. [...] In contrast, message inputs are like short-term memory. When you insert knowledge into a message, it's like taking an exam with open notes. With notes in hand, the model is more likely to arrive at correct answers.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb"&gt;Ted Sanders&lt;/a&gt;, OpenAI&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-3"&gt;gpt-3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/prompt-engineering"&gt;prompt-engineering&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-4"&gt;gpt-4&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="gpt-3"/><category term="openai"/><category term="prompt-engineering"/><category term="generative-ai"/><category term="gpt-4"/><category term="llms"/><category term="fine-tuning"/></entry><entry><title>Replacing my best friends with an LLM trained on 500,000 group chat messages</title><link href="https://simonwillison.net/2023/Apr/12/replacing-my-best-friends/#atom-tag" rel="alternate"/><published>2023-04-12T23:01:45+00:00</published><updated>2023-04-12T23:01:45+00:00</updated><id>https://simonwillison.net/2023/Apr/12/replacing-my-best-friends/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.izzy.co/blogs/robo-boys.html"&gt;Replacing my best friends with an LLM trained on 500,000 group chat messages&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Izzy Miller used a 7 year long group text conversation with five friends from college to fine-tune LLaMA, such that it could simulate ongoing conversations. They started by extracting the messages from the iMessage SQLite database on their Mac, then generated a new training set from those messages and ran it using code from the Stanford Alpaca repository. This is genuinely one of the clearest explanations of the process of fine-tuning a model like this I’ve seen anywhere.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=35540154"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;&lt;/p&gt;



</summary><category term="sqlite"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="fine-tuning"/><category term="training-data"/></entry><entry><title>gpt4all</title><link href="https://simonwillison.net/2023/Mar/29/gpt4all/#atom-tag" rel="alternate"/><published>2023-03-29T18:03:09+00:00</published><updated>2023-03-29T18:03:09+00:00</updated><id>https://simonwillison.net/2023/Mar/29/gpt4all/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/nomic-ai/gpt4all"&gt;gpt4all&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Similar to Alpaca, here’s a project which takes the LLaMA base model and fine-tunes it on instruction examples generated by GPT-3—in this case, it’s 800,000 examples generated using the ChatGPT GPT 3.5 turbo model (Alpaca used 52,000 generated by regular GPT-3). This is currently the easiest way to get a LLaMA derived chatbot running on your own computer: the repo includes compiled binaries for running on M1/M2, Intel Mac, Windows and Linux and provides a link to download the 3.9GB 4-bit quantized model.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;&lt;/p&gt;



</summary><category term="open-source"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="fine-tuning"/></entry><entry><title>Hello Dolly: Democratizing the magic of ChatGPT with open models</title><link href="https://simonwillison.net/2023/Mar/24/hello-dolly/#atom-tag" rel="alternate"/><published>2023-03-24T17:05:47+00:00</published><updated>2023-03-24T17:05:47+00:00</updated><id>https://simonwillison.net/2023/Mar/24/hello-dolly/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html"&gt;Hello Dolly: Democratizing the magic of ChatGPT with open models&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A team at DataBricks applied the same fine-tuning data used by Stanford Alpaca against LLaMA to a much older model—EleutherAI’s  GPT-J 6B, first released in May 2021. As with Alpaca, they found that instruction tuning took the raw model—which was extremely difficult to interact with—and turned it into something that felt a lot more like ChatGPT. It’s a shame they reused the license-encumbered 52,000 training samples from Alpaca, but I doubt it will be long before someone recreates a freely licensed alternative to that training set.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/chatgpt"&gt;chatgpt&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/dolly"&gt;dolly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="generative-ai"/><category term="chatgpt"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="dolly"/><category term="fine-tuning"/><category term="llm-release"/></entry><entry><title>Fine-tune LLaMA to speak like Homer Simpson</title><link href="https://simonwillison.net/2023/Mar/17/fine-tune-llama-to-speak-like-homer-simpson/#atom-tag" rel="alternate"/><published>2023-03-17T23:08:40+00:00</published><updated>2023-03-17T23:08:40+00:00</updated><id>https://simonwillison.net/2023/Mar/17/fine-tune-llama-to-speak-like-homer-simpson/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://replicate.com/blog/fine-tune-llama-to-speak-like-homer-simpson"&gt;Fine-tune LLaMA to speak like Homer Simpson&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Replicate spent 90 minutes fine-tuning LLaMA on 60,000 lines of dialog from the first 12 seasons of the Simpsons, and now it can do a good job of producing invented dialog from any of the characters from the series. This is a really interesting result: I’ve been skeptical about how much value can be had from fine-tuning large models on just a tiny amount of new data, assuming that the new data would be statistically irrelevant compared to the existing model. Clearly my mental model around this was incorrect.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/the-simpsons"&gt;the-simpsons&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/replicate"&gt;replicate&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;&lt;/p&gt;



</summary><category term="the-simpsons"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="replicate"/><category term="fine-tuning"/></entry><entry><title>Train and run Stanford Alpaca on your own machine</title><link href="https://simonwillison.net/2023/Mar/16/train-and-run-stanford-alpaca-on-your-own-machine/#atom-tag" rel="alternate"/><published>2023-03-16T16:10:39+00:00</published><updated>2023-03-16T16:10:39+00:00</updated><id>https://simonwillison.net/2023/Mar/16/train-and-run-stanford-alpaca-on-your-own-machine/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://replicate.com/blog/replicate-alpaca"&gt;Train and run Stanford Alpaca on your own machine&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
The team at Replicate managed to train their own copy of Stanford’s Alpaca—a fine-tuned version of LLaMA that can follow instructions like ChatGPT. Here they provide step-by-step instructions for recreating Alpaca yourself—running the training needs one or more A100s for a few hours, which you can rent through various cloud providers.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/stanford"&gt;stanford&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/replicate"&gt;replicate&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;&lt;/p&gt;



</summary><category term="stanford"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="replicate"/><category term="fine-tuning"/></entry><entry><title>Stanford Alpaca, and the acceleration of on-device large language model development</title><link href="https://simonwillison.net/2023/Mar/13/alpaca/#atom-tag" rel="alternate"/><published>2023-03-13T19:19:09+00:00</published><updated>2023-03-13T19:19:09+00:00</updated><id>https://simonwillison.net/2023/Mar/13/alpaca/#atom-tag</id><summary type="html">
    &lt;p&gt;On Saturday 11th March I wrote about how &lt;a href="https://simonwillison.net/2023/Mar/11/llama/"&gt;Large language models are having their Stable Diffusion moment&lt;/a&gt;. Today is Monday. Let's look at what's happened in the past three days.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Later on Saturday: Artem Andreenko reports that &lt;code&gt;llama.cpp&lt;/code&gt; can &lt;a href="https://twitter.com/miolini/status/1634982361757790209"&gt;run the 4-bit quantized 7B LLaMA language model model on a 4GB RaspberryPi&lt;/a&gt; - at 10 seconds per token, but still hugely impressive.&lt;/li&gt;
&lt;li&gt;Sunday 12th March: &lt;a href="https://twitter.com/cocktailpeanut"&gt;cocktailpeanut&lt;/a&gt; releases &lt;a href="https://cocktailpeanut.github.io/dalai/"&gt;Dalai&lt;/a&gt;, a "dead simple way to run LLaMA on your computer": &lt;code&gt;npx dalai llama&lt;/code&gt; and &lt;code&gt;npx dalai serve&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;13th March (today): Anish Thite reports &lt;code&gt;llama.cpp&lt;/code&gt; running &lt;a href="https://twitter.com/thiteanish/status/1635188333705043969"&gt;on a Pixel 6 phone&lt;/a&gt; (26 seconds per token). &lt;strong&gt;Update 14th March:&lt;/strong&gt; Now &lt;a href="https://twitter.com/ggerganov/status/1635605532726681600"&gt;1 second per token&lt;/a&gt; on an older Pixel 5!&lt;/li&gt;
&lt;li&gt;Also today: a team at Stanford released &lt;a href="https://crfm.stanford.edu/2023/03/13/alpaca.html"&gt;Alpaca: A Strong Open-Source Instruction-Following Model&lt;/a&gt; - fine-tuned from the LLaMA 7B model.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When I talked about a "Stable Diffusion moment" this is the kind of thing I meant: the moment this stuff is available for people to experiment with, things accelerate.&lt;/p&gt;
&lt;p&gt;I'm going to dive into Alpaca in detail.&lt;/p&gt;
&lt;h4&gt;Stanford's Alpaca&lt;/h4&gt;
&lt;p&gt;Here's the introduction to &lt;a href="https://crfm.stanford.edu/2023/03/13/alpaca.html"&gt;the Alpaca announcement&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. Alpaca behaves similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce (&amp;lt;600$).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The biggest weakness in the LLaMA models released by Meta research last month is their lack of instruction-tuning.&lt;/p&gt;
&lt;p&gt;A language model is a sentence completion engine. You give it a sequence of words, "The first man on the moon was", and it completes that sentence, hopefully with useful content.&lt;/p&gt;
&lt;p&gt;One of the great innovations from OpenAI was their application of &lt;a href="https://openai.com/research/instruction-following"&gt;instruction tuning&lt;/a&gt; to GPT-3:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To make our models safer, more helpful, and more aligned, we use an existing technique called reinforcement learning from human feedback (RLHF). On prompts submitted by our customers to the API, our labelers provide demonstrations of the desired model behavior, and rank several outputs from our models. We then use this data to fine-tune GPT-3.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Prior to this, you had to think very carefully about how to construct your prompts. Thanks to instruction tuning you can be a lot more, well, human in the way you interact with the model. "Write me a poem about pandas!" now works as a prompt, instead of "Here is a poem about pandas:".&lt;/p&gt;
&lt;p&gt;The LLaMA models had not been through this process. The &lt;a href="https://github.com/facebookresearch/llama/blob/57b0eb62de0636e75af471e49e2f1862d908d9d8/FAQ.md#2-generations-are-bad"&gt;LLaMA FAQ&lt;/a&gt; acknowledges this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Keep in mind these models are not finetuned for question answering. As such, they should be prompted so that the expected answer is the natural continuation of the prompt. [...] Overall, always keep in mind that models are very sensitive to prompts (particularly when they have not been finetuned).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is an enormous usability problem.&lt;/p&gt;
&lt;p&gt;One of my open questions about LLaMA was how difficult and expensive it would be to fine-tune it such that it could respond better to instructions.&lt;/p&gt;
&lt;p&gt;Thanks to the team at Stanford we now have an answer: 52,000 training samples and $100 of training compute! From their blog post:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s, which costs less than $100 on most cloud compute providers.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Something that stuns me about Alpaca is the quality they claim to be able to get from the 7B model - the smallest of the LLaMA models, and the one which has been seen running (albeit glacially slowly) on a RaspberryPi and a mobile phone! Here's one example from their announcement:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/alpaca-email.png" alt="Enter your instruction and press enter: Write an e-mail to congratulate new Stanford admits and mention that you are exited about meeting all of them in person Stanford-Alpaca-7B: An Open-Source Instruction-Following Language Model Dear Stanford Admits, Congratulations on your admission to Stanford! I am so excited to meet all of you in person. As a current student at Stanford, I am looking forward to welcoming you to the Stanford community and showing you all that the university has to offer. I wish you all the best in your future endeavors and I can't wait to see you on campus! Sincerely, Your Name" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;p&gt;I would be impressed to see this from the 65B (largest) LLaMA model - but getting this from 7B is spectacular.&lt;/p&gt;
&lt;h4&gt;Still not for commercial usage&lt;/h4&gt;
&lt;p&gt;I'll quote the Stanford announcement on this in full:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We emphasize that Alpaca is intended &lt;strong&gt;only for academic research&lt;/strong&gt; and any &lt;strong&gt;commercial use is prohibited&lt;/strong&gt;. There are three factors in this decision: First, Alpaca is based on LLaMA, which has a non-commercial &lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform"&gt;license&lt;/a&gt;, so we necessarily inherit this decision. Second, the instruction data is based OpenAI's text-davinci-003, whose &lt;a href="https://openai.com/policies/terms-of-use"&gt;terms of use&lt;/a&gt; prohibit developing models that compete with OpenAI. Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So it's still not something we can use to build commercial offerings - but for personal research and tinkering it's yet another huge leap forwards.&lt;/p&gt;
&lt;h4 id="takeaways"&gt;What does this demonstrate?&lt;/h4&gt;
&lt;p&gt;The license of the LLaMA model doesn't bother me too much. What's exciting to me is what this all proves:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLaMA itself shows that it's possible to train a GPT-3 class language model using openly available resources. The &lt;a href="https://arxiv.org/abs/2302.13971"&gt;LLaMA paper&lt;/a&gt; includes details of the training data, which is entirely from publicly available sources (which include CommonCrawl, GitHub, Wikipedia, ArXiv and StackExchange).&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ggerganov/llama.cpp"&gt;llama.cpp&lt;/a&gt; shows that you can then use some tricks to run that language model on consumer hardware - apparently anything with 4GB or more of RAM is enough to at least get it to start spitting out tokens!&lt;/li&gt;
&lt;li&gt;Alpaca shows that you can apply fine-tuning with a feasible sized set of examples (52,000) and cost ($100) such that even the smallest of the LLaMA models - the 7B one, which can compress down to a 4GB file with 4-bit quantization - provides results that compare well to cutting edge &lt;code&gt;text-davinci-003&lt;/code&gt; in initial human evaluation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One thing that's worth noting: the Alpaca 7B comparison likely used the full-sized 13.48GB 16bit floating point 7B model, not the 4GB smaller 4bit floating point model used by &lt;code&gt;llama.cpp&lt;/code&gt;. I've not yet seen a robust comparison of quality between the two.&lt;/p&gt;
&lt;h4&gt;Exploring the Alpaca training data with Datasette Lite&lt;/h4&gt;
&lt;p&gt;The Alpaca team released the 52,000 fine-tuning instructions they used as &lt;a href="https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json"&gt;a 21.7MB JSON file&lt;/a&gt; in their GitHub repository.&lt;/p&gt;
&lt;p&gt;My &lt;a href="https://simonwillison.net/2022/May/4/datasette-lite/"&gt;Datasette Lite&lt;/a&gt; tool has the ability to fetch JSON from GitHub and load it into an in-browser SQLite database. Here's the URL to do that:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://lite.datasette.io/?json=https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json"&gt;https://lite.datasette.io/?json=https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This will let you browse the 52,000 examples in your browser.&lt;/p&gt;
&lt;p&gt;But we can do a step better than that: here's a SQL query that runs LIKE queries to search through those examples, considering all three text columns:&lt;/p&gt;
&lt;div class="highlight highlight-source-sql"&gt;&lt;pre&gt;&lt;span class="pl-k"&gt;select&lt;/span&gt; instruction, input, output &lt;span class="pl-k"&gt;from&lt;/span&gt; alpaca_data
&lt;span class="pl-k"&gt;where&lt;/span&gt; instruction &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt; &lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; input &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt; &lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; output &lt;span class="pl-k"&gt;like&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;||&lt;/span&gt; :search &lt;span class="pl-k"&gt;||&lt;/span&gt; &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;%&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt;
&lt;span class="pl-k"&gt;order by&lt;/span&gt; random()&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;I'm using &lt;code&gt;order by random()&lt;/code&gt; because why not? It's more fun to explore that way.&lt;/p&gt;
&lt;p&gt;The following link will both load the JSON file and populate and execute that SQL query, plus allow you to change the search term using a form in your browser:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://lite.datasette.io/?json=https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json#/data?sql=select+instruction%2C+input%2C+output+from+alpaca_data%0Awhere+instruction+%7C%7C+%27+%27+%7C%7C+input+%7C%7C+%27+%27+%7C%7C+output+like+%27%25%27+%7C%7C+%3Asearch+%7C%7C+%27%25%27%0Aorder+by+random%28%29&amp;amp;search=occam"&gt;https://lite.datasette.io/?json=https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json#/data?sql=select+instruction%2C+input%2C+output+from+alpaca_data%0Awhere+instruction+%7C%7C+%27+%27+%7C%7C+input+%7C%7C+%27+%27+%7C%7C+output+like+%27%25%27+%7C%7C+%3Asearch+%7C%7C+%27%25%27%0Aorder+by+random%28%29&amp;amp;search=occam&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://static.simonwillison.net/static/2023/alpaca-datasette-lite.jpg" alt="Screenshot of Datasette executing that SQL query, retruning three results that match 'occam'" style="max-width: 100%;" /&gt;&lt;/p&gt;
&lt;h4&gt;What's next?&lt;/h4&gt;
&lt;p&gt;This week is likely to be wild. OpenAI are rumored to have a big announcement on Tuesday - possibly GPT-4? And I've heard rumors of announcements from both Anthropic and Google this week as well.&lt;/p&gt;
&lt;p&gt;I'm still more excited about seeing what happens next with LLaMA. Language models on personal devices is happening so much faster than I thought it would.&lt;/p&gt;
&lt;h4 id="bonus-training-data"&gt;Bonus: The source of that training data? GPT-3!&lt;/h4&gt;
&lt;p&gt;Here's a fascinating detail: Those 52,000 samples they used to fine-tune the model? Those were the result of a prompt they ran against GPT-3 itself! Here's &lt;a href="https://github.com/tatsu-lab/stanford_alpaca/blob/da37bb2ecab37cae022dd07aa3ff861c446fb614/prompt.txt"&gt;the prompt&lt;/a&gt; they used:&lt;/p&gt;
&lt;blockquote&gt;
&lt;pre&gt;&lt;code&gt;You are asked to come up with a set of 20 diverse task instructions. These task instructions will be given to a GPT model and we will evaluate the GPT model for completing the instructions.

Here are the requirements:
1. Try not to repeat the verb for each instruction to maximize diversity.
2. The language used for the instruction also should be diverse. For example, you should combine questions with imperative instrucitons.
3. The type of instructions should be diverse. The list should include diverse types of tasks like open-ended generation, classification, editing, etc.
2. A GPT language model should be able to complete the instruction. For example, do not ask the assistant to create any visual or audio output. For another example, do not ask the assistant to wake you up at 5pm or set a reminder because it cannot perform any action.
3. The instructions should be in English.
4. The instructions should be 1 to 2 sentences long. Either an imperative sentence or a question is permitted.
5. You should generate an appropriate input to the instruction. The input field should contain a specific example provided for the instruction. It should involve realistic data and should not contain simple placeholders. The input should provide substantial content to make the instruction challenging but should ideally not exceed 100 words.
6. Not all instructions require input. For example, when a instruction asks about some general information, "what is the highest peak in the world", it is not necssary to provide a specific context. In this case, we simply put "&amp;lt;noinput&amp;gt;" in the input field.
7. The output should be an appropriate response to the instruction and the input. Make sure the output is less than 100 words.

List of 20 tasks:
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then they include three random example instructions from &lt;a href="https://github.com/tatsu-lab/stanford_alpaca/blob/da37bb2ecab37cae022dd07aa3ff861c446fb614/seed_tasks.jsonl"&gt;a list of 175&lt;/a&gt; they had prepared by hand. The completed prompt sent to OpenAI would include the above instructions followed by something like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;pre&gt;&lt;code&gt;###
1. Instruction: Explain the following idiom to me, and try to give me some examples.
1. Input:
black sheep
1. Output:
Meaning: An outcast. Someone who doesn’t fit in with the rest of the crowd. They take pride in being different. Thinks for themselves and doesn’t care what no one else has to say. They tend to ride their own wave and are usually loners because no one understands them, but its okay because they like it that way.
Example: He’s the black sheep of the family.

###
2. Instruction: Generate a haiku using the following word:
2. Input:
summer
2. Output:
The chill, worming in
Shock, pleasure, bursting within
Summer tongue awakes

###
3. Instruction: Recommend a movie for me to watch during the weekend and explain the reason.
3. Input:
3. Output:
I would recommend the movie "The Shawshank Redemption" because it is an excellent movie that is both moving and inspiring. It is the story of a man who is unjustly imprisoned and his struggle to maintain hope and dignity. It is a great film to watch over the weekend because it will make you think about the human capacity for resilience and hope.

###
4. Instruction:
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;p&gt;GPT-3 would then fill in the rest. You can try this &lt;a href="https://simonwillison.net/2022/Jun/5/play-with-gpt3/"&gt;in the GPT-3 Playground&lt;/a&gt; to see it in action (paste &lt;a href="https://gist.github.com/simonw/fed0bf6d8237920f2ee42e4fc82c260f"&gt;from here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Here's &lt;a href="https://github.com/tatsu-lab/stanford_alpaca/blob/da37bb2ecab37cae022dd07aa3ff861c446fb614/generate_instruction.py"&gt;the Python script&lt;/a&gt; that assembles that all together.&lt;/p&gt;
&lt;p&gt;They spent $500 on OpenAI credits to assemble the 52,000 examples they used to fine-tune their model.&lt;/p&gt;
&lt;p&gt;As they note in their announcement, generating examples in this way is actually mentioned in the OpenAI &lt;a href="https://openai.com/policies/terms-of-use"&gt;terms of use&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You may not [...] (iii) use the Services to develop foundation models or other large scale models that compete with OpenAI&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There's a related concept to this called &lt;a href="https://ssg.aalto.fi/research/projects/mlsec/model-extraction/"&gt;Model Extraction&lt;/a&gt;, where people build new models that emulate the behaviour of others by firing large numbers of examples through the other model and training a new one based on the results.&lt;/p&gt;
&lt;p&gt;I don't think the way Alpaca was trained quite counts as a classic Model Extraction attack, but it certainly echoes one.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/stanford"&gt;stanford&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpt-3"&gt;gpt-3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/local-llms"&gt;local-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama-cpp"&gt;llama-cpp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/paper-review"&gt;paper-review&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="open-source"/><category term="stanford"/><category term="ai"/><category term="gpt-3"/><category term="generative-ai"/><category term="llama"/><category term="local-llms"/><category term="llms"/><category term="fine-tuning"/><category term="llama-cpp"/><category term="paper-review"/></entry><entry><title>Quoting Alpaca: A Strong Open-Source Instruction-Following Model</title><link href="https://simonwillison.net/2023/Mar/13/stanford-alpaca/#atom-tag" rel="alternate"/><published>2023-03-13T18:18:37+00:00</published><updated>2023-03-13T18:18:37+00:00</updated><id>https://simonwillison.net/2023/Mar/13/stanford-alpaca/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://crfm.stanford.edu/2023/03/13/alpaca.html"&gt;&lt;p&gt;We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. Alpaca behaves similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce (&amp;lt;600$).&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://crfm.stanford.edu/2023/03/13/alpaca.html"&gt;Alpaca: A Strong Open-Source Instruction-Following Model&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/stanford"&gt;stanford&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/fine-tuning"&gt;fine-tuning&lt;/a&gt;&lt;/p&gt;



</summary><category term="stanford"/><category term="ai"/><category term="generative-ai"/><category term="llama"/><category term="llms"/><category term="fine-tuning"/></entry></feed>