<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: pytorch</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/pytorch.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2025-10-20T17:21:52+00:00</updated><author><name>Simon Willison</name></author><entry><title>Getting DeepSeek-OCR working on an NVIDIA Spark via brute force using Claude Code</title><link href="https://simonwillison.net/2025/Oct/20/deepseek-ocr-claude-code/#atom-tag" rel="alternate"/><published>2025-10-20T17:21:52+00:00</published><updated>2025-10-20T17:21:52+00:00</updated><id>https://simonwillison.net/2025/Oct/20/deepseek-ocr-claude-code/#atom-tag</id><summary type="html">
    &lt;p&gt;DeepSeek released a new model yesterday: &lt;a href="https://github.com/deepseek-ai/DeepSeek-OCR"&gt;DeepSeek-OCR&lt;/a&gt;, a 6.6GB model fine-tuned specifically for OCR. They released it as model weights that run using PyTorch and CUDA. I got it running on the NVIDIA Spark by having Claude Code effectively brute force the challenge of getting it working on that particular hardware.&lt;/p&gt;
&lt;p&gt;This small project (40 minutes this morning, most of which was Claude Code churning away while I had breakfast and did some other things) ties together a bunch of different concepts I've been exploring recently. I &lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/"&gt;designed an agentic loop&lt;/a&gt; for the problem, gave Claude full permissions inside a Docker sandbox, embraced the &lt;a href="https://simonwillison.net/2025/Oct/5/parallel-coding-agents/"&gt;parallel agents lifestyle&lt;/a&gt; and reused my &lt;a href="https://simonwillison.net/2025/Oct/14/nvidia-dgx-spark/"&gt;notes on the NVIDIA Spark&lt;/a&gt; from last week.&lt;/p&gt;
&lt;p&gt;I knew getting a PyTorch CUDA model running on the Spark was going to be a little frustrating, so I decided to outsource the entire process to Claude Code to see what would happen.&lt;/p&gt;
&lt;p&gt;TLDR: It worked. It took four prompts (one long, three very short) to have Claude Code figure out everything necessary to run the new DeepSeek model on the NVIDIA Spark, OCR a document for me and produce &lt;em&gt;copious&lt;/em&gt; notes about the process.&lt;/p&gt;
&lt;h4 id="the-setup"&gt;The setup&lt;/h4&gt;
&lt;p&gt;I connected to the Spark from my Mac via SSH and started a new Docker container there:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;docker run -it --gpus=all \
  -v /usr/local/cuda:/usr/local/cuda:ro \
  nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 \
  bash&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then I installed npm and used that to install Claude Code:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;apt-get update
DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y npm
npm install -g @anthropic-ai/claude-code&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then started Claude Code, telling it that it's OK that it's running as &lt;code&gt;root&lt;/code&gt; because it's in a sandbox:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;IS_SANDBOX=1 claude --dangerously-skip-permissions&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;It provided me a URL to click on to authenticate with my Anthropic account.&lt;/p&gt;
&lt;h4 id="the-initial-prompts"&gt;The initial prompts&lt;/h4&gt;
&lt;p&gt;I kicked things off with this prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Create a folder deepseek-ocr and do everything else in that folder&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then I ran the following, providing links to both the GitHub repository and the Hugging Face model, providing a clue about NVIDIA ARM and giving it an image (&lt;a href="https://static.simonwillison.net/static/2025/ft.jpeg"&gt;this one&lt;/a&gt;, see &lt;a href="https://simonwillison.net/2025/Aug/29/the-perils-of-vibe-coding/"&gt;previous post&lt;/a&gt;) that I wanted it to run OCR on.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Your task is to get this working: &lt;a href="https://github.com/deepseek-ai/DeepSeek-OCR"&gt;https://github.com/deepseek-ai/DeepSeek-OCR&lt;/a&gt; - it uses Hugging Face Transformers and models from &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-OCR"&gt;https://huggingface.co/deepseek-ai/DeepSeek-OCR&lt;/a&gt; - you are running in a Docker container on an NVIDIA ARM device, first research your environment to check it will be possible to do this. Git clone both of those - you may need to enable LFS first. Make extensive notes as you go in a notes.md file which you should only ever append new notes to. At the end of this you should have scripts in the folder which can be run to both setup and then run the model against an example - use &lt;a href="https://static.simonwillison.net/static/2025/ft.jpeg"&gt;https://static.simonwillison.net/static/2025/ft.jpeg&lt;/a&gt; as the example image that you OCR. As a final step create a detailed README.md file explaining the project and what you learned and how to run your code.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Note how I also told it to "make extensive notes as you go in a notes.md file which you should only ever append new notes to" so I'd have some useful notes at the end.&lt;/p&gt;
&lt;h4 id="the-follow-ups"&gt;The follow-ups&lt;/h4&gt;
&lt;p&gt;It churned away for 13 minutes, and made a sequence of discoveries about the platform and the model and the difficulties of running PyTorch on it.&lt;/p&gt;
&lt;p&gt;And then... &lt;a href="https://github.com/simonw/research/blob/main/deepseek-ocr-nvidia-spark/claude-log.md#-assistant--2025-10-20-154706"&gt;it gave up&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The NVIDIA GB10 GPU has compute capability sm_121, which is too new for PyTorch 2.5.1&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PyTorch 2.5.1 supports: sm_50, sm_80, sm_86, sm_89, sm_90, sm_90a&lt;/li&gt;
&lt;li&gt;GB10 requires: sm_121&lt;/li&gt;
&lt;li&gt;Error: &lt;code&gt;CUDA error: no kernel image is available for execution on the device&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additionally, the model has hardcoded &lt;code&gt;.cuda()&lt;/code&gt; calls, preventing CPU fallback without code modifications.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So I gave it some encouragement:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Any other options that might help get CUDA working like using different versions of PyTorch that have available CUDA wheels for this platform? Have you checked to see what PyTorch versions already have these ARM CUDA wheels that would work here?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It ran this command to grep for all available PyTorch wheels on &lt;code&gt;download.pytorch.org&lt;/code&gt; that match the ARM architecture:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;curl -s https://download.pytorch.org/whl/torch/ &lt;span class="pl-k"&gt;|&lt;/span&gt; grep -o &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;torch-[0-9.]*.*aarch64.whl&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; &lt;span class="pl-k"&gt;|&lt;/span&gt; sort -V &lt;span class="pl-k"&gt;|&lt;/span&gt; tail -20&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;... and that unblocked it.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Excellent! I can see that PyTorch 2.9.0 has ARM64 wheels for CUDA 12.8, 12.9, and &lt;strong&gt;CUDA 13.0&lt;/strong&gt; (cu130)! This is exactly what we need!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It installed the wheels it found and reported:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Good news! PyTorch 2.9.0 can detect the GB10 GPU with CUDA capability 12.1 (sm_121), but it says the maximum supported is 12.0. However, it's just a warning now instead of a hard error! Let's try running inference: [...]&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Then kept on churning for a bit, ignoring warnings, and managed to &lt;a href="https://github.com/simonw/research/blob/main/deepseek-ocr-nvidia-spark/claude-log.md#-user--2025-10-20-155056"&gt;run the model&lt;/a&gt;! It loaded the image and ran this prompt:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;|grounding|&amp;gt;OCR this image.
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And got back output that started like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&amp;lt;|ref|&amp;gt;The perils of vibe coding&amp;lt;|/ref|&amp;gt;&amp;lt;|det|&amp;gt;[[352, 30, 624, 111]]&amp;lt;|/det|&amp;gt;
&amp;lt;|ref|&amp;gt;opt&amp;lt;|/ref|&amp;gt;&amp;lt;|det|&amp;gt;[[970, 0, 994, 30]]&amp;lt;|/det|&amp;gt;
&amp;lt;|ref|&amp;gt;such&amp;lt;|/ref|&amp;gt;&amp;lt;|det|&amp;gt;[[970, 24, 996, 58]]&amp;lt;|/det|&amp;gt;
&amp;lt;|ref|&amp;gt;days&amp;lt;|/ref|&amp;gt;&amp;lt;|det|&amp;gt;[[970, 52, 996, 87]]&amp;lt;|/det|&amp;gt;
&amp;lt;|ref|&amp;gt;pavi&amp;lt;|/ref|&amp;gt;&amp;lt;|det|&amp;gt;[[970, 85, 996, 118]]&amp;lt;|/det|&amp;gt;
&amp;lt;|ref|&amp;gt;TECHNOLOGY&amp;lt;|/ref|&amp;gt;&amp;lt;|det|&amp;gt;[[33, 199, 127, 230]]&amp;lt;|/det|&amp;gt;
&amp;lt;|ref|&amp;gt;holds the promise of replacing program-&amp;lt;|/ref|&amp;gt;&amp;lt;|det|&amp;gt;[[220, 170, 390, 210]]&amp;lt;|/det|&amp;gt;
&amp;lt;|ref|&amp;gt;mers - a profession of very well paid&amp;lt;|/ref|&amp;gt;&amp;lt;|det|&amp;gt;[[220, 199, 390, 238]]&amp;lt;|/det|&amp;gt;
&amp;lt;|ref|&amp;gt;people.For another, the work can be&amp;lt;|/ref|&amp;gt;&amp;lt;|det|&amp;gt;[[220, 228, 390, 267]]&amp;lt;|/det|&amp;gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Claude declared victory and pointed me to the &lt;a href="https://github.com/simonw/research/blob/main/deepseek-ocr-nvidia-spark/output/result.mmd?short_path=f43da6a"&gt;output/result.mmd&lt;/a&gt; file, which contained only whitespace. So OCR had worked but the result had failed to be written correctly to disk.&lt;/p&gt;
&lt;p&gt;I &lt;a href="https://github.com/simonw/research/blob/main/deepseek-ocr-nvidia-spark/claude-log.md#-user--2025-10-20-155826"&gt;said&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;That result.mmd file looks to be exclusively whitespace to me, try and get a better result there - the bounding box image was good though&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Which caused Claude to &lt;a href="https://github.com/simonw/research/blob/main/deepseek-ocr-nvidia-spark/claude-log.md#-assistant--2025-10-20-155900"&gt;reconsider&lt;/a&gt; the prompt it used:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Maybe we need to use a different prompt. Looking at the README from DeepSeek-OCR, there are different prompts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;"&amp;lt;image&amp;gt;\n&amp;lt;|grounding|&amp;gt;Convert the document to markdown."&lt;/code&gt; - for documents&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"&amp;lt;image&amp;gt;\n&amp;lt;|grounding|&amp;gt;OCR this image."&lt;/code&gt; - general OCR&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"&amp;lt;image&amp;gt;\nFree OCR."&lt;/code&gt; - without layouts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The "grounding" mode seems to be focused on bounding boxes, which is why we got good bounding box results but the text output might be in a different format.&lt;/p&gt;
&lt;p&gt;Let me try running it again with a different prompt that's more focused on text extraction rather than grounding.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It tried all of the prompt variants, &lt;a href="https://github.com/simonw/research/blob/main/deepseek-ocr-nvidia-spark/notes.md#text-output-improvement-2025-10-20-1601"&gt;made a note about them&lt;/a&gt; and created a file called &lt;a href="https://github.com/simonw/research/blob/main/deepseek-ocr-nvidia-spark/PROMPTS_GUIDE.md"&gt;PROMPTS_GUIDE.md&lt;/a&gt; with a bunch more details, including this comparison table:&lt;/p&gt;
&lt;blockquote&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prompt&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Text Quality&lt;/th&gt;
&lt;th&gt;Structure&lt;/th&gt;
&lt;th&gt;Coordinates&lt;/th&gt;
&lt;th&gt;Best Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free OCR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⚡⚡⚡ Fast&lt;/td&gt;
&lt;td&gt;⭐⭐⭐ Excellent&lt;/td&gt;
&lt;td&gt;⭐ Basic&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;General OCR&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Markdown&lt;/td&gt;
&lt;td&gt;⚡⚡ Medium&lt;/td&gt;
&lt;td&gt;⭐⭐⭐ Excellent&lt;/td&gt;
&lt;td&gt;⭐⭐⭐ Full&lt;/td&gt;
&lt;td&gt;⭐⭐ Partial&lt;/td&gt;
&lt;td&gt;Documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grounding&lt;/td&gt;
&lt;td&gt;⚡ Slow&lt;/td&gt;
&lt;td&gt;⭐⭐ Good&lt;/td&gt;
&lt;td&gt;⭐ Basic&lt;/td&gt;
&lt;td&gt;⭐⭐⭐ Full&lt;/td&gt;
&lt;td&gt;Annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Detailed&lt;/td&gt;
&lt;td&gt;⚡⚡⚡ Fastest&lt;/td&gt;
&lt;td&gt;⭐ N/A&lt;/td&gt;
&lt;td&gt;❌ N/A&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;Image analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/blockquote&gt;
&lt;p&gt;And this table under the heading "Performance Benchmarks":&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Test image: 3503×1668 pixels (Financial Times article)&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prompt&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Output Size&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free OCR&lt;/td&gt;
&lt;td&gt;24s&lt;/td&gt;
&lt;td&gt;Clean text&lt;/td&gt;
&lt;td&gt;2257&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Markdown&lt;/td&gt;
&lt;td&gt;39s&lt;/td&gt;
&lt;td&gt;Formatted MD&lt;/td&gt;
&lt;td&gt;2257 + structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grounding&lt;/td&gt;
&lt;td&gt;58s&lt;/td&gt;
&lt;td&gt;Text + coords&lt;/td&gt;
&lt;td&gt;2257 + boxes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Detailed&lt;/td&gt;
&lt;td&gt;9s&lt;/td&gt;
&lt;td&gt;Description&lt;/td&gt;
&lt;td&gt;~300&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/blockquote&gt;
&lt;p&gt;My final prompt was this, to gather everything together into a zip file I could extract from the Docker container:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Create a zip file with the output and output_text and all of the scripts and notes - but leave out the github repo and the huggingface repo directories&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I added the contents of that zip file to my new &lt;a href="https://github.com/simonw/research"&gt;simonw/research&lt;/a&gt; GitHub repo &lt;a href="https://github.com/simonw/research/tree/main/deepseek-ocr-nvidia-spark"&gt;in the deepseek-ocr-nvidia-spark&lt;/a&gt; folder.&lt;/p&gt;
&lt;p&gt;Claude really likes writing notes! Here's the directory listing of that finished folder:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;  |-- download_test_image.sh
  |-- FINAL_SUMMARY.md
  |-- notes.md
  |-- output
  |   |-- images
  |   |-- result_with_boxes.jpg
  |   `-- result.mmd
  |-- output_text
  |   |-- detailed
  |   |   |-- images
  |   |   |-- result_with_boxes.jpg
  |   |   `-- result.mmd
  |   |-- free_ocr
  |   |   |-- images
  |   |   |-- result_with_boxes.jpg
  |   |   `-- result.mmd
  |   `-- markdown
  |       |-- images
  |       |   `-- 0.jpg
  |       |-- result_with_boxes.jpg
  |       `-- result.mmd
  |-- PROMPTS_GUIDE.md
  |-- README_SUCCESS.md
  |-- README.md
  |-- run_ocr_best.py
  |-- run_ocr_cpu_nocuda.py
  |-- run_ocr_cpu.py
  |-- run_ocr_text_focused.py
  |-- run_ocr.py
  |-- run_ocr.sh
  |-- setup.sh
  |-- SOLUTION.md
  |-- test_image.jpeg
  |-- TEXT_OUTPUT_SUMMARY.md
  `-- UPDATE_PYTORCH.md
&lt;/code&gt;&lt;/pre&gt;
&lt;h4 id="takeaways"&gt;Takeaways&lt;/h4&gt;
&lt;p&gt;My first prompt was at 15:31:07 (UTC). The final message from Claude Code came in at 16:10:03. That means it took less than 40 minutes start to finish, and I was only actively involved for about 5-10 minutes of that time. The rest of the time I was having breakfast and doing other things.&lt;/p&gt;
&lt;p&gt;Having tried and failed to get PyTorch stuff working in the past, I count this as a &lt;em&gt;huge&lt;/em&gt; win. I'll be using this process a whole lot more in the future.&lt;/p&gt;
&lt;p&gt;How good were the actual results? There's honestly so much material in the resulting notes created by Claude that I haven't reviewed all of it. There may well be all sorts of errors in there, but it's indisputable that it managed to run the model and made notes on how it did that such that I'll be able to do the same thing in the future.&lt;/p&gt;
&lt;p&gt;I think the key factors in executing this project successfully were the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;I gave it exactly what it needed: a Docker environment in the target hardware, instructions on where to get what it needed (the code and the model) and a clear goal for it to pursue. This is a great example of the pattern I described in &lt;a href="https://simonwillison.net/2025/Sep/30/designing-agentic-loops/"&gt;designing agentic loops&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Running it in a Docker sandbox meant I could use &lt;code&gt;claude --dangerously-skip-permissions&lt;/code&gt; and leave it running on its own. If I'd had to approve every command it wanted to run I would have got frustrated and quit the project after just a few minutes.&lt;/li&gt;
&lt;li&gt;I applied my own knowledge and experience when it got stuck. I was confident (based on &lt;a href="https://simonwillison.net/2025/Oct/14/nvidia-dgx-spark/#claude-code-for-everything"&gt;previous experiments&lt;/a&gt; with the Spark) that a CUDA wheel for ARM64 existed that was likely to work, so when it gave up I prompted it to try again, leading to success.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Oh, and it looks like DeepSeek OCR is a pretty good model if you spend the time experimenting with different ways to run it.&lt;/p&gt;
&lt;h4 id="bonus-using-vs-code-to-monitor-the-container"&gt;Bonus: Using VS Code to monitor the container&lt;/h4&gt;
&lt;p&gt;A small TIL from today: I had kicked off the job running in the Docker container via SSH to the Spark when I realized it would be neat if I could easily monitor the files it was creating while it was running.&lt;/p&gt;
&lt;p&gt;I &lt;a href="https://claude.ai/share/68a0ebff-b586-4278-bd91-6b715a657d2b"&gt;asked Claude.ai&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I am running a Docker container on a remote machine, which I started over SSH&lt;/p&gt;
&lt;p&gt;How can I have my local VS Code on MacOS show me the filesystem in that docker container inside that remote machine, without restarting anything?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It gave me a set of steps that solved this exact problem:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install the VS Code "Remote SSH" and "Dev Containers" extensions&lt;/li&gt;
&lt;li&gt;Use "Remote-SSH: Connect to Host" to connect to the remote machine (on my Tailscale network that's &lt;code&gt;spark@100.113.1.114&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;In the window for that remote SSH session, run "Dev Containers: Attach to Running Container" - this shows a list of containers and you can select the one you want to attach to&lt;/li&gt;
&lt;li&gt;... and that's it! VS Code opens a new window providing full access to all of the files in that container. I opened up &lt;code&gt;notes.md&lt;/code&gt; and watched it as Claude Code appended to it in real time.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At the end when I told Claude to create a zip file of the results I could select that in the VS Code file explorer and use the "Download" menu item to download it to my Mac.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ocr"&gt;ocr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/docker"&gt;docker&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytorch"&gt;pytorch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/anthropic"&gt;anthropic&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nvidia"&gt;nvidia&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vs-code"&gt;vs-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vision-llms"&gt;vision-llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/deepseek"&gt;deepseek&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llm-release"&gt;llm-release&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/coding-agents"&gt;coding-agents&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-in-china"&gt;ai-in-china&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nvidia-spark"&gt;nvidia-spark&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="ocr"/><category term="python"/><category term="ai"/><category term="docker"/><category term="pytorch"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="anthropic"/><category term="claude"/><category term="nvidia"/><category term="vs-code"/><category term="vision-llms"/><category term="deepseek"/><category term="llm-release"/><category term="coding-agents"/><category term="claude-code"/><category term="ai-in-china"/><category term="nvidia-spark"/></entry><entry><title>nanochat</title><link href="https://simonwillison.net/2025/Oct/13/nanochat/#atom-tag" rel="alternate"/><published>2025-10-13T20:29:58+00:00</published><updated>2025-10-13T20:29:58+00:00</updated><id>https://simonwillison.net/2025/Oct/13/nanochat/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/karpathy/nanochat"&gt;nanochat&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Really interesting new project from Andrej Karpathy, described at length &lt;a href="https://github.com/karpathy/nanochat/discussions/1"&gt;in this discussion post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be trained for as little as $100:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This repo is a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It's around 8,000 lines of code, mostly Python (using PyTorch) plus a little bit of Rust for &lt;a href="https://github.com/karpathy/nanochat/tree/master/rustbpe"&gt;training the tokenizer&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Andrej suggests renting a 8XH100 NVIDA node for around $24/ hour to train the model. 4 hours (~$100) is enough to get a model that can hold a conversation - &lt;a href="https://twitter.com/karpathy/status/1977755430093980034"&gt;almost coherent example here&lt;/a&gt;. Run it for 12 hours and you get something that slightly outperforms GPT-2. I'm looking forward to hearing results from longer training runs!&lt;/p&gt;
&lt;p&gt;The resulting model is ~561M parameters, so it should run on almost anything. I've run a 4B model on my iPhone, 561M should easily fit on even an inexpensive Raspberry Pi.&lt;/p&gt;
&lt;p&gt;The model defaults to training on ~24GB from &lt;a href="https://huggingface.co/datasets/karpathy/fineweb-edu-100b-shuffle"&gt;karpathy/fineweb-edu-100b-shuffle&lt;/a&gt; derived from &lt;a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu"&gt;FineWeb-Edu&lt;/a&gt;, and then &lt;a href="https://github.com/karpathy/nanochat/blob/5fd0b138860a76beb60cf099fa46f74191b50941/scripts/mid_train.py"&gt;midtrains&lt;/a&gt; on 568K examples from &lt;a href="https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk"&gt;SmolTalk&lt;/a&gt; (460K), &lt;a href="https://huggingface.co/datasets/cais/mmlu"&gt;MMLU auxiliary train&lt;/a&gt; (100K), and &lt;a href="https://huggingface.co/datasets/openai/gsm8k"&gt;GSM8K&lt;/a&gt; (8K), followed by &lt;a href="https://github.com/karpathy/nanochat/blob/5fd0b138860a76beb60cf099fa46f74191b50941/scripts/chat_sft.py"&gt;supervised finetuning&lt;/a&gt; on 21.4K examples from &lt;a href="https://huggingface.co/datasets/allenai/ai2_arc#arc-easy-1"&gt;ARC-Easy&lt;/a&gt; (2.3K), &lt;a href="https://huggingface.co/datasets/allenai/ai2_arc#arc-challenge"&gt;ARC-Challenge&lt;/a&gt; (1.1K), &lt;a href="https://huggingface.co/datasets/openai/gsm8k"&gt;GSM8K&lt;/a&gt; (8K), and &lt;a href="https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk"&gt;SmolTalk&lt;/a&gt; (10K).&lt;/p&gt;
&lt;p&gt;Here's the code for the &lt;a href="https://github.com/karpathy/nanochat/blob/5fd0b138860a76beb60cf099fa46f74191b50941/scripts/chat_web.py"&gt;web server&lt;/a&gt;, which is fronted by this pleasantly succinct vanilla JavaScript &lt;a href="https://github.com/karpathy/nanochat/blob/5fd0b138860a76beb60cf099fa46f74191b50941/nanochat/ui.html"&gt;HTML+JavaScript frontend&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Sam Dobson pushed a build of the model to &lt;a href="https://huggingface.co/sdobson/nanochat"&gt;sdobson/nanochat&lt;/a&gt; on Hugging Face. It's designed to run on CUDA but I pointed Claude Code at a checkout and had it hack around until it figured out how to run it on CPU on macOS, which eventually resulted in &lt;a href="https://gist.github.com/simonw/912623bf00d6c13cc0211508969a100a"&gt;this script&lt;/a&gt; which I've published as a Gist. You should be able to try out the model using uv like this:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd /tmp
git clone https://huggingface.co/sdobson/nanochat
uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \
--model-dir /tmp/nanochat \
--prompt "Tell me about dogs."
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I got this (truncated because it ran out of tokens):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I'm delighted to share my passion for dogs with you. As a veterinary doctor, I've had the privilege of helping many pet owners care for their furry friends. There's something special about training, about being a part of their lives, and about seeing their faces light up when they see their favorite treats or toys.&lt;/p&gt;
&lt;p&gt;I've had the chance to work with over 1,000 dogs, and I must say, it's a rewarding experience. The bond between owner and pet&lt;/p&gt;
&lt;/blockquote&gt;

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/karpathy/status/1977755427569111362"&gt;@karpathy&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rust"&gt;rust&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytorch"&gt;pytorch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/andrej-karpathy"&gt;andrej-karpathy&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/training-data"&gt;training-data&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpus"&gt;gpus&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude-code"&gt;claude-code&lt;/a&gt;&lt;/p&gt;



</summary><category term="python"/><category term="ai"/><category term="rust"/><category term="pytorch"/><category term="andrej-karpathy"/><category term="generative-ai"/><category term="llms"/><category term="training-data"/><category term="uv"/><category term="gpus"/><category term="claude-code"/></entry><entry><title>Defeating Nondeterminism in LLM Inference</title><link href="https://simonwillison.net/2025/Sep/11/defeating-nondeterminism/#atom-tag" rel="alternate"/><published>2025-09-11T06:53:42+00:00</published><updated>2025-09-11T06:53:42+00:00</updated><id>https://simonwillison.net/2025/Sep/11/defeating-nondeterminism/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/"&gt;Defeating Nondeterminism in LLM Inference&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A very common question I see about LLMs concerns why they can't be made to deliver the same response to the same prompt by setting a fixed random number seed.&lt;/p&gt;
&lt;p&gt;Like many others I had been lead to believe this was due to the non-associative nature of floating point arithmetic, where &lt;code&gt;(a + b) + c ≠ a + (b + c)&lt;/code&gt;, combining with unpredictable calculation orders on concurrent GPUs. This new paper calls that the "concurrency + floating point hypothesis":&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;One common hypothesis is that some combination of floating-point non-associativity and concurrent execution leads to nondeterminism based on which concurrent core finishes first. We will call this the “concurrency + floating point” hypothesis for LLM inference nondeterminism.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It then convincingly argues that this is &lt;em&gt;not&lt;/em&gt; the core of the problem, because "in the typical forward pass of an LLM, there is usually not a single atomic add present."&lt;/p&gt;
&lt;p&gt;Why are LLMs so often non-deterministic then?&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;[...] &lt;strong&gt;the primary reason nearly all LLM inference endpoints are nondeterministic is that the load (and thus batch-size) nondeterministically varies!&lt;/strong&gt; This nondeterminism is not unique to GPUs — LLM inference endpoints served from CPUs or TPUs will also have this source of nondeterminism.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The &lt;a href="https://github.com/thinking-machines-lab/batch_invariant_ops"&gt;thinking-machines-lab/batch_invariant_ops&lt;/a&gt; code that accompanies this paper addresses this by providing a PyTorch implementation of invariant kernels and demonstrates them running Qwen3-8B deterministically under vLLM.&lt;/p&gt;
&lt;p&gt;This paper is the first public output from Thinking Machines,  the AI Lab founded in February 2025 by Mira Murati, OpenAI's former CTO (and interim CEO for &lt;a href="https://openai.com/index/openai-announces-leadership-transition/"&gt;a few days&lt;/a&gt;). It's unrelated to &lt;a href="https://en.m.wikipedia.org/wiki/Thinking_Machines_Corporation"&gt;Thinking Machines Corporation&lt;/a&gt;, the last employer of Richard Feynman (as described in this &lt;a href="https://longnow.org/ideas/richard-feynman-and-the-connection-machine/"&gt;most excellent story by Danny Hillis&lt;/a&gt;).

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=45200925"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytorch"&gt;pytorch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/qwen"&gt;qwen&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="pytorch"/><category term="generative-ai"/><category term="llms"/><category term="qwen"/></entry><entry><title>Using uv with PyTorch</title><link href="https://simonwillison.net/2024/Nov/19/using-uv-with-pytorch/#atom-tag" rel="alternate"/><published>2024-11-19T23:20:18+00:00</published><updated>2024-11-19T23:20:18+00:00</updated><id>https://simonwillison.net/2024/Nov/19/using-uv-with-pytorch/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://docs.astral.sh/uv/guides/integration/pytorch/"&gt;Using uv with PyTorch&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
PyTorch is a notoriously tricky piece of Python software to install, due to the need to provide separate wheels for different combinations of Python version and GPU accelerator (e.g. different CUDA versions).&lt;/p&gt;
&lt;p&gt;uv now has dedicated documentation for PyTorch which I'm finding really useful - it clearly explains the challenge and then shows exactly how to configure a &lt;code&gt;pyproject.toml&lt;/code&gt; such that &lt;code&gt;uv&lt;/code&gt; knows which version of each package it should install from where.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/charliermarsh/status/1858966355518878163"&gt;@charliermarsh&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/packaging"&gt;packaging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pip"&gt;pip&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytorch"&gt;pytorch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;&lt;/p&gt;



</summary><category term="packaging"/><category term="pip"/><category term="python"/><category term="pytorch"/><category term="uv"/></entry><entry><title>light-the-torch</title><link href="https://simonwillison.net/2024/Aug/22/light-the-torch/#atom-tag" rel="alternate"/><published>2024-08-22T04:11:32+00:00</published><updated>2024-08-22T04:11:32+00:00</updated><id>https://simonwillison.net/2024/Aug/22/light-the-torch/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://pypi.org/project/light-the-torch/"&gt;light-the-torch&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;light-the-torch&lt;/code&gt; is a small utility that wraps &lt;code&gt;pip&lt;/code&gt; to ease the installation process for PyTorch distributions like &lt;code&gt;torch&lt;/code&gt;, &lt;code&gt;torchvision&lt;/code&gt;, &lt;code&gt;torchaudio&lt;/code&gt;, and so on as well as third-party packages that depend on them. It auto-detects compatible CUDA versions from the local setup and installs the correct PyTorch binaries without user interference.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Use it like this:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;pip install light-the-torch
ltt install torch&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It works by wrapping and &lt;a href="https://github.com/pmeier/light-the-torch/blob/main/light_the_torch/_patch.py"&gt;patching pip&lt;/a&gt;.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/thezachmueller/status/1826384400684384476"&gt;@ thezachmueller&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/packaging"&gt;packaging&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pip"&gt;pip&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytorch"&gt;pytorch&lt;/a&gt;&lt;/p&gt;



</summary><category term="packaging"/><category term="pip"/><category term="python"/><category term="pytorch"/></entry><entry><title>GGUF, the long way around</title><link href="https://simonwillison.net/2024/Feb/29/gguf-the-long-way-around/#atom-tag" rel="alternate"/><published>2024-02-29T21:39:53+00:00</published><updated>2024-02-29T21:39:53+00:00</updated><id>https://simonwillison.net/2024/Feb/29/gguf-the-long-way-around/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://vickiboykis.com/2024/02/28/gguf-the-long-way-around/"&gt;GGUF, the long way around&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Vicki Boykis dives deep into the GGUF format used by llama.cpp, after starting with a detailed description of how PyTorch models work and how they are traditionally persisted using Python pickle.&lt;/p&gt;

&lt;p&gt;Pickle lead to safetensors, a format that avoided the security problems with downloading and running untrusted pickle files.&lt;/p&gt;

&lt;p&gt;Llama.cpp introduced GGML, which popularized 16-bit (as opposed to 32-bit) quantization and bundled metadata and tensor data in a single file.&lt;/p&gt;

&lt;p&gt;GGUF fixed some design flaws in GGML and is the default format used by Llama.cpp today.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=39553967"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytorch"&gt;pytorch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/vicki-boykis"&gt;vicki-boykis&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama-cpp"&gt;llama-cpp&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="pytorch"/><category term="generative-ai"/><category term="llama"/><category term="llms"/><category term="vicki-boykis"/><category term="llama-cpp"/></entry><entry><title>Getting Started With CUDA for Python Programmers</title><link href="https://simonwillison.net/2024/Jan/29/getting-started-with-cuda-for-python-programmers/#atom-tag" rel="alternate"/><published>2024-01-29T21:23:57+00:00</published><updated>2024-01-29T21:23:57+00:00</updated><id>https://simonwillison.net/2024/Jan/29/getting-started-with-cuda-for-python-programmers/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=nOxKexn3iBo"&gt;Getting Started With CUDA for Python Programmers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
if, like me, you’ve avoided CUDA programming (writing efficient code that runs on NVIGIA GPUs) in the past, Jeremy Howard has a new 1hr17m video tutorial that demystifies the basics. The code is all run using PyTorch in notebooks running on Google Colab, and it starts with a very clear demonstration of how to convert a RGB image to black and white.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/jeremyphoward/status/1752071227228008471"&gt;@jeremyphoward&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytorch"&gt;pytorch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jeremy-howard"&gt;jeremy-howard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/gpus"&gt;gpus&lt;/a&gt;&lt;/p&gt;



</summary><category term="python"/><category term="ai"/><category term="pytorch"/><category term="jeremy-howard"/><category term="gpus"/></entry><entry><title>How We Executed a Critical Supply Chain Attack on PyTorch</title><link href="https://simonwillison.net/2024/Jan/14/supply-chain-attack-on-pytorch/#atom-tag" rel="alternate"/><published>2024-01-14T19:38:24+00:00</published><updated>2024-01-14T19:38:24+00:00</updated><id>https://simonwillison.net/2024/Jan/14/supply-chain-attack-on-pytorch/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://johnstawinski.com/2024/01/11/playing-with-fire-how-we-executed-a-critical-supply-chain-attack-on-pytorch/"&gt;How We Executed a Critical Supply Chain Attack on PyTorch&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Report on a now handled supply chain attack reported against PyTorch which took advantage of GitHub Actions, stealing credentials from some self-hosted task runners.&lt;/p&gt;

&lt;p&gt;The researchers first submitted a typo fix to the PyTorch repo, which gave them status as a “contributor” to that repo and meant that their future pull requests would have workflows executed without needing manual approval.&lt;/p&gt;

&lt;p&gt;Their mitigation suggestion is to switch the option from ’Require approval for first-time contributors’ to ‘Require approval for all outside collaborators’.&lt;/p&gt;

&lt;p&gt;I think GitHub could help protect against this kind of attack by making it more obvious when you approve a PR to run workflows in a way that grants that contributor future access rights. I’d like a “approve this time only” button separate from “approve this run and allow future runs from user X”.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=38969533"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/github"&gt;github&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytorch"&gt;pytorch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/supply-chain"&gt;supply-chain&lt;/a&gt;&lt;/p&gt;



</summary><category term="github"/><category term="security"/><category term="pytorch"/><category term="supply-chain"/></entry><entry><title>Llama from scratch (or how to implement a paper without crying)</title><link href="https://simonwillison.net/2023/Aug/9/llama-from-scratch/#atom-tag" rel="alternate"/><published>2023-08-09T19:21:42+00:00</published><updated>2023-08-09T19:21:42+00:00</updated><id>https://simonwillison.net/2023/Aug/9/llama-from-scratch/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.briankitano.com/llama-from-scratch/"&gt;Llama from scratch (or how to implement a paper without crying)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Brian Kitano implemented the model described in the Llama paper against TinyShakespeare, from scratch, using Python and PyTorch. This write-up is fantastic—meticulous, detailed and deeply informative. It would take several hours to fully absorb and follow everything Brian does here but it would provide multiple valuable lessons in understanding how all of this stuff fits together.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://news.ycombinator.com/item?id=37059479"&gt;Hacker News&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytorch"&gt;pytorch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llama"&gt;llama&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="python"/><category term="ai"/><category term="pytorch"/><category term="generative-ai"/><category term="llama"/><category term="llms"/></entry><entry><title>A Promenade of PyTorch</title><link href="https://simonwillison.net/2018/Feb/21/pytorch/#atom-tag" rel="alternate"/><published>2018-02-21T05:31:35+00:00</published><updated>2018-02-21T05:31:35+00:00</updated><id>https://simonwillison.net/2018/Feb/21/pytorch/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.goldsborough.me/ml/ai/python/2018/02/04/20-17-20-a_promenade_of_pytorch/"&gt;A Promenade of PyTorch&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Useful overview of the PyTorch machine learning library from Facebook AI Research described as “a Python library enabling GPU-accelerated tensor computation”. Similar to TensorFlow, but where TensorFlow requires you to explicitly construct an execution graph PyTorch instead lets you write regular Python code (if statements, for loops etc) which PyTorch then uses to construct the execution graph for you.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tensorflow"&gt;tensorflow&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pytorch"&gt;pytorch&lt;/a&gt;&lt;/p&gt;



</summary><category term="machine-learning"/><category term="python"/><category term="tensorflow"/><category term="pytorch"/></entry></feed>