Hoard things you know how to do

Many of my tips for working productively with coding agents are extensions of advice I've found useful in my career without them. Here's a great example of that: hoard things you know how to do.

A big part of the skill in building software is understanding what's possible and what isn't, and having at least a rough idea of how those things can be accomplished.

These questions can be broad or quite obscure. Can a web page run OCR operations in JavaScript alone? Can an iPhone app pair with a Bluetooth device even when the app isn't running? Can we process a 100GB JSON file in Python without loading the entire thing into memory first?

The more answers to questions like this you have under your belt, the more likely you'll be able to spot opportunities to deploy technology to solve problems in ways other people may not have thought of yet.

The best way to be confident in answers to these questions is to have seen them illustrated by running code. Knowing that something is theoretically possible is not the same as having seen it done for yourself. A key asset to develop as a software professional is a deep collection of answers to questions like this, accompanied by proof of those answers.

I hoard solutions like this in a number of different ways. My blog and TIL blog are crammed with notes on things I've figured out how to do. I have over a thousand GitHub repos collecting code I've written for different projects, many of them small proof-of-concepts that demonstrate a key idea.

More recently I've used LLMs to help expand my collection of code solutions to interesting problems.

tools.simonwillison.net is my largest collection of LLM-assisted tools and prototypes. I use this to collect what I call HTML tools - single HTML pages that embed JavaScript and CSS and solve a specific problem.

My simonw/research repository has larger, more complex examples where I’ve challenged a coding agent to research a problem and come back with working code and a written report detailing what it found out.

Recombining things from your hoard

Why collect all of this stuff? Aside from helping you build and extend your own abilities, the assets you generate along the way become powerful inputs for your coding agents.

One of my favorite prompting patterns is to tell an agent to build something new by combining two or more existing working examples.

A project that helped crystallize how effective this can be was the first thing I added to my tools collection - a browser-based OCR tool, described in more detail here.

I wanted an easy, browser-based tool for OCRing pages from PDF files - in particular PDFs that consist entirely of scanned images with no text version provided at all.

I had previously experimented with running the Tesseract.js OCR library in my browser, and found it to be very capable. That library provides a WebAssembly build of the mature Tesseract OCR engine and lets you call it from JavaScript to extract text from an image.

I didn’t want to work with images though, I wanted to work with PDFs. Then I remembered that I had also worked with Mozilla’s PDF.js library, which among other things can turn individual pages of a PDF into rendered images.

I had snippets of JavaScript for both of those libraries in my notes.

Here’s the full prompt I fed into a model (at the time it was Claude 3 Opus), combining my two examples and describing the solution I was looking for:

This code shows how to open a PDF and turn it into an image per page:
```html
<!DOCTYPE html>
<html>
<head>
  <title>PDF to Images</title>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.9.359/pdf.min.js"></script>
  <style>
    .image-container img {
      margin-bottom: 10px;
    }
    .image-container p {
      margin: 0;
      font-size: 14px;
      color: #888;
    }
  </style>
</head>
<body>
  <input type="file" id="fileInput" accept=".pdf" />
  <div class="image-container"></div>

fileInput.addEventListener('change', handleFileUpload);

pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.9.359/pdf.worker.min.js';

async function handleFileUpload(event) {
      const file = event.target.files[0];
      const imageIterator = convertPDFToImages(file);

for await (const { imageURL, size } of imageIterator) {
        const imgElement = document.createElement('img');
        imgElement.src = imageURL;
        imageContainer.appendChild(imgElement);

const sizeElement = document.createElement('p');
        sizeElement.textContent = `Size: ${formatSize(size)}`;
        imageContainer.appendChild(sizeElement);
      }
    }

async function* convertPDFToImages(file) {
      try {
        const pdf = await pdfjsLib.getDocument(URL.createObjectURL(file)).promise;
        const numPages = pdf.numPages;

for (let i = 1; i <= numPages; i++) {
          const page = await pdf.getPage(i);
          const viewport = page.getViewport({ scale: 1 });
          const canvas = document.createElement('canvas');
          const context = canvas.getContext('2d');
          canvas.width = desiredWidth;
          canvas.height = (desiredWidth / viewport.width) * viewport.height;
          const renderContext = {
            canvasContext: context,
            viewport: page.getViewport({ scale: desiredWidth / viewport.width }),
          };
          await page.render(renderContext).promise;
          const imageURL = canvas.toDataURL('image/jpeg', 0.8);
          const size = calculateSize(imageURL);
          yield { imageURL, size };
        }
      } catch (error) {
        console.error('Error:', error);
      }
    }

function calculateSize(imageURL) {
      const base64Length = imageURL.length - 'data:image/jpeg;base64,'.length;
      const sizeInBytes = Math.ceil(base64Length * 0.75);
      return sizeInBytes;
    }

function formatSize(size) {
      const sizeInKB = (size / 1024).toFixed(2);
      return `${sizeInKB} KB`;
    }
  </script>
</body>
</html>
```
This code shows how to OCR an image:
```javascript
async function ocrMissingAltText() {
    // Load Tesseract
    var s = document.createElement("script");
    s.src = "https://unpkg.com/tesseract.js@v2.1.0/dist/tesseract.min.js";
    document.head.appendChild(s);

s.onload = async () => {
      const images = document.getElementsByTagName("img");
      const worker = Tesseract.createWorker();
      await worker.load();
      await worker.loadLanguage("eng");
      await worker.initialize("eng");
      ocrButton.innerText = "Running OCR...";

// Iterate through all the images in the output div
      for (const img of images) {
        const altTextarea = img.parentNode.querySelector(".textarea-alt");
        // Check if the alt textarea is empty
        if (altTextarea.value === "") {
          const imageUrl = img.src;
          var {
            data: { text },
          } = await worker.recognize(imageUrl);
          altTextarea.value = text; // Set the OCR result to the alt textarea
          progressBar.value += 1;
        }
      }

await worker.terminate();
      ocrButton.innerText = "OCR complete";
    };
  }
```
Use these examples to put together a single HTML page with embedded HTML and CSS and JavaScript that provides a big square which users can drag and drop a PDF file onto and when they do that the PDF has every page converted to a JPEG and shown below on the page, then OCR is run with tesseract and the results are shown in textarea blocks below each image.

This worked flawlessly! The model kicked out a proof-of-concept page that did exactly what I needed.

I ended up iterating with it a few times to get to my final result, but it took just a few minutes to build a genuinely useful tool that I’ve benefited from ever since.

Coding agents make this even more powerful

I built that OCR example back in March 2024, nearly a year before the first release of Claude Code. Coding agents have made hoarding working examples even more valuable.

If your coding agent has internet access you can tell it to do things like:

(I specified curl there because Claude Code defaults to using a WebFetch tool which summarizes the page content rather than returning the raw HTML.)

Coding agents are excellent at search, which means you can run them on your own machine and tell them where to find the examples of things you want them to do:

Often that's enough - the agent will fire up a search sub-agent to investigate and pull back just the details it needs to achieve the task.

Since so much of my research code is public I'll often tell coding agents to clone my repositories to /tmp and use them as input:

The key idea here is that coding agents mean we only ever need to figure out a useful trick once. If that trick is then documented somewhere with a working code example our agents can consult that example and use it to solve any similar shaped project in the future.

← Writing code is cheap now

AI should help us produce better code →

Simon Willison’s Weblog