Calling LLMs from client-side JavaScript, converting PDFs to HTML + weeknotes
6th September 2024
I’ve been having a bunch of fun taking advantage of CORS-enabled LLM APIs to build client-side JavaScript applications that access LLMs directly. I also span up a new Datasette plugin for advanced permission management.
- LLMs from client-side JavaScript
- Converting PDFs to HTML and Markdown
- Adding some class to Datasette forms
- On the blog
- Releases
- TILs
LLMs from client-side JavaScript
Anthropic recently added CORS support to their Claude APIs. It’s a little hard to use—you have to add anthropic-dangerous-direct-browser-access: true
to your request headers to enable it—but once you know the trick you can start building web applications that talk to Anthropic’s LLMs directly, without any additional server-side code.
I later found out that both OpenAI and Google Gemini have this capability too, without needing the special header.
The problem with this approach is security: it’s very important not to embed an API key attached to your billing account in client-side HTML and JavaScript for anyone to see!
For my purposes though that doesn’t matter. I’ve been building tools which prompt()
a user for their own API key (sadly restricting their usage to the tiny portion of people who both understand API keys and have created API accounts with one of the big providers)—then I stash that key in localStorage
and start using it to make requests.
My simonw/tools repository is home to a growing collection of pure HTML+JavaScript tools, hosted at tools.simonwillison.net using GitHub Pages. I love not having to even think about hosting server-side code for these tools.
I’ve published three tools there that talk to LLMs directly so far:
- haiku is a fun demo that requests access to the user’s camera and then writes a Haiku about what it sees. It uses Anthropic’s Claude 3 Haiku model for this—the whole project is one terrible pun. Haiku source code here.
- gemini-bbox uses the Gemini 1.5 Pro (or Flash) API to prompt those models to return bounding boxes for objects in an image, then renders those bounding boxes. Gemini Pro is the only of the vision LLMs that I’ve tried that has reliable support for bounding boxes. I wrote about this in Building a tool showing how Gemini Pro can return bounding boxes for objects in images.
-
Gemini Chat App is a more traditional LLM chat interface that again talks to Gemini models (including the new super-speedy
gemini-1.5-flash-8b-exp-0827
). I built this partly to try out those new models and partly to experiment with implementing a streaming chat interface agaist the Gemini API directly in a browser. I wrote more about how that works in this post.
Here’s that Gemini Bounding Box visualization tool:
All three of these tools made heavy use of AI-assisted development: Claude 3.5 Sonnet wrote almost every line of the last two, and the Haiku one was put together a few months ago using Claude 3 Opus.
My personal style of HTML and JavaScript apps turns out to be highly compatible with LLMs: I like using vanilla HTML and JavaScript and keeping everything in the same file, which makes it easy to paste the entire thing into the model and ask it to make some changes for me. This approach also works really well with Claude Artifacts, though I have to tell it “no React” to make sure I get an artifact I can hack on without needing to configure a React build step.
Converting PDFs to HTML and Markdown
I have a long standing vendetta against PDFs for sharing information. They’re painful to read on a mobile phone, they have poor accessibility, and even things like copying and pasting text from them can be a pain.
Complaining without doing something about it isn’t really my style. Twice in the past few weeks I’ve taken matters into my own hands:
- Google Research released a PDF paper describing their new pipe syntax for SQL. I ran it through Gemini 1.5 Pro to convert it to HTML (prompts here) and got this—a pretty great initial result for the first prompt I tried!
- Nous Research released a preliminary report PDF about their DisTro technology for distributed training of LLMs over low-bandwidth connections. I ran a prompt to use Gemini 1.5 Pro to convert that to this Markdown version, which even handled tables.
Within six hours of posting it my Pipe Syntax in SQL conversion was ranked third on Google for the title of the paper, at which point I set it to <meta name="robots" content="noindex>
to try and keep the unverified clone out of search. Yet more evidence that HTML is better than PDF!
I’ve spent less than a total of ten minutes on using Gemini to convert PDFs in this way and the results have been very impressive. If I were to spend more time on this I’d target figures: I have a hunch that getting Gemini to return bounding boxes for figures on the PDF pages could be the key here, since then each figure could be automatically extracted as an image.
I bet you could build that whole thing as a client-side app against the Gemini Pro API, too...
Adding some class to Datasette forms
I’ve been working on a new Datasette plugin for permissions management, datasette-acl, which I’ll write about separately soon.
I wanted to integrate Choices.js with it, to provide a nicer interface for adding permissions to a user or group.
My first attempt at integrating Choices ended up looking like this:
The weird visual glitches are caused by Datasette’s core CSS, which included the following rule:
form input[type=submit], form button[type=button] {
font-weight: 400;
cursor: pointer;
text-align: center;
vertical-align: middle;
border-width: 1px;
border-style: solid;
padding: .5em 0.8em;
font-size: 0.9rem;
line-height: 1;
border-radius: .25rem;
}
These style rules apply to any submit button or button-button that occurs inside a form!
I’m glad I caught this before Datasette 1.0. I’ve now started the process of fixing that, by ensuring these rules only apply to elements with class="core"
(or that class on a wrapping element). This ensures plugins can style these elements without being caught out by Datasette’s defaults.
The problem is... there are a whole bunch of existing plugins that currently rely on that behaviour. I have a tricking issue about that, which identified 28 plugins that need updating. I’ve worked my way through 8 of those so far, hence the flurry of releases listed at the bottom of this post.
This is also an excuse to revisit a bunch of older plugins, some of which had partially complete features that I’ve been finishing up.
datasette-write for example now has a neat row action menu item for updating a selected row using a pre-canned UPDATE query. Here’s an animated demo of my first prototype of that feature:
On the blog
anthropic
- Claude’s API now supports CORS requests, enabling client-side applications—2024-08-23
- Explain ACLs by showing me a SQLite table schema for implementing them—2024-08-23
- Musing about OAuth and LLMs on Mastodon—2024-08-24
- Building a tool showing how Gemini Pro can return bounding boxes for objects in images—2024-08-26
- Long context prompting tips—2024-08-26
- Anthropic Release Notes: System Prompts—2024-08-26
- Alex Albert: We’ve read and heard that you’d appreciate more t...—2024-08-26
- Gemini Chat App—2024-08-27
- System prompt for val.town/townie—2024-08-28
- How Anthropic built Artifacts—2024-08-28
- Anthropic’s Prompt Engineering Interactive Tutorial—2024-08-30
- llm-claude-3 0.4.1—2024-08-30
ai-assisted-programming
- Andy Jassy, Amazon CEO: [...] here’s what we found when we integrated [Am...—2024-08-24
- AI-powered Git Commit Function—2024-08-26
- OpenAI: Improve file search result relevance with chunk ranking—2024-08-30
- Forrest Brazeal: I think that AI has killed, or is about to kill, ...—2024-08-31
gemini
- SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL—2024-08-24
- NousResearch/DisTrO—2024-08-27
python
- uvtrick—2024-09-01
- Anatomy of a Textual User Interface—2024-09-02
- Why I Still Use Python Virtual Environments in Docker—2024-09-02
- Python Developers Survey 2023 Results—2024-09-03
security
- Top companies ground Microsoft Copilot over data governance concerns—2024-08-23
- Frederik Braun: In 2021 we [the Mozilla engineering team] found “...—2024-08-26
- OAuth from First Principles—2024-09-05
projects
armin-ronacher
ethics
open-source
- Debate over “open source AI” term brings new push to formalize definition—2024-08-27
- Elasticsearch is open source, again—2024-08-29
performance
- Cerebras Inference: AI at Instant Speed—2024-08-28
sqlite
aws
- Leader Election With S3 Conditional Writes—2024-08-30
javascript
openai
art
llm
vision-llms
- Qwen2-VL: To See the World More Clearly—2024-09-04
Releases
-
datasette-import 0.1a5—2024-09-04
Tools for importing data into Datasette -
datasette-search-all 1.1.3—2024-09-04
Datasette plugin for searching all searchable tables at once -
datasette-write 0.4—2024-09-04
Datasette plugin providing a UI for executing SQL writes against the database -
datasette-debug-events 0.1a0—2024-09-03
Print Datasette events to standard error -
datasette-auth-passwords 1.1.1—2024-09-03
Datasette plugin for authentication using passwords -
datasette-enrichments 0.4.3—2024-09-03
Tools for running enrichments against data stored in Datasette -
datasette-configure-fts 1.1.4—2024-09-03
Datasette plugin for enabling full-text search against selected table columns -
datasette-auth-tokens 0.4a10—2024-09-03
Datasette plugin for authenticating access using API tokens -
datasette-edit-schema 0.8a3—2024-09-03
Datasette plugin for modifying table schemas -
datasette-pins 0.1a4—2024-09-01
Pin databases, tables, and other items to the Datasette homepage -
datasette-acl 0.4a2—2024-09-01
Advanced permission management for Datasette -
llm-claude-3 0.4.1—2024-08-30
LLM plugin for interacting with the Claude 3 family of models
TILs
- Testing HTML tables with Playwright Python—2024-09-04
- Using namedtuple for pytest parameterized tests—2024-08-31
More recent articles
- Notes from Bing Chat—Our First Encounter With Manipulative AI - 19th November 2024
- Project: Civic Band - scraping and searching PDF meeting minutes from hundreds of municipalities - 16th November 2024
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024