Weeknotes: Miscellaneous research into Rye, ChatGPT Code Interpreter and openai-to-sqlite
I gave myself some time off stressing about my core responsibilities this week after PyCon, which meant allowing myself to be distracted by some miscellaneous research projects.
Rye is a new experimental twist on Python packaging from Armin Ronacher. He’s been quite apologetic about it, asking Should Rye Exist?—Python packaging is a crowded space right now!
Personally, I think a working prototype of an interesting idea is always worthwhile. My experience is that running code increases the quality of the discussion around an idea enormously, because it gives people something concrete to talk about.
Rye has some really interesting ideas. By far my favourite is how it bundles Python itself: it doesn’t depend on a system Python, instead downloading a standalone Python build from the python-build-standalone project and stashing it away in a
I love this. Getting Python running on a system is often way harder than it should be. Rye provides a single binary (written in Rust) which can bootstrap a working Python environment, without interfering with the system Python or any other Python environments that might already be installed.
I wrote up a few notes on Rye in a TIL earlier this week, mainly detailing how it works and where it puts things.
I also released Datasette 0.64.3 with a tiny fix to ensure it would install cleanly using
rye install datasette.
ChatGPT Code Interpreter
I’ve been having a whole lot of fun exploring this. I wrote about how I’ve been using it to run micro-benchmarks a few weeks ago—today I figured out a pattern for installing additional Python packages (despite its lack of an internet connection) and even uploading binaries for Deno and Lua to grant it the ability to run code in other languages!
I think it’s the most interesting thing in all of ChatGPT/LLM world at the moment, which is a big statement.
Inspired by a Datasette Office Hours conversation on Friday I decided to see if I could figure out a way to run simple sentiment analysis against data in a SQLite database using any of my various tools.
I ended up adding a new mechanism to my
openai-to-sqlite CLI tool—it can now execute SQL queries that can update existing tables with the results of a
chatgpt() API call using a custom SQL function.
I wrote more about that in Enriching data with GPT3.5 and SQLite SQL functions.
Upgraded social media cards for my TILs
My Today I Learned site has had social media cards—images that show up in link previews when URLs are shared—for a long time now. Since few of my TILs have images of their own it generates these as screenshots of the pages themselves.
Until recently it stored these images as PNG files directly in the SQLite database itself. Vercel has a 50MB size limit on deployments and the other day the screenshots finally tipped the database over that limit.
To fix it, I moved the images out of the SQLite database and put them in an S3 bucket instead. This also meant I could increase their size and resolution—they are now generated with the shot-scraper
--retina option which doubles their size to 1600x800 pixels.
This ended up being a fun exercise in combining both
shot-scraper and my s3-credentials CLI tools. I wrote up full details of how the new screenshot system works in a new TIL, Social media cards generated with shot-scraper.
Next week: a webinar on Prompt Injection
My other blog entry this week introduced The Dual LLM pattern for building AI assistants that can resist prompt injection—my latest thinking on how we might be able to build AI assistants even without a robust solution to the prompt injection problem.
I have a speaking engagement lined up for next week: the LangChain Prompt Injection Webinar.
I’ll be discussing prompt injection attacks against LLMs on a panel with Willem Pienaar, Kojin Oshiba and Jonathan Cohen and Christopher Parisien from NVIDIA.
I think it will be an interesting conversation. I’m going to reiterate my argument that You can’t solve AI security problems with more AI—a position that I’m not sure is shared by the other members of the panel!
Entries this week
- Enriching data with GPT3.5 and SQLite SQL functions
- The Dual LLM pattern for building AI assistants that can resist prompt injection
Releases this week
A tool for creating credentials for accessing S3 buckets
Save OpenAI API results to a SQLite database
An open source multi-tool for exploring and publishing data
A command-line utility for taking automated screenshots of websites
Explain and validate SQL queries as you type them into Datasette
TIL this week
- Expanding ChatGPT Code Interpreter with Python packages, Deno and Lua—2023-05-01
- Social media cards generated with shot-scraper—2023-04-30
- Deno KV—2023-04-28
- The location of the pip cache directory—2023-04-28
- A few notes on Rye—2023-04-27
More recent articles
- Weeknotes: Parquet in Datasette Lite, various talks, more LLM hacking - 4th June 2023
- It's infuriatingly hard to understand how closed models train on their input - 4th June 2023
- ChatGPT should include inline tips - 30th May 2023
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - 27th May 2023
- llm, ttok and strip-tags - CLI tools for working with ChatGPT and other LLMs - 18th May 2023
- Delimiters won't save you from prompt injection - 11th May 2023
- Weeknotes: sqlite-utils 3.31, download-esm, Python in a sandbox - 10th May 2023
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - 4th May 2023
- Midjourney 5.1 - 4th May 2023
- Prompt injection explained, with video, slides, and a transcript - 2nd May 2023