440 items tagged “projects”
Posts about projects I have worked on.
2024
dclient 0.3. dclient is my CLI utility for working with remote Datasette instances—in particular for authenticating with them and then running both read-only SQL queries and inserting data using the new Datasette write JSON API. I just picked up work on the project again after a six month gap—the insert command can now be used to constantly stream data directly to hosted Datasette instances such as Datasette Cloud.
datasette-studio. I've been thinking for a while that it might be interesting to have a version of Datasette that comes bundled with a set of useful plugins, aimed at expanding Datasette's default functionality to cover things like importing data and editing schemas.
This morning I built the very first experimental preview of what that could look like. Install it using pipx
:
pipx install datasette-studio
I recommend pipx because it will ensure datasette-studio
gets its own isolated environment, independent of any other Datasette installations you might have.
Now running datasette-studio
instead of datasette
will get you the version with the bundled plugins.
The implementation of this is fun - it's a single pyproject.toml file defining the dependencies and setting up the datasette-studio
CLI hook, which is enough to provide the full set of functionality.
Is this a good idea? I don't know yet, but it's certainly an interesting initial experiment.
Datasette 1.0a10. The only changes in this alpha release concern the way Datasette handles database transactions. The database.execute_write_fn() internal method used to leave functions to implement transactions on their own—it now defaults to wrapping them in a transaction unless they opt out with the new transaction=False parameter.
In implementing this I found several places inside Datasette—in particular parts of the JSON write API—which had not been handling transactions correctly. Those are all now fixed.
Datasette 1.0a9. A new Datasette alpha release today. This adds basic alter table support API support, so you can request Datasette modify a table to add new columns needed for JSON objects submitted to the insert, upsert or update APIs.
It also makes some permission changes—fixing a minor bug with upsert permissions, and introducing a new rule where every permission plugin gets consulted for a permission check, with just one refusal vetoing that check.
Weeknotes: a Datasette release, an LLM release and a bunch of new plugins
I wrote extensive annotated release notes for Datasette 1.0a8 and LLM 0.13 already. Here’s what else I’ve been up to this past three weeks.
[... 1,074 words]Datasette 1.0a8: JavaScript plugins, new plugin hooks and plugin configuration in datasette.yaml
I just released Datasette 1.0a8. These are the annotated release notes.
[... 1,709 words]shot-scraper 1.4. I decided to add HTTP Basic authentication support to shot-scraper today and found several excellent pull requests waiting to be merged, by Niel Thiart and mhalle.
1.4 adds support for HTTP Basic auth, custom --scale-factor shots, additional --browser-arg arguments and a fix for --interactive mode.
llm-sentence-transformers 0.2. I added a new --trust-remote-code option when registering an embedding model, which means LLM can now run embeddings through the new Nomic AI nomic-embed-text-v1 model.
llm-embed-onnx. I wrote a new plugin for LLM that acts as a thin wrapper around onnx_embedding_models by Benjamin Anderson, providing access to seven embedding models that can run on the ONNX model framework.
The actual plugin is around 50 lines of code, which makes for a nice example of how thin a plugin wrapper can be that adds new models to my LLM tool.
LLM 0.13: The annotated release notes
I just released LLM 0.13, the latest version of my LLM command-line tool for working with Large Language Models—both via APIs and running models locally using plugins.
[... 1,278 words]Weeknotes: datasette-test, datasette-build, PSF board retreat
I wrote about Page caching and custom templates in my last weeknotes. This week I wrapped up that work, modifying datasette-edit-templates to be compatible with the jinja2_environment_from_request() plugin hook. This means you can edit templates directly in Datasette itself and have those served either for the full instance or just for the instance when served from a specific domain (the Datasette Cloud case).
[... 757 words]Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions
I use cookiecutter to start almost all of my Python projects. It helps me quickly generate a skeleton of a project with my preferred directory structure and configured tools.
[... 686 words]My blog’s year archive pages now have tag clouds (via) Inspired by the tag cloud I used in my recent 2023 AI roundup post, I decided to add a tag cloud to the top of every one of my archive-by-year pages showing what topics I had spent the most time with that year.
I already had old code for this, so I pasted it into GPT-4 along with an example of the output of my JSON endpoint from Django SQL Dashboard and had it do most of the work for me.
2023
Many options for running Mistral models in your terminal using LLM
Mistral AI is the most exciting AI research lab at the moment. They’ve now released two extremely powerful smaller Large Language Models under an Apache 2 license, and have a third much larger one that’s available via their API.
[... 2,063 words]Weeknotes: datasette-enrichments, datasette-comments, sqlite-chronicle
I’ve mainly been working on Datasette Enrichments and continuing to explore the possibilities enabled by sqlite-chronicle.
[... 1,123 words]Datasette Enrichments: a new plugin framework for augmenting your data
Today I’m releasing datasette-enrichments, a new feature for Datasette which provides a framework for applying “enrichments” that can augment your data.
[... 1,202 words]Annotate and explore your data with datasette-comments. New plugin for Datasette and Datasette Cloud: datasette-comments, providing tools for collaborating on data exploration with a team through posting comments on individual rows of data.
Alex Garcia built this for Datasette Cloud but as with almost all of our work there it’s also available as an open source Python package.
Weeknotes: DevDay, GitHub Universe, OpenAI chaos
Three weeks of conferences and Datasette Cloud work, four days of chaos for OpenAI.
[... 766 words]Exploring GPTs: ChatGPT in a trench coat?
The biggest announcement from last week’s OpenAI DevDay (and there were a LOT of announcements) was GPTs. Users of ChatGPT Plus can now create their own, custom GPT chat bots that other Plus subscribers can then talk to.
[... 5,699 words]ospeak: a CLI tool for speaking text in the terminal via OpenAI
I attended OpenAI DevDay today, the first OpenAI developer conference. It was a lot. They released a bewildering array of new API tools, which I’m just beginning to wade my way through fully understanding.
[... 1,109 words]DALL-E 3, GPT4All, PMTiles, sqlite-migrate, datasette-edit-schema
I wrote a lot this week. I also did some fun research into new options for self-hosting vector maps and pushed out several new releases of plugins.
[... 1,362 words]Execute Jina embeddings with a CLI using llm-embed-jina
Berlin-based Jina AI just released a new family of embedding models, boasting that they are the “world’s first open-source 8K text embedding model” and that they rival OpenAI’s text-embedding-ada-002
in quality.
Weeknotes: the Datasette Cloud API, a podcast appearance and more
Datasette Cloud now has a documented API, plus a podcast appearance, some LLM plugins work and some geospatial excitement.
[... 1,243 words]LLM 0.11. I released LLM 0.11 with support for the new gpt-3.5-turbo-instruct completion model from OpenAI.
The most interesting feature of completion models is the option to request “log probabilities” from them, where each token returned is accompanied by up to 5 alternatives that were considered, along with their scores.
Weeknotes: Embeddings, more embeddings and Datasette Cloud
Since my last weeknotes, a flurry of activity. LLM has embeddings support now, and Datasette Cloud has driven some major improvements to the wider Datasette ecosystem.
[... 2,427 words]Build an image search engine with llm-clip, chat with models with llm chat
LLM is my combination CLI tool and Python library for working with Large Language Models. I just released LLM 0.10 with two significant new features: embedding support for binary files and the llm chat
command.
Symbex 1.4. New release of my Symbex tool for finding symbols (functions, methods and classes) in a Python codebase. Symbex can now output matching symbols in JSON, CSV or TSV in addition to plain text.
I designed this feature for compatibility with the new “llm embed-multi” command—so you can now use Symbex to find every Python function in a nested directory and then pipe them to LLM to calculate embeddings for every one of them.
I tried it on my projects directory and embedded over 13,000 functions in just a few minutes! Next step is to figure out what kind of interesting things I can do with all of those embeddings.
LLM now provides tools for working with embeddings
LLM is my Python library and command-line tool for working with language models. I just released LLM 0.9 with a new set of features that extend LLM to provide tools for working with embeddings.
[... 3,466 words]Datasette 1.0a4 and 1.0a5, plus weeknotes
Two new alpha releases of Datasette, plus a keynote at WordCamp, a new LLM release, two new LLM plugins and a flurry of TILs.
[... 2,709 words]Datasette Cloud and the Datasette 1.0 alphas. I sent out the Datasette Newsletter for the first time in quite a while, with updates on Datasette Cloud, the Datasette 1.0 alphas, a note about the security vulnerability in those alphas and a summary of some of my research into combining LLMs with Datasette.