Archive for Saturday, 5th October 2024

Saturday, 5th October 2024

Wikidata is a Giant Crosswalk File. Drew Breunig shows how to take the 140GB Wikidata JSON export, use sed 's/,$//' to convert it to newline-delimited JSON, then use DuckDB to run queries and extract external identifiers, including a query that pulls out 500MB of latitude and longitude points.

# 3:45 pm / json, wikipedia, duckdb, drew-breunig

marimo v0.9.0 with mo.ui.chat. The latest release of the Marimo Python reactive notebook project includes a neat new feature: you can now easily embed a custom chat interface directly inside of your notebook.

Marimo co-founder Myles Scolnick posted this intriguing demo on Twitter, demonstrating a chat interface to my LLM library “in only 3 lines of code”:

import marimo as mo
import llm

model = llm.get_model()
conversation = model.conversation()
mo.ui.chat(lambda messages: conversation.prompt(messages[-1].content))

I tried that out today - here’s the result:

marimo.ui.chat() takes a function which is passed a list of Marimo chat messages (representing the current state of that widget) and returns a string - or other type of renderable object - to add as the next message in the chat. This makes it trivial to hook in any custom chat mechanism you like.

Marimo also ship their own built-in chat handlers for OpenAI, Anthropic and Google Gemini which you can use like this:

mo.ui.chat(
    mo.ai.llm.anthropic(
        "claude-3-5-sonnet-20240620",
        system_message="You are a helpful assistant.",
        api_key="sk-ant-...",
    ),
    show_configuration_controls=True
)

# 10:59 pm / python, ai, generative-ai, llms, llm, marimo

UV with GitHub Actions to run an RSS to README project. Jeff Triplett demonstrates a very neat pattern for using uv to run Python scripts with their dependencies inside of GitHub Actions. First, add uv to the workflow using the setup-uv action:

- uses: astral-sh/setup-uv@v3
  with:
    enable-cache: true
    cache-dependency-glob: "*.py"

This enables the caching feature, which stores uv's own cache of downloads from PyPI between runs. The cache-dependency-glob key ensures that this cache will be invalidated if any .py file in the repository is updated.

Now you can run Python scripts using steps that look like this:

- run: uv run fetch-rss.py

If that Python script begins with some dependency definitions (PEP 723) they will be automatically installed by uv run on the first run and reused from the cache in the future. From the start of fetch-rss.py:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "feedparser",
#     "typer",
# ]
# ///

uv will download the required Python version and cache that as well.

# 11:39 pm / python, github-actions, jeff-triplett, uv

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Simon Willison’s Weblog

Saturday, 5th October 2024