Simon Willison’s Weblog

Subscribe

132 items tagged “json”

2024

Ham radio general exam question pool as JSON. I scraped a pass of my Ham radio general exam this morning. One of the tools I used to help me pass was a Datasette instance with all 429 questions from the official question pool. I've published that raw data as JSON on GitHub, which I converted from the official question pool document using an Observable notebook.

Relevant TIL: How I studied for my Ham radio general exam. # 11th May 2024, 7:16 pm

Tips on Adding JSON Output to Your CLI App (via) Kelly Brazil - also the author of jc, the neat CLI tool that converts the output of common Unix utilities such as dig into JSON - provides some useful do's and don'ts for adding JSON output as an option to a command-line tool.

Kelly recommends defaulting to arrays of flat objects - or newline-delimited objects - and suggests including an "unbuffer" option for streaming tools that discourages the OS from buffering output that is being sent through a pipe. # 20th April 2024, 9:43 pm

Caddy: Config Adapters (via) The Caddy web application server is configured using JSON, but their “config adapters” plugin mechanism allows you to write configuration files in YAML, TOML, JSON5 (JSON with comments), and even nginx format which then gets automatically converted to JSON for you.

Caddy author Matt Holt: “We put an end to the config format wars in Caddy by letting you use any format you want!” # 13th February 2024, 4:22 am

SQLite 3.45. Released today. The big new feature is JSONB support, a new, specific-to-SQLite binary internal representation of JSON which can provide up to a 3x performance improvement for JSON-heavy operations, plus a 5-10% saving it terms of bytes stored on disk. # 15th January 2024, 8:15 pm

2023

jo (via) Neat little C utility (available via brew/apt-get install etc) for conveniently outputting JSON from a shell: “jo -p name=jo n=17 parser=false” will output a JSON object with string, integer and boolean values, and you can nest it to create nested objects. Looks very handy. # 8th October 2023, 5:20 am

jq 1.7. First new release of jq in five years! The project has moved from a solo maintainer to a new team with a dedicated GitHub organization. A ton of new features in this release—I’m most excited about the new pick(.key1, .key2.nested) builtin for emitting a selected subset of the incoming objects, and the --raw-output0 option which outputs zero byte delimited lists, designed to be piped to “xargs -0”. # 2nd October 2023, 4:58 am

Lark parsing library JSON tutorial (via) A very convincing tutorial for a new-to-me parsing library for Python called Lark.

The tutorial covers building a full JSON parser from scratch, which ends up being just 19 lines of grammar definition code and 15 lines for the transformer to turn that tree into the final JSON.

It then gets into the details of optimization—the default Earley algorithm is quite slow, but swapping that out for a LALR parser (a one-line change) provides a 5x speedup for this particular example. # 13th August 2023, 9:50 pm

Datasette 1.0a3. A new Datasette alpha release. This one previews the new default JSON API design that’s coming in 1.0—the single most significant change in the 1.0 milestone, since I plan to keep that API stable for many years to come. # 9th August 2023, 8:49 pm

SQLite 3.42.0. The latest SQLite has a tiny feature I requested on the SQLite Forum—SELECT unixepoch(’subsec’) now returns the current time in milliseconds since the Unix epoch, a big improvement on the previous recipe of select cast((julianday(’now’)—2440587.5) * 86400 * 1000 as integer)!

Also in the release: JSON5 support (JSON with multi-line strings and comments), a bunch of improvements to the query planner and CLI tool, plus various interesting internal changes. # 18th May 2023, 9:14 pm

Jsonformer: A Bulletproof Way to Generate Structured JSON from Language Models. This is such an interesting trick. A common challenge with LLMs is getting them to output a specific JSON shape of data reliably, without occasionally messing up and generating invalid JSON or outputting other text.

Jsonformer addresses this in a truly ingenious way: it implements code that interacts with the logic that decides which token to output next, influenced by a JSON schema. If that code knows that the next token after a double quote should be a comma it can force the issue for that specific token.

This means you can get reliable, robust JSON output even for much smaller, less capable language models.

It’s built against Hugging Face transformers, but there’s no reason the same idea couldn’t be applied in other contexts as well. # 8th May 2023, 11:02 pm

Datasette: Gather feedback on new ?_extra= design. I just landed the single biggest backwards-incompatible change to Datasette ever, in preparation for the 1.0 release. It’s a change to the default JSON format from the Datasette API—the new format is much slimmer, and can be expanded using a new ?_extra= query string parameter. I’m desperately keen on getting feedback on this change! This issues has more details and a call for feedback. # 22nd March 2023, 11:14 pm

sqlite-jsonschema. “A SQLite extension for validating JSON objects with JSON Schema”, building on the jsonschema Rust crate. SQLite and JSON are already a great combination—Alex suggests using this extension to implement check constraints to validate JSON columns before inserting into a table, or just to run queries finding existing data that doesn’t match a given schema. # 28th January 2023, 3:50 am

2022

Datasette’s new JSON write API: The first alpha of Datasette 1.0

Visit Datasette's new JSON write API: The first alpha of Datasette 1.0

This week I published the first alpha release of Datasette 1.0, with a significant new feature: Datasette core now includes a JSON API for creating and dropping tables and inserting, updating and deleting data.

[... 2817 words]

Building a BFT JSON CRDT (via) Jacky Zhao describes their project to build a CRDT library for JSON data in Rust, and includes a thorough explanation of what CRDTs are and how they work. “I write this blog post mostly as a note to my past self, distilling a lot of what I’ve learned since into a blog post I wish I had read before going in”—the best kind of blog post! # 21st November 2022, 7:56 pm

Datasette Lite: Loading JSON data (via) I added a new feature to Datasette Lite: you can now pass it the URL to a JSON file (hosted on a CORS-compatible hosting provider such as GitHub or GitHub Gists) and it will load that file into a database table for you. It expects an array of objects, but if your file has an object as the root it will search through it looking for the first key that is an array of objects and load those instead. # 18th November 2022, 6:43 pm

JSON Changelog with SQLite (via) One of my favourite database challenges is how to track changes to rows over time. This is a neat recipe from 2018 which uses SQLite triggers and the SQLite JSON functions to serialize older versions of the rows and store them in TEXT columns. # 16th November 2022, 3:41 am

RFC 7807: Problem Details for HTTP APIs (via) This RFC has been brewing for quite a while, and is currently in last call (ends 2022-11-03). I’m designing the JSON error messages for Datasette at the moment so this could not be more relevant for me. # 1st November 2022, 3:15 am

dolthub/jsplit (via) Neat Go CLI tool for working with truly gigantic JSON files. This assumes files will be an object with one or more keys that are themselves huge lists of objects—it than extracts those lists out into one or more newline-delimited JSON files (capping their size at 4GB) which are much easier to work with as streams of data. # 6th September 2022, 8:27 pm

Introducing sqlite-lines—a SQLite extension for reading files line-by-line (via) Alex Garcia wrote a brilliant C module for SQLIte which adds functions (and a table-valued function) for efficiently reading newline-delimited text into SQLite. When combined with SQLite’s built-in JSON features this means you can read a huge newline-delimited JSON file into SQLite in a streaming fashion so it doesn’t exhaust memory for a large file. Alex also compiled the extension to WebAssembly, and his post here is an Observable notebook post that lets you exercise the code directly. # 30th July 2022, 7:18 pm

jless (via) A really nice new command-line JSON viewer, written in Rust, created by Paul Julius Martinez. It provides a terminal interface for navigating through large JSON files, including expanding and contracting nested objects and searching for strings or a modified form of regular expressions. # 12th February 2022, 3:17 am

2021

jc (via) This is such a great idea: jc is a CLI tool which knows how to convert the output of dozens of different classic Unix utilities to JSON, so you can more easily process it programmatically, pipe it through jq and suchlike. “pipx install jc” to install, then “dig example.com | jc --dig” to try it out. # 5th December 2021, 11:05 pm

lex.go in json5-go. This archived GitHub repository has a beautifully clean and clear example of a hand-written lexer in Go, for the JSON5 format (JSON + comments + multi-line strings). parser.go is worth a look too. # 19th August 2021, 8:15 pm

Joining CSV and JSON data with an in-memory SQLite database

Visit Joining CSV and JSON data with an in-memory SQLite database

The new sqlite-utils memory command can import CSV and JSON data directly into an in-memory SQLite database, combine and query it using SQL and output the results as CSV, JSON or various other formats of plain text tables.

[... 1507 words]

Understanding JSON Schema (via) Useful, comprehensive short book guide to JSON Schema, which finally helped me feel like I fully understand the specification. # 24th March 2021, 2:57 am

2020

airtable-export. I wrote a command-line utility for exporting data from Airtable and dumping it to disk as YAML, JSON or newline delimited JSON files. This means you can backup an Airtable database from a GitHub Action and get a commit history of changes made to your data. # 29th August 2020, 9:48 pm

Better Python Object Serialization. TIL about functions.singledispatch, a decorator which makes it easy to create Python functions with implementations that vary based on the type of their arguments and which can have additional implementations registered after the fact—great for things like custom JSON serialization. # 7th January 2020, 8:35 pm

2019

Subsume JSON a.k.a. JSON ⊂ ECMAScript (via) TIL that JSON isn’t a subset of ECMAScript after all! “In ES2018, ECMAScript string literals couldn’t contain unescaped U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR characters, because they are considered to be line terminators even in that context.” # 15th August 2019, 10:30 am

json-flatten. A little Python library I wrote that attempts to flatten a JSON object into a set of key/value pairs suitable for transmitting in a query string or using to construct an HTML form. I first wrote this back in 2015 as a Gist—I’ve reconstructed the Gist commit history in a new repository and shipped it to PyPI. # 22nd June 2019, 4:51 am

paginate-json (via) I released a fun tiny utility: paginate-json, which knows how to paginate through JSON APIs that use the HTTP Link header for pagination. I built it so I could pull data from the GitHub API and pipe it directly into SQLite via sqlite-utils. # 12th June 2019, 3:22 pm

quicktype code generator for Python. Really interesting tool: give it an example JSON document and it will code-generate the equivalent set of Python classes (with type annotations) instantly in your browser. It also accepts input in JSON Schema or TypeScript and can generate code in 18 different languages. # 14th May 2019, 11:35 pm