Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Datasette: Full-text search. I wrote some documentation for Datasette’s full-text search feature, which detects tables which have been configured to use the SQLite FTS module and adds a search input box and support for a _search= querystring parameter.

# 12th May 2018, 12:09 pm / full-text-search, search, sqlite, datasette

Pyre: Fast Type Checking for Python (via) Facebook’s alternative to mypy. “Pyre is designed to be highly parallel, optimizing for near-instant responses so that you get immediate feedback, even in a large codebase”. Like their Hack type checker for PHP, Pyre is implemented in OCaml.

# 11th May 2018, 5:47 pm / facebook, python, static-typing, mypy, ocaml

Datasette: The Metropolitan Museum of Art (via) The Metropolitan Museum of Art publish a CSV file on GitHub with details of 464,360 items from their collection. I turned it into a searchable Datasette instance.

# 9th May 2018, 6:38 pm / art, museums, datasette

mendoza-trees-workshop (via) Eventbrite Argentina has an academy program to train new Python/Django developers. I presented a workshop there this morning showing how Django and Jupyter can be used together to iterate on a project. Since the session was primarily about demonstrating Jupyter it was mostly live-coding, but the joy of Jupyter is that at the end of a workshop you can go back and add inline commentary to the notebooks that you used. In putting together the workshop I learned about the django_extensions “/manage.py shell_plus --notebook” command—it’s brilliant! It launches Jupyter in a way that lets you directly import your Django models without having to mess around with DJANGO_SETTINGS_MODULE.

# 8th May 2018, 5:22 pm / django, speaking, my-talks, tutorial, eventbrite, jupyter

Datasette 0.21: New _shape=, new _size=, search within columns. Nothing earth-shattering here but it’s accumulated enough small improvements that it warranted a new release. You can now send ?_shape=array to get back a plain JSON array of results, ?_size=XXX|max to get back a specific number of rows from a table view and ?_search_COLUMN=text to run full-text search against a specific column.

# 5th May 2018, 11:25 pm / projects, datasette

Iodide Notebook: Project Examples (via) Iodide is a very promising looking open source JavaScript notebook project, and these examples do a great job of showing what it can do. It’s not as slick (yet) as Observable but it does run completely independently using just a browser.

# 3rd May 2018, 6:42 pm / javascript, jupyter, observable

Datasette—a talk at Zeit Day SF 2018 (via) Slides from the talk I gave today about Datasette and Datasette Publish at the Zeit Day SF conference.

# 28th April 2018, 9:31 pm / my-talks, zeit-now, datasette

Make Near Me (via) The natural evolution of owlsnearme.com—Make Near Me uses the Zeit Now API to allow anyone to deploy their own version of Owls Near Me for any species! I announced this on stage at Zeit Day SF 2018 as part of my talk on Datasette and Datasette Publish.

# 28th April 2018, 9:28 pm / owlsnearyou, projects, zeit-now

Scaling a High-traffic Rate Limiting Stack With Redis Cluster. Brandur Leach describes the simple, elegant and performant design of Redis Cluster, and talks about how Stripe used it to scaled their rate-limiting from one to ten nodes.

# 26th April 2018, 6:34 pm / rate-limiting, redis, scaling, brandur-leach, stripe

System-Versioned Tables in MariaDB (via) Fascinating new feature from the SQL:2011 standard that’s getting its first working implementation in MariaDB 10.3.4. “ALTER TABLE products ADD SYSTEM VERSIONING;” causes the database to store every change made to that table—then you can run “SELECT * FROM products FOR SYSTEM_TIME AS OF TIMESTAMP @t1;” to query the data as of a specific point in time. I’ve tried all manner of horrible mechanisms for achieving this in the past, having it baked into the database itself would be fantastic.

# 25th April 2018, 2:34 pm / sql, versioning

Black Onlline Demo (via) Black is “the uncompromising Python code formatter” by Łukasz Langa—it reformats Python code to a very carefully thought out styleguide, and provides almost no options for how code should be formatted. It’s reminiscent of gofmt. José Padilla built a handy online tool for trying it out in your browser.

# 25th April 2018, 5:17 am / python, lukasz-langa, black

JSON Escape Text. I built a tiny tool for turning text into an escaped JSON string—I needed it to help create descriptions and canned SQL queries for adding to Datasette’s metadata.json files.

# 25th April 2018, 4:13 am / json, projects, datasette

dateparser: python parser for human readable dates (via) I’ve used dateutil.parser for this in the past, but dateparser is a major upgrade: it knows how to parse dates in 200 different language locales, can interpret different timezone representations and handles relative dates (“3 months, 1 week and 1 day ago”) as well.

# 24th April 2018, 4:17 pm / dates, python

csvs-to-sqlite 0.8. I released a new version of my csvs-to-sqlite tool this morning with a bunch of handy new features. It can now rename columns and define their types, add the CSV filenames as an additional column, add create indexes on columns and parse dates and datetimes into SQLite-friendly ISO formatted values.

# 24th April 2018, 4:11 pm / csv, projects, sqlite

react-jsonschema-form. Exciting library from the Mozilla Services team: given a JSON Schema definition, react-jsonschema-form can produce a cleanly designed React-powered form for adding and editing data that matches that schema. Includes support for adding multiple items in a nested array, re-ordering them, custom form widgets and more.

# 23rd April 2018, 9:38 pm / cms, forms, json, jsonschema, mozilla, react

Why it took a long time to build that tiny link preview on Wikipedia (via) Wikipedia now shows a little preview card on internal links with an image and summary paragraph of the linked page. As a Wikpedia user I absolutely love this feature—and as an engineer and product designer, it’s fascinating to hear the challenges they overcame to ship it. Of particular interest: actually generating a useful summary of a page, while stripping out the cruft that often accumulates at the beginning of their text. It’s also an impressive scaling challenge: the API they use for this feature is now handling more than 500,000 requests per minute.

# 23rd April 2018, 9:07 pm / scaling, wikipedia

Datasette ClusterMap Plugin – Querying UK Food Standards Agency (FSA) Food Hygiene Ratings Open Data (via) Tony Hirst wrote a tutorial on using datasette-cluster-map to analyze food hygiene ratings data from the FSA

# 20th April 2018, 8:50 pm / datasette

I submitted a PWA to 3 app stores. Here’s what I learned (via) Useful real-world experience shipping a progressive web app to the iOS, Android and Windows app stores.

# 19th April 2018, 9:06 pm / appstore, webapp

How to rewrite your SQL queries in Pandas, and more (via) I still haven’t fully internalized the idioms needed to manipulate DataFrames in pandas. This tutorial helps a great deal—it shows the Pandas equivalents for a host of common SQL queries.

# 19th April 2018, 6:34 pm / pandas, python, sql

Intro to Threads and Processes in Python (via) I really like the diagrams in this article which compares the performance of Python threads and processes for different types of task via the excellent concurrent.futures library.

# 19th April 2018, 6:32 pm / multiprocessing, python, threads

How to Use Static Type Checking in Python 3.6 (via) Useful introduction to optional static typing in Python 3.6, including how to use mypy, PyCharm and the Atom mypy plugin.

# 19th April 2018, 6:30 pm / python, static-typing, mypy

The best of Python: a collection of my favorite articles from 2017 and 2018 (so far). Gergely Szerovay has brought together an outstandingly interesting selection of Python articles from the last couple of years of activity of the Python community on Medium. A whole load of gems in here that I hadn’t seen before.

# 19th April 2018, 6:28 pm / python

Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter (via) ScriptedForms is a fascinating Jupyter hack that lets you construct dynamic documents defined using markdown that provide form fields and evaluate Python code instantly as you interact with them.

# 19th April 2018, 4:05 pm / python, markdown, jupyter

What’s New in MySQL 8.0. MySQL 8 has lots of exciting improvements: Window functions, SRS aware spatial types for GIS, utf8mb4 by default, a ton of JSON improvements and atomic DDL. I no longer feel at a significant disadvantage when I have to use MySQL in place of PostgreSQL.

# 19th April 2018, 4:03 pm / mysql

Text Embedding Models Contain Bias. Here’s Why That Matters (via) Excellent discussion from the Google AI team of the enormous challenge of building machine learning models without accidentally encoding harmful bias in a way that cannot be easily detected.

# 17th April 2018, 8:54 pm / ethics, google, machine-learning, ai, generative-ai, embeddings, ai-ethics, ai-bias

Datasette 0.19: Plugins Documentation (via) I’ve released the first preview of Datasette’s new plugin support, which uses the pluggy package originally developed for py.test. So far the only two plugin hooks are for SQLite connection creation (allowing custom SQL functions to be registered) and Jinja2 template environment initialization (for custom template tags), but this release is mainly about exercising the plugin registration mechanism and starting to gather feedback. Lots more to come.

# 17th April 2018, 3:59 am / plugins, datasette

Datasette 0.18: units (via) This release features the first Datasette feature that was entirely designed and implemented by someone else (yay open source)—Russ Garrett wanted unit support (Hz, ft etc) for his Wireless Telegraphy Register project. It’s a really neat implementation: you can tell Datasette what units are in use for a particular database column and it will display the correct SI symbols on the page. Specifying units also enables unit-aware filtering: if Datasette knows that a column is measured in meters you can now query it for all rows that are less than 50 feet for example.

# 14th April 2018, 3:56 pm / open-source, datasette

What do you mean “average”? (via) Lovely example of an interactive explorable demonstrating mode/mean/median, built as an Observable notebook using D3.

# 12th April 2018, 4:41 pm / explorables, d3, observable

Wireless Telegraphy Register (via) Russ Garrett used Datasette to build a browsable interface to the UK’s register of business radio licenses, using data from Ofcom.

# 12th April 2018, 4:08 pm / datasette

Mozilla Telemetry: In-depth Data Pipeline (via) Detailed behind-the-scenes look at an extremely sophisticated big data telemetry processing system built using open source tools. Some of this is unsurprising (S3 for storage, Spark and Kafka for streams) but the details are fascinating. They use a custom nginx module for the ingestion endpoint and have a “tee” server written in Lua and OpenResty which lets them route some traffic to alternative backend.

# 12th April 2018, 3:44 pm / analytics, lua, mozilla, nginx, big-data, kafka

Years

Tags