Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Beginner’s Guide to Jupyter Notebooks for Data Science (with Tips, Tricks!) (via) If you haven’t yet got on the Jupyter notebooks bandwagon this should help. It’s the single biggest productivity improvement I’ve made to my workflow in a very long time.

# 24th May 2018, 1:58 pm / jupyter, data-science

Showdown: MySQL 8 vs PostgreSQL 10 (via) MySQL 8 makes comparisons between PostgreSQL and MySQL far more interesting, as it closes some of the key feature gaps. Meanwhile the PostgreSQL replication story (long one of MySQL’s key advantages) has improved dramatically in recent versions. This article offers a useful overview of the current differences, including diving into some of the less obvious implementation details that differ between the two.

# 23rd May 2018, 5:02 pm / databases, mysql, postgresql

Hynek Schlawack: Testing & Packaging (via) “How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!)”—Hynek makes a convincing argument for putting your packaged Python code in a src/ directory for ease of testing and coverage.

# 22nd May 2018, 10:12 pm / packaging, python, testing, hynek-schlawack

Observable: Downloading and Embedding Notebooks (via) Big news from the Observable team: firstly, they’ve released the open source runtime for their notebooks which means you can now execute the code from a notebook independently of their hosted service. On top of that they’ve constructed an elegant way of exporting and executing notebooks (or specific notebook cells) as ES6 modules and as installable npm package tarballs.

# 22nd May 2018, 12:14 pm / javascript, observable

New in Django 2.0: Database instrumentation. I missed this previously. Django 2.0 shipped with one of my most-wanted features: the ability to easily instrument database calls (for logging and metrics) without having to monkey-patch or run an entirely new database backend. Can’t wait to try this out.

# 21st May 2018, 9:28 pm / django

VirtualKNN for SpatiaLite. This looks amazing: a special virtual table shipped as part of SpatiaLite 4.4.0 which implements a fast, R-Tree backed mechanism for finding the X nearest points against a geospatial database table. There’s just one catch: it’s only available in 4.4.0, but the most recent “stable” release of SpatiaLite is 4.3.0a from September 2015 so the version you get if you install from apt-get or homebrew doesn’t yet have this functionality. I’d love to figure out a neat way to package and distribute this along with Datasette. I’d also like to figure out a clean way to ship a more recent version of SQLite than the one that is currently packaged with Python 3 (3.16.2, where the latest SQLite release is 3.23.1).

# 21st May 2018, 9:23 pm / gis, spatialite, sqlite

sqlitebiter. Similar to my csvs-to-sqlite tool, but sqlitebiter handles “CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/SSV/TSV/Google-Sheets”. Most interestingly, it works against HTML pages—run “sqlitebiter -v url ’https://en.wikipedia.org/wiki/Comparison_of_firewalls’” and it will scrape that Wikipedia page and create a SQLite table for each of the HTML tables it finds there.

# 17th May 2018, 10:40 pm / csv, scraping, sqlite, datasette

sql.js Online SQL interpreter (via) This is fascinating: sql.js is a project that complies the whole of SQLite to JavaScript using Emscripten. The demo is an online SQL interpreter which lets you import an existing SQLite database from your filesystem and run queries against it directly in your browser.

# 17th May 2018, 9:28 pm / javascript, sqlite

Django #8936: Add view (read-only) permission to admin (closed). Opened 10 years ago. Closed 15 hours ago. I apparently filed this issue during the first DjangoCon back in September 2008, when Adrian and Jacob mentioned on-stage that they would like to see a read-only permission for the Django Admin. Thanks to Olivier Dalang from Fiji and Petr Dlouhý from Prague it’s going to be a feature shipping in Django 2.1. Open source is a beautiful thing.

# 17th May 2018, 1:40 pm / django, django-admin, djangocon, open-source

How to number rows in MySQL. MySQL’s user variables can be used to add a “rank” or “row_number” column to a database query that shows the ranking of a row against a specific unique value. This means you can return the first N rows for any given column—for example, given a list of articles return just the first three tags for each article. I’ve recently found myself using this trick for a few different things—once you know it, chances to use it crop up surprisingly often.

# 16th May 2018, 9:06 pm / mysql

isomorphic-git (via) A pure-JavaScript implementation of the git protocol and underlying tools which works both server-side (Node.js) AND in the client, using an emulation of the fs API. Given the right CORS headers it can clone a GitHub repository over HTTPS right into your browser. Impressive.

# 16th May 2018, 8:54 pm / git, javascript, cors

Datasette: Full-text search. I wrote some documentation for Datasette’s full-text search feature, which detects tables which have been configured to use the SQLite FTS module and adds a search input box and support for a _search= querystring parameter.

# 12th May 2018, 12:09 pm / full-text-search, search, sqlite, datasette

Pyre: Fast Type Checking for Python (via) Facebook’s alternative to mypy. “Pyre is designed to be highly parallel, optimizing for near-instant responses so that you get immediate feedback, even in a large codebase”. Like their Hack type checker for PHP, Pyre is implemented in OCaml.

# 11th May 2018, 5:47 pm / facebook, python, static-typing, mypy, ocaml

Datasette: The Metropolitan Museum of Art (via) The Metropolitan Museum of Art publish a CSV file on GitHub with details of 464,360 items from their collection. I turned it into a searchable Datasette instance.

# 9th May 2018, 6:38 pm / art, museums, datasette

mendoza-trees-workshop (via) Eventbrite Argentina has an academy program to train new Python/Django developers. I presented a workshop there this morning showing how Django and Jupyter can be used together to iterate on a project. Since the session was primarily about demonstrating Jupyter it was mostly live-coding, but the joy of Jupyter is that at the end of a workshop you can go back and add inline commentary to the notebooks that you used. In putting together the workshop I learned about the django_extensions “/manage.py shell_plus --notebook” command—it’s brilliant! It launches Jupyter in a way that lets you directly import your Django models without having to mess around with DJANGO_SETTINGS_MODULE.

# 8th May 2018, 5:22 pm / django, speaking, my-talks, tutorial, eventbrite, jupyter

Datasette 0.21: New _shape=, new _size=, search within columns. Nothing earth-shattering here but it’s accumulated enough small improvements that it warranted a new release. You can now send ?_shape=array to get back a plain JSON array of results, ?_size=XXX|max to get back a specific number of rows from a table view and ?_search_COLUMN=text to run full-text search against a specific column.

# 5th May 2018, 11:25 pm / projects, datasette

Iodide Notebook: Project Examples (via) Iodide is a very promising looking open source JavaScript notebook project, and these examples do a great job of showing what it can do. It’s not as slick (yet) as Observable but it does run completely independently using just a browser.

# 3rd May 2018, 6:42 pm / javascript, jupyter, observable

Datasette—a talk at Zeit Day SF 2018 (via) Slides from the talk I gave today about Datasette and Datasette Publish at the Zeit Day SF conference.

# 28th April 2018, 9:31 pm / my-talks, zeit-now, datasette

Make Near Me (via) The natural evolution of owlsnearme.com—Make Near Me uses the Zeit Now API to allow anyone to deploy their own version of Owls Near Me for any species! I announced this on stage at Zeit Day SF 2018 as part of my talk on Datasette and Datasette Publish.

# 28th April 2018, 9:28 pm / owlsnearyou, projects, zeit-now

Scaling a High-traffic Rate Limiting Stack With Redis Cluster. Brandur Leach describes the simple, elegant and performant design of Redis Cluster, and talks about how Stripe used it to scaled their rate-limiting from one to ten nodes.

# 26th April 2018, 6:34 pm / rate-limiting, redis, scaling, brandur-leach, stripe

System-Versioned Tables in MariaDB (via) Fascinating new feature from the SQL:2011 standard that’s getting its first working implementation in MariaDB 10.3.4. “ALTER TABLE products ADD SYSTEM VERSIONING;” causes the database to store every change made to that table—then you can run “SELECT * FROM products FOR SYSTEM_TIME AS OF TIMESTAMP @t1;” to query the data as of a specific point in time. I’ve tried all manner of horrible mechanisms for achieving this in the past, having it baked into the database itself would be fantastic.

# 25th April 2018, 2:34 pm / sql, versioning

Black Onlline Demo (via) Black is “the uncompromising Python code formatter” by Łukasz Langa—it reformats Python code to a very carefully thought out styleguide, and provides almost no options for how code should be formatted. It’s reminiscent of gofmt. José Padilla built a handy online tool for trying it out in your browser.

# 25th April 2018, 5:17 am / python, lukasz-langa, black

JSON Escape Text. I built a tiny tool for turning text into an escaped JSON string—I needed it to help create descriptions and canned SQL queries for adding to Datasette’s metadata.json files.

# 25th April 2018, 4:13 am / json, projects, datasette

dateparser: python parser for human readable dates (via) I’ve used dateutil.parser for this in the past, but dateparser is a major upgrade: it knows how to parse dates in 200 different language locales, can interpret different timezone representations and handles relative dates (“3 months, 1 week and 1 day ago”) as well.

# 24th April 2018, 4:17 pm / dates, python

csvs-to-sqlite 0.8. I released a new version of my csvs-to-sqlite tool this morning with a bunch of handy new features. It can now rename columns and define their types, add the CSV filenames as an additional column, add create indexes on columns and parse dates and datetimes into SQLite-friendly ISO formatted values.

# 24th April 2018, 4:11 pm / csv, projects, sqlite

react-jsonschema-form. Exciting library from the Mozilla Services team: given a JSON Schema definition, react-jsonschema-form can produce a cleanly designed React-powered form for adding and editing data that matches that schema. Includes support for adding multiple items in a nested array, re-ordering them, custom form widgets and more.

# 23rd April 2018, 9:38 pm / cms, forms, json, jsonschema, mozilla, react

Why it took a long time to build that tiny link preview on Wikipedia (via) Wikipedia now shows a little preview card on internal links with an image and summary paragraph of the linked page. As a Wikpedia user I absolutely love this feature—and as an engineer and product designer, it’s fascinating to hear the challenges they overcame to ship it. Of particular interest: actually generating a useful summary of a page, while stripping out the cruft that often accumulates at the beginning of their text. It’s also an impressive scaling challenge: the API they use for this feature is now handling more than 500,000 requests per minute.

# 23rd April 2018, 9:07 pm / scaling, wikipedia

Datasette ClusterMap Plugin – Querying UK Food Standards Agency (FSA) Food Hygiene Ratings Open Data (via) Tony Hirst wrote a tutorial on using datasette-cluster-map to analyze food hygiene ratings data from the FSA

# 20th April 2018, 8:50 pm / datasette

I submitted a PWA to 3 app stores. Here’s what I learned (via) Useful real-world experience shipping a progressive web app to the iOS, Android and Windows app stores.

# 19th April 2018, 9:06 pm / appstore, webapp

How to rewrite your SQL queries in Pandas, and more (via) I still haven’t fully internalized the idioms needed to manipulate DataFrames in pandas. This tutorial helps a great deal—it shows the Pandas equivalents for a host of common SQL queries.

# 19th April 2018, 6:34 pm / pandas, python, sql

Years

Tags