Simon Willison’s Weblog

Subscribe
Atom feed for datasette Random

1,455 posts tagged “datasette”

Datasette is an open source tool for exploring and publishing data.

2018

Release datasette Datasette 0.25.2 — An open source multi-tool for exploring and publishing data
Release datasette Datasette 0.25.1 — An open source multi-tool for exploring and publishing data

Building smaller Python Docker images

Changes are afoot at Zeit Now, my preferred hosting provider for the past year (see previous posts). They have announced Now 2.0, an intriguing new approach to providing auto-scaling immutable deployments. It’s built on top of lambdas, and comes with a whole host of new constraints: code needs to fit into a 5MB bundle for example (though it looks like this restriction will soon be relaxed a littleupdate November 19th you can now bump this up to 50MB).

[... 1,872 words]

How to Instantly Publish Data to the Internet with Datasette

I spoke about my Datasette project at PyBay in August and they’ve just posted the video of my talk.

[... 58 words]

The interesting ideas in Datasette

Datasette (previously) is my open source tool for exploring and publishing structured data. There are a lot of ideas embedded in Datasette. I realized that I haven’t put many of them into writing.

[... 2,857 words]

Release datasette Datasette 0.25 — An open source multi-tool for exploring and publishing data

Slides, notes and links from my Datasette talk at PyBay (via) I presented a session about Datasette at the PyBay conference in San Francisco this morning. I talked about the project itself and demonstrated ways of creating and publishing databases using csvs-to-sqlite, Datasette Publish and my new sqlite-utils library.

# 19th August 2018, 11:23 pm / my-talks, datasette, sqlite, sqlite-utils

How to Instantly Publish Data to the Internet with Datasette

Visit How to Instantly Publish Data to the Internet with Datasette

I presented a session about Datasette at the PyBay 2018 conference in San Francisco. I talked about the project itself and demonstrated ways of creating and publishing databases using csvs-to-sqlite, Datasette Publish and my new sqlite-utils library.

[... 2,043 words]

Analyzing US Election Russian Facebook Ads

Two interesting data sources have emerged in the past few weeks concerning the Russian impact on the 2016 US elections.

[... 922 words]

Analyzing US Election troll tweets with Datasette

FiveThirtyEight published nearly 3 million tweets from accounts associated with the Russian “Internet Research Agency”, based on research by Darren Linvill and Patrick Warren at at Clemson University.

[... 110 words]

Datasette unit tests: monkeytype_call_traces (via) Faceted browse against every function call that occurs during the execution of Datasette’s test suite. I used Instagram’s MonkeyType tool to generate this, which can run Python code and generates a SQLite database of all of the traced calls. It’s intended to be used to automatically add mypy annotations to your code, but since it produces a SQLite database as a by-product I’ve started exploring the intermediary format using Datasette. Generating this was as easy as running “monkeytype run `which pytest`” in the Datasette root directory.

# 2nd August 2018, 9:03 pm / sqlite, datasette, static-typing, python, mypy, testing

Documentation unit tests

Or: Test-driven documentation.

[... 1,521 words]

Datasette: publish_subcommand hook + default plugins mechanism, used for publish heroku/now (via) I just landed a new plugin hook to Datasette master: publish_subcommand, which lets you define new publisher subcommands for the “datasette publish” CLI tool in addition to Heroku and Zeit Now. As part of this I’ve refactored the heroku/now publisher implementations into two default plugins that ship as part of Datasette—I hope to use this pattern for other core functionality in the future.

# 26th July 2018, 5:27 am / datasette, plugins, cli

Release datasette Datasette 0.24 — An open source multi-tool for exploring and publishing data

XARs: An efficient system for self-contained executables (via) Really interesting new open source project from Facebook: a XAR is a new way of packaging up a Python executable complete with its dependencies and resources such that it can be distributed and executed elsewhere as a single file. It’s kind of like a Docker container without Docker—it uses the SquashFS compressed read-only filesystem. I can’t wait to try this out with Datasette.

# 13th July 2018, 7 pm / facebook, datasette, python

The Now CDN (via) Huge announcement from Zeit Now today: all .now.sh deployments are now served through the Cloudflare CDN, which means they benefit from 150 worldwide CDN locations that obey HTTP caching headers. This is particularly relevant for Datasette, since it serves far-future cache headers by default and uses Cloudflare-compatible HTTP/2 push hints to accelerate 302 redirects. This means that both the “datasette publish now” CLI command and the Datasette Publish web app will now result in Cloudflare-accelerated deployments.

# 12th July 2018, 3:34 am / cdn, performance, datasette, zeit-now, cloudflare

Release datasette-vega 0.6.1 — Datasette plugin for visualizing data using Vega
Release datasette Datasette 0.23.2 — An open source multi-tool for exploring and publishing data
Release datasette-vega 0.6 — Datasette plugin for visualizing data using Vega
Release datasette-vega 0.5 — Datasette plugin for visualizing data using Vega

datasette-vega (via) I wrote a visualization plugin for Datasette that uses the excellent Vega “visualization grammar” library to provide bar, line and scatter charts configurable against any Datasette table or SQL query.

# 29th June 2018, 3 pm / datasette, visualization, plugins, projects

Query Parquet files in SQLite. Colin Dellow built a SQLite virtual table extension that lets you query Parquet files directly using SQL. Parquet is interesting because it’s a columnar format that dramatically reduces the space needed to store tables with lots of duplicate column data—most CSV files, for example. Colin reports being able to shrink a 1291 MB CSV file from the Canadian census to an equivalent Parquet file weighing just 42MB (3% of the original)—then running a complex query against the data in just 60ms. I’d love to see someone get this extension working with Datasette.

# 24th June 2018, 7:44 pm / big-data, datasette, parquet, sqlite, colin-dellow

Release datasette Datasette 0.23.1 — An open source multi-tool for exploring and publishing data

Datasette 0.23: CSV, SpatiaLite and more (via) The big new feature in 0.23 is CSV export: any Datasette table or query can now be exported as CSV, including the option to get all matching rows in one giant CSV file taking advantage of Python 3 async and Datasette’s efficient keyset pagination. Also in this release: improved support for SpatiaLite and various JSON API improvements including the ability to expand foreign key labels in JSON and CSV responses.

# 18th June 2018, 3:34 pm / projects, csv, datasette

Release datasette Datasette 0.23: CSV, SpatiaLite and more — An open source multi-tool for exploring and publishing data

SpatiaLite — Datasette documentation. Datasette’s documentation now includes extensive coverage of the SpatiaLite extension for SQLite: how to install it, how to import latitude/longitude points, shapefiles and GeoJSON data into SpatiaLite tables, and how to run SQL queries against it that take advantage of spatial indexes. I’m learning SpatiaLite at the moment and filling out the documentation with each new trick I learn as I go—as Mark Pilgrim once taught me, the best way to learn a new technology is to write about it.

# 30th May 2018, 4:34 am / sqlite, spatialite, datasette, mark-pilgrim, documentation

Release datasette Datasette 0.22.1 — An open source multi-tool for exploring and publishing data
Release datasette Datasette 0.22: Datasette Facets — An open source multi-tool for exploring and publishing data

Datasette Facets

Datasette 0.22 is out with the most significant new feature I’ve added since the initial release: faceted browse.

[... 1,189 words]

sqlitebiter. Similar to my csvs-to-sqlite tool, but sqlitebiter handles “CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/SSV/TSV/Google-Sheets”. Most interestingly, it works against HTML pages—run “sqlitebiter -v url ’https://en.wikipedia.org/wiki/Comparison_of_firewalls’” and it will scrape that Wikipedia page and create a SQLite table for each of the HTML tables it finds there.

# 17th May 2018, 10:40 pm / datasette, scraping, csv, sqlite