94 items tagged “sql”
2021
Django SQL Dashboard 1.0 (via) As part of my ongoing attempt to be braver about 1.0 releases (crucial if you want to do semantic versioning properly) I’ve released version 1.0 of Django SQL Dashboard, my Datasette-inspired app for Django that adds an interface for running read-only, bookmarkable SQL queries against a PostgreSQL database. The new version adds a column cog menu providing shortcuts for changing the sort order, counting distinct values and performing a group-by/count against column values.
Hierarchical Structures in PostgreSQL (via) Two techniques I hadn’t seen before: the first is to define a materialized view using a CTE that offers efficient tree queries against a PostgreSQL array of path components (plus a trigger to update the materialized view), the second is with the PostgreSQL ltree extension which ships as part of PostgreSQL and hence should be widely available.
Joining CSV and JSON data with an in-memory SQLite database
The new sqlite-utils memory
command can import CSV and JSON data directly into an in-memory SQLite database, combine and query it using SQL and output the results as CSV, JSON or various other formats of plain text tables.
The humble hash aggregate (via) Today I learned that “hash aggregate” is the name for the algorithm where you split a list of tuples on a common key, run an aggregation against each resulting group and combine the results back together again—I’d previously thought if this in terms of map/reduce but hash aggregate is a much older term used widely by SQL engines—I’ve seen it come up in PostgreSQL explain query output (for GROUP BY) before but didn’t know what it meant.
Django SQL Dashboard
I’ve released the first non-alpha version of Django SQL Dashboard, which provides an interface for running arbitrary read-only SQL queries directly against a PostgreSQL database, protected by the Django authentication scheme. It can also be used to create saved dashboards that can be published or shared internally.
[... 2,171 words]Practical SQL for Data Analysis
(via)
This is a really great SQL tutorial: it starts with the basics, but quickly moves on to a whole array of advanced PostgreSQL techniques - CTEs, window functions, efficient sampling, rollups, pivot tables and even linear regressions executed directly in the database using regr_slope()
, regr_intercept()
and regr_r2()
. I picked up a whole bunch of tips for things I didn't know you could do with PostgreSQL here.
Weeknotes: tableau-to-sqlite, django-sql-dashboard
This week I started a limited production run of my new backend for Vaccinate CA calling, built a tableau-to-sqlite
import tool and started working on a subset of Datasette for PostgreSQL and Django called django-sql-dashboard
.
Everything You Always Wanted To Know About GitHub (But Were Afraid To Ask) (via) ClickHouse by Yandex is an open source column-oriented data warehouse, designed to run analytical queries against TBs of data. They’ve loaded the full GitHub Archive of events since 2011 into a public instance, which is a great way of both exploring GitHub activity and trying out ClickHouse. Here’s a query I just ran that shows number of watch events per year, for example:
SELECT toYear(created_at) as yyyy, count() FROM github_events WHERE event_type = ’WatchEvent’ group by yyyy
2020
Some SQL Tricks of an Application DBA (via) This post taught me so many PostgreSQL tricks that I hadn’t seen before. Did you know you can start a transaction, drop an index, run explain and then rollback the transaction (cancelling the index drop) to see what explain would look like without that index? Among other things I also learned what the “correlation” database statistic does: it’s a measure of how close-to-sorted the values in a specific column are, which helps PostgreSQL decide if it should do an index scan or a bitmap scan when making use of an index.
How much can you learn from just two columns?
Derek Willis shared an intriguing dataset this morning: a table showing every Twitter account followed by an official GOP congressional Twitter account.
[... 951 words]Get Started—Materialize. Materialize is a really interesting new database—“a streaming SQL materialized view engine”. It builds materialized views on top of streaming data sources (such as Kafka)—you define the view using a SQL query, then it figures out how to keep that view up-to-date automatically as new data streams in. It speaks the PostgreSQL protocol so you can talk to it using the psql tool or any PostgreSQL client library. The “get started” guide is particularly impressive: it uses a curl stream of the Wikipedia recent changes API, parsed using a regular expression. And it’s written in Rust, so installing it is as easy as downloading and executing a single binary (though I used Homebrew).
Using SQL to Look Through All of Your iMessage Text Messages (via) Dan Kelch shows how to access the iMessage SQLite database at ~/Library/Messages/chat.db—it’s protected under macOS Catalina so you have to enable Full Disk Access in the privacy settings first. I usually use the macOS terminal app but I installed iTerm for this because I’d rather enable full disk access to a separate terminal program than let anything I’m running in my regular terminal take advantage of it. It worked! Now I can run “datasette ~/Library/Messages/chat.db” to browse my messages.
Using SQL to find my best photo of a pelican according to Apple Photos
According to the Apple Photos internal SQLite database, this is the most aesthetically pleasing photograph I have ever taken of a pelican:
[... 1,937 words]How Super Graph compiles GraphQL to a single SQL query. Super Graph is a GraphQL server that compiles arbitrarily nested GraphQL queries to “a single fast SQL query”. I’ve always wondered how that could possible work, so I asked author Vikram Rangnekar for an example of a compiled query—it turns out it uses a brilliant sequence of JSON aggregations to glue together results from nested subqueries and left outer joins.
SQL is a better API language than GraphQL – Convince me otherwise (via) A dumb tweet I posted this morning blew up today and ended up on the Hacker News homepage.
I called it normalization because then President Nixon was talking a lot about normalizing relations with China. I figured that if he could normalize relations, so could I.
Generated Columns in SQLite (via) SQLite 3.31.0 released today, and generated columns are the single most notable new feature. PostgreSQL 12 added these in October 2019, and MySQL has had them since 5.7 in October 2015. MySQL and SQLite both offer either “stored” or “virtual” generated columns, with virtual columns being calculated at runtime. PostgreSQL currently only supports stored columns.
2019
sqlite-utils 2.0: real upserts
I just released version 2.0 of my sqlite-utils library/CLI tool to PyPI.
[... 1,140 words]athena-sqlite (via) Amazon Athena is the AWS tool for querying data stored in S3—as CSV, JSON or Apache Parquet files—using SQL. It’s an interesting way of buliding a very cheap data warehouse on top of S3 without having to run any additional services. Athena recently added a query federation SDK which lets you define additional custom data sources using Lambda functions. Damon Cortesi used this to write a custom connector for SQLite, which lets you run queries against data stored in SQLite files that you have uploaded to S3. You can then run joins between that data and other Athena sources.
Weeknotes: datasette-template-sql
Last week I talked about wanting to take ona a larger Datasette project, and listed some candidates. I ended up pushing a big project that I hadn’t listed there: the upgrade of Datasette to Python 3.8, which meant dropping support for Python 3.5 (thanks to incompatible dependencies).
[... 521 words]datasette-template-sql (via) New Datasette plugin, celebrating the new ability in Datasette 0.32 to have asynchronous custom template functions in Jinja (which was previously blocked by the need to support Python 3.5). The plugin adds a sql() function which can be used to execute SQL queries that are embedded directly in custom templates.
Calling C functions from BigQuery with web assembly (via) Google BigQuery lets you define custom SQL functions in JavaScript, and it turns out they expose the WebAssembly.instantiate family of APIs. Which means you can write your UDD in C or Rust, compile it to WebAssembly and run it as part of your query!
SQL Murder Mystery in Datasette (via) “A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder that occurred sometime on Jan.15, 2018 and that it took place in SQL City. Start by retrieving the corresponding crime scene report from the police department’s database.”—Really fun game to help exercise your skills with SQL by the NU Knight Lab. I loaded their SQLite database into Datasette so you can play in your browser.
SQL queries don’t start with SELECT. This is really useful. Understanding that SELECT (and associated window functions) happen after the WHERE, GROUP BY and HAVING helps explain why you can’t filter a query based on the results of a window function for example.
Anyone with solid knowledge of both SQL and genetic engineering want to write me an UPDATE query to turn me into a dinosaur?
— @simonw
PugSQL. Interesting new twist on a definitely-not-an-ORM library for Python. With PugSQL you define SQL queries in files, give them names and then load them into a module which allows you to execute them as Python methods with keyword arguments. You can mark statements as only returning a single row (or a single scalar value) with a comment at the top of their file.
Datasette: ?_where=sql-fragment parameter for table views. I just shipped a tiny but really useful new feature to Datasette master: you can now add ?_where=sql-fragment on to the URL of any table view to inject additional SQL directly into the underlying WHERE clause. This tiny feature actually has some really interesting applications: I created this because I wanted to be able to run more complex custom SQL queries without losing access to the conveniences of Datasette’s table view, in particular the built-in faceting support. The feature actually fits in well with Datasette’s philosophy of allowing arbitrary SQL to be executed against a read-only database: you can turn this ability off using the allow_sql config flag.
2018
Sqorn (via) JavaScript library for building SQL queries that makes really smart usage of ES6 tagged template literals. The magic of tagged template literals is that they let you intercept and process interpolated values, making them ideally suited to escaping parameters in SQL queries. Sqorn takes that basic ability and layers on some really interesting API design to allow you to further compose queries.
github/gh-ost: Thoughts on Foreign Keys? The biggest challenge I’ve seen with foreign key constraints at scale (at least with MySQL) is how they conflict with online schema migrations using tools like pt-online-schema-change or GitHub’s gh-ost. This is a good explanation of the issue by Shlomi Noach, one of the gh-ost maintainers.
Half of the time when companies say they need "AI" what they really need is a SELECT clause with GROUP BY.