48 items tagged “sql”
I called it normalization because then President Nixon was talking a lot about normalizing relations with China. I figured that if he could normalize relations, so could I.
Generated Columns in SQLite (via) SQLite 3.31.0 released today, and generated columns are the single most notable new feature. PostgreSQL 12 added these in October 2019, and MySQL has had them since 5.7 in October 2015. MySQL and SQLite both offer either “stored” or “virtual” generated columns, with virtual columns being calculated at runtime. PostgreSQL currently only supports stored columns. # 24th January 2020, 4:20 am
[... 1140 words]
athena-sqlite (via) Amazon Athena is the AWS tool for querying data stored in S3—as CSV, JSON or Apache Parquet files—using SQL. It’s an interesting way of buliding a very cheap data warehouse on top of S3 without having to run any additional services. Athena recently added a query federation SDK which lets you define additional custom data sources using Lambda functions. Damon Cortesi used this to write a custom connector for SQLite, which lets you run queries against data stored in SQLite files that you have uploaded to S3. You can then run joins between that data and other Athena sources. # 18th December 2019, 9:05 am
Last week I talked about wanting to take ona a larger Datasette project, and listed some candidates. I ended up pushing a big project that I hadn’t listed there: the upgrade of Datasette to Python 3.8, which meant dropping support for Python 3.5 (thanks to incompatible dependencies).[... 521 words]
datasette-template-sql (via) New Datasette plugin, celebrating the new ability in Datasette 0.32 to have asynchronous custom template functions in Jinja (which was previously blocked by the need to support Python 3.5). The plugin adds a sql() function which can be used to execute SQL queries that are embedded directly in custom templates. # 15th November 2019, 12:59 am
SQL Murder Mystery in Datasette (via) “A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder that occurred sometime on Jan.15, 2018 and that it took place in SQL City. Start by retrieving the corresponding crime scene report from the police department’s database.”—Really fun game to help exercise your skills with SQL by the NU Knight Lab. I loaded their SQLite database into Datasette so you can play in your browser. # 7th October 2019, 11:37 pm
SQL queries don’t start with SELECT. This is really useful. Understanding that SELECT (and associated window functions) happen after the WHERE, GROUP BY and HAVING helps explain why you can’t filter a query based on the results of a window function for example. # 3rd October 2019, 8:56 pm
Anyone with solid knowledge of both SQL and genetic engineering want to write me an UPDATE query to turn me into a dinosaur?
PugSQL. Interesting new twist on a definitely-not-an-ORM library for Python. With PugSQL you define SQL queries in files, give them names and then load them into a module which allows you to execute them as Python methods with keyword arguments. You can mark statements as only returning a single row (or a single scalar value) with a comment at the top of their file. # 3rd July 2019, 6:19 pm
Datasette: ?_where=sql-fragment parameter for table views. I just shipped a tiny but really useful new feature to Datasette master: you can now add ?_where=sql-fragment on to the URL of any table view to inject additional SQL directly into the underlying WHERE clause. This tiny feature actually has some really interesting applications: I created this because I wanted to be able to run more complex custom SQL queries without losing access to the conveniences of Datasette’s table view, in particular the built-in faceting support. The feature actually fits in well with Datasette’s philosophy of allowing arbitrary SQL to be executed against a read-only database: you can turn this ability off using the allow_sql config flag. # 13th April 2019, 2 am
github/gh-ost: Thoughts on Foreign Keys? The biggest challenge I’ve seen with foreign key constraints at scale (at least with MySQL) is how they conflict with online schema migrations using tools like pt-online-schema-change or GitHub’s gh-ost. This is a good explanation of the issue by Shlomi Noach, one of the gh-ost maintainers. # 19th June 2018, 4:12 pm
Half of the time when companies say they need “AI” what they really need is a SELECT clause with GROUP BY.
System-Versioned Tables in MariaDB (via) Fascinating new feature from the SQL:2011 standard that’s getting its first working implementation in MariaDB 10.3.4. “ALTER TABLE products ADD SYSTEM VERSIONING;” causes the database to store every change made to that table—then you can run “SELECT * FROM products FOR SYSTEM_TIME AS OF TIMESTAMP @t1;” to query the data as of a specific point in time. I’ve tried all manner of horrible mechanisms for achieving this in the past, having it baked into the database itself would be fantastic. # 25th April 2018, 2:34 pm
How to rewrite your SQL queries in Pandas, and more (via) I still haven’t fully internalized the idioms needed to manipulate DataFrames in pandas. This tutorial helps a great deal—it shows the Pandas equivalents for a host of common SQL queries. # 19th April 2018, 6:34 pm
Use The Index, Luke! Paging Through Results (via) The best explanation of keyset pagination I’ve seen. Keyset pagination is where instead of using OFFSET/LIMIT to return the next page of results you instead track the last seen value in the column you sort by and then return the next X results that follow it. This allows you to paginate to arbitrarily deep offsets within a table, whereas OFFSET/LIMIT requires first iterating across all preceding rows and tends to stop working well after the first few thousand results. # 29th March 2018, 5:30 pm
User-defined Order in SQL (via) This is a fun intellectual exercise: how can one efficiently implement a user-defined order in a SQL table? The obvious initial approach is to have an integer position column, but this means every subsequent row must be updated when an item changes position. Joe “begriffs” Nelson explores some clever alternatives, including floating point or decimal positions (allowing new items to be inserted at a midpoint between existing positions) and a new custom rational number type he buiIt as a PostgreSQL extension. # 21st March 2018, 2:07 pm
SQLite Query Language: WITH clause. SQLite’s documentation on recursive CTEs starts out with some nice clear examples of tree traversal using a WITH statement, then gets into graphs, then goes way off the deep end with a Mandelbrot Set query and a query that can solve Soduku puzzles (“in less than 300 milliseconds on a modern workstation”). # 26th November 2017, 7:23 am
You may find this article useful (despite the list-o-matic name): 10 Easy Steps to a Complete Understanding of SQL—I’ve been using SQL for years but I found that some of the concepts explained there helped firm up my fundamental understanding of how to use it effectively.[... 64 words]
The term you are looking for is database migrations (sometimes called database change scripts).[... 308 words]
Appending the request URL to SQL statements in Django. A clever frame-walking monkey-patch which pulls the most recent HttpRequest object out of the Python stack and adds the current request.path to each SQL query as an SQL comment, so you can see it in debugging tools such as slow query logs and the PostgreSQL “select * from pg_stat_activity” query. # 2nd June 2010, 9:09 am
The answers to your Security Questions are case sensitive and cannot contain special characters like an apostrophe, or the words “insert,” “delete,” “drop,” “update,” “null,” or “select.”
grant XXX on * ? (via) PostgreSQL doesn’t have a way to say “this user is allowed to select/update/etc on all tables in database X”. That kind of sucks. UPDATE: This is fixed in PostgreSQL 9, see the comments. # 16th March 2010, 6:26 pm
django-batch-select (via) A smart attempt at solving select_related for many-to-many relationships in Django. Add a custom manager to your model and call e.g. Entry.objects.all()[:10].batch_select(“tags”) to execute two queries—one pulling back the first ten entries and another using an “IN” query against the tags table to pull back all of the tags for those entries in one go. # 23rd November 2009, 4:19 pm
When I worked at Amazon.com we had a deeply-ingrained hatred for all of the SQL databases in our systems. Now, we knew perfectly well how to scale them through partitioning and other means. But making them highly available was another matter. Replication and failover give you basic reliability, but it’s very limited and inflexible compared to a real distributed datastore with master-master replication, partition tolerance, consensus and/or eventual consistency, or other availability-oriented features.