Items tagged databases
Filters: databases ×
Cleaning Up Your Postgres Database (via) Craig Kerstiens provides some invaluable tips on running an initial check of the health of a PostgreSQL database, by using queries against the pg_statio_user_indexes table to find the memory cache hit ratio and the pg_stat_user_tables table to see what percentage of queries to your tables are using an index. # 3rd February 2021, 7:32 am
I get asked a lot about learning to code. Sure, if you can. It’s fun. But the real action, the crux of things, is there in the database. Grab a tiny, free database like SQLite. Import a few million rows of data. Make them searchable. It’s one of the most soothing activities known to humankind, taking big piles of messy data and massaging them into the rigid structure required of a relational database. It’s true power.
Yesterday I described the new sqlite-utils transform mechanism for applying SQLite table transformations that go beyond those supported by
ALTER TABLE. The other new feature in sqlite-utils 2.20 builds on that capability to allow you to refactor a database table by extracting columns into separate tables. I’ve called it sqlite-utils extract.
DuckDB (via) This is a really interesting, relatively new database. It’s kind of a weird hybrid between SQLite and PostgreSQL: it uses the PostgreSQL parser but models itself after SQLite in that databases are a single file and the code is designed for use as an embedded library, distributed in a single amalgamation C++ file (SQLite uses a C amalgamation). It features a “columnar-vectorized query execution engine” inspired by MonetDB (also by the DuckDB authors) and is hence designed to run analytical queries really quickly. You can install it using “pip install duckdb”—the resulting module feels similar to Python’s sqlite3, and follows roughly the same DBAPI pattern. # 19th September 2020, 11:43 pm
This week I helped Natalie launch Rocky Beaches, shipped Datasette 0.48 and several releases of
datasette-graphql, upgraded the CSRF protection for
datasette-upload-csvs and figured out how to get a commit log of changes to my blog by backing up its database to a GitHub repository.
Get Started—Materialize. Materialize is a really interesting new database—“a streaming SQL materialized view engine”. It builds materialized views on top of streaming data sources (such as Kafka)—you define the view using a SQL query, then it figures out how to keep that view up-to-date automatically as new data streams in. It speaks the PostgreSQL protocol so you can talk to it using the psql tool or any PostgreSQL client library. The “get started” guide is particularly impressive: it uses a curl stream of the Wikipedia recent changes API, parsed using a regular expression. And it’s written in Rust, so installing it is as easy as downloading and executing a single binary (though I used Homebrew). # 1st June 2020, 10:11 pm
I called it normalization because then President Nixon was talking a lot about normalizing relations with China. I figured that if he could normalize relations, so could I.
Relational databases are a commodity now, but they power a much larger fraction of the world’s economy that AI ever will. And no company has a “relational database strategy”.
github/gh-ost: Thoughts on Foreign Keys? The biggest challenge I’ve seen with foreign key constraints at scale (at least with MySQL) is how they conflict with online schema migrations using tools like pt-online-schema-change or GitHub’s gh-ost. This is a good explanation of the issue by Shlomi Noach, one of the gh-ost maintainers. # 19th June 2018, 4:12 pm
Showdown: MySQL 8 vs PostgreSQL 10 (via) MySQL 8 makes comparisons between PostgreSQL and MySQL far more interesting, as it closes some of the key feature gaps. Meanwhile the PostgreSQL replication story (long one of MySQL’s key advantages) has improved dramatically in recent versions. This article offers a useful overview of the current differences, including diving into some of the less obvious implementation details that differ between the two. # 23rd May 2018, 5:02 pm
Is there an open source (or freely accessible) database of geofence coordinates for common places, such as cities or national parks?
Take a look at Flickr’s openly licensed shapefiles:[... 59 words]
PostgreSQL: How can I store images in a database? What existing products makes it easy for a user to upload photos into a general database?
As a general rule, it’s a bad idea to store images in a database. This is due to the large amount of space they take up, which can affect database read performance and will greatly increases the size of your backups, making them both take longer and cost more to store.[... 187 words]
You may find this article useful (despite the list-o-matic name): 10 Easy Steps to a Complete Understanding of SQL—I’ve been using SQL for years but I found that some of the concepts explained there helped firm up my fundamental understanding of how to use it effectively.[... 64 words]
On the right setup, PostgreSQL can handle petabytes. There are also commercial vendors such as Greenplum that offer data warehouse solutions built on a modified version of PostgreSQL.[... 80 words]
The principle reason they switched to a schemaless DB was to work around the challenges of having to make schemes changes in MySQL, which can lock the table and take hours if bit days to complete in large tables.[... 115 words]
As a general rule it’s not a good idea to allow mobile devices to connect directly to a server-side database, as it’s an invitation to hackers to figure out what’s going on and then connect to the database themselves for nefarious reasons.[... 105 words]
You should store the exact format that was entered by the user.[... 95 words]
If you’re only dealing with 60 records there’s no need to add a full database. I’ve actually hand coded a 50 record JSON file before and it was fine- use an editor with good JSON support (I like Sublime Text 2) and it’s pretty easy to hand write.[... 103 words]
This is a surprisingly complicated question. The first thing you might want to ask yourself is “what’s a country”—how do you deal with places on this List of states with limited recognition for example?[... 182 words]
The term you are looking for is database migrations (sometimes called database change scripts).[... 308 words]
It sounds like you’re pretty much screwed already, if you’re dealing with companies that still think FTPing XML around is a sensible thing to do.[... 364 words]
NoSQL systems are enormously varied which makes it hard (and not particularly constructive) to benchmark them against each other. How would you compare the performance of Redis, an in-memory data structure server, with Cassandra, a distributed redundant column store?[... 78 words]
I don’t fully understand the question, but if you’re talking about doing a single join across multiple tables the Django ORM handles that just fine. Let’s say you want to get every BlogEntry written by a User who belongs to the Group with the name “admins”:[... 67 words]
Many to Many tables can be a bit of a pain to deal with using regular SQL, but a good ORM can abstract away any potential complexity almost entirely. I find using the Django ORM means I’m much less likely to shy away from a design that involves a many-to-many relationship because I know it won’t increase the complexity of the application. I imagine the Rails ORM has the same effect.[... 91 words]
I’d guess Twitter or Craigslist.[... 19 words]
That it’s so obviously a good idea (and works so well) that you’d be crazy not to use it. As far as I’m concerned, it’s part of the default stack for any web application.[... 46 words]
Where can I find a database of the cities in the United States, their populations, and square miles?
On Freebase: http://www.freebase.com/view/use...—if you sign up for a Freebase account you can further filter this report to include areas.[... 46 words]
A Gentle Introduction to CouchDB for Relational Practitioners. By “High Performance MySQL” author Baron Schwartz—a smart, concise overview that touches pretty much everything that’s interesting about CouchDB. # 22nd September 2010, 9:51 pm
Lucene is a Java library for creating and searching through a full text index. If you want to make use of it, you’ll need to write your own Java code that integrates with it.[... 109 words]
The BBC have a pretty big CouchDB cluster, which they use mostly as a replicated key-value store. It’s used by their new identity platform which includes customisation features for iPlayer.[... 47 words]