Simon Willison’s Weblog

Subscribe

63 items tagged “mysql”

2010

If you’re ever debugging a problem and you see the number 42-mumble-mumble-mumble-7295 you’ve run out of 32-bit storage. If you see 2-mumble-mumble-mumble-647 (2147483647) you’ve run out of signed 32-bit storage. 167-mumble-mumble-15 (16777215) you’ve run out of 24-bits and 65-mumble-mumble-35 (65535) you’ve run out of 16-bits of integers.

Kellan Elliott-McCrea # 25th January 2010, 8:11 am

2009

Crowdsourced document analysis and MP expenses

As you may have heard, the UK government released a fresh batch of MP expenses documents a week ago on Thursday. I spent that week working with a small team at Guardian HQ to prepare for the release. Here’s what we built:

[... 2081 words]

Fixing Poor MySQL Default Configuration Values. Some tips from Jeremy Zawodny on configuring MySQL for high traffic environments—he suggests skip-name-resolve, connect_timeout=20, thread_cache_size=not-zero, max_connect_errors=very-high-number, slave_net_timeout=30. # 9th November 2009, 5:07 pm

Django-Jython 1.0.0 released! Now with database backends for PostgreSQL, Oracle and MySQL. The next release (planned for next month) should provide full compatibility with Django 1.1—the current release has 1.1 support for PostgreSQL but only 1.0 support for the other two databases. # 9th November 2009, 1:53 pm

MySQL backups with EBS snapshots. Assaf Arkin’s 45 line ruby script shows how to lock tables / XFS freeze / create an EBS snapshot / unfreeze and unlock, with hourly snapshots preserved for the past 24 hours and daily snapshots for the past week. Is an EBS snapshot enough to restore your data to somewhere other than EC2 though? # 13th October 2009, 12:34 pm

MySQL Connector/Python. A pure Python implementation of the MySQL client/server protocol, meaning you can talk to a MySQL server from Python without needing to first install the MySQL client libraries (which often requires compiling from source). # 2nd October 2009, 2:16 pm

MySQL, Python and MacOS X 10.6 (Snow Leopard). I gave up on compiling things when I upgraded to Snow Leopard—I’m back to running Ubuntu in a VMWare instance, mounted over Samba so I can still use TextMate. # 25th September 2009, 10:14 pm

Ravelry. Tim Bray interviews Casey Forbes, the single engineer behind Ravelry, the knitting community that serves 10 million Rails requests a day using just seven physical servers, MySQL, Sphinx, memcached, nginx, haproxy, passenger and Tokyo Cabinet. # 3rd September 2009, 6:50 pm

How to find un-indexed queries in MySQL, without using the log (via) Use tcpdump(!) to sniff the MySQL protocol and dump out queries that had the “no index used” bit set. # 19th August 2009, 11:42 am

SQL pie chart. Generating ASCII art pie charts using the world’s scariest MySQL SELECT statement. # 13th August 2009, 1:04 pm

Why we migrated from MySQL to MongoDB. Includes some useful information on MongoDB’s limitations—for example, running many different collections can waste disk space and repairing large datasets or bulk deleting many rows can block and lock the database for the duration of the operation. # 27th July 2009, 10:49 am

Installing Django, Solr, Varnish and Supervisord with Buildout. Useful, detailed instructions... but I still think this stuff is Way Too Difficult at the moment. I’m a big fan of the idea of sites that are assembled from multiple smaller web services talking HTTP to each other, but ensuring all the moving parts stay running is massively more painful than just running Apache and MySQL. # 7th June 2009, 1:54 pm

peeping into memcached. “Peep uses ptrace to freeze a running memcached server, dump the internal key metadata, and return the server to a running state”—you can then load the resulting data in to MySQL using LOAD LOCAL INFILE and analyse it using standard SQL queries. # 20th April 2009, 6:35 pm

Sphinx 0.9.9-rc2 is out. Interesting new feature: the Sphinx search server now supports the MySQL binary protocol, so you can talk to it using a regular MySQL client library and fire off search queries using SELECT syntax and the new SphinxQL query language. # 8th April 2009, 1:59 pm

Introducing Digg’s IDDB Infrastructure. IDDB is Digg’s new infrastructure component for sharding data across multiple databases, with support for both MySQL and memcachedb. “The DiggBar and URL minifying service is powered by a 16 machine IDDB cluster, which includes 8 write masters in the index and 8 MySQL storage nodes.” # 3rd April 2009, 8:42 pm

Concurrence. Exciting: a Python framework for “creating massively concurrent network applications” (the tutorial benchmarks a Hello World web server at over 8,000 requests a second). It’s implemented on top of libevent using pyrex, can run on either Stackless Python or Greenlets from the py library and ships with a WSGI server, an HTTP client and a DBAPI 2.0 compliant MySQL driver. # 15th March 2009, 1:28 pm

[Drizzle] won’t be a get-out-of-jail-free card for very write-heavy applications but I bet it will do wonders for heavily replicated, heavily federated, read-heavy architectures (you know, normal stuff).

Richard Crowley # 8th March 2009, 6:05 pm

Database Sharding at Netlog, with MySQL and PHP. Detailed MySQL sharding case study from Netlog, who serve five billion page requests a month using thousands of shards across more than 80 database servers. # 2nd March 2009, 10:22 am

How FriendFeed uses MySQL to store schema-less data. The pain of altering/ adding indexes to tables with 250 million rows was killing their ability to try out new features, so they’ve moved to storing pickled Python objects and manually creating the indexes they need as denormalised two column tables. These can be created and dropped much more easily, and are continually populated by an off-line index building process. # 27th February 2009, 2:33 pm

New Gearman Server & Library in C, MySQL UDFs. Gearman, the job queue written for LiveJournal and now used by Digg and Yahoo!, has been rewritten in C. Looks like a good candidate for an easily configured lightweight message queue. Also includes hooks for writing MySQL functions that can interact with queues. # 13th January 2009, 4:41 pm

2008

Spock Proxy. A MySQL Proxy fork (no Lua) that concentrates solely on sharding, by parsing incoming SQL statements and redirecting them across multiple databases. There are some limitations on the SQL that can be handled (no nested queries, joins across a maximum of two tables) but generally it looks pretty impressive. # 11th December 2008, 9:49 am

OurDelta Builds for MySQL (via) A community supported “alternative distro” of MySQL, incorporating new features from Google and other sources by maintaining a clean set of patches against the MySQL source tree (which I guess is why it’s not considered a fork). I recognise some of the patches from the excellent “High Performance MySQL, 2nd Edition”. # 8th December 2008, 4:20 pm

Minimal nginx conf to split get/post requests. Interesting idea for master-slave replication balancing where GET v.s. POST is load-balanced by nginx, presumably to different backend servers that are configured to talk to either a slave or a master. This won’t deal very will with replication lag though—you really want a user’s session to be bound to the master server for the next few GET requests after data is modified to ensure they see the effects of their updates. UPDATE: Amit fixed my complaint with a neat hack based around a cookie with a max age of 10 seconds. # 14th October 2008, 4:33 pm

Using the New MySQL Query Profiler. Extremely powerful new feature in MySQL 5.0.37. Definitely something for the Django debug toolbar. # 1st October 2008, 1:20 am

Facebook engineering notes on Scaling Out. Jason Sobel explains a couple of tricks Facebook use to deal with consistency between their California and Virginia data centres. The first is to hijack the MySQL replication stream to include information about memcached records to invalidate; the second is to use Layer 7 load balancers which inspect a “last modification time” cookie and send users to the masters in California if they have updated their profile in the past 20 seconds. # 20th August 2008, 11:51 pm

Drizzle, Clouds, “What If?”. Exciting news in the world of MySQL: Drizzle is a new project to produce a massively stripped down version of the database server—InnoDB/UTF8 only, no permissions, views, stored procedures or triggers, simplified field types, optimised for the common subset of functionality used by web apps. MySQL’s Firefox? # 23rd July 2008, 12:30 am

Django Unit Tests and Transactions. If you’re using a transactional database engine (MySQL with InnoDB, Postgres or SQLite) you can speed things up by running each of your unit tests inside a transaction and rolling back in tearDown(). # 7th July 2008, 2:14 pm

Sun To Acquire MySQL. Sun also employ Josh Berkus, one of the lead developers of PostgreSQL. # 16th January 2008, 1:55 pm

2007

Getting from point A to B (the right way)

If your laptop is relatively recent it might have hardware support for virtualization (Intel Core Duo chips do, for example). If so, it’s worth looking in to using VMWare or Parallels to run a virtual linux server locally on your machine. You’ll need a fair amount of RAM for this as well—2 GB minimum probably.

[... 194 words]

CouchDB: Thinking beyond the RDBMS. CouchDB is a fascinating project—an Erlang powered non-relational database with a JSON API that lets you define “views” (really computed tables) based on JavaScript functions that execute using map/reduce. Damien Katz, the main developer currently works for MySQL and used to work on Lotus Notes. # 3rd September 2007, 9:48 am