64 items tagged “mysql”
2010
If you’re ever debugging a problem and you see the number 42-mumble-mumble-mumble-7295 you’ve run out of 32-bit storage. If you see 2-mumble-mumble-mumble-647 (2147483647) you’ve run out of signed 32-bit storage. 167-mumble-mumble-15 (16777215) you’ve run out of 24-bits and 65-mumble-mumble-35 (65535) you’ve run out of 16-bits of integers.
2009
Crowdsourced document analysis and MP expenses
As you may have heard, the UK government released a fresh batch of MP expenses documents a week ago on Thursday. I spent that week working with a small team at Guardian HQ to prepare for the release. Here’s what we built:
[... 2,081 words]Fixing Poor MySQL Default Configuration Values. Some tips from Jeremy Zawodny on configuring MySQL for high traffic environments—he suggests skip-name-resolve, connect_timeout=20, thread_cache_size=not-zero, max_connect_errors=very-high-number, slave_net_timeout=30.
Django-Jython 1.0.0 released! Now with database backends for PostgreSQL, Oracle and MySQL. The next release (planned for next month) should provide full compatibility with Django 1.1—the current release has 1.1 support for PostgreSQL but only 1.0 support for the other two databases.
MySQL backups with EBS snapshots. Assaf Arkin’s 45 line ruby script shows how to lock tables / XFS freeze / create an EBS snapshot / unfreeze and unlock, with hourly snapshots preserved for the past 24 hours and daily snapshots for the past week. Is an EBS snapshot enough to restore your data to somewhere other than EC2 though?
MySQL Connector/Python. A pure Python implementation of the MySQL client/server protocol, meaning you can talk to a MySQL server from Python without needing to first install the MySQL client libraries (which often requires compiling from source).
MySQL, Python and MacOS X 10.6 (Snow Leopard). I gave up on compiling things when I upgraded to Snow Leopard—I’m back to running Ubuntu in a VMWare instance, mounted over Samba so I can still use TextMate.
Ravelry. Tim Bray interviews Casey Forbes, the single engineer behind Ravelry, the knitting community that serves 10 million Rails requests a day using just seven physical servers, MySQL, Sphinx, memcached, nginx, haproxy, passenger and Tokyo Cabinet.
How to find un-indexed queries in MySQL, without using the log (via) Use tcpdump(!) to sniff the MySQL protocol and dump out queries that had the “no index used” bit set.
SQL pie chart. Generating ASCII art pie charts using the world’s scariest MySQL SELECT statement.
Why we migrated from MySQL to MongoDB. Includes some useful information on MongoDB’s limitations—for example, running many different collections can waste disk space and repairing large datasets or bulk deleting many rows can block and lock the database for the duration of the operation.
Installing Django, Solr, Varnish and Supervisord with Buildout. Useful, detailed instructions... but I still think this stuff is Way Too Difficult at the moment. I’m a big fan of the idea of sites that are assembled from multiple smaller web services talking HTTP to each other, but ensuring all the moving parts stay running is massively more painful than just running Apache and MySQL.
peeping into memcached. “Peep uses ptrace to freeze a running memcached server, dump the internal key metadata, and return the server to a running state”—you can then load the resulting data in to MySQL using LOAD LOCAL INFILE and analyse it using standard SQL queries.
Sphinx 0.9.9-rc2 is out. Interesting new feature: the Sphinx search server now supports the MySQL binary protocol, so you can talk to it using a regular MySQL client library and fire off search queries using SELECT syntax and the new SphinxQL query language.
Introducing Digg’s IDDB Infrastructure. IDDB is Digg’s new infrastructure component for sharding data across multiple databases, with support for both MySQL and memcachedb. “The DiggBar and URL minifying service is powered by a 16 machine IDDB cluster, which includes 8 write masters in the index and 8 MySQL storage nodes.”
Concurrence. Exciting: a Python framework for “creating massively concurrent network applications” (the tutorial benchmarks a Hello World web server at over 8,000 requests a second). It’s implemented on top of libevent using pyrex, can run on either Stackless Python or Greenlets from the py library and ships with a WSGI server, an HTTP client and a DBAPI 2.0 compliant MySQL driver.
[Drizzle] won’t be a get-out-of-jail-free card for very write-heavy applications but I bet it will do wonders for heavily replicated, heavily federated, read-heavy architectures (you know, normal stuff).
Database Sharding at Netlog, with MySQL and PHP. Detailed MySQL sharding case study from Netlog, who serve five billion page requests a month using thousands of shards across more than 80 database servers.
How FriendFeed uses MySQL to store schema-less data. The pain of altering/ adding indexes to tables with 250 million rows was killing their ability to try out new features, so they’ve moved to storing pickled Python objects and manually creating the indexes they need as denormalised two column tables. These can be created and dropped much more easily, and are continually populated by an off-line index building process.
New Gearman Server & Library in C, MySQL UDFs. Gearman, the job queue written for LiveJournal and now used by Digg and Yahoo!, has been rewritten in C. Looks like a good candidate for an easily configured lightweight message queue. Also includes hooks for writing MySQL functions that can interact with queues.
2008
Spock Proxy. A MySQL Proxy fork (no Lua) that concentrates solely on sharding, by parsing incoming SQL statements and redirecting them across multiple databases. There are some limitations on the SQL that can be handled (no nested queries, joins across a maximum of two tables) but generally it looks pretty impressive.
OurDelta Builds for MySQL (via) A community supported “alternative distro” of MySQL, incorporating new features from Google and other sources by maintaining a clean set of patches against the MySQL source tree (which I guess is why it’s not considered a fork). I recognise some of the patches from the excellent “High Performance MySQL, 2nd Edition”.
Minimal nginx conf to split get/post requests. Interesting idea for master-slave replication balancing where GET v.s. POST is load-balanced by nginx, presumably to different backend servers that are configured to talk to either a slave or a master. This won’t deal very will with replication lag though—you really want a user’s session to be bound to the master server for the next few GET requests after data is modified to ensure they see the effects of their updates. UPDATE: Amit fixed my complaint with a neat hack based around a cookie with a max age of 10 seconds.
Using the New MySQL Query Profiler. Extremely powerful new feature in MySQL 5.0.37. Definitely something for the Django debug toolbar.
Facebook engineering notes on Scaling Out. Jason Sobel explains a couple of tricks Facebook use to deal with consistency between their California and Virginia data centres. The first is to hijack the MySQL replication stream to include information about memcached records to invalidate; the second is to use Layer 7 load balancers which inspect a “last modification time” cookie and send users to the masters in California if they have updated their profile in the past 20 seconds.
Drizzle, Clouds, “What If?”. Exciting news in the world of MySQL: Drizzle is a new project to produce a massively stripped down version of the database server—InnoDB/UTF8 only, no permissions, views, stored procedures or triggers, simplified field types, optimised for the common subset of functionality used by web apps. MySQL’s Firefox?
Django Unit Tests and Transactions. If you’re using a transactional database engine (MySQL with InnoDB, Postgres or SQLite) you can speed things up by running each of your unit tests inside a transaction and rolling back in tearDown().
Sun To Acquire MySQL. Sun also employ Josh Berkus, one of the lead developers of PostgreSQL.
2007
Getting from point A to B (the right way)
If your laptop is relatively recent it might have hardware support for virtualization (Intel Core Duo chips do, for example). If so, it’s worth looking in to using VMWare or Parallels to run a virtual linux server locally on your machine. You’ll need a fair amount of RAM for this as well—2 GB minimum probably.
[... 194 words]CouchDB: Thinking beyond the RDBMS. CouchDB is a fascinating project—an Erlang powered non-relational database with a JSON API that lets you define “views” (really computed tables) based on JavaScript functions that execute using map/reduce. Damien Katz, the main developer currently works for MySQL and used to work on Lotus Notes.