Simon Willison’s Weblog

Subscribe

Items in Oct, 2010

Filters: Year: 2010 × Month: Oct × Sorted by date


A predictable web of data—the why of YQL. Christian Heilmann is moving from Yahoo! to Mozilla to head up their evangelism team, and has marked the occasion by releasing the first chapter of a proposed book on YQL. # 30th October 2010, 7:44 am

WebKit is Dropping HTML5 “popstate” Events (via) Really nasty bug with WebKit’s pushState support, discovered by Ben Cherry from Twitter. popState events get dropped if the user navigates while an outbound network request is in progress. # 30th October 2010, 7:41 am

Running Yahoo! Pipes on Google App Engine. “The pipe2py package can compile a Yahoo! Pipe into pure Python source code, or it can interpret the pipe on-the-fly”—makes smart use of Python generators, and comes with tools to run the resulting compiled code on Google App Engine. # 30th October 2010, 12:11 am

mrjob: Distributed Computing for Everybody. Yelp use MapReduce with Hadoop (running on Amazon’s EMR service) to power all sorts of interesting features on the site, including spelling suggestions, review highlights, top searches and “people who viewed X also viewed...”. mrjob is their new open source Python framework for writing MapReduce jobs against the Hadoop streaming API. # 29th October 2010, 11:55 pm

If I have data that loads using  json / JavaScript will it get indexed by Google?

No. Personally I dislike sites with content that is only accessible through JavaScript, but if you absolutely insist on doing this you should look in to implementing the Google Ajax Crawling mechanism: http://code.google.com/web/ajaxc...

[... 56 words]

What are people’s experiences using Memcached?

That it’s so obviously a good idea (and works so well) that you’d be crazy not to use it. As far as I’m concerned, it’s part of the default stack for any web application.

[... 46 words]

Is there a good online calendar for upcoming technology conferences?

We’re trying to build exactly this with http://lanyrd.com/—not just for technology conferences, but they are definitely our largest niche.

[... 208 words]

What is the best JS library for automated cropping?

Not entirely clear what you’re looking for, but if you mean a UI tool for letting people resize and crop an image Jcrop is really nice http://deepliquid.com/content/Jc...

[... 43 words]

Is there a blog that covers open source Python projects?

No, but I wish there was.

[... 29 words]

Using MySQL as a NoSQL—A story for exceeding 750,000 qps on a commodity server. Very interesting approach: much of the speed difference between MySQL/InnoDB and memcached is due to the overhead involved in parsing and processing SQL, so the team at DeNA wrote their own MySQL plugin, HandlerSocket, which exposes a NoSQL-style network protocol for directly calling the low level MySQL storage engine APIs—resulting in a 7.5x performance increase. # 27th October 2010, 11:10 pm

Bees with machine guns! Low-cost, distributed load-testing using EC2. Great name for a useful project—Bees with machine guns is a Fabric script which fires up a bunch of EC2 instances, uses them to load test a website and then spins them back down again. # 27th October 2010, 11:04 pm

What is the best way to integrate MongoDB with Django?

Personally, I just “import pymongo” and start calling the regular Python API—no need for any special treatment to get it working with Django.

[... 41 words]

What are all the advantages of jQuery?

jQuery’s API is astonishingly well designed. It’s extremely consistent once you learn its rules (e.g. methods often take one argument to read a value and two arguments to set one, e.g. .css(), .attr(), .width(), .height()) and its functionality is so complete that the last few major releases of the library have hardly added any new methods at all.

[... 166 words]

Why does Python load imported modules separately for different files, unlike C or PHP? Isn’t that inefficient in terms of memory usage?

It doesn’t—you’re misunderstanding how Python’s module system works. If two different places have “import os” in them, the os module is only imported and executed once—it’s cached in the sys.modules dictionary so you can see it happen if you want to. The key thing to understand is that “import os” attaches the os module to the “os” symbol within the current file’s scope, loading it only if it hasn’t been loaded already.

[... 104 words]

What is the best lightweight jQuery tooltip plugin? Why?

Last time I went looking, I was very impressed by qTip: http://craigsworks.com/projects/...

[... 28 words]

What is the story of Advogato?

There’s a Google Tech Talk about Advogato: http://video.google.com/videopla...

[... 21 words]

What’s a good book about basic usage techniques and patterns in Python? (a la Effective Java/C++)

Dive into Python 3 is well worth a look: http://diveintopython3.org/

[... 48 words]

What are some of your favorite complicated diagrams?

This one’s pretty nuts:

[... 25 words]

Is it a good idea for new start-up to outsource Software/App Development?

It depends on what you mean by “outsourcing”.

[... 130 words]

What is the best Mac OS X text editor for a web developer? And what makes it great?

It’s still TextMate for me. It gets the basics right—syntax highlighting, sensible indentation, a good project pane (I use “mate ~/Development/my-project” at the terminal to open TextMate with my entire folder hierarchy), solid extensions and good unix integration (Filter selection through command).

[... 77 words]

Bleach, HTML sanitizer and auto-linker. HTML sanitisation is notoriously difficult to do correctly, but Bleach (a Python library) looks like an excellent effort. It uses the html5lib parsing library to deal with potentially malformed HTML, uses a whitelist rather than a blacklist and has a neat feature for auto-linking URLs that is aware of the DOM (so it won’t try to auto-link a URL that is already wrapped in a link element). It was written by the Mozilla team for addons.mozilla.org and support.mozilla.org so it should be production ready. # 25th October 2010, 1:32 pm

What are the best APIs for creating location-based Wikipedia mashups?

GeoNames has a fantastic API for finding Wikipedia articles near a specific latitude/longitude pair:

[... 32 words]

What is the best way to maintain a API wrapper class across multiple languages?

1. Use JSON for your API. That takes away a lot of the necessity for an API wrapper, since it means you’re automatically returning native data types (hashes, lists, strings etc) for most programming languages.

[... 175 words]

Firesheep (via) Oh wow. A Firefox extension that makes sniffing for insecured (non-HTTPS) cookie requests on your current WiFi network and logging in as that person a case of clicking a couple of buttons. Always possible of course, but it’s never been made easy before. Private VPNs are about to become a lot more popular. # 25th October 2010, 9:11 am

What does an ideal Django workflow setup look like?

Short answer: virtualenv, pip, south for migrations, fabric for deployment.

[... 57 words]

Linked Data at the Guardian. The Guardian’s Open Platform API can now be queried by MusicBrainz ID and ISBN, opening up some extremely useful new types of query. # 19th October 2010, 7:11 pm

Is it possible to make API calls without cURL installed?

Yes:

[... 26 words]

jQuery 1.4.3 Released. Once again, the thing that impresses me most about this jQuery release is how stable the core API is. Hardly any new methods added, but the existing methods are made faster, more flexible and more predictable. The same as been true for the past several releases as well. It just keeps getting more and more polished. # 17th October 2010, 12:15 am

What is the best way to hire Solr developers?

Do you really need to hire a Solr specialist? It shouldn’t take a competent developer more than a few days to get familiar with Solr—the HTTP API is extremely easy to work with in my experience. You can always hire in a consultant from one of the companies that provide commercial Solr support for a few days to help your developers get up to scratch.

[... 82 words]

Why do some websites implement their logout link as a form post via JavaScript versus a plain old GET request?

Probably because if you implement logout as a GET action, I can force you to log out of a site by tricking you in to visiting a page with an <img src="http://yoursite.com/logout/" width="1" height="1"> element on it.

[... 64 words]