Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Flickr Shapefiles Public Dataset 1.0. Another awesome Geo dataset from the Yahoo! stable—this time it’s Flickr releasing shapefiles (geometrical shapes) for hundreds of thousands of places around the world, under the CC0 license which makes them essentially public domain. The shapes themselves have been crowdsourced from geocoded photos uploaded to Flickr, where users can “correct” the textual location assigned to each photo. Combine this with the GeoPlanet WOE data and you get a huge, free dataset describing the human geography of the world.

# 22nd May 2009, 6:12 pm / creativecommons, crowdsourcing, flickr, geoplanet, geospatial, maps, shapefiles, yahoo

Fake Reviews. Now now kids, play nice... Not at all surprised to hear this—nefarious iPhone app developers (in this case the team behind “London Tube”, an inferior version of Malcolm Barclay’s marvellous “Tube Deluxe”) have been caught leaving fake negative reviews on rival applications in the App Store. This is an excellent argument for adding friends/followers or importing an existing social graph—I’d much rather see reviews from people in my social network than strangers who may turn out to be sock puppets.

# 22nd May 2009, 12:49 am / apple, appstore, iphone, londontube, malcolm-barclay, social-graph, social-networks, sockpuppets, tubedeluxe

Dinky pocketbooks with WebKit transforms. Nat used 90 degree CSS transform rotations in print stylesheets for WebKit and Safari to create printable cut-out-and-fold pocketbooks from A4 pages. Very neat.

# 22nd May 2009, 12:33 am / css, csstransforms, natalie-downe, pocketbooks, printstyles, rotation, safari, webkit

TwitterAlikeExample—redis. Excellent example of how you design a moderately complex system against a scalable key-value store (in this case redis). Most “how to build Twitter” code examples fail to address the hard problem of scaling user inboxes, but this one tackles it head on.

# 21st May 2009, 11:14 pm / keyvaluepairs, redis, scaling, twitter

Working with Python and RabbitMQ. Nathan Borror eliminates the boilerplate needed to talk to RabbitMQ (or any other AMQP queue server) from Python.

# 21st May 2009, 11:10 pm / amqp, message-queues, nathan-borror, python, rabbitmq

AWS Import/Export: Ship Us That Disk! Andrew Tanenbaum said “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway”, and now you can ship your storage device direct to Amazon and have them load the data in to an S3 bucket for you.

# 21st May 2009, 11:22 am / amazon, andrew-tanenbaum, aws, bandwidth, s3

Yahoo! Placemaker. Really exciting new API from Yahoo!—Placemaker accepts a block of text (or a URL to HTML or RSS) and extracts and returns geographical locations mentioned in the text. I just ran my djng blog entry through it and it pulled out “Prague” as the only location mentioned. This should be really useful for adding geodata to existing textual content.

# 20th May 2009, 9:34 pm / geocoding, geospatial, placemaker, yahoo

Yahoo! Geo: Announcing GeoPlanet Data. The Yahoo! WhereOnEarth geographic data set is fantastic, but I’ve always felt slightly uncomfortable about building applications against it in case the API went away. That’s not an issue any more—the entire dataset is now available to download and use under a Creative Commons Attribution license. It’s not entirely clear what the attribution requirements are—do you have to put “data from GeoPlanet” on every page or can you get away with just tucking the attribution away in an “about this site” page? UPDATE: The data doesn’t include latitude/longitude or bounding boxes, which severely reduces its utility.

# 20th May 2009, 9:12 pm / attribution, creativecommons, data, geoplanet, geospatial, whereonearth, yahoo

Google Maps Data API (via) I’m disappointed by this one—it’s really just a CRUD store for the KML files used in Google MyMaps. It would be a lot more useful if it let you perform geospatial calculations against your stored map data using some kind of query API—a cloud service alternative to tools like PostGIS.

# 20th May 2009, 9:07 pm / apis, gdata, geospatial, google-maps, google-maps-api, googlemapsdataapi, kml, postgis

Offline Processing on App Engine: a Look Ahead. A session at IO next week: “App Engine was designed to run request-driven web applications, although this will change in the coming year with the release of a number of offline computing components. In this session, we’ll explore the task queue/executor model of computation and some of the more interesting applications.”

# 20th May 2009, 12:40 pm / google, google-app-engine, io, message-queues, offlineprocessing, workers

EC2: Creating an Image. Here’s the easier way of creating your own AMI: start with a running instance in EC2, then customise it to fit your purposes and create a new bundle (and then AMI) using the ec2-bundle-vol command.

# 19th May 2009, 7:50 pm / amazon, ami, cloud-computing, ec2

HOWTO Building a self-bundling Debian AMI. Not as terrifying as you would have thought. Also contains some neat hints as to how some of the more magical parts of EC2 work (like the way your SSH public key automatically ends up in /root/.ssh/authorized_keys).

# 19th May 2009, 7:49 pm / amazon, ami, cloud-computing, debian, ec2

Critical Mac OS X Java Vulnerabilities. There’s a five month old Java arbitrary code execution vulnerability which hasn’t yet been patched by Apple. Disable Java applets in your browser until it’s fixed, or random web pages could execute commands on your machine as your user account.

# 19th May 2009, 7:07 pm / apple, applets, browsers, java, macos, security

aws—simple access to Amazon EC2 and S3. The best command line client I’ve found for EC2 and S3. “aws put --progress my-bucket-name/large-file.tar.gz large-file.tar.gz” is particularly useful for uploading large files to S3. Written in Perl (with no dependencies), shelling out to curl to do the heavy lifting.

# 19th May 2009, 11:38 am / amazon-web-services, aws, cli, curl, ec2, perl, s3, tim-kay, tools

Django tip: Caching and two-phased template rendering. Neat trick for expensive pages which can be mostly cached with the exception of the “logged in as” bit—run them through the template system twice, caching the intermediary generated template.

# 19th May 2009, 1:34 am / adrian-holovaty, caching, django, performance, python, template

resty. 58 lines of bash provides a better command-line interface to RESTful APIs, using curl under the hood. This should save me from running “man curl” several times a week.

# 18th May 2009, 1:07 pm / apis, bash, cli, curl, rest, resty

The Little Manual of API Design (PDF). A concise, highly readable guide to designing APIs that are “Complete, Easy to learn and memorize, lead to readable code, hard to misuse, and easy to extend”, based on lessons learnt over many years of development of the Qt framework.

# 18th May 2009, 10:14 am / api-design, apis, programming, qt

python-daemon (via) A library for correctly creating Unix daemon processes in Python, implementing the proposed PEP 3143 API.

# 18th May 2009, 10:12 am / daemon, python, unix

New Features for EC2: Elastic Load Balancing, Auto Scaling, and Amazon CloudWatch. EC2 now fulfils the promise of “magic scaling in the cloud” out of the box—CloudWatch monitors performance of your EC2 instances without needing to install any monitoring software, Auto Scaling allows you to configure “scaling triggers” which start up new instances based on information from CloudWatch, and Elastic Load Balancing balances requests across all available instances.

# 18th May 2009, 10:07 am / amazon, autoscaling, cloud-computing, cloudwatch, ec2, elasticloadbalancing, scaling

python-spidermonkey. A Python to JavaScript bridge using Mozilla Spidermonkey. Expose Python objects to JavaScript, or execute JavaScript from Python.

# 14th May 2009, 3:56 pm / bridge, javascript, mozilla, python, pythonspidermonkey, spidermonkey

Game Web 2.Over? Meg Pickard took the 2005 Web 2.0 logo collage and updated it to show which companies have folded and which have been bought.

# 13th May 2009, 8:20 pm / logos, meg-pickard, web20

Drug seizures: how pure is street cocaine? Neat story on the Guardian Datablog using graphs from Timetric to show that while the purity of cocaine seized by customs over the past five years has stayed constant, the purity of drugs seized by the police has been trending downwards.

# 13th May 2009, 12:34 pm / cocaine, data, drugs, guardian, stats, timetric

South’s Design. Andrew Godwin explains why South resorts to parsing your models.py file in order to construct information about for creating automatic migrations.

# 13th May 2009, 12:30 pm / andrew-godwin, django, models, orm, parsing, python, south

ScenicOrNot. MySociety are crowdsourcing opinions on how “scenic” different parts of the UK are, by rating representative photos from Geograph.

# 12th May 2009, 1:32 pm / crowdsourcing, geograph, mysociety, scenicornot

Contextual (via) I’ve been trying to figure out a sane way to replace Django’s settings.py global module with something that’s designed to be reconfigured at run-time. Contextual appears to be trying to solve exactly that problem.

# 12th May 2009, 1:19 pm / contextual, django, globals, python, settings

Ask MetaFilter’s best introductory books. Part of Phil Gyford’s ongoing quest to “learn about everything”, a list of the best introductory books to a wide range of topics collated from a thread on Ask MetaFilter.

# 12th May 2009, 11:23 am / ask-metafilter, books, education, phil-gyford

[whatwg] Annotating structured data that HTML has no semantics for. Hixie’s proposal for microdata, a simplified RDFa to be included in the HTML5 spec which allows self-contained communities to invent their own microformat-style spec and use it to add structured semantics to their markup. Whether or not you like the proposal itself the explanation is a fascinating read.

# 11th May 2009, 2:41 pm / html5, ian-hickson, markup, microdata, microformats, rdf, rdfa, semantics, whatwg

disturbyte’s zenqueue. Simple, tiny and fast Python message queue server built on top of coroutines and Eventlet, using JSON over TCP as the message format. I’m impressed with how potentially useful this looks considering the small amount of code. The author benchmarks it at 28 thousand messages/second.

# 11th May 2009, 1:27 pm / coroutines, eventlet, github, json, message-queues, python, zachary-voase, zenqueue

Nose 0.11 released. My favourite Python testing tool just got some really neat new features, including the ability to parallelize tests across multiple processes (hence CPUs) using the multiprocess module, Xunit XML output for integration with continuous integration tools and a --failed switch to re-run only the last batch of failed tests.

# 8th May 2009, 11:24 am / multiprocess, nose, python, testing, xunit

Years

Tags