Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

7 items tagged “mapreduce”

BashReduce. Map/Reduce in Bash is no longer a joke project (if it ever was)—Richard Crowley is extending it and using it for analysis at OpenDNS. 0 28th June 2009, 3:03 pm

Finding similar items with Amazon Elastic MapReduce, Python, and Hadoop streaming. Tutorial for running Hadoop jobs on Elastic MapReduce using Python and the 2005 Audioscrobbler dataset. 0 7th April 2009, 9:19 am

Amazon Elastic MapReduce (via) Hadoop as a service. Basically a web based GUI around Hadoop—you could roll this yourself on EC2 but for a small markup on regular EC2 prices you get to avoid the extra work setting everything up. Data processing scripts can be written in Java, Ruby, Perl, Python, PHP, R, or C++ and are loaded in to S3 before firing off the job. 0 2nd April 2009, 10:25 am

Cascading. A Java API abstraction layer over Hadoop that lets developers think in terms of pipes and filters rather than map/reduce. The Cascading developers claim that this model is easier to understand and less error prone. 0 1st October 2008, 1:22 pm

Python + Hadoop = Flying Circus Elephant. Last.fm have released Dumbo, a Python module that lets you easily write Hadoop map/reduce tasks using Python and generators. 0 31st May 2008, 2:14 pm

Writing An Hadoop MapReduce Program In Python. Hadoop (the open source map/reduce framework) can interact with any program that reads from stdin and outputs on stdout—so it’s trivial to drop in Python scripts for the map and reduce steps. 0 9th October 2007, 11:33 am

CouchDB: Thinking beyond the RDBMS. CouchDB is a fascinating project—an Erlang powered non-relational database with a JSON API that lets you define “views” (really computed tables) based on JavaScript functions that execute using map/reduce. Damien Katz, the main developer currently works for MySQL and used to work on Lotus Notes. 1 3rd September 2007, 9:48 am

A django site