Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

40 items tagged “xml”

Introducing BERT and BERT-RPC. Justification for inventing a brand new serialisation protocol: Thrift and Protocol Buffers both use IDLs and code generation, XML “is not convertible to a simple unambiguous data structure in any language I’ve ever used” and JSON lacks support for unencoded binary data. The result is BERT—Binary ERlang Term—which extracts a format from Erlang in much the same way that JSON extracted one from JavaScript. 0 21st October 2009, 10:11 pm

minixsv (via) As far as I can tell, this is the only library that can validate XML using pure Python (no C extension required). I’d be extremely happy if someone would write a pure Python library (or one that only depends on ElementTree, which is included in the standard library) for validating XML against a Relax NG Compact syntax schema. Even DTD validation would be better than nothing! 1 12th August 2009, 4:59 pm

xmlwitch. An XML building library for Python that doesn’t suck (I love ElementTree for parsing XML, but I’ve never really liked it for generation). Makes smart use of the with statement. 1 24th July 2009, 12:33 am

With YQL Execute, the Internet becomes your database. This is nuts (in a good way). Yahoo!’s intriguing universal SQL-style XML/JSONP web service interface now supports JavaScript as a kind of stored procedure language, meaning you can use JavaScript and E4X to screen-scrape web pages, then query the results with YQL. 1 29th April 2009, 10:50 pm

A few notes on the Guardian Open Platform

This morning we launched the Guardian Open Platform at a well attended event in our new offices in Kings Place. This is one of the main projects I’ve been helping out with since joining the Guardian last year, and it’s fantastic to finally have it out in the open. [... 839 words]

JsonML (JSON Markup Language). An almost non-lossy serialization format for sending XML as JSON (plain text in between elements is ignored). Uses the (element-name, attribute-dictionary, list-of-children) tuple format, which sadly means many common cases end up taking more bytes than the original XML. Still an improvement on serializations that behave differently when a list of children has only one item in it. 4 10th February 2009, 3:03 pm

Crowbar. Headless Gecko/XULRunner which exposes a web service API for screen scraping using a real browser DOM—just pass it the URL of a page and the URL of a screen scraping JavaScript script (a bit like a Greasemonkey user script) and get back RDF/XML. 2 24th January 2009, 11:52 pm

How to install lxml python module on mac os 10.5 (leopard). Instructions that work! Finally, I can find out what all the fuss is about. 1 15th December 2008, 12:05 am

pyquery. “A jQuery-like library for Python”—implemented on top of lxml, providing jQuery style methods for manipulating an HTML or XML document. 1 6th December 2008, 9:53 am

Magnificent Seven—the value of Atom. The seven core things that Atom solves so that you don’t have to. 0 19th October 2008, 10:24 pm

cascadenik: cascading sheets of style for mapnik. Great idea. Mapnik (the open source tile rendering system used by OpenStreetMap and others) has a complex style configuration based on XML. Michal Migurski has build a CSS-style equivalent which compiles down to XML, hopefully making it much quicker and easier to get started with Mapnik customisation. 1 30th August 2008, 10:04 am

Tip: Configure SAX parsers for secure processing. Explains the billion laughs attack, among others. 0 23rd August 2008, 11:12 am

DoS vulnerability in REXML. Ruby’s REXML library is susceptible to the “billion laughs” denial of service attack where recursively nested entities expand a single entitity reference to a billion characters (kind of like the exploding zip file attack). Rails applications that process user-supplied XML should apply the monkey-patch ASAP; a proper gem update is forthcoming. 2 23rd August 2008, 11:11 am

My Universal Feed Parser was conceived as a weapon against what I considered the gravest error of XML: draconian error handling. Recently, someone asked me to implement a switch that makes it not fall back on lax parsing in the case of an XML wellformedness error. I said no, not because it would be difficult to implement, but because that defeats its entire reason for being.

Mark Pilgrim 0 5th August 2008, 10:52 pm

PDFMiner. Useful looking PDF parsing library in Python—can produce an XML representation of the text and style information in a PDF document. 0 3rd August 2008, 3:29 pm

Protocol Buffers: Google’s Data Interchange Format. Open sourced today. Highly efficient binary protocol for storing and transmitting structured data between C++, Java and Python. Uses a .proto file describing the data structure which is compiled to classes in those languages for serializing and deserializing. 3-10 times smaller and 20-100 times faster than XML. 10 8th July 2008, 8:20 am

XML is better if you have more text and fewer tags. And JSON is better if you have more tags and less text. Argh! I mean, come on, it’s that easy. But you know, there’s a big debate about it.

Steve Yegge 1 15th June 2008, 6:09 pm

Draconian failure on error is not the answer problems of Postel’s law. Draconian error handling creates an unstable equilibrium in Game Theory terms —it only lasts until one player breaks the rule. One non-Draconian XML5 implementation in key client product and the Draconian XML ranks would break. Well-specified error recovery is the right way to implement the liberal part of Postel’s law.

Henri Sivonen 0 20th March 2008, 2:43 pm

CouchDB, XML, and E4X. Brilliant—CouchDB now enables SpiderMonkey’s E4X support, meaning CouchDB views can easily query XML documents stored inside JSON objects using E4X syntax. 0 5th March 2008, 12:31 am

PrinceXML is extremely impressive. I had a poke at Prince (a commercial package for generating high quality PDFs from HTML, XML, CSS and SVG) a few weeks ago and was similarly impressed. 0 8th February 2008, 12:02 pm

Cross-Site XMLHttpRequest (via) “Firefox 3 implements the W3C Access Control working draft, which gives you the ability to do XMLHttpRequests to other web sites”—you can mark a document as available for cross-domain requests using either an Access-Control HTTP header or an XML processing instruction. 0 9th January 2008, 11:57 pm

PostgreSQL 8.3 beta 4 release notes. In addition to the huge speed improvements, 8.3 adds support for XML, UUID and ENUM data types and brings full text (tsearch2) in to the core database engine. 0 12th December 2007, 12:43 am

[Release] CouchDB 0.7.0. This is a huge milestone for the project—it’s the first official release to include the JSON REST API instead of XML, and it’s also the first release that is “intended for widespread use”. 0 17th November 2007, 12:25 am

The larger question is why on earth, in 2007 and ten years after XML came out, we are still using text files that don’t label their encoding?

Rick Jelliffe 2 8th October 2007, 12:27 pm

Atom Models. Building Python classes that act as utility wrappers around data stored in an lxml DOM object. 0 7th August 2007, 4:02 pm

=drummond XRDS. Bookmarked so I can remember how to easily resolve someone’s i-name. 0 8th May 2007, 8:27 pm

Introduction and Yahoo! Pipes. The official Google Maps API blog describes how to plot KML output from Yahoo! Pipes. 0 3rd May 2007, 10 pm

XML and JSON. James Clark on JSON’s strengths and weaknesses compared to XML. 0 9th April 2007, 8:57 pm

A binary compatible wire call is still a binary compatible wire call, no matter how much XML you put on it.

Bill de hÓra 0 23rd March 2007, 12:56 am

Highrise Forum: Using the undocumented API. Add .xml to the end of many URLs in Highrise to get an XML representation of that page. 3 19th March 2007, 11:29 pm

json-taglib. Because JSON just doesn’t have enough angle brackets. 3 4th March 2007, 8:52 pm

Introducing RDFa. A way of representing RDF triples in XML that doesn’t suck. 0 15th February 2007, 12:22 am

XForms in Firefox (via) Practical tutorial on taking advantage of the Firefox XForms plugin, sadly not yet bundled with the browser itself. 0 26th January 2007, 9:59 am

Which is the real explanation of where the name XMLHTTP comes from- the thing is mostly about HTTP and doesn’t have any specific tie to XML other than that was the easiest excuse for shipping it so I needed to cram XML into the name (plus- XML was the hot technology at the time and it seemed like some good marketing for the component).

Alex Hopmann 1 24th January 2007, 8:48 pm

Apache Solr 1.1. Solr is the search Web Service built on top of Lucene. The latest release introduces JSON, Python and Ruby response formats in addition to XML. 0 13th January 2007, 1:16 am

Seems easy to me; if you want to serialize a data structure that’s not too text-heavy and all you want is for the receiver to get the same data structure with minimal effort, and you trust the other end to get the i18n right, JSON is hunky-dory.

Tim Bray 3 22nd December 2006, 12:47 am

Why JSON isn’t just for JavaScript

Dave Winer’s discovery of JSON (and shock that “it’s not even XML”) has triggered an interesting discussion thread, on his blog and elsewhere. Plenty of people have re-assured him (and themselves) that it’s only used for JavaScript—it’s convenient in the browser but irrelevant elsewhere. [... 787 words]

I read on Niall Kennedy that del.icio.us has come up with an API that returns a JSON structure, and I figured, sheez it can’t be that hard to parse, so let’s see what it looks like, and damn, IT’S NOT EVEN XML! [...] Who did this travesty? Let’s find a tree and string them up. Now.

Dave Winer 8 20th December 2006, 7:21 pm

Parsing XML can open network sockets (via) Yikes. Something to bare in mind. 0 18th August 2006, 2:27 pm

XML security on SitePoint

Getting Started with XML Security is a SitePoint article of epic proportions. I had never really looked at any of the XML security applications but this article appears to cover the lot. [... 33 words]

A django site