Simon Willison’s Weblog

Subscribe

Items tagged xml in 2008

Filters: Year: 2008 × xml × Sorted by date


How to install lxml python module on mac os 10.5 (leopard). Instructions that work! Finally, I can find out what all the fuss is about. # 15th December 2008, 12:05 am

pyquery. “A jQuery-like library for Python”—implemented on top of lxml, providing jQuery style methods for manipulating an HTML or XML document. # 6th December 2008, 9:53 am

Magnificent Seven—the value of Atom. The seven core things that Atom solves so that you don’t have to. # 19th October 2008, 10:24 pm

cascadenik: cascading sheets of style for mapnik. Great idea. Mapnik (the open source tile rendering system used by OpenStreetMap and others) has a complex style configuration based on XML. Michal Migurski has build a CSS-style equivalent which compiles down to XML, hopefully making it much quicker and easier to get started with Mapnik customisation. # 30th August 2008, 10:04 am

Tip: Configure SAX parsers for secure processing. Explains the billion laughs attack, among others. # 23rd August 2008, 11:12 am

DoS vulnerability in REXML. Ruby’s REXML library is susceptible to the “billion laughs” denial of service attack where recursively nested entities expand a single entitity reference to a billion characters (kind of like the exploding zip file attack). Rails applications that process user-supplied XML should apply the monkey-patch ASAP; a proper gem update is forthcoming. # 23rd August 2008, 11:11 am

My Universal Feed Parser was conceived as a weapon against what I considered the gravest error of XML: draconian error handling. Recently, someone asked me to implement a switch that makes it not fall back on lax parsing in the case of an XML wellformedness error. I said no, not because it would be difficult to implement, but because that defeats its entire reason for being.

Mark Pilgrim # 5th August 2008, 10:52 pm

PDFMiner. Useful looking PDF parsing library in Python—can produce an XML representation of the text and style information in a PDF document. # 3rd August 2008, 3:29 pm

Protocol Buffers: Google’s Data Interchange Format. Open sourced today. Highly efficient binary protocol for storing and transmitting structured data between C++, Java and Python. Uses a .proto file describing the data structure which is compiled to classes in those languages for serializing and deserializing. 3-10 times smaller and 20-100 times faster than XML. # 8th July 2008, 8:20 am

XML is better if you have more text and fewer tags. And JSON is better if you have more tags and less text. Argh! I mean, come on, it’s that easy. But you know, there’s a big debate about it.

Steve Yegge # 15th June 2008, 6:09 pm

Draconian failure on error is not the answer problems of Postel’s law. Draconian error handling creates an unstable equilibrium in Game Theory terms —it only lasts until one player breaks the rule. One non-Draconian XML5 implementation in key client product and the Draconian XML ranks would break. Well-specified error recovery is the right way to implement the liberal part of Postel’s law.

Henri Sivonen # 20th March 2008, 2:43 pm

CouchDB, XML, and E4X. Brilliant—CouchDB now enables SpiderMonkey’s E4X support, meaning CouchDB views can easily query XML documents stored inside JSON objects using E4X syntax. # 5th March 2008, 12:31 am

PrinceXML is extremely impressive. I had a poke at Prince (a commercial package for generating high quality PDFs from HTML, XML, CSS and SVG) a few weeks ago and was similarly impressed. # 8th February 2008, 12:02 pm

Cross-Site XMLHttpRequest (via) “Firefox 3 implements the W3C Access Control working draft, which gives you the ability to do XMLHttpRequests to other web sites”—you can mark a document as available for cross-domain requests using either an Access-Control HTTP header or an XML processing instruction. # 9th January 2008, 11:57 pm