51 items tagged “xml”
I’ve been using Evernote for over a decade, and I’ve long wanted to export my data from it so I can do interesting things with it.[... 1879 words]
xml-analyser. In building evernote-to-sqlite I dusted off an ancient (2009) project I built that scans through an XML file and provides a summary of what elements are present in the document and how they relate to each other. I’ve now packaged it up as a CLI app and published it on PyPI. # 12th October 2020, 12:41 am
Using memory-profiler to debug excessive memory usage in healthkit-to-sqlite. This morning I figured out how to use the memory-profiler module (and mprof command line tool) to debug memory usage of Python processes. I added the details, including screenshots, to this GitHub issue. It helped me knock down RAM usage for my healthkit-to-sqlite from 2.5GB to just 80MB by making smarter usage of the ElementTree pull parser. # 24th July 2019, 8:25 am
Convert Locations.kml (pulled from an iPhone backup) to SQLite. I’ve been playing around with data from my iPhone using the iPhone Backup Extractor app and one of the things it exports for you is a Locations.kml file full of location history data. I wrote a tiny script using Python’s ElementTree XMLPullParser to efficiently iterate through the Placemarks and yield them as dictionaries, which I then batch-inserted into sqlite-utils to create a SQLite database. # 14th June 2019, 12:45 am
Ever wondered which UK Members of Parliament get gifted the most helicopter rides? How about which MPs have been given Christmas hampers by the Sultan of Brunei? (David Cameron, William Hague and Michael Howard apparently). Here’s how to dig through the Register of Members Interests using SQL and Datasette.[... 1167 words]
Has JSON pretty much replaced XML for string processing for the web, or are there use cases where XML is still necessary?
It’s replaced XML as the default format for most APIs. XML is still necessary for Atom/RSS feeds and other existing standards built on top of XML. It’s also a better choice than JSON for markup-style data—stuff like XHTML where tags are applied to sequences of characters within larger chunks of text.[... 81 words]
It sounds like you’re pretty much screwed already, if you’re dealing with companies that still think FTPing XML around is a sensible thing to do.[... 364 words]
Not a lot. XHTML transitional lets you use a few presentational attributes and elements that aren’t available in XHTML strict. Here’s a more detailed overview from back in 2005: http://24ways.org/2005/transitio...[... 59 words]
minixsv (via) As far as I can tell, this is the only library that can validate XML using pure Python (no C extension required). I’d be extremely happy if someone would write a pure Python library (or one that only depends on ElementTree, which is included in the standard library) for validating XML against a Relax NG Compact syntax schema. Even DTD validation would be better than nothing! # 12th August 2009, 4:59 pm
xmlwitch. An XML building library for Python that doesn’t suck (I love ElementTree for parsing XML, but I’ve never really liked it for generation). Makes smart use of the with statement. # 24th July 2009, 12:33 am
This morning we launched the Guardian Open Platform at a well attended event in our new offices in Kings Place. This is one of the main projects I’ve been helping out with since joining the Guardian last year, and it’s fantastic to finally have it out in the open.[... 839 words]
JsonML (JSON Markup Language). An almost non-lossy serialization format for sending XML as JSON (plain text in between elements is ignored). Uses the (element-name, attribute-dictionary, list-of-children) tuple format, which sadly means many common cases end up taking more bytes than the original XML. Still an improvement on serializations that behave differently when a list of children has only one item in it. # 10th February 2009, 3:03 pm
How to install lxml python module on mac os 10.5 (leopard). Instructions that work! Finally, I can find out what all the fuss is about. # 15th December 2008, 12:05 am
cascadenik: cascading sheets of style for mapnik. Great idea. Mapnik (the open source tile rendering system used by OpenStreetMap and others) has a complex style configuration based on XML. Michal Migurski has build a CSS-style equivalent which compiles down to XML, hopefully making it much quicker and easier to get started with Mapnik customisation. # 30th August 2008, 10:04 am
DoS vulnerability in REXML. Ruby’s REXML library is susceptible to the “billion laughs” denial of service attack where recursively nested entities expand a single entitity reference to a billion characters (kind of like the exploding zip file attack). Rails applications that process user-supplied XML should apply the monkey-patch ASAP; a proper gem update is forthcoming. # 23rd August 2008, 11:11 am
My Universal Feed Parser was conceived as a weapon against what I considered the gravest error of XML: draconian error handling. Recently, someone asked me to implement a switch that makes it not fall back on lax parsing in the case of an XML wellformedness error. I said no, not because it would be difficult to implement, but because that defeats its entire reason for being.
Protocol Buffers: Google’s Data Interchange Format. Open sourced today. Highly efficient binary protocol for storing and transmitting structured data between C++, Java and Python. Uses a .proto file describing the data structure which is compiled to classes in those languages for serializing and deserializing. 3-10 times smaller and 20-100 times faster than XML. # 8th July 2008, 8:20 am
XML is better if you have more text and fewer tags. And JSON is better if you have more tags and less text. Argh! I mean, come on, it’s that easy. But you know, there’s a big debate about it.
Draconian failure on error is not the answer problems of Postel’s law. Draconian error handling creates an unstable equilibrium in Game Theory terms —it only lasts until one player breaks the rule. One non-Draconian XML5 implementation in key client product and the Draconian XML ranks would break. Well-specified error recovery is the right way to implement the liberal part of Postel’s law.
CouchDB, XML, and E4X. Brilliant—CouchDB now enables SpiderMonkey’s E4X support, meaning CouchDB views can easily query XML documents stored inside JSON objects using E4X syntax. # 5th March 2008, 12:31 am
PrinceXML is extremely impressive. I had a poke at Prince (a commercial package for generating high quality PDFs from HTML, XML, CSS and SVG) a few weeks ago and was similarly impressed. # 8th February 2008, 12:02 pm