Posts tagged python, xml

Filters: python × xml × Sorted by date

19 results

My First Open Source AI Generated Library (via) Armin Ronacher had Claude and Claude Code do almost all of the work in building, testing, packaging and publishing a new Python library based on his design:

It wrote ~1100 lines of code for the parser

It wrote ~1000 lines of tests

It configured the entire Python package, CI, PyPI publishing

Generated a README, drafted a changelog, designed a logo, made it theme-aware

Did multiple refactorings to make me happier

The project? sloppy-xml-py, a lax XML parser (and violation of everything the XML Working Group hold sacred) which ironically is necessary because LLMs themselves frequently output "XML" that includes validation errors.

Claude's SVG logo design is actually pretty decent, turns out it can draw more than just bad pelicans!

Hand drawn style, orange rough rectangly containing < { s } > - then the text Sloppy XML below in black

I think experiments like this are a really valuable way to explore the capabilities of these models. Armin's conclusion:

This was an experiment to see how far I could get with minimal manual effort, and to unstick myself from an annoying blocker. The result is good enough for my immediate use case and I also felt good enough to publish it to PyPI in case someone else has the same problem.

Treat it as a curious side project which says more about what's possible today than what's necessarily advisable.

I'd like to present a slightly different conclusion here. The most interesting thing about this project is that the code is good.

My criteria for good code these days is the following:

Solves a defined problem, well enough that I'm not tempted to solve it in a different way
Uses minimal dependencies
Clear and easy to understand
Well tested, with tests prove that the code does what it's meant to do
Comprehensive documentation
Packaged and published in a way that makes it convenient for me to use
Designed to be easy to maintain and make changes in the future

sloppy-xml-py fits all of those criteria. It's useful, well defined, the code is readable with just about the right level of comments, everything is tested, the documentation explains everything I need to know, and it's been shipped to PyPI.

I'd be proud to have written this myself.

This example is not an argument for replacing programmers with LLMs. The code is good because Armin is an expert programmer who stayed in full control throughout the process. As I wrote the other day, a skilled individual with both deep domain understanding and deep understanding of the capabilities of the agent.

# 21st June 2025, 11:22 pm / armin-ronacher, open-source, python, xml, ai, generative-ai, llms, ai-assisted-programming, claude, claude-code

Anthropic’s Prompt Engineering Interactive Tutorial (via) Anthropic continue their trend of offering the best documentation of any of the leading LLM vendors. This tutorial is delivered as a set of Jupyter notebooks - I used it as an excuse to try uvx like this:

git clone https://github.com/anthropics/courses
uvx --from jupyter-core jupyter notebook courses

This installed a working Jupyter system, started the server and launched my browser within a few seconds.

The first few chapters are pretty basic, demonstrating simple prompts run through the Anthropic API. I used %pip install anthropic instead of !pip install anthropic to make sure the package was installed in the correct virtual environment, then filed an issue and a PR.

One new-to-me trick: in the first chapter the tutorial suggests running this:

API_KEY = "your_api_key_here"
%store API_KEY

This stashes your Anthropic API key in the IPython store. In subsequent notebooks you can restore the API_KEY variable like this:

%store -r API_KEY

I poked around and on macOS those variables are stored in files of the same name in ~/.ipython/profile_default/db/autorestore.

Chapter 4: Separating Data and Instructions included some interesting notes on Claude's support for content wrapped in XML-tag-style delimiters:

Note: While Claude can recognize and work with a wide range of separators and delimeters, we recommend that you use specifically XML tags as separators for Claude, as Claude was trained specifically to recognize XML tags as a prompt organizing mechanism. Outside of function calling, there are no special sauce XML tags that Claude has been trained on that you should use to maximally boost your performance. We have purposefully made Claude very malleable and customizable this way.

Plus this note on the importance of avoiding typos, with a nod back to the problem of sandbagging where models match their intelligence and tone to that of their prompts:

This is an important lesson about prompting: small details matter! It's always worth it to scrub your prompts for typos and grammatical errors. Claude is sensitive to patterns (in its early years, before finetuning, it was a raw text-prediction tool), and it's more likely to make mistakes when you make mistakes, smarter when you sound smart, sillier when you sound silly, and so on.

Chapter 5: Formatting Output and Speaking for Claude includes notes on one of Claude's most interesting features: prefill, where you can tell it how to start its response:

client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=100,
    messages=[
        {"role": "user", "content": "JSON facts about cats"},
        {"role": "assistant", "content": "{"}
    ]
)

Things start to get really interesting in Chapter 6: Precognition (Thinking Step by Step), which suggests using XML tags to help the model consider different arguments prior to generating a final answer:

Is this review sentiment positive or negative? First, write the best arguments for each side in <positive-argument> and <negative-argument> XML tags, then answer.

The tags make it easy to strip out the "thinking out loud" portions of the response.

It also warns about Claude's sensitivity to ordering. If you give Claude two options (e.g. for sentiment analysis):

In most situations (but not all, confusingly enough), Claude is more likely to choose the second of two options, possibly because in its training data from the web, second options were more likely to be correct.

This effect can be reduced using the thinking out loud / brainstorming prompting techniques.

A related tip is proposed in Chapter 8: Avoiding Hallucinations:

How do we fix this? Well, a great way to reduce hallucinations on long documents is to make Claude gather evidence first.

In this case, we tell Claude to first extract relevant quotes, then base its answer on those quotes. Telling Claude to do so here makes it correctly notice that the quote does not answer the question.

I really like the example prompt they provide here, for answering complex questions against a long document:

<question>What was Matterport's subscriber base on the precise date of May 31, 2020?</question>

Please read the below document. Then, in <scratchpad> tags, pull the most relevant quote from the document and consider whether it answers the user's question or whether it lacks sufficient detail. Then write a brief numerical answer in <answer> tags.

# 30th August 2024, 2:52 am / python, xml, ai, jupyter, prompt-engineering, generative-ai, llms, anthropic, claude, uv

Using memory-profiler to debug excessive memory usage in healthkit-to-sqlite. This morning I figured out how to use the memory-profiler module (and mprof command line tool) to debug memory usage of Python processes. I added the details, including screenshots, to this GitHub issue. It helped me knock down RAM usage for my healthkit-to-sqlite from 2.5GB to just 80MB by making smarter usage of the ElementTree pull parser.

# 24th July 2019, 8:25 am / elementtree, memory, profiling, python, xml

minixsv (via) As far as I can tell, this is the only library that can validate XML using pure Python (no C extension required). I’d be extremely happy if someone would write a pure Python library (or one that only depends on ElementTree, which is included in the standard library) for validating XML against a Relax NG Compact syntax schema. Even DTD validation would be better than nothing!

# 12th August 2009, 4:59 pm / elementtree, minixsv, python, relaxng, validation, xml, xmlschema

xmlwitch. An XML building library for Python that doesn’t suck (I love ElementTree for parsing XML, but I’ve never really liked it for generation). Makes smart use of the with statement.

# 24th July 2009, 12:33 am / python, withstatement, xml, xmlwitch

A few notes on the Guardian Open Platform

This morning we launched the Guardian Open Platform at a well attended event in our new offices in Kings Place. This is one of the main projects I’ve been helping out with since joining the Guardian last year, and it’s fantastic to finally have it out in the open.

[... 839 words]

2:28 pm / 10th March 2009 / apis, atom, contentapi, data, data-journalism, datastore, guardian, javascript, journalism, jquery, json, openplatform, python, simon-rogers, tom-watson, xml

How to install lxml python module on mac os 10.5 (leopard). Instructions that work! Finally, I can find out what all the fuss is about.

# 15th December 2008, 12:05 am / leopard, libxml2, lxml, osx, python, xml

pyquery. “A jQuery-like library for Python”—implemented on top of lxml, providing jQuery style methods for manipulating an HTML or XML document.

# 6th December 2008, 9:53 am / jquery, lxml, pyquery, python, xml

My Universal Feed Parser was conceived as a weapon against what I considered the gravest error of XML: draconian error handling. Recently, someone asked me to implement a switch that makes it not fall back on lax parsing in the case of an XML wellformedness error. I said no, not because it would be difficult to implement, but because that defeats its entire reason for being.

— Mark Pilgrim

# 5th August 2008, 10:52 pm / draconian, feeds, mark-pilgrim, python, universalfeedparser, wellformedness, xml

PDFMiner. Useful looking PDF parsing library in Python—can produce an XML representation of the text and style information in a PDF document.

# 3rd August 2008, 3:29 pm / pdf, pdfminer, python, scraping, xml

Protocol Buffers: Google’s Data Interchange Format. Open sourced today. Highly efficient binary protocol for storing and transmitting structured data between C++, Java and Python. Uses a .proto file describing the data structure which is compiled to classes in those languages for serializing and deserializing. 3-10 times smaller and 20-100 times faster than XML.

# 8th July 2008, 8:20 am / c-plus-plus, google, idf, java, open-source, protocolbuffers, python, xml

Atom Models. Building Python classes that act as utility wrappers around data stored in an lxml DOM object.

# 7th August 2007, 4:02 pm / atom, dom, ian-bicking, lxml, python, xml

Apache Solr 1.1. Solr is the search Web Service built on top of Lucene. The latest release introduces JSON, Python and Ruby response formats in addition to XML.

# 13th January 2007, 1:16 am / json, lucene, python, ruby, search, solr, webservice, xml

lxml (via) A Pythonic wrapper for libxml2.

# 13th April 2005, 5:04 pm / libxml2, python, xml

PHP 5 Release Candidate 1

I haven’t blogged much about PHP in a while because I’ve been up to my nose in mod_python and loving every minute of it. This news is just too important to miss: PHP 5 Release Candidate 1 has been released, bringing the first production-ready release tantilisingly close. While I doubt PHP 5 will tempt me back it’s definitely an exciting upgrade—my biggest complaint with PHP 4 is the brain-dead object model which defaults to copying whole objects rather than passing references, and this is one of the many things addressed by PHP 5. The new libxml2 powered XML features sound really powerful, and SQLite as an on-board database should be ideal for knocking out small stand-alone applications without needing to set up a mySQL database for them.

[... 173 words]

1:27 am / 19th March 2004 / libxml2, php, python, sqlite, xml

XML.com: Lightweight XML Search Servers [Jan. 21, 2004] (via) More fun with Python and libxml2

# 26th January 2004, 2:51 pm / libxml2, python, xml

Ned Batchelder: handyxml. Yet another XML object wrapper for Python, this time with full DOM method support included

# 26th January 2004, 2:52 am / ned-batchelder, python, xml

Using XPath to mine XHTML

This morning, I finally decided to install libxml2 and see what all the fuss was about, in particular with respect to XPath. What followed is best described as an enlightening experience.

[... 576 words]

5:31 am / 21st October 2003 / libxml2, python, xhtml, xml, xpath

xmlhack news wire

xmlhack’s “Editor’s Newswire” is interesting. It is a small column (explained here) located on the right hand side of the site that displays the latest XML news snippets “in real time”. The interesting part is how the section is updated—an IRC bot (The Daily Chump Bot, written in Python) monitors a channel for specific commands from authorised users, and produces an XML file of new snippets. Site updates through IRC (or instant messenging services such as MSN or Jabber) is a concept which we could see a lot more of, especially in this age of web services.

12:50 pm / 10th July 2002 / python, xml

Simon Willison’s Weblog