11 items tagged “parsing”
Lark parsing library JSON tutorial (via) A very convincing tutorial for a new-to-me parsing library for Python called Lark.
The tutorial covers building a full JSON parser from scratch, which ends up being just 19 lines of grammar definition code and 15 lines for the transformer to turn that tree into the final JSON.
It then gets into the details of optimization—the default Earley algorithm is quite slow, but swapping that out for a LALR parser (a one-line change) provides a 5x speedup for this particular example. # 13th August 2023, 9:50 pm
Building Search DSLs with Django (via) Neat tutorial by Dan Lamanna: how to build a GitHub-style search feature—supporting modifiers like “is:open author:danlamanna”—using PyParsing and the Django ORM. # 19th June 2023, 8:30 am
Comby (via) Describes itself as “Structural search and replace for any language”. Lets you execute search and replace patterns that look a little bit like simplified regular expressions, but with some deep OCaml-powered magic that makes them aware of comment, string and nested parenthesis rules for different languages. This means you can use it to construct scripts that automate common refactoring or code upgrade tasks. # 18th May 2022, 5:47 am
pup. This is a great idea: a command-line tool for parsing HTML on stdin using CSS selectors. It’s like jq but for HTML. Supports a sensible collection of selectors and has a number of output options for the selected nodes, including plain text and JSON. It also works as a simple pretty-printer for HTML. # 14th February 2020, 4:25 pm
Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide.
HTML 5 Parsing. Firefox nightlies include a new parser that implements the HTML5 parsing algorithm (disabled by default), which uses C++ code automatically generated from Henri Sivonen’s Java parser first used in the HTML5 validator. # 11th July 2009, 11:36 pm