Research
Filters: Sorted by date
67 results
«« first
« previous
page 2 / 3
next »
last »»
Research
bs4-justhtml-investigation
— BeautifulSoup 4 can be integrated with JustHTML, a pure Python HTML5 parser, enabling full compliance with the HTML5 parsing algorithm according to the WHATWG specification. By implementing a custom `JustHTMLTreeBuilder`, BeautifulSoup’s parser plugin system can leverage JustHTML for parsing, allowing seamless use of BeautifulSoup’s familiar API and features—like `find_all()` and CSS selectors—while inheriting robust, standards-adherent HTML handling.
Research
litestream-restarts
— Experiments in this project evaluate Litestream’s robustness when SQLite writes occur while Litestream is stopped and later restarted, with focus on replication to S3. Both the simple restart and the scenario where the WAL is checkpointed (truncated) while Litestream is offline confirm no data loss: Litestream either streams pending WAL changes upon restart or detects a database change and uploads a new full snapshot (“generation”).
Research
streaming-file-upload-prototype
— Demonstrating efficient large file uploads, this prototype integrates the streaming-form-data library with a Starlette-based ASGI server to enable true streaming of multipart file data directly to disk, bypassing memory bottlenecks. It incrementally parses incoming form data and supports checksum calculation on-the-fly, handling multiple simultaneous file uploads via async workflows.
Research
vite-wasm-browser-compiler
— Investigating the feasibility of Vite as a browser-based bundler, this project demonstrates that while Vite itself cannot operate directly in the browser due to its Node.js dependencies, client-side file bundling is achievable using alternative strategies. Three approaches were prototyped: a pure JavaScript "simple" bundler for inlining assets, an esbuild-wasm browser integration for ES module support, and full Vite bundling via StackBlitz WebContainers using vite-plugin-singlefile.
Research
js-api-tagger
— Efficiently categorizing the 155 HTML tools in simonw/tools by their JavaScript API usage, this project developed an automated pipeline combining Cheerio for HTML parsing and Acorn for JavaScript AST analysis. The solution robustly filters out false positives from comments, strings, and non-code regions, accurately tagging over 60 Web APIs and handling modern ES modules and edge script types.
Research
ast-grep-import-rewriter
— Leveraging ast-grep and custom YAML rules, the AST-Grep Import Rewriter offers a structured approach to automatically extract, analyze, and rewrite obfuscated JavaScript import statements across ES6, CommonJS, dynamic imports, and webpack bundles. By parsing source files, it generates mapping templates and applies user-defined mappings, converting unreadable module paths into meaningful names with either regex- or AST-based transformations.
Research
offline-notes-sync
— Building on offline-first principles, this notes sync system enables robust note creation and editing without active internet connectivity, using IndexedDB and service workers on the client side. It employs operation-based sync and vector clocks for fine-grained conflict detection and resolution, and features a three-way character-level merge algorithm inspired by Apple Notes.
Research
epsilon-python-wrapper
— Epsilon Python Wrapper provides seamless Python bindings to Epsilon, Google's pure Go WebAssembly 2.0 runtime, enabling efficient and dependency-free WASM execution within Python projects. The wrapper exposes a simple API for module instantiation, function calls (with type safety), memory operations, and export inspection, supporting advanced features like SIMD and resource limiting.
Research
datasette-lite-js-init
— Datasette-lite faces a core limitation: HTML content injected via `innerHTML` does not execute embedded JavaScript, breaking interactive features and plugin functionality. The proposed solution introduces a standardized initialization event (`datasette_init`) triggered after each content update, allowing dependent scripts and plugins to reinitialize reliably.
Research
datasette-lite-npm-package
— Converting Datasette Lite into a self-hostable NPM package enables seamless client-side data exploration using SQLite, CSV, JSON, and Parquet files directly in the browser, powered by Pyodide. The project removes analytics, adds a CLI server for local testing, and exposes all necessary static assets for easy deployment to platforms like GitHub Pages, Netlify, or Vercel.
Research
sqlite-ripgrep-function
— SQLite Ripgrep Function enables fast code and text search inside SQLite queries by integrating the powerful ripgrep search tool as a custom SQL function. It offers both a pure Python implementation and a performant C extension, allowing users to search files within a configurable directory, restrict output with glob patterns (e.g., `*.py`), and enforce time limits to avoid runaway queries.
Research
apptron-analysis
— Apptron is a browser-based cloud IDE that hosts a full x86 Linux environment using emulation and WebAssembly, delivering a seamless developer experience directly in the browser. By tightly integrating VS Code, a Linux terminal, and persistent cloud storage via Cloudflare R2, users are able to work on customizable environments without any local setup.
Research
github-cli-api-proxy
— Proxying GitHub CLI (`gh`) API traffic can be achieved through standard HTTP/HTTPS proxies or via a Unix domain socket, each suited to different use cases and levels of flexibility. The CLI, implemented in Go, natively supports proxy environment variables (`HTTPS_PROXY`, `HTTP_PROXY`, `NO_PROXY`), making integration with existing HTTP proxies seamless and requiring no changes to the CLI configuration.
Research
self-host-datasette-lite
— Datasette Lite, a browser-based SQLite explorer powered by Pyodide and WebAssembly, can be fully self-hosted and used offline by bundling all core files, required Python wheels, and optional sample databases locally instead of relying on external CDNs and PyPI hosts.
Research
datasette-sql-permissions-review
— A comprehensive architecture review of Datasette's new SQL-based permissions system (introduced in v1.0a20) finds that transitioning from a callback-driven model to SQL query resolution greatly improves scalability for large deployments. The redesigned system efficiently checks access by evaluating compiled permission rules through internal catalog tables, substantially reducing processing overhead compared to the multiplicative N x M callback pattern.
Research
sqlite-utils-iterator-support
— Enhancements to the sqlite-utils library now allow its `insert_all` and `upsert_all` methods to efficiently process Python iterators yielding lists, in addition to the original dict-based input. Detection of the iterator type is automatic and maintains full backward compatibility, streamlining bulk inserts from row-based data sources like CSV streams and reducing memory usage by avoiding dict construction.
Research
svg-to-png-comparison
— Multiple Python-based approaches for converting SVG files to PNG were benchmarked using the tiger.svg image, evaluating file size, output quality, and ease of installation. Pure Python solutions like CairoSVG and svglib+reportlab offered simple pip-based installs with predictable PNGs, though svglib lacks alpha channel support. Wand (ImageMagick bindings) and ImageMagick CLI yielded the highest quality output (16-bit RGBA) at the cost of larger files and system-level dependencies.
Research
svg-to-png-renderer
— A lightweight SVG to PNG renderer has been developed using Python, leveraging the `xml.etree.ElementTree` and `Pillow` libraries to parse SVG XML data and convert it to raster PNG images. This minimal library supports a range of SVG elements, including paths, basic shapes, and containers, as well as attributes such as colors, styling, and transforms.
Research
yt-dlp-install-report
— A detailed analysis of installing `yt-dlp[default]` via pip on Linux with Python 3.11 reveals that the process brings in six new packages totaling about 39 MB and over 3,000 files, including 44 binary libraries (mainly for cryptography and compression) consuming 8.55 MB.
Research
absurd-in-sqlite
— Durable execution workflows can be implemented using SQLite, as demonstrated by the Absurd-in-SQLite project, which is inspired by Armin Ronacher's Absurd. This project provides a proof-of-concept implementation of durable execution using SQLite, allowing for reliable and long-running workflows that can survive crashes and network failures.
Research
uv-run-flow-analysis
— Running `uv run myscript.py` in a directory with a `pyproject.toml` launches a multi-phase workflow that automates Python script execution within an isolated, dependency-managed environment. uv scans for project metadata, resolves and validates interpreter and package requirements, manages virtual environments, locks dependencies with a TOML-based `uv.lock` file using the PubGrub algorithm, efficiently syncs the environment with parallel downloads and caching, and finally executes the desired…
Research
llm-pyodide-openai-plugin
— Leveraging the LLM Python package and pyodide, this project successfully adapts LLM’s OpenAI model interface for direct use in browser environments by bypassing the standard openai library (which fails in browsers due to its httpx dependency) and instead using the browser-native fetch API for CORS-compliant API calls.
Research
codex-sandbox-investigation
— OpenAI Codex CLI's sandbox employs strong, platform-specific isolation to securely constrain the behavior of AI-driven code agents. On macOS, it uses Apple's Seatbelt sandbox with finely tuned dynamic policies, while on Linux, it combines Landlock for strict filesystem controls and seccomp for syscall-based network blocking—ensuring that agents can only write to user-approved directories and have no outgoing network by default.
Research
env86-analysis
— env86 is a Go-based management tool that enables users to run x86 Linux virtual machines within browser contexts via the v86 WebAssembly emulator. By combining a native desktop application (embedding a browser), a robust CLI, and an integrated virtual networking stack, env86 provides an easily distributable and reproducible Linux environment that can boot instantly from snapshots, support host-guest communication, and mount host filesystems.
Research
h3-library-benchmark
— A systematic performance benchmark was conducted on two prominent Python libraries implementing Uber's H3 geospatial indexing system: h3-py (official, C-based) and h3o-python (Rust-based). Results show h3o-python consistently outperforms h3-py on core operations, achieving over 2x speedup for coordinate conversions and up to 13x faster neighbor queries, while area calculations remain comparable.
Research
sqlite-query-linter
— The SQLite Query Linter is a lightweight Python library that wraps the standard `sqlite3` module to provide configurable linting and rule-based analysis of SQL queries before execution. Acting as a drop-in replacement, it helps catch common syntax errors and platform incompatibilities—such as invalid types in `CAST`, use of unsupported functions, `SELECT *`, missing `WHERE` clauses, and string quoting mistakes—helping developers avoid runtime errors and improve code quality.
Research
h3o-python
— h3o-python delivers efficient Python bindings for the h3o Rust library, enabling fast and convenient access to H3 geospatial indexing from Python. Utilizing PyO3 and packaged with maturin, it allows encoding geographic coordinates into 64-bit H3 cell indexes, decoding indexes, performing neighborhood queries, calculating great-circle distances, and retrieving surface area metrics—all without requiring a separate H3 installation.
Research
wazero-python-claude
— Wazero Python Bindings enable seamless integration of the wazero WebAssembly runtime—written in Go—with Python applications, delivering a zero-dependency solution for running WASM modules natively from Python. The project exposes a clean, Pythonic API for instantiating modules, calling exported WASM functions, and managing resources efficiently with context managers. Performance benchmarks demonstrate rapid execution and minimal overhead between Python and WASM.
Research
blog-tags-scikit-learn
— Automatically assigning meaningful tags to historic, untagged blog posts, this project leverages the Simon Willison blog database and scikit-learn to train and compare multi-label text classification models. Four approaches—TF-IDF + Logistic Regression, Multinomial Naive Bayes, Random Forest, and LinearSVC—were tested on posts’ title and body text using the 158 most frequently used tags.
Research
datasette-plugin-skill
— Covering every aspect of Datasette plugin development, this project creates a comprehensive skill set for authors—from bootstrapping with cookiecutter to deploying on GitHub and PyPI. It provides precise guides and working code samples for essential plugin hooks like custom SQL functions, authentication, custom views, and output formats.
«« first
« previous
page 2 / 3
next »
last »»