Blogmarks

Filters: Sorted by date

8,406 results «« first « previous page 80 / 281 next » last »»

Dogsheep: Personal analytics with Datasette. The second edition of my new Datasette Weekly newsletter, talks about Dogsheep, Dogsheep Beta, Datasette 1.0 and features datasette-cluster-map as the plugin of the week.

# 19th October 2020, 4:38 pm / datasette, dogsheep

xml-analyser. In building evernote-to-sqlite I dusted off an ancient (2009) project I built that scans through an XML file and provides a summary of what elements are present in the document and how they relate to each other. I’ve now packaged it up as a CLI app and published it on PyPI.

# 12th October 2020, 12:41 am / cli, projects, xml

evernote-to-sqlite (via) The latest tool in my Dogsheep series of utilities for personal analytics: evernote-to-sqlite takes Evernote note exports en their ENEX XML format and loads them into a SQLite database. Embedded images are loaded into a BLOB column and the output of their cloud-based OCR system is added to a full-text search index. Notes have a latitude and longitude which means you can visualize your notes on a map using Datasette and datasette-cluster-map.

# 12th October 2020, 12:38 am / projects, sqlite, datasette, dogsheep

Datasette Weekly: Datasette 0.50, git scraping, extracting columns (via) The first edition of the new Datasette Weekly newsletter—covering Datasette 0.50, Git scraping, extracting columns with sqlite-utils and featuring datasette-graphql as the first “plugin of the week”

# 10th October 2020, 9 pm / email, projects, sqlite, graphql, datasette, git-scraping, sqlite-utils

Datasette Weekly (via) I’m trying something new: I’ve decided to start an email newsletter called the Datasette Weekly (I’m already worried I’ll regret that weekly promise) which will share news about Datasette and the Datasette ecosystem, plus tips and tricks for getting the most out of Datasette and SQLite.

# 10th October 2020, 7:05 pm / email, projects, datasette

Animating a commit based Sudoku game using Puppeteer (via) This is really clever. There’s a GitHub repo that tracks progress in a game of Sudoku: Anish Karandikar wrote code which iterates through the game board state commit by commit, uses that state to generate an HTML table, passes that table to Puppeteer using a data: URI, renders a PNG of each stage and then concatenates those PNGs together into an animated GIF using the gifencoder Node.js library.

# 9th October 2020, 10:28 pm / data-urls, gifs, puppeteer

Bedrock: The SQLitening (via) Back in March 2018 www.mozilla.org switched over to running on Django using SQLite! They’re using the same pattern I’ve been exploring with Datasette: their SQLite database is treated as a read-only cache by their frontend servers, and a new SQLite database is built by a separate process and fetched onto the frontend machines every five minutes by a scheduled task. They have a healthcheck page which shows the latest version of the database and when it was fetched, and even lets you download the 25MB SQLite database directly (I’ve been exploring it using Datasette).

# 7th October 2020, 11:47 pm / django, mozilla, sqlite, datasette, baked-data

Running Datasette on DigitalOcean App Platform (via) I spent some time with DigitalOcean’s new App Platform today, which is a Heroku-style PaaS that starts at $5/month. It looks like it could be a really good fit for Datasette. Disk is ephemeral, but if you’re publishing read-only data that doesn’t matter since you can build the SQLite database as part of the deployment and bundle it up in the Docker/Kubernetes container.

# 7th October 2020, 2:52 am / deployment, heroku, datasette, digitalocean

Potential new elevator pitch / tagline for Datasette: The best way to publish data online (via) One of the biggest challenges I’ve had with Datasette is compressing it into a single elevator pitch or tagline that helps answer the question “what does this software do?”—the project does a lot of different things, so finding the right angle for explaining it has proved really difficult. I’m workshopping a new tagline over on the Datasette discussion forum—feedback, suggestions and challenges very welcome!

# 4th October 2020, 12:03 am / marketing, pitching, datasette

I was wrong. CRDTs are the future (via) Joseph Gentle has been working on collaborative editors since being a developer on Google Wave back in 2010, later building ShareJS. He’s used Operational Transforms throughout, due to their performance and memory benefits over CRDTs (Conflict-free replicated data types)—but the latest work in that space from Martin Kleppmann and other researchers has seen him finally switch allegiance to these newer algorithms. As a long-time fan of collaborative editing (ever since the Hydra/SubEthaEdit days) I thoroughly enjoyed this as an update on how things have evolved over the past decade.

# 28th September 2020, 9:03 pm / algorithms, collaboration, crdt, martin-kleppmann

elite-source.asm—annotated source code for Elite on the BBC Micro (via) Mark Moxon has annotated every single line of the source code for Elite on the BBC Micro, and his annotations are so clear and in-depth that I can follow it despite knowing next to nothing about assembly code (and certainly nothing about writing it for the BBC).

# 28th September 2020, 2:30 am / programming, retro

datasette-dateutil (via) New Datasette plugin exposing date/time parsing custom SQL functions powered by the classic dateutil Python library.

# 28th September 2020, 12:33 am / dateutil, plugins, projects, datasette

DuckDB (via) This is a really interesting, relatively new database. It’s kind of a weird hybrid between SQLite and PostgreSQL: it uses the PostgreSQL parser but models itself after SQLite in that databases are a single file and the code is designed for use as an embedded library, distributed in a single amalgamation C++ file (SQLite uses a C amalgamation). It features a “columnar-vectorized query execution engine” inspired by MonetDB (also by the DuckDB authors) and is hence designed to run analytical queries really quickly. You can install it using “pip install duckdb”—the resulting module feels similar to Python’s sqlite3, and follows roughly the same DBAPI pattern.

# 19th September 2020, 11:43 pm / databases, postgresql, sqlite, duckdb

Array programming with NumPy—the NumPy paper (via) The NumPy paper is out, published in Nature. I found this enlightening: for an academic paper it’s very understandable, and it filled in quite a few gaps in my mental model of what NumPy is and which problems it addresses, as well as its relationship to the many other tools in the scientific Python stack.

# 17th September 2020, 4:34 pm / python, scipy, numpy

“I Have Blood on My Hands”: A Whistleblower Says Facebook Ignored Global Political Manipulation (via) Sophie Zhang worked as the data scientist for the Facebook Site Integrity fake engagement team. She gave up her severance package in order to speak out internally about what she saw there, and someone leaked her memo to BuzzFeed News. It’s a hell of a story: she saw bots and coordinated manual accounts used to influence politics in countries all around the world, and found herself constantly making moderation decisions that had lasting political impact. “With no oversight whatsoever, I was left in a situation where I was trusted with immense influence in my spare time". This sounds like a nightmare—imagine taking on responsibility for protecting democracy in so many different places.

# 15th September 2020, 9:11 pm / facebook, moderation, social-media

Stories of reaching Staff-plus engineering roles (via) Extremely useful collection of career stories from staff-level engineers at a variety of different companies, collected by Will Larson.

# 11th September 2020, 3:30 am / will-larson, careers, management

15 rules for blogging, and my current streak (via) Matt Webb is on a 24 week streak of blogging multiple posts a week and shares his rules on how he’s doing this. These are really good rules. A rule of thumb that has helped me a lot is to fight back against the temptation to make a post as good as I can before I publish it— because that way lies a giant drafts folder and no actual published content. “Perfect is the enemy of shipped”.

# 10th September 2020, 6:09 pm / blogging, matt-webb, streaks

AVIF has landed. AVIF support landed in Chrome 85 a few weeks ago. It’s a new lossy royalty-free image format derived from AV1 video and it’s really impressive—it can achieve similar results to JPEG using a quarter of the file size! Jake digs into AVIF in detail, providing lots of illustrative examples created using the Squoosh online compressor, which now supports AVIF encoding. Jake used the same WebAssembly encoder from Squoosh to decode AVIF images in a web worker so that the demos in his article would work even for browsers that don’t yet support AVIF natively.

# 9th September 2020, 4:49 pm / chrome, images, webworkers, webassembly, jake-archibald

Render Markdown tool (via) I wrote a quick JavaScript tool for rendering Markdown via the GitHub Markdown API—which includes all of their clever extensions like tables and syntax highlighting—and then stripping out some extraneous HTML to give me back the format I like using for my blog posts.

# 3rd September 2020, 12:08 am / github, javascript, projects, markdown

airtable-export. I wrote a command-line utility for exporting data from Airtable and dumping it to disk as YAML, JSON or newline delimited JSON files. This means you can backup an Airtable database from a GitHub Action and get a commit history of changes made to your data.

# 29th August 2020, 9:48 pm / json, projects, yaml, airtable

California Protected Areas Database in Datasette (via) I built this yesterday: it’s a Datasette interface on top of the CPAD 2020 GIS database of protected areas in California maintained by GreenInfo Network. This was a useful excuse to build a GitHub Actions flow that builds a SpatiaLite database using my shapefile-to-sqlite tool, and I fixed a few bugs in my datasette-leaflet-geojson plugin as well.

# 21st August 2020, 11:15 pm / geospatial, projects, shapefiles, spatialite, california, datasette, github-actions, leaflet

Announcing the Consortium for Python Data API Standards (via) Interesting effort to unify the fragmented DataFrame API ecosystem, where increasing numbers of libraries offer APIs inspired by Pandas that imitate each other but aren’t 100% compatible. The announcement includes some very clever code to support the effort: custom tooling to compare the existing APIs, and an ingenious GitHub Actions setup to run traces (via sys.settrace), derive type signatures and commit those generated signatures back to a repository.

# 19th August 2020, 5:48 am / python, standards, data-science, github-actions

Datasette 0.46 (via) I just released Datasette 0.46 with a security fix for an issue involving CSRF tokens on canned query pages, plus a new debugging tool, improved file downloads and a bunch of other smaller improvements.

# 9th August 2020, 4:57 pm / csrf, projects, security, datasette

Pysa: An open source static analysis tool to detect and prevent security issues in Python code (via) Interesting new static analysis tool for auditing Python for security vulnerabilities—things like SQL injection and os.execute() calls. Built by Facebook and tested extensively on Instagram, a multi-million line Django application.

# 7th August 2020, 8:50 pm / django, facebook, python, security, sql-injection, static-analysis

Design Docs at Google. Useful description of the format used for software design docs at Google—informal documents of between 3 and 20 pages that outline the proposed design of a new project, discuss trade-offs that were considered and solicit feedback before the code starts to be written.

# 7th August 2020, 4:31 pm / documentation, google

Zero Downtime Release: Disruption-free Load Balancing of a Multi-Billion User Website (via) I remain fascinated by techniques for zero downtime deployment—once you have it working it makes shipping changes to your software so much less stressful, which means you can iterate faster and generally be much more confident in shipping code.

Facebook have invested vast amounts of effort into getting this right, and their new paper for the ACM SIGCOMM conference goes into detail about how it all works.

# 5th August 2020, 3:27 am / deployment, zero-downtime

How a Cheese Goes Extinct (via) Ruby Tandoh writes for the New Yorker about the culture, history and anthropology of cheesemaking through the lens of the British cheese industry. I learned that two of my favourite British cheeses—Tymsboro and Innes Log, have sadly ceased production. Beautifully written.

# 2nd August 2020, 5:51 pm / cheese, new-yorker

sqlite-utils 2.14 (via) I finally figured out porter stemming with SQLite full-text search today—it turns out it’s as easy as adding tokenize=’porter’ to the CREATE VIRTUAL TABLE statement. So I just shipped sqlite-utils 2.14 with a tokenize= option (plus the ability to insert binary file data from stdin).

# 1st August 2020, 9:19 pm / full-text-search, projects, search, sqlite, sqlite-utils

James Bennett on why Django should not support JWT in core (via) The topic of adding JWT support to Django core comes up occasionally—here’s James Bennett’s detailed argument for not doing that. The short version is that the JWT specification isn’t just difficult to implement securely: it’s fundamentally flawed, which results in things like five implementations in three different languages all manifesting the same vulnerability. Third party modules exist that add JWT support to Django, but baking it into core would act as a form of endorsement and Django’s philosophy has always been to encourage people towards best practices.

# 1st August 2020, 12:28 am / django, james-bennett, security, jwt

Why Your Christian Friends and Family Members Are So Easily Fooled by Conspiracy Theories (via) I think the title undersells this: this is a really great piece of writing on conspiracy theories, why people fall for them and why it’s so hard to dig people back out again—regardless of any particular religion, despite being written for a Christian audience.

# 31st July 2020, 6:28 pm / conspiracy

«« first « previous page 80 / 281 next » last »»

Simon Willison’s Weblog

Blogmarks

Years

Tags