Simon Willison’s Weblog

Tuesday, 17th April 2018

Text Embedding Models Contain Bias. Here’s Why That Matters (via) Excellent discussion from the Google AI team of the enormous challenge of building machine learning models without accidentally encoding harmful bias in a way that cannot be easily detected. # 8:54 pm

Suppose a runaway success novel/tv/film franchise has “Bob” as the evil bad guy. Reams of fanfictions are written with “Bob” doing horrible things. People endlessly talk about how bad “Bob” is on twitter. Even the New York times writes about Bob latest depredations, when he plays off current events. Your name is Bob. Suddenly all the AIs in the world associate your name with evil, death, killing, lying, stealing, fraud, and incest. AIs silently, slightly ding your essays, loan applications, uber driver applications, and everything you write online. And no one believes it’s really happening. Or the powers that be think it’s just a little accidental damage because the AI overall is still, overall doing a great job of sentiment analysis and fraud detection.

Daniel Von Fange # 8:51 pm

A rating system for open data proposed by Tim Berners-Lee, founder of the World Wide Web. To score the maximum five stars, data must (1) be available on the Web under an open licence, (2) be in the form of structured data, (3) be in a non-proprietary file format, (4) use URIs as its identifiers (see also RDF), (5) include links to other data sources (see linked data). To score 3 stars, it must satisfy all of (1)-(3), etc.

Five stars of open data # 4:20 am

Datasette 0.19: Plugins Documentation (via) I’ve released the first preview of Datasette’s new plugin support, which uses the pluggy package originally developed for py.test. So far the only two plugin hooks are for SQLite connection creation (allowing custom SQL functions to be registered) and Jinja2 template environment initialization (for custom template tags), but this release is mainly about exercising the plugin registration mechanism and starting to gather feedback. Lots more to come. # 3:59 am

2018 » April