Simon Willison’s Weblog

Subscribe
Atom feed for derek-willis

7 posts tagged “derek-willis”

2025

Political Email Extraction Leaderboard (via) Derek Willis collects "political fundraising emails from just about every committee" - 3,000-12,000 a month - and has created an LLM benchmark from 1,000 of them that he collected last November.

He explains the leaderboard in this blog post. The goal is to have an LLM correctly identify the the committee name from the disclaimer text included in the email.

Here's the code he uses to run prompts using Ollama. It uses this system prompt:

Produce a JSON object with the following keys: 'committee', which is the name of the committee in the disclaimer that begins with Paid for by but does not include 'Paid for by', the committee address or the treasurer name. If no committee is present, the value of 'committee' should be None. Also add a key called 'sender', which is the name of the person, if any, mentioned as the author of the email. If there is no person named, the value is None. Do not include any other text, no yapping.

Gemini 2.5 Pro tops the leaderboard at the moment with 95.40%, but the new Mistral Small 3.1 manages 5th place with 85.70%, pretty good for a local model!

Table comparing AI model performance with columns for Model (JSON Filename), Total Records, Committee Matches, and Match Percentage. Shows 7 models with 1000 records each: gemini_25_november_2024_prompt2.json (95.40%), qwen25_november_2024_prompt2.json (92.90%), gemini20_flash_november_2024_prompt2.json (92.40%), claude37_sonnet_november_2024_prompt2.json (90.70%), mistral_small_31_november_2024_prompt2.json (85.70%), gemma2_27b_november_2024_prompt2.json (84.40%), and gemma2_november_2024_prompt2.json (83.90%).

I said we need our own evals in my talk at the NICAR Data Journalism conference last month, without realizing Derek has been running one since January.

# 8th April 2025, 11:22 pm / gemini, evals, ai, ollama, llms, mistral, derek-willis, generative-ai, data-journalism, prompt-engineering

Six short video demos of LLM and Datasette projects

Visit Six short video demos of LLM and Datasette projects

Last Friday Alex Garcia and I hosted a new kind of Datasette Public Office Hours session, inviting members of the Datasette community to share short demos of projects that they had built. The session lasted just over an hour and featured demos from six different people.

[... 1,047 words]

2023

Teaching News Apps with Codespaces (via) Derek Willis used GitHub Codespaces for the latest data journalism class he taught, and it eliminated the painful process of trying to get students on an assortment of Mac, Windows and Chromebook laptops all to a point where they could start working and learning together.

# 23rd March 2023, 12:39 am / teaching, data-journalism, derek-willis, github, github-codespaces

2020

How much can you learn from just two columns?

Visit How much can you learn from just two columns?

Derek Willis shared an intriguing dataset this morning: a table showing every Twitter account followed by an official GOP congressional Twitter account.

[... 951 words]

2008

Represent. Andrei Scheinkman and Derek Willis describe how they built the NYTimes Represent feature using GeoDjango and PostGIS.

# 29th December 2008, 10:10 pm / derek-willis, andrei-scheinkman, new-york-times, django, geodjango, python, postgresql, postgis, gis

Represent and GeoDjango. The NYTimes new Represent application is built on GeoDjango.

# 20th December 2008, 9:07 pm / represent, new-york-times, geodjango, derek-willis, django, python

2007

Django, iCal and vObject. Easy iCal generation for Django using vObject.

# 1st August 2007, 11:09 am / vobject, derek-willis, django, python, icalendar