Simon Willison’s Weblog

Subscribe

Thursday, 27th November 2025

Tool Bluesky Quote Finder — Search for quote posts on Bluesky by entering a post URL to discover all responses that quote the original post. The tool fetches and displays these quotes with options to sort by likes, recency, or oldest first, and provides direct links to view each quote on Bluesky or explore its thread context. A color-coded depth indicator helps visualize the conversation hierarchy.

deepseek-ai/DeepSeek-Math-V2. New on Hugging Face, a specialist mathematical reasoning LLM from DeepSeek. This is their entry in the space previously dominated by proprietary models from OpenAI and Google DeepMind, both of which achieved gold medal scores on the International Mathematical Olympiad earlier this year.

We now have an open weights (Apache 2 licensed) 685B, 689GB model that can achieve the same. From the accompanying paper:

DeepSeekMath-V2 demonstrates strong performance on competition mathematics. With scaled test-time compute, it achieved gold-medal scores in high-school competitions including IMO 2025 and CMO 2024, and a near-perfect score on the undergraduate Putnam 2024 competition.

# 3:59 pm / mathematics, ai, generative-ai, llms, llm-reasoning, deepseek, llm-release, ai-in-china

To evaluate the model’s capability in processing long-context inputs, we construct a video “Needle-in- a-Haystack” evaluation on Qwen3-VL-235B-A22B-Instruct. In this task, a semantically salient “needle” frame—containing critical visual evidence—is inserted at varying temporal positions within a long video. The model is then tasked with accurately locating the target frame from the long video and answering the corresponding question. [...]

As shown in Figure 3, the model achieves a perfect 100% accuracy on videos up to 30 minutes in duration—corresponding to a context length of 256K tokens. Remarkably, even when extrapolating to sequences of up to 1M tokens (approximately 2 hours of video) via YaRN-based positional extension, the model retains a high accuracy of 99.5%.

Qwen3-VL Technical Report, 5.12.3: Needle-in-a-Haystack

# 5:01 pm / ai, generative-ai, llms, vision-llms, evals, qwen, ai-in-china

Research Datasette 1.0a20 SQL Permissions System: Architecture Review — A comprehensive architecture review of Datasette's new SQL-based permissions system (introduced in v1.0a20) finds that transitioning from a callback-driven model to SQL query resolution greatly improves scalability for large deployments. The redesigned system efficiently checks access by evaluating compiled permission rules through internal catalog tables, substantially reducing processing overhead compared to the multiplicative N x M callback pattern.
Tool The New Yorker Style Converter — Convert text to match The New Yorker's distinctive typographic style, featuring proper diaereses (such as "coöperate" and "naïve"), curly quotation marks, em dashes, and ellipses. Paste your text into the input field and watch the refined version appear instantly in the output area, which can be copied to your clipboard with a single click.
Wednesday, 26th November 2025
Friday, 28th November 2025