Archive for Monday, 3rd April 2023

Monday, 3rd April 2023

Django 4.2 released. “This version has been designated as a long-term support (LTS) release, which means that security and data loss fixes will be applied for at least the next three years.” Some neat new async features, including improvements to async streaming responses.

# 2:14 pm / async, django

Stable Diffusion copyright lawsuits could be a legal earthquake for AI. Timothy B. Lee provides a thorough discussion of the copyright lawsuits currently targeting Stable Diffusion and GitHub Copilot, including subtle points about how the interpretation of “fair use” might be applied to the new field of generative AI.

# 3:34 pm / copyright, law, ai, stable-diffusion, generative-ai, github-copilot, text-to-image

Beyond these specific legal arguments, Stability AI may find it has a “vibes” problem. The legal criteria for fair use are subjective and give judges some latitude in how to interpret them. And one factor that likely influences the thinking of judges is whether a defendant seems like a “good actor.” Google is a widely respected technology company that tends to win its copyright lawsuits. Edgier companies like Napster tend not to.

— Timothy B. Lee

# 3:38 pm / copyright, law, ai, generative-ai

TIL Copy tables between SQLite databases — I figured out a pattern for doing this today using the `sqlite3` CLI tool - given two SQLite databases in the current folder, called `tils.db` and `simonwillisonblog.db`:

3rd Apr 2023, 5:32 pm

Closed AI Models Make Bad Baselines (via) The NLP academic research community are facing a tough challenge: the state-of-the-art in large language models, GPT-4, is entirely closed which means papers that compare it to other models lack replicability and credibility. “We make the case that as far as research and scientific publications are concerned, the “closed” models (as defined below) cannot be meaningfully studied, and they should not become a “universal baseline”, the way BERT was for some time widely considered to be.”

Anna Rogers proposes a new rule for this kind of research: “That which is not open and reasonably reproducible cannot be considered a requisite baseline.”

# 7:57 pm / nlp, ai, openai, generative-ai, gpt-4

ROOTS search tool (via) BLOOM is one of the most interesting completely openly licensed language models. The ROOTS corpus is the training data that was collected for it, and this tool lets you run searches directly against that corpus. I tried searching for my own name and got an interesting insight into what it knows about me.

# 8:40 pm / ai, generative-ai, llms, bloom, training-data

← Sunday, 2nd April 2023

Tuesday, 4th April 2023 →

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Simon Willison’s Weblog

Monday, 3rd April 2023