Simon Willison’s Weblog

Subscribe
Atom feed for space-invaders

6 posts tagged “space-invaders”

Results for my simple LLM benchmark "Write an HTML and JavaScript page implementing space invaders"

2025

OpenAI’s new open weight (Apache 2) models are really good

Visit OpenAI's new open weight (Apache 2) models are really good

The long promised OpenAI open weight models are here, and they are very impressive. They’re available under proper open source licenses—Apache 2.0—and come in two sizes, 120B and 20B.

[... 2,771 words]

XBai o4 (via) Yet another open source (Apache 2.0) LLM from a Chinese AI lab. This model card claims:

XBai o4 excels in complex reasoning capabilities and has now completely surpassed OpenAI-o3-mini in Medium mode.

This a 32.8 billion parameter model released by MetaStone AI, a new-to-me lab who released their first model in March - MetaStone-L1-7B, then followed that with MetaStone-S1 1.5B, 7B and 32B in July and now XBai o4 in August.

The MetaStone-S1 models were accompanied with a with a paper, Test-Time Scaling with Reflective Generative Model.

There is very little information available on the English-language web about MetaStone AI. Their paper shows a relationship with USTC, University of Science and Technology of China in Hefei. One of their researchers confirmed on Twitter that their CEO is from KWAI which lead me to this Chinese language article from July last year about Li Yan, formerly of KWAI and now the founder of Wen Xiaobai and evidently now the CEO of MetaStone. www.wenxiaobai.com is listed as the "official website" linked to from the XBai-o4 README on GitHub.

Ivan Fioravanti got it working under MLX in 4bit, 5bit, 6bit, 8bit and 4bit-DWQ sizes. I tried his 6bit one (a 24.81GB download) in LM Studio and had it "Generate an SVG of a pelican riding a bicycle". It thought for 39.49 seconds (thinking trace here) and gave me this:

The pelican is blue and looks a little like a sad elephant. The bicycle looks more like a flat motorbike. The pelican has a blue tail and orange legs.

I also had it "Write an HTML and JavaScript page implementing space invaders", which you can try out here. It's not great - the invaders don't fire bullets (it listed that as a "potential improvement") and I don't think it's winnable due to timing issues, but it's an interesting addition to my collection.

LM Studio reported a constant memory usage of 25.54GB and according to Activity Monitor used 87.5% of my GPU and around 39% of a CPU core.

It did a lot of work in the thinking trace - which took 10m35s! - including writing the first draft of much of the code that ended up in the finished result. Here's an extract:

Let me think of variables:

let enemyDirection = 1; // 1 for right, -1 for left

In the game loop: enemies.forEach(enemy => {> enemy.x += enemyDirection * enemy.dx;> }); Then check if any enemy is out of bounds. If so, reverse direction and move all enemies down.

But how to check if any enemy is at the edge?

Maybe find the max x of all enemies and see if it exceeds canvas.width - enemy.width.

Alternatively, track the leftmost and rightmost positions of enemies. Maybe a bit complex.

For simplicity, perhaps check if any enemy's x + width > canvas.width or x < 0.

Here's the full transcript.

My initial impressions are that I'm not as impressed with this model for running on my own laptop as I was with Qwen3-Coder-30B-A3B-Instruct or GLM-4.5 Air.

But... how extraordinary is it that another Chinese AI lab has been able to produce a competitive model, this time with far less fanfare than we've seen from Qwen and Moonshot AI and Z.ai.

# 3rd August 2025, 10:21 pm / ai, generative-ai, llms, mlx, pelican-riding-a-bicycle, llm-reasoning, llm-release, lm-studio, ai-in-china, space-invaders, ivan-fioravanti

Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

Visit Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

Qwen just released their sixth model(!) of this July called Qwen3-Coder-30B-A3B-Instruct—listed as Qwen3-Coder-Flash in their chat.qwen.ai interface.

[... 1,390 words]

Qwen3-30B-A3B-Thinking-2507 (via) Yesterday was Qwen3-30B-A3B-Instruct-2507. Qwen are clearly committed to their new split between reasoning and non-reasoning models (a reversal from Qwen 3 in April), because today they released the new reasoning partner to yesterday's model: Qwen3-30B-A3B-Thinking-2507.

I'm surprised at how poorly this reasoning mode performs at "Generate an SVG of a pelican riding a bicycle" compared to its non-reasoning partner. The reasoning trace appears to carefully consider each component and how it should be positioned... and then the final result looks like this:

A line with two dots, over a rhombus, with two circles and a pelican that looks like a grey snowman. They are not arranged in a sensible layout.

I ran this using chat.qwen.ai/?model=Qwen3-30B-A3B-2507 with the "reasoning" option selected.

I also tried the "Write an HTML and JavaScript page implementing space invaders" prompt I ran against the non-reasoning model. It did a better job in that the game works:

It's not as playable as the on I got from GLM-4.5 Air though - the invaders fire their bullets infrequently enough that the game isn't very challenging.

This model is part of a flurry of releases from Qwen over the past two 9 days. Here's my coverage of each of those:

# 30th July 2025, 3:36 pm / ai, generative-ai, llms, qwen, pelican-riding-a-bicycle, llm-reasoning, llm-release, ai-in-china, space-invaders

Qwen3-30B-A3B-Instruct-2507. New model update from Qwen, improving on their previous Qwen3-30B-A3B release from late April. In their tweet they said:

Smarter, faster, and local deployment-friendly.

✨ Key Enhancements:
✅ Enhanced reasoning, coding, and math skills
✅ Broader multilingual knowledge
✅ Improved long-context understanding (up to 256K tokens)
✅ Better alignment with user intent and open-ended tasks
✅ No more <think> blocks — now operating exclusively in non-thinking mode

🔧 With 3B activated parameters, it's approaching the performance of GPT-4o and Qwen3-235B-A22B Non-Thinking

I tried the chat.qwen.ai hosted model with "Generate an SVG of a pelican riding a bicycle" and got this:

This one is cute: blue sky, green grass, the sun is shining. The bicycle is a red block with wheels that looks more like a toy car. The pelican doesn't look like a pelican and has a quirky smile printed on its beak.

I particularly enjoyed this detail from the SVG source code:

<!-- Bonus: Pelican's smile -->
<path d="M245,145 Q250,150 255,145" fill="none" stroke="#d4a037" stroke-width="2"/>

I went looking for quantized versions that could fit on my Mac and found lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-8bit from LM Studio. Getting that up and running was a 32.46GB download and it appears to use just over 30GB of RAM.

The pelican I got from that one wasn't as good:

It looks more like a tall yellow hen chick riding a segway

I then tried that local model on the "Write an HTML and JavaScript page implementing space invaders" task that I ran against GLM-4.5 Air. The output looked promising, in particular it seemed to be putting more effort into the design of the invaders (GLM-4.5 Air just used rectangles):

// Draw enemy ship
ctx.fillStyle = this.color;

// Ship body
ctx.fillRect(this.x, this.y, this.width, this.height);

// Enemy eyes
ctx.fillStyle = '#fff';
ctx.fillRect(this.x + 6, this.y + 5, 4, 4);
ctx.fillRect(this.x + this.width - 10, this.y + 5, 4, 4);

// Enemy antennae
ctx.fillStyle = '#f00';
if (this.type === 1) {
    // Basic enemy
    ctx.fillRect(this.x + this.width / 2 - 1, this.y - 5, 2, 5);
} else if (this.type === 2) {
    // Fast enemy
    ctx.fillRect(this.x + this.width / 4 - 1, this.y - 5, 2, 5);
    ctx.fillRect(this.x + (3 * this.width) / 4 - 1, this.y - 5, 2, 5);
} else if (this.type === 3) {
    // Armored enemy
    ctx.fillRect(this.x + this.width / 2 - 1, this.y - 8, 2, 8);
    ctx.fillStyle = '#0f0';
    ctx.fillRect(this.x + this.width / 2 - 1, this.y - 6, 2, 3);
}

But the resulting code didn't actually work:

Black screen - a row of good looking space invaders advances across the screen for a moment... and then the entire screen goes blank.

That same prompt against the unquantized Qwen-hosted model produced a different result which sadly also resulted in an unplayable game - this time because everything moved too fast.

This new Qwen model is a non-reasoning model, whereas GLM-4.5 and GLM-4.5 Air are both reasoners. It looks like at this scale the "reasoning" may make a material difference in terms of getting code that works out of the box.

# 29th July 2025, 6:57 pm / ai, generative-ai, llms, qwen, mlx, llm-reasoning, llm-release, lm-studio, ai-in-china, space-invaders

My 2.5 year old laptop can write Space Invaders in JavaScript now, using GLM-4.5 Air and MLX

Visit My 2.5 year old laptop can write Space Invaders in JavaScript now, using GLM-4.5 Air and MLX

I wrote about the new GLM-4.5 model family yesterday—new open weight (MIT licensed) models from Z.ai in China which their benchmarks claim score highly in coding even against models such as Claude Sonnet 4.

[... 685 words]