Simon Willison’s Weblog

Subscribe

Blogmarks tagged ai, video in 2024

Filters: Type: blogmark × Year: 2024 × ai × video × Sorted by date

Sora (via) OpenAI's released their long-threatened Sora text-to-video model this morning, available in most non-European countries to subscribers to ChatGPT Plus ($20/month) or Pro ($200/month).

Here's what I got for the very first test prompt I ran through it:

A pelican riding a bicycle along a coastal path overlooking a harbor

The Pelican inexplicably morphs to cycle in the opposite direction half way through, but I don't see that as a particularly significant issue: Sora is built entirely around the idea of directly manipulating and editing and remixing the clips it generates, so the goal isn't to have it produce usable videos from a single prompt.

# 9th December 2024, 6:35 pm / video, ai, openai, generative-ai, pelican-riding-a-bicycle, text-to-video

How developers are using Gemini 1.5 Pro’s 1 million token context window. I got to be a talking head for a few seconds in an intro video for today's Google I/O keynote, talking about how I used Gemini Pro 1.5 to index my bookshelf (and with a cameo from my squirrel nutcracker). I'm at 1m25s.

(Or at 10m6s in the full video of the keynote)

# 14th May 2024, 8:27 pm / google, video, google-io, ai, generative-ai, llms, gemini

Google Research: Lumiere. The latest in text-to-video from Google Research, described as “a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion”.

Most existing text-to-video models generate keyframes and then use other models to fill in the gaps, which frequently leads to a lack of coherency. Lumiere “generates the full temporal duration of the video at once”, which avoids this problem.

Disappointingly but unsurprisingly the paper doesn’t go into much detail on the training data, beyond stating “We train our T2V model on a dataset containing 30M videos along with their text caption. The videos are 80 frames long at 16 fps (5 seconds)”.

The examples of “stylized generation” which combine a text prompt with a single reference image for style are particularly impressive.

# 24th January 2024, 7:58 pm / google, video, ai, generative-ai, text-to-video