33 items tagged “youtube”
2024
Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI. This article has been getting a lot of attention over the past couple of days.
The story itself is nothing new: the Pile is four years old now, and has been widely used for training LLMs since before anyone even cared what an LLM was. It turns out one of the components of the Pile is a set of ~170,000 YouTube video captions (just the captions, not the actual video) and this story by Annie Gilbertson and Alex Reisner highlights that and interviews some of the creators who were included in the data, as well as providing a search tool for seeing if a specific creator has content that was included.
What's notable is the response. Marques Brownlee (19m subscribers) posted a video about it. Abigail Thorn (Philosophy Tube, 1.57m subscribers) tweeted this:
Very sad to have to say this - an AI company called EleutherAI stole tens of thousands of YouTube videos - including many of mine. I’m one of the creators Proof News spoke to. The stolen data was sold to Apple, Nvidia, and other companies to build AI
When I was told about this I lay on the floor and cried, it’s so violating, it made me want to quit writing forever. The reason I got back up was because I know my audience come to my show for real connection and ideas, not cheapfake AI garbage, and I know they’ll stay with me
Framing the data as "sold to Apple..." is a slight misrepresentation here - EleutherAI have been giving the Pile away for free since 2020. It's a good illustration of the emotional impact here though: many creative people do not want their work used in this way, especially without their permission.
It's interesting seeing how attitudes to this stuff change over time. Four years ago the fact that a bunch of academic researchers were sharing and training models using 170,000 YouTube subtitles would likely not have caught any attention at all. Today, people care!
Tom Scott, and the formidable power of escalating streaks
Ten years ago yesterday, Tom Scott posted this video to YouTube about “Special Crossings For Horses In Britain”. It was the first in his Things You Might Not Know series, but more importantly it was the start of a streak.
[... 1,352 words]After ten years, it’s time to stop making videos. Ten years ago, my friend Tom Scott started a deliberate streak of posting YouTube videos—initially about one a day before settling into a cadence of one a week. He kept that up for the full ten years, growing his subscribers to over 6 million in the process.
Today he’s ending that streak, in unparalleled style.
(I’m proud to have made an appearance in video number 13, talking about Zeppelins.)
2023
Exploring MusicCaps, the evaluation data released to accompany Google’s MusicLM text-to-music model
Google Research just released MusicLM: Generating Music From Text. It’s a new generative AI model that takes a descriptive prompt and produces a “high-fidelity” music track. Here’s the paper (and a more readable version using arXiv Vanity).
[... 1,323 words]2022
lite-youtube-embed (via) Handy Web Component wrapper around the standard YouTube iframe embed which knocks over 500KB of JavaScript off the initial page load—I just added this to the datasette.io homepage and increased the Lighthouse performance score from 51 to 93!
2021
Weeknotes: Datasette and Git scraping at NICAR, VaccinateCA
This week I virtually attended the NICAR data journalism conference and made a ton of progress on the Django backend for VaccinateCA (see last week).
[... 773 words]2020
Scaling Datastores at Slack with Vitess (via) Slack spent three years migrating 99% of their MySQL query load to run against Vitess, the open source MySQL sharding system originally built by YouTube. “Today, we serve 2.3 million QPS at peak. 2M of those queries are reads and 300K are writes. Our median query latency is 2 ms, and our p99 query latency is 11 ms.”
One academic who interviewed attendees of a flat-earth convention found that, almost to a person, they'd discovered the subculture via YouTube recommendations.
Happy Birthday Sea Lions! (via) Today, June 15th, is Sea Lion birthday—half of all California Sea Lions are born today thanks to clever co-ordinated delayed implantation by Sea Lion females. Natalie has started making nature videos and I’ve been tagging along as her camera-person—this three minute video, shot at Pier 39 in San Francisco, celebrates Sea Lion birthday and explains how it works.
2019
There’s a spectrum on YouTube between the calm section — the Walter Cronkite, Carl Sagan part — and Crazytown, where the extreme stuff is. If I’m YouTube and I want you to watch more, I’m always going to steer you toward Crazytown.
A Conspiracy To Kill IE6 (via) Cracking story by Chris Zacharias about how a team of engineers at YouTube back in 2009 took advantage of some exploits in YouTube’s organization structure (left over from their acquisition by Google) to ship a vague IE6 deprecation warning banner on one of the world’s highest traffic websites, inspiring many other similar banners and resulting in a 10% drop in global IE6 traffic.
Vitess (via) I remember looking at Vitess when it was first released by YouTube in 2012. The idea of a proven horizontally scalable sharding mechanism for MySQL was exciting, but I was put off by the need for a custom Go or Java client library. Apparently that changed with Vitess 2.1 in April 2017, the first version to introduce a MySQL protocol compatible proxy which can be connected to by existing code written in any language. Vitess 3.0 came out last December so now the MySQL proxy layer is much more stable. Vitess is used in production by a bunch of other companies now (including Slack and Square) so it’s definitely worth a closer look.
2018
It seems as if you are never ‘hardcore’ enough for YouTube’s recommendation algorithm. It promotes, recommends and disseminates videos in a manner that appears to constantly up the stakes. Given its billion or so users, YouTube may be one of the most powerful radicalising instruments of the 21st century.
2017
Something is wrong on the internet. James Bridle takes a fascinating and deeply troubling dive into the world of Kids’ YouTube videos, which appear to be increasingly algorithmically generated and are evolving in a very dark direction.
In the official timeline, Peppa is appropriately reassured by a kindly dentist. In the version above, she is basically tortured, before turning into a series of Iron Man robots and performing the Learn Colours dance. A search for “peppa pig dentist” returns the above video on the front page, and it only gets worse from here.
2014
A Zeppelin, A Cat, and The World’s First In-Flight Radio Message. Tom Scott asked me for “something you might not know” at our leaving party in London before we moved to California. I went with the story of Kiddo the cat and the first attempt at an aerial Atlantic crossing. Here’s the resulting YouTube video.
2012
What platform was YouTube using before they were acquired by Google?
It was written in Python—I don’t think they used any particular framework (they started the site in 2005).
[... 37 words]2009
Google container data center tour (on YouTube). 45,000 servers in 45 shipping containers, along with some serious looking plumbing.
Apparently [unladen-swallow] is already 30% faster than CPython, and this version is being used to run some of the Python code on YouTube.
2008
YouTube Enables Deep Linking Within Videos. Add #t=1m45s to the end of a YouTube URL to jump to that spot. I’d be a lot more impressed by this if visiting a YouTube link in the UK didn’t use IP geo targetting to redirect me to uk.youtube.com, losing the fragment identifier and hence the #t specifier in the process.
Popular Websites Vulnerable to Cross-Site Request Forgery Attacks. Ed Felten and Bill Zeller announce four CSRF holes, in ING Direct, YouTube, MetaFilter and the New York Times. The ING Direct hole allowed transfer of funds out of a user’s bank accounts! The first three were fixed before publication; the New York Times hole still exists (despite being reported a year ago), and allows you to silently steal e-mail addresses by CSRFing the “E-mail this” feature.
Wario Land: Shake It—Amazing footage! Some virals really do deserve linking to.
YouTube Playlist: DjangoCon 2008 Sessions. YouTube’s tag and search indexes appear to lag behind the main site by quite a while; this appears to be the definitive index page for videos of talks at DjangoCon.
YouTube: djangocon tag. Google have started posting videos of presentations at DjangoCon on YouTube.
“THIS IS NOT MLM!!!”—An Appreciation. Merlin Mann explains his fascination with the “cash gifting” pyramid scams that keep cropping up on YouTube.
There is a reason why Flickr eventually killed Yahoo! Photos and why it was decided that Google Video be relegated to being a search brand while YouTube would be the social sharing brand. The brand baggage and the accompanying culture made them road kill.
2007
Musical hackery. Indescribably clever musical video game creation, where images from classic games spell out their own theme tunes. The smartest thing I’ve seen on YouTube, well, ever.
Silly MS-DOS 5 Promo Video. I can’t decide if this is better or worse than the Windows 386 rap.
H.264 support coming to the Flash player. It looks like this is a response to the higher video quality offered by Silverlight. I wonder if YouTube knew about this when they started transcoding their videos to H.264 for the Apple TV and iPhone.
YouTube Scalability Talk. Kyle Cordes’ notes on a Google Tech Talk on scaling YouTube by Cuong Do.