Simon Willison’s Weblog

98 items tagged “facebook”

2023

MMS Language Coverage in Datasette Lite. I converted the HTML table of 4,021 languages supported by Meta’s new Massively Multilingual Speech models to newline-delimited JSON and loaded it into Datasette Lite. Faceting by Language Family is particularly interesting—the top five families represented are Niger-Congo with 1,019, Austronesian with 609, Sino-Tibetan with 288, Indo-European with 278 and Afro-Asiatic with 222. # 22nd May 2023, 8:01 pm

Introducing speech-to-text, text-to-speech, and more for 1,100+ languages (via) New from Meta AI: Massively Multilingual Speech. “MMS supports speech-to-text and text-to-speech for 1,107 languages and language identification for over 4,000 languages. [...] Some of these, such as the Tatuyo language, have only a few hundred speakers, and for most of these languages, no prior speech technology exists.”

It’s licensed CC-BY-NC 4.0 though, so it’s not available for commercial use.

“In a like-for-like comparison with OpenAI’s Whisper, we found that models trained on the Massively Multilingual Speech data achieve half the word error rate, but Massively Multilingual Speech covers 11 times more languages.”

The training data was mostly sourced from audio Bible translations. # 22nd May 2023, 7:22 pm

ImageBind. New model release from Facebook/Meta AI research: “An approach to learn a joint embedding across six different modalities—images, text, audio, depth, thermal, and IMU (inertial measurement units) data”. The non-interactive demo shows searching audio starting with an image, searching images starting with audio, using text to retrieve images and audio, using image and audio to retrieve images (e.g. a barking sound and a photo of a beach to get dogs on a beach) and using audio as input to an image generator. # 9th May 2023, 7:04 pm

Large language models are having their Stable Diffusion moment

The open release of the Stable Diffusion image generation model back in August 2022 was a key moment. I wrote how Stable Diffusion is a really big deal at the time.

[... 1810 words]

Running LLaMA 7B on a 64GB M2 MacBook Pro with llama.cpp. I got Facebook’s LLaMA 7B to run on my MacBook Pro using llama.cpp (a “port of Facebook’s LLaMA model in C/C++”) by Georgi Gerganov. It works! I’ve been hoping to run a GPT-3 class language model on my own hardware for ages, and now it’s possible to do exactly that. The model itself ends up being just 4GB after applying Georgi’s script to “quantize the model to 4-bits”. # 11th March 2023, 4:19 am

Introducing LLaMA: A foundational, 65-billion-parameter large language model (via) From the paper: “For instance, LLaMA-13B outperforms GPT-3 on most benchmarks, despite being 10× smaller. We believe that this model will help democratize the access and study of LLMs, since it can be run on a single GPU.” # 24th February 2023, 5:34 pm

2022

Exploring 10m scraped Shutterstock videos used to train Meta’s Make-A-Video text-to-video model

Make-A-Video is a new “state-of-the-art AI system that generates videos from text” from Meta AI. It looks incredible—it really is DALL-E / Stable Diffusion for video. And it appears to have been trained on 10m video preview clips scraped from Shutterstock.

[... 923 words]

Every few weeks, someone on Twitter notices how demented the content on Facebook is. I’ve covered a lot of these stories. The quick TL;DR is that Facebook’s video section is essentially run by a network of magicians and Vegas stage performers who hack the platform’s algorithm with surreal low-value content designed to distract users long enough to trigger an in-video advertisement and anger them enough to leave a comment.

Ryan Broderick # 5th February 2022, 10:41 pm

2021

But this much is clear: Facebook knew all along. Their own employees were desperately trying to get anyone inside the company to listen as their products radicalized their own friends and family members. And as they were breaking the world, they had an army of spokespeople publicly and privately gaslighting and intimidating reporters and researchers who were trying to ring the alarm bell. They knew all along and they simply did not give a shit.

Ryan Broderick # 25th October 2021, 8:22 pm

I saw millions compromise their Facebook accounts to fuel fake engagement. Sophie Zhang, ex-Facebook, describes how millions of Facebook users have signed up for “autolikers”—programs that promise likes and engagement for their posts, in exchange for access to their accounts which are then combined into the larger bot farm and used to provide likes to other posts. “Self-compromise was a widespread problem, and possibly the largest single source of existing inauthentic activity on Facebook during my time there. While actual fake accounts can be banned, Facebook is unwilling to disable the accounts of real users who share their accounts with a bot farm.” # 9th June 2021, 3:40 pm

2020

The open secret Jennings filled me in on is that OpenStreetMap (OSM) is now at the center of an unholy alliance of the world’s largest and wealthiest technology companies. The most valuable companies in the world are treating OSM as critical infrastructure for some of the most-used software ever written. The four companies in the inner circle— Facebook, Apple, Amazon, and Microsoft— have a combined market capitalization of over six trillion dollars.

Joe Morrison # 20th November 2020, 9:11 pm

CG-SQL (via) This is the toolkit the Facebook Messenger team wrote to bring stored procedures to SQLite. It implements a custom version of the T-SQL language which it uses to generate C code that can then be compiled into a SQLite module. # 22nd October 2020, 6:25 pm

Project LightSpeed: Rewriting the Messenger codebase for a faster, smaller, and simpler messaging app (via) Facebook rewrote their iOS messaging app earlier this year, dropping it from 1.7m lines of code to 360,000 and reducing the binary size to a quarter of what it was. A key part of the new app’s architecture is much heavier reliance on SQLite to coordinate data between views, and to dynamically configure how different views are displayed. They even built their own custom system to add stored procedures to SQLite so they could execute portable business logic inside the database. # 22nd October 2020, 6:22 pm

A manager on Strategic Response mused to myself that most of the world outside the West was effectively the Wild West with myself as the part-time dictator – he meant the statement as a compliment, but it illustrated the immense pressures upon me.

Sophie Zhang # 15th September 2020, 9:21 pm

“I Have Blood on My Hands”: A Whistleblower Says Facebook Ignored Global Political Manipulation (via) Sophie Zhang worked as the data scientist for the Facebook Site Integrity fake engagement team. She gave up her severance package in order to speak out internally about what she saw there, and someone leaked her memo to BuzzFeed News. It’s a hell of a story: she saw bots and coordinated manual accounts used to influence politics in countries all around the world, and found herself constantly making moderation decisions that had lasting political impact. “With no oversight whatsoever, I was left in a situation where I was trusted with immense influence in my spare time". This sounds like a nightmare—imagine taking on responsibility for protecting democracy in so many different places. # 15th September 2020, 9:11 pm

Pysa: An open source static analysis tool to detect and prevent security issues in Python code (via) Interesting new static analysis tool for auditing Python for security vulnerabilities—things like SQL injection and os.execute() calls. Built by Facebook and tested extensively on Instagram, a multi-million line Django application. # 7th August 2020, 8:50 pm

Announcing Daylight Map Distribution. Mike Migurski announces a new distribution of OpenStreetMap: a 42GB dump of the version of the data used by Facebook, carefully moderated to minimize the chance of incorrect or maliciously offensive edits. Lots of constructive conversation in the comments about the best way for Facebook to make their moderation decisions more available to the OSM community. # 12th March 2020, 11:44 am

2019

What is a Self-XSS scam? Facebook link to this page from a console.log message that they display the browser devtools console, specifically warning that “If someone told you to copy-paste something here to enable a Facebook feature or hack someone’s account, it is a scam and will give them access to your Facebook account.” # 8th April 2019, 6:01 pm

In January, Facebook distributes a policy update stating that moderators should take into account recent romantic upheaval when evaluating posts that express hatred toward a gender. “I hate all men” has always violated the policy. But “I just broke up with my boyfriend, and I hate all men” no longer does.

Casey Newton # 25th February 2019, 2:09 pm

2018

XARs: An efficient system for self-contained executables (via) Really interesting new open source project from Facebook: a XAR is a new way of packaging up a Python executable complete with its dependencies and resources such that it can be distributed and executed elsewhere as a single file. It’s kind of like a Docker container without Docker—it uses the SquashFS compressed read-only filesystem. I can’t wait to try this out with Datasette. # 13th July 2018, 7 pm

Migrating Messenger storage to optimize performance (via) Fascinating case-study of a truly gargantuan migration. Messenger has over a billion users, and Facebook successfully migrated its backend storage from HBase to their MyRocks database (a fork of MySQL with a storage engine built on their SSD-optimized RocksDB key/value library) without any user-visible downtime. They ended up using two migration paths: one for the 99.9% of regular accounts, and a separate path for extremely high volume accounts (businesses with very active chat bots or support systems). # 27th June 2018, 3:05 pm

Pyre: Fast Type Checking for Python (via) Facebook’s alternative to mypy. “Pyre is designed to be highly parallel, optimizing for near-instant responses so that you get immediate feedback, even in a large codebase”. Like their Hack type checker for PHP, Pyre is implemented in OCaml. # 11th May 2018, 5:47 pm

Upgrades to Facebook’s link security (via) Facebook have started scanning links shared on the site for HSTS headers, which are used to indicate that an HTTP page is also available over HTTPS and are intended to be cached by browsers such that future HTTP access is automatically retrieved over HTTPS instead. Facebook will now obey those headers itself and link directly to the HTTPS version. What a great idea: all sites with sophisticated link sharing (where links are fetched to retrieve extracts and images for example) should do this as well. # 5th March 2018, 3:32 pm

The whole story is basically that Facebook gets so much traffic that they started convincing publishers to post things on Facebook. For a long time, that was fine. People posted things on Facebook, then you would click those links and go to their websites. But then, gradually, Facebook started exerting more and more control of what was being seen, to the point that they, not our website, essentially became the main publishers of everyone’s content. Today, there’s no reason to go to a comedy website that has a video if that video is just right on Facebook. And that would be fine if Facebook compensated those companies for the ad revenue that was generated from those videos, but because Facebook does not pay publishers, there quickly became no money in making high-quality content for the internet.

Matt Klinman # 7th February 2018, 3:51 pm

2017

Whatever weird thing you imagine might happen, something weirder probably did happen. Reporters tried to keep up, but it was too strange. As Max Read put it in New York Magazine, Facebook is “like a four-dimensional object, we catch slices of it when it passes through the three-dimensional world we recognize.” No one can quite wrap their heads around what this thing has become, or all the things this thing has become.

Alexis C. Madrigal # 13th October 2017, 1:09 pm

How do I receive automatic updates from a Facebook group by email?

Facebook’s API does provide a feed of recent posts to a group: https://developers.facebook.com/docs/graph-api/reference/v2.8/group/feed

[... 85 words]

2014

What is the best way one can expand his or her professional network?

Go to events—local meetups, conferences, tradeshows... there’s no better way of expanding your professional network than to attend events and build in-person relationships with people.

[... 46 words]

Calendars: When posting a facebook event page for an event that is repeated on two dates, should you use one page or two? (The events are games that are identical and should not have overlapping players)

I would use separate pages. The most valuable part of a Facebook event page is being able to see who is going to that event (and hence which of your friends will be there). If there are two events on two separate days you want to be able to maintain two separate lists of attendees.

[... 97 words]

2013

Does Facebook fly you to London when you apply for this office or are the interviews done remotely?

If they are anything like Google (which I expect they are) they will do the initial interviews remotely and then fly promising candidates to the London office (or even to California) for in-person interviews.

[... 58 words]

Why doesn’t xkcd site have social media share options?

My guess: he probably thinks they are a bit tacky.

[... 45 words]