Simon Willison’s Weblog

On json 119 webassembly 34 aws 37 stablediffusion 12 datasette 316 ...

 

Recent entries

Coping strategies for the serial project hoarder three days ago

I gave a talk at DjangoCon US 2022 in San Diego last month about productivity on personal projects, titled “Massively increase your productivity on personal projects with comprehensive documentation and automated tests”.

The alternative title for the talk was Coping strategies for the serial project hoarder.

I’m maintaining a lot of different projects at the moment. Somewhat unintuitively, the way I’m handling this is by scaling down techniques that I’ve seen working for large engineering teams spread out across multiple continents.

The key trick is to ensure that every project has comprehensive documentation and automated tests. This scales my productivity horizontally, by freeing me up from needing to remember all of the details of all of the different projects I’m working on at the same time.

You can watch the talk on YouTube (25 minutes). Alternatively, I’ve included a detailed annotated version of the slides and notes below.

Weeknotes: Implementing a write API, Mastodon distractions six days ago

Everything is so distracting at the moment. The ongoing Twitter catastrophe, the great migration (at least amongst most of the people I pay attention to) to Mastodon, the FTX calamity. It’s been very hard to focus!

I’ve been continuing to work on the write API for Datasette that I described previously. I’ve decided that the first release to include that work will also be the first alpha version of Datasette 1.0—you can see my progress towards that goal in the Datasette 1.0a0 milestone.

This alpha will be the first in a sequence of alphas. There’s still a lot more work to do—most notably:

  • Refactor Datasette’s HTML templates to exclusively use values that are available in the API (including via a new ?_extra= mechanism). This will help achieve the goal of having those template contexts officially documented, such that custom template authors can depend on them being stable not changing between dot-releases.
  • This means some breaking API changes, which need to be documented and stable before 1.0.
  • Finalize the design of the plugin hooks for 1.0
  • Change how metadata.json works—it’s grown a whole bunch of functionality that has nothing to do with metadata, so I’d like to rename it.
  • Review how authentication and permissions work—there may be some changes I can make here to improve their usability prior to 1.0.

I hope to put out alpha releases quite frequently as the different parts of 1.0 start to come together.

dclient

Designing a good API is difficult if you don’t have anything that uses it! But you can’t build things against an API that doesn’t exist yet.

To help overcome this chicken-and-egg problem, I’ve started a new project: dclient.

dclient is the Datasette Client—it’s a CLI utility for interacting with remote Datasette instances.

I’m planning to imitate much of the existing sqlite-utils design, which provides a CLI for manipulating local SQLite database files.

Eventually you’ll be able to use dclient to authenticate with a remote Datasette instance and then do things like pipe CSV files into it to create new tables.

So far it has one, obvious feature: you can use it to run a SQL query against a remote Datasette instance:

dclient query \
  https://datasette.io/content \
  "select * from news limit 1"

Returns:

[
  {
    "date": "2022-10-27",
    "body": "[Datasette 0.63](https://docs.datasette.io/en/stable/changelog.html#v0-63) is out. Here are the [annotated release notes](https://simonwillison.net/2022/Oct/27/datasette-0-63/)."
  }
]

It also supports aliases, so you can create an alias for a database like this:

dclient alias add content https://datasette.io/content

And then run the above query like this instead:

dclient query content "select * from news limit 1"

One fun additional feature: if you install dclient in the same virtual environment as Datasette itself it registers itself as a command plugin:

datasette install dclient

You can then access its functionality via datasette client instead:

datasette client query content \
  "select * from news limit 1"

A flurry of plugins

I also pushed out a flurry of plugin releases, listed below. Almost all of these are a result of a tiny change to how breadcrumbs work in Datasette 0.63 which turned out to break the display of navigation in a bunch of plugins. Details in this issue—thanks to Brian Grinstead for pointing it out.

Releases this week

TIL this week

Tracking Mastodon user numbers over time with a bucket of tricks nine days ago

Mastodon is definitely having a moment. User growth is skyrocketing as more and more people migrate over from Twitter.

I’ve set up a new git scraper to track the number of registered user accounts on known Mastodon instances over time.

It’s only been running for a few hours, but it’s already collected enough data to render this chart:

The chart starts at around 1am with 4,694,000 users - it climbs to 4,716,000 users by 6am in a relatively straight line

I’m looking forward to seeing how this trend continues to develop over the next days and weeks.

Scraping the data

My scraper works by tracking https://instances.social/—a website that lists a large number (but not all) of the Mastodon instances that are out there.

That site publishes an instances.json array which currently contains 1,830 objects representing Mastodon instances. Each of those objects looks something like this:

{
    "name": "pleroma.otter.sh",
    "title": "Otterland",
    "short_description": null,
    "description": "Otters does squeak squeak",
    "uptime": 0.944757,
    "up": true,
    "https_score": null,
    "https_rank": null,
    "ipv6": true,
    "openRegistrations": false,
    "users": 5,
    "statuses": "54870",
    "connections": 9821,
}

I have a GitHub Actions workflow running approximately every 20 minutes that fetches a copy of that file and commits it back to this repository:

https://github.com/simonw/scrape-instances-social

Since each instance includes a users count, the commit history of my instances.json file tells the story of Mastodon’s growth over time.

Building a database

A commit log of a JSON file is interesting, but the next step is to turn that into actionable information.

My git-history tool is designed to do exactly that.

For the chart up above, the only number I care about is the total number of users listed in each snapshot of the file—the sum of that users field for each instance.

Here’s how to run git-history against that file’s commit history to generate tables showing how that count has changed over time:

git-history file counts.db instances.json \
  --convert "return [
    {
        'id': 'all',
        'users': sum(d['users'] or 0 for d in json.loads(content)),
        'statuses': sum(int(d['statuses'] or 0) for d in json.loads(content)),
    }
  ]" --id id

I’m creating a file called counts.db that shows the history of the instances.json file.

The real trick here though is that --convert argument. I’m using that to compress each snapshot down to a single row that looks like this:

{
    "id": "all",
    "users": 4717781,
    "statuses": 374217860
}

Normally git-history expects to work against an array of objects, tracking the history of changes to each one based on their id property.

Here I’m tricking it a bit—I only return a single object with the ID of all. This means that git-history will only track the history of changes to that single object.

It works though! The result is a counts.db file which is currently 52KB and has the following schema (truncated to the most interesting bits):

CREATE TABLE [commits] (
   [id] INTEGER PRIMARY KEY,
   [namespace] INTEGER REFERENCES [namespaces]([id]),
   [hash] TEXT,
   [commit_at] TEXT
);
CREATE TABLE [item_version] (
   [_id] INTEGER PRIMARY KEY,
   [_item] INTEGER REFERENCES [item]([_id]),
   [_version] INTEGER,
   [_commit] INTEGER REFERENCES [commits]([id]),
   [id] TEXT,
   [users] INTEGER,
   [statuses] INTEGER,
   [_item_full_hash] TEXT
);

Each item_version row will tell us the number of users and statuses at a particular point in time, based on a join against that commits table to find the commit_at date.

Publishing the database

For this project, I decided to publish the SQLite database to an S3 bucket. I considered pushing the binary SQLite file directly to the GitHub repository but this felt rude, since a binary file that changes every 20 minutes would bloat the repository.

I wanted to serve the file with open CORS headers so I could load it into Datasette Lite and Observable notebooks.

I used my s3-credentials tool to create a bucket for this:

~ % s3-credentials create scrape-instances-social --public --website --create-bucket
Created bucket: scrape-instances-social
Attached bucket policy allowing public access
Configured website: IndexDocument=index.html, ErrorDocument=error.html
Created  user: 's3.read-write.scrape-instances-social' with permissions boundary: 'arn:aws:iam::aws:policy/AmazonS3FullAccess'
Attached policy s3.read-write.scrape-instances-social to user s3.read-write.scrape-instances-social
Created access key for user: s3.read-write.scrape-instances-social
{
    "UserName": "s3.read-write.scrape-instances-social",
    "AccessKeyId": "AKIAWXFXAIOZI5NUS6VU",
    "Status": "Active",
    "SecretAccessKey": "...",
    "CreateDate": "2022-11-20 05:52:22+00:00"
}

This created a new bucket called scrape-instances-social configured to work as a website and allow public access.

It also generated an access key and a secret access key with access to just that bucket. I saved these in GitHub Actions secrets called AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

I enabled a CORS policy on the bucket like this:

s3-credentials set-cors-policy scrape-instances-social

Then I added the following to my GitHub Actions workflow to build and upload the database after each run of the scraper:

    - name: Build and publish database using git-history
      env:
        AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
        AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
      run: |-
        # First download previous database to save some time
        wget https://scrape-instances-social.s3.amazonaws.com/counts.db
        # Update with latest commits
        ./build-count-history.sh
        # Upload to S3
        s3-credentials put-object scrape-instances-social counts.db counts.db \
          --access-key $AWS_ACCESS_KEY_ID \
          --secret-key $AWS_SECRET_ACCESS_KEY

git-history knows how to only process commits since the last time the database was built, so downloading the previous copy saves a lot of time.

Exploring the data

Now that I have a SQLite database that’s being served over CORS-enabled HTTPS I can open it in Datasette Lite—my implementation of Datasette compiled to WebAssembly that runs entirely in a browser.

https://lite.datasette.io/?url=https://scrape-instances-social.s3.amazonaws.com/counts.db

Any time anyone follows this link their browser will fetch the latest copy of the counts.db file directly from S3.

The most interesting page in there is the item_version_detail SQL view, which joins against the commits table to show the date of each change:

https://lite.datasette.io/?url=https://scrape-instances-social.s3.amazonaws.com/counts.db#/counts/item_version_detail

(Datasette Lite lets you link directly to pages within Datasette itself via a #hash.)

Plotting a chart

Datasette Lite doesn’t have charting yet, so I decided to turn to my favourite visualization tool, an Observable notebook.

Observable has the ability to query SQLite databases (that are served via CORS) directly these days!

Here’s my notebook:

https://observablehq.com/@simonw/mastodon-users-and-statuses-over-time

There are only four cells needed to create the chart shown above.

First, we need to open the SQLite database from the remote URL:

database = SQLiteDatabaseClient.open(
  "https://scrape-instances-social.s3.amazonaws.com/counts.db"
)

Next we need to use an Obervable Database query cell to execute SQL against that database and pull out the data we want to plot—and store it in a query variable:

SELECT _commit_at as date, users, statuses
FROM item_version_detail

We need to make one change to that data—we need to convert the date column from a string to a JavaScript date object:

points = query.map((d) => ({
  date: new Date(d.date),
  users: d.users,
  statuses: d.statuses
}))

Finally, we can plot the data using the Observable Plot charting library like this:

Plot.plot({
  y: {
    grid: true,
    label: "Total users over time across all tracked instances"
  },
  marks: [Plot.line(points, { x: "date", y: "users" })],
  marginLeft: 100
})

I added 100px of margin to the left of the chart to ensure there was space for the large (4,696,000 and up) labels on the y-axis.

A bunch of tricks combined

This project combines a whole bunch of tricks I’ve been pulling together over the past few years:

  • Git scraping is the technique I use to gather the initial data, turning a static listing of instances into a record of changes over time
  • git-history is my tool for turning a scraped Git history into a SQLite database that’s easier to work with
  • s3-credentials makes working with S3 buckets—in particular creating credentials that are restricted to just one bucket—much less frustrating
  • Datasette Lite means that once you have a SQLite database online somewhere you can explore it in your browser—without having to run my full server-side Datasette Python application on a machine somewhere
  • And finally, combining the above means I can take advantage of Observable notebooks for ad-hoc visualization of data that’s hosted online, in this case as a static SQLite database file served from S3

Datasette is 5 today: a call for birthday presents 16 days ago

Five years ago today I published the first release of Datasette, in Datasette: instantly create and publish an API for your SQLite databases.

Five years, 117 releases, 69 contributors, 2,123 commits and 102 plugins later I’m still finding new things to get excited about with th project every single day. I fully expect to be working on this for the next decade-plus.

Datasette is the ideal project for me because it can be applied to pretty much everything that interests me—and I’m interested in a lot of things!

I can use it to experiment with GIS, explore machine learning data, catalog cryptozoological creatures and collect tiny museums. It can power blogs and analyze genomes and figure out my dog’s favourite coffee shop.

The official Datasette website calls it “an open source multi-tool for exploring and publishing data”. This definitely fits how I think about the project today, but I don’t know that it really captures my vision for its future.

In “x for y” terms I’ve started thinking of it as Wordpress for Data.

Wordpress powers 39.5% of the web because its thousands of plugins let it solve any publishing problem you can think of.

I want Datasette to be able to do the same thing for any data analysis, visualization, exploration or publishing problem.

There’s still so much more work to do!

Call for birthday presents

To celebrate this open source project’s birthday, I’ve decided to try something new: I’m going to ask for birthday presents.

An aspect of Datastte’s marketing that I’ve so far neglected is social proof. I think it’s time to change that: I know people are using the software to do cool things, but this often happens behind closed doors.

For Datastte’s birthday, I’m looking for endorsements and case studies and just general demonstrations that show how people are using it do so cool stuff.

So: if you’ve used Datasette to solve a problem, and you’re willing to publicize it, please give us the gift of your endorsement!

How far you want to go is up to you:

  • Not ready or able to go public? Drop me an email. I’ll keep it confidential but just knowing that you’re using these tools will give me a big boost, especially if you can help me understand what I can do to make Datasette more useful to you
  • Add a comment to this issue thread describing what you’re doing. Just a few sentences is fine—though a screenshot or even a link to a live instance would be even better
  • Best of all: a case study—a few paragraphs describing your problem and how you’re solving it, plus permission to list your logo as an organization that uses Datasette. The most visible social proof of all!

I thrive on talking to people who are using Datasette, so if you want to have an in-person conversation you can sign up for a Zoom office hours conversation on a Friday.

I’m also happy to accept endorsements in replies to these posts on Mastodon or on Twitter.

Here’s to the next five years of Datasette. I’m excited to see where it goes next!

Designing a write API for Datasette 20 days ago

Building out Datasette Cloud has made one thing clear to me: Datasette needs a write API for ingesting new data into its attached SQLite databases.

I had originally thought that this could be left entirely to plugins: my datasette-insert plugin already provides a JSON API for inserting data, and other plugins like datasette-upload-csvs also implement data import functionality.

But some things deserve to live in core. An API for manipulating data is one of them, because it can hopefully open up a floodgate of opportunities for other plugins and external applications to build on top of it.

I’ve been working on this over the past two weeks, in between getting distracted by Mastodon (it’s just blogs!).

Designing the API

You can follow my progress in this tracking issue: Write API in Datasette core #1850. I’m building the new functionality in a branch (called 1.0-dev, because this is going to be one of the defining features of Datasette 1.0—and will be previewed in alphas of that release).

Here’s the functionality I’m aiming for in the first alpha:

  • API for writing new records (singular or plural) to a table
  • API for updating an existing record
  • API for deleting an existing record
  • API for creating a new table—either with an explicit schema or by inferring it from a set of provided rows
  • API for dropping a table

I have a bunch of things I plan to add later, but I think the above represents a powerful, coherent set of initial functionality.

In terms of building this, I have a secret weapon: sqlite-utils. It already has both a Python client library and a comprehensive CLI interface for inserting data and creating tables. I’ve evolved the design of those over multiple major versions, and I’m confident that they’re solid. Datasette’s write API will mostly implement the same patterns I’ve eventually settled on for sqlite-utils.

I still need to design the higher level aspects of the API though—the endpoint URLs and the JSON format that will be used.

This is still in flux, but my current design looks like this.

To insert records:

POST /database/table/-/insert
{
    "rows": [
        {"id": 1, "name": "Simon"},
        {"id": 2, "name": "Cleo"}
    ]
}

Or use "row": {...} to insert a single row.

To create a new table with an explicit schema:

POST /database/-/create
{
    "name": "people",
    "columns": [
        {
            "name": "id",
            "type": "integer"
        },
        {
            "name": "title",
            "type": "text"
        }
    ]
   "pk": "id"
}

To create a new table with a schema automatically derived from some initial rows:

POST /database/-/create
{
    "name": "my new table",
    "rows": [
        {"id": 1, "name": "Simon"},
        {"id": 2, "name": "Cleo"}
    ]
   "pk": "id"
}

To update a record:

POST /database/table/134/-/update
{
    "update": {
        "name": "New name"
    }
}

Where 134 in the URL is the primary key of the record. Datasette supports compound primary keys too, so this could be /database/docs/article,242/-/update for a table with a compound primary key.

I’m using a "update" nested object here rather than having everything at the root of the document because that frees me up to add extra future fields that control the update—"alter": true to specify that the table schema should be updated to add new columns, for example.

To delete a record:

POST /database/table/134/-/delete

I thought about using the HTTP DELETE verb here and I’m ready to be convinced that it’s a good idea, but thinking back over my career I can’t see any times where I’ve seen DELETE offered a concrete benefit over just sticking with POST for this kind of thing.

This isn’t going to be a pure REST API, and I’m OK with that.

So many details

There are so many interesting details to consider here—especially given that Datasette is designed to support ANY schema that’s possible in SQLite.

  • Should you be allowed to update the primary key of an existing record?
  • What happens if you try to insert a record that violates a foreign key constraint?
  • What happens if you try to insert a record that violates a unique constraint?
  • How should inserting binary data work, given that JSON doesn’t have a binary type?
  • What permissions should the different API endpoints require (I’m looking to add a bunch of new ones)
  • How should compound primary keys be treated?
  • Should the API return a copy of the records that were just inserted? Initially I thought yes, but it turns out to be a big impact on insert speeds, at least in SQLite versions before the RETURNING clause was added in SQLite 3.35.0 (in March 2021, so not necessarily widely available yet).
  • How should the interactive API explorer work? I’ve been building that in this issue.

I’m working through these questions in the various issues attached to my tracking issue. If you have opinions to share you’re welcome to join me there!

Token authentication

This is another area that I’ve previously left to plugins. datasette-auth-tokens adds Authorization: Bearer xxx authentication to Datasette, but if there’s a write API in core there really needs to be a default token authentication mechanism too.

I’ve implemented a default mechanism based around generating signed tokens, described in issue #1852 and described in this in-progress documentation.

The basic idea is to support tokens that are signed JSON objects (similar to JWT but not JWT, because JWT is a flawed standard—I rolled my own using itsdangerous).

The signed content of a token looks like this:

{
    "a": "user_id",
    "t": 1668022423,
    "d": 3600
}

The "a" field captures the ID of the user created that token. The token can then inherit the permissions of that user.

The "t" field shows when the token was initially created.

The "d" field is optional, and indicates after how many seconds duration the token should expire. This allows for the creation of time-limited tokens.

Tokens can be created using the new /-/create-token page or the new datasette create-token CLI command.

It’s important to note that this is not intended to be the only way tokens work in Datasette. There are plenty of applications where database-backed tokens makes more sense, since it allows tokens to be revoked individually without rotating secrets and revoking every issued token at once. I plan to implement this pattern myself for Datasette Cloud.

But I think this is a reasonable default scheme to include in Datasette core. It can even be turned off entirely using the new --setting allow_signed_tokens off option.

I’m also planning a variant of these tokens that can apply additional restrictions. Let’s say you want to issue a token that acts as your user but is only allowed to insert rows into the docs table in the primary database. You’ll be able to create a token that looks like this:

{
    "a": "simonw",
    "t": 1668022423,
    "r": {
        "t": {
            "primary: {
                "docs": ["ir"]
            }
        }
    }
}

"r" means restrictions. The "t" key indicates per-table restrictions, and the "ir" is an acronym for the insert-row permission.

I’m still fleshing out how this will work, but it feels like an important feature of any permissions system. I find it frustrating any time I’m working with a a system that doesn’t allow me to create scoped-down tokens.

Releases this week

TIL this week

Mastodon is just blogs 21 days ago

And that’s great. It’s also the return of Google Reader!

Mastodon is really confusing for newcomers. There are memes about it.

If you’re an internet user of a certain age, you may find an analogy that’s been working for me really useful:

Mastodon is just blogs.

Every Mastodon account is a little blog. Mine is at https://fedi.simonwillison.net/@simon.

You can post text and images to it. You can link to things. It’s a blog.

You can also subscribe to other people’s blogs—either by “following” them (a subscribe in disguise) or—fun trick—you can add .rss to their page and subscribe in a regular news reader (here’s my feed).

A Mastodon server (often called an instance) is just a shared blog host. Kind of like putting your personal blog in a folder on a domain on shared hosting with some of your friends.

Want to go it alone? You can do that: run your own dedicated Mastodon instance on your own domain (or pay someone to do that for you—I’m using masto.host).

Feeling really nerdy? You can build your own instance from scratch, by implementing the ActivityPub specification and a few others, plus matching some Mastodon conventions.

Differences from regular blogs

Mastodon (actually mostly ActivityPub—Mastodon is just the most popular open source implementation) does add some extra features that you won’t get with a regular blog:

  • Follows: you can follow other blogs, and see who you are following and who is following you
  • Likes: you can like a post—people will see that you liked it
  • Retweets: these are called “boosts”. They duplicate someone’s post on your blog too, promoting it to your followers
  • Replies: you can reply to other people’s posts with your own
  • Privacy levels: you can make a post public, visible only to your followers, or visible only to specific people (effectively a group direct message)

These features are what makes it interesting, and also what makes it significantly more complicated—both to understand and to operate.

Add all of these features to a blog and you get a blog that’s lightly disguised as a Twitter account. It’s still a blog though!

It doesn’t have to be a shared host

This shared hosting aspect is the root of many of the common complaints about Mastodon: “The server admins can read your private messages! They can ban you for no reason! They can delete your account! If they lose interest the entire server could go away one day!”

All of this is true.

This is why I like the shared blog hosting analogy: the same is true there too.

In both cases, the ultimate solution is to host it yourself. Mastodon has more moving pieces than a regular static blog, so this is harder—but it’s not impossibly hard.

I’m paying to host my own server for exactly this reason.

It’s also a shared feed reader

This is where things get a little bit more complicated.

Do you still miss Google Reader, almost a decade after it was shut down? It’s back!

A Mastodon server is a feed reader, shared by everyone who uses that server.

Users on one server can follow users on any other server—and see their posts in their feed in near-enough real time.

This works because each Mastodon server implements a flurry of background activity. My personal server, serving just me, already tells me it has processed 586,934 Sidekiq jobs since I started using it.

Blogs and feed readers work by polling for changes every few hours. ActivityPub is more ambitious: any time you post something, your server actively sends your new post out to every server that your followers are on.

Every time someone followed by you (or any other user on your server) posts, your server receives that post, stores a copy and adds it to your feed.

Servers offer a “federated” timeline. That’s effectively a combined feed of all of the public posts from every account on Mastodon that’s followed by at least one user on your server.

It’s like you’re running a little standalone copy of the Google Reader server application and sharing it with a few dozen/hundred/thousand of your friends.

May a thousand servers bloom

If you’re reading this with a web engineering background, you may be thinking that this sounds pretty alarming! Half a million Sidekiq jobs to support a single user? Huge amounts of webhooks firing every time someone posts?

Somehow it seems to work. But can it scale?

The key to scaling Mastodon is spreading the cost of all of that background activity across a large number of servers.

And unlike something like Twitter, where you need to host all of those yourself, Mastodon scales by encouraging people to run their own servers.

On November 2nd Mastodon founder Eugen Rochko posted the following:

199,430 is the number of new users across different Mastodon servers since October 27, along with 437 new servers. This bring last day’s total to 608,837 active users, which is without precedent the highest it’s ever been for Mastodon and the fediverse.

That’s 457 new users for each new server.

Any time anyone builds something decentralized like this, the natural pressure is to centralize it again.

In Mastodon’s case though, decentralization is key to getting it to scale. And the organization behind mastodon.social, the largest server, is a German non-profit with an incentive to encourage new servers to help spread the load.

Will it break? I don’t think so. Regular blogs never had to worry about scaling, because that’s like worrying that the internet will run out of space for new content.

Mastodon servers are a lot chattier and expensive to run, but they don’t need to talk to everything else on the network—they only have to cover the social graph of the people using them.

It may prove unsustainable to run a single Mastodon server with a million users—but if you split that up into ten servers covering 100,000 users each I feel like it should probably work.

Running on multiple, independently governed servers is also Mastodon’s answer to the incredibly hard problem of scaling moderation. There’s a lot more to be said about this and I’m not going to try and do it justice here, but I recommend reading this Time interview with Mastodon founder Eugen for a good introduction.

How does this all get paid for?

One of the really refreshing things about Mastodon is the business model. There are no ads. There’s no VC investment, burning early money to grow market share for later.

There are just servers, and people paying to run them and volunteering their time to maintain them.

Elon did us all a favour here by setting $8/month as the intended price for Twitter Blue. That’s now my benchmark for how much I should be contributing to my Mastodon server. If everyone who can afford to do so does that, I think we’ll be OK.

And it’s very clear what you’re getting for the money. How much each server costs to run can be a matter of public record.

The oldest cliche about online business models is “if you’re not paying for the product, you are the product being sold”.

Mastodon is our chance to show that we’ve learned that lesson and we’re finally ready to pay up!

Is it actually going to work?

Mastodon has been around for six years now—and the various standards it is built on have been in development I believe since 2008.

A whole generation of early adopters have been kicking the tyres on this thing for years. It is not a new, untested piece of software. A lot of smart people have put a lot of work into this for a long time.

No-one could have predicted that Elon would drive it into hockeystick growth mode in under a week. Despite the fact that it’s run by volunteers with no profit motive anywhere to be found, it’s holding together impressively well.

My hunch is that this is going to work out just fine.

Don’t judge a website by its mobile app

Just like blogs, Mastodon is very much a creature of the Web.

There’s an official Mastodon app, and it’s decent, but it suffers the classic problem of so many mobile apps in that it doesn’t quite keep up with the web version in terms of features.

More importantly, its onboarding process for creating a new account is pretty confusing!

I’m seeing a lot of people get frustrated and write-off Mastodon as completely impenetrable. I have a hunch that many of these are people who’s only experience has come from downloading the official app.

So don’t judge a federated web ecosystem exclusively by its mobile app! If you begin your initial Mastodon exploration on a regular computer you may find it easier to get started.

Other apps exist—in fact the official app is a relatively recent addition to the scene, just over a year old. I’m personally a fan of Toot! for iOS, which includes some delightful elephant animations.

The expanded analogy

Here’s my expanded version of that initial analogy:

Mastodon is just blogs and Google Reader, skinned to look like Twitter.

Elsewhere

Today

  • Scaling Mastodon: The Compendium (via) Hazel Weakly’s collection of notes on scaling Mastodon, covering PostgreSQL, Sidekiq, Redis, object storage and more. #29th November 2022, 5:46 am
  • Stable Diffusion 2.0 and the Importance of Negative Prompts for Good Results. Stable Diffusion 2.0 is out, and it’s a very different model from 1.4/1.5. It’s trained using a new text encoder (OpenCLIP, in place of OpenAI’s CLIP) which means a lot of the old tricks—notably using “Greg Rutkowski” to get high quality fantasy art—no longer work. What DOES work, incredibly well, is negative prompting—saying things like “cyberpunk forest by Salvador Dali” but negative on “trees, green”. Max Woolf explores negative prompting in depth in this article, including how to combine it with textual inversion. #29th November 2022, 1:22 am

Yesterday

  • If posts in a social media app do not have URLs that can be linked to and viewed in an unauthenticated browser, or if there is no way to make a new post from a browser, then that program is not a part of the World Wide Web in any meaningful way.

    Consign that app to oblivion.

    JWZ # 28th November 2022, 6:22 am

26th November 2022

24th November 2022

  • Microsoft Flight Simulator: WebAssembly (via) This is such a smart application of WebAssembly: it can now be used to write extensions for Microsoft Flight Simulator, which means you can run code from untrusted sources safely in a sandbox. I’m really looking forward to more of this kind of usage—I love the idea of finally having a robust sandbox for running things like plugins. #24th November 2022, 2:08 am

21st November 2022

  • Building a BFT JSON CRDT (via) Jacky Zhao describes their project to build a CRDT library for JSON data in Rust, and includes a thorough explanation of what CRDTs are and how they work. “I write this blog post mostly as a note to my past self, distilling a lot of what I’ve learned since into a blog post I wish I had read before going in”—the best kind of blog post! #21st November 2022, 7:56 pm

20th November 2022

19th November 2022

  • ... it [ActivityPub] is crucially good enough. Perfect is the enemy of good, and in ActivityPub we have a protocol that has flaws but, crucially, that works, and has a standard we can all mostly agree on how to implement—and eventually, I hope, agree on how to improve.

    Andrew Godwin # 19th November 2022, 4:02 pm

18th November 2022

  • Datasette Lite: Loading JSON data (via) I added a new feature to Datasette Lite: you can now pass it the URL to a JSON file (hosted on a CORS-compatible hosting provider such as GitHub or GitHub Gists) and it will load that file into a database table for you. It expects an array of objects, but if your file has an object as the root it will search through it looking for the first key that is an array of objects and load those instead. #18th November 2022, 6:43 pm

16th November 2022

  • These kinds of biases aren’t so much a technical problem as a sociotechnical one; ML models try to approximate biases in their underlying datasets and, for some groups of people, some of these biases are offensive or harmful. That means in the coming years there will be endless political battles about what the ‘correct’ biases are for different models to display (or not display), and we can ultimately expect there to be as many approaches as there are distinct ideologies on the planet. I expect to move into a fractal ecosystem of models, and I expect model providers will ‘shapeshift’ a single model to display different biases depending on the market it is being deployed into. This will be extraordinarily messy.

    Jack Clark # 16th November 2022, 11:04 pm

  • fasiha/yamanote (via) Yamanote is “a guerrilla bookmarking server” by Ahmed Fasih—it works using a bookmarklet that grabs a full serialized copy of the page—the innerHTML of both the head and body element—and passes it to the server, which stores it in a SQLite database. The files are then served with a Content-Security-Policy’: `default-src ’self’ header to prevent stored pages from fetching ANY external assets when they are viewed. #16th November 2022, 3:48 am
  • JSON Changelog with SQLite (via) One of my favourite database challenges is how to track changes to rows over time. This is a neat recipe from 2018 which uses SQLite triggers and the SQLite JSON functions to serialize older versions of the rows and store them in TEXT columns. #16th November 2022, 3:41 am

11th November 2022

  • Home invasion: Mastodon's Eternal September begins. Hugh Rundle’s thoughtful write-up of the impact of the massive influx of new users from Twitter on the existing Mastodon community. If you’re new to Mastodon (like me) you should read this and think carefully about how best to respectfully integrate with your new online space. #11th November 2022, 12:47 am