State-of-the-art music scanning by Soundslice. It's been a while since I checked in on Soundslice, Adrian Holovaty's beautiful web application focused on music education.
The latest feature is spectacular. The Soundslice music editor - already one of the most impressive web applications I've ever experienced - can now import notation directly from scans or photos of sheet music.
The attention to detail is immaculate. The custom machine learning model can handle a wide variety of notation details, and the system asks the user to verify or correct details that it couldn't perfectly determine using a neatly designed flow.
Free accounts can scan two single page documents a month, and paid plans get a much higher allowance. I tried it out just now on a low resolution image I found on Wikipedia and it did a fantastic job, even allowing me to listen to a simulated piano rendition of the music once it had finished processing.
It's worth spending some time with the release notes for the feature to appreciate how much work they've out into improving it since the initial release.
If you're new to Soundslice, here's an example of their core player interface which syncs the display of music notation to an accompanying video.
Adrian wrote up some detailed notes on the machine learning behind the feature when they first launched it in beta back in November 2022.
OMR [Optical Music Recognition] is an inherently hard problem, significantly more difficult than text OCR. For one, music symbols have complex spatial relationships, and mistakes have a tendency to cascade. A single misdetected key signature might result in multiple incorrect note pitches. And there’s a wide diversity of symbols, each with its own behavior and semantics — meaning the problems and subproblems aren’t just hard, there are many of them.
Recent articles
- Open challenges for AI engineering - 27th June 2024
- Building search-based RAG using Claude, Datasette and Val Town - 21st June 2024
- Weeknotes: Datasette Studio and a whole lot of blogging - 19th June 2024
- Language models on the command-line - 17th June 2024
- A homepage redesign for my blog's 22nd birthday - 12th June 2024
- Thoughts on the WWDC 2024 keynote on Apple Intelligence - 10th June 2024