Items tagged datasette, ai in 2023
Filters: Year: 2023 × datasette × ai × Sorted by date
Weeknotes: the Datasette Cloud API, a podcast appearance and more
Datasette Cloud now has a documented API, plus a podcast appearance, some LLM plugins work and some geospatial excitement.
[... 1243 words]Weeknotes: Self-hosted language models with LLM plugins, a new Datasette tutorial, a dozen package releases, a dozen TILs
A lot of stuff to cover from the past two and a half weeks.
[... 1742 words]Vector Search. Amjith Ramanujam provides a very thorough tutorial on implementing vector similarity search using SentenceTransformers embeddings (all-MiniLM-L6-v2) executed using sqlite-utils, then served via datasette-sqlite-vss and deployed using Fly. # 2nd June 2023, 5:02 am
MMS Language Coverage in Datasette Lite. I converted the HTML table of 4,021 languages supported by Meta’s new Massively Multilingual Speech models to newline-delimited JSON and loaded it into Datasette Lite. Faceting by Language Family is particularly interesting—the top five families represented are Niger-Congo with 1,019, Austronesian with 609, Sino-Tibetan with 288, Indo-European with 278 and Afro-Asiatic with 222. # 22nd May 2023, 8:01 pm
What’s in the RedPajama-Data-1T LLM training set
RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute.
[... 1077 words]Weeknotes: AI won’t slow down, a new newsletter and a huge Datasette refactor
I’m a few weeks behind on my weeknotes, but it’s not through lack of attention to my blog. AI just keeps getting weirder and more interesting.
[... 1255 words]Exploring MusicCaps, the evaluation data released to accompany Google’s MusicLM text-to-music model
Google Research just released MusicLM: Generating Music From Text. It’s a new generative AI model that takes a descriptive prompt and produces a “high-fidelity” music track. Here’s the paper (and a more readable version using arXiv Vanity).
[... 1323 words]Weeknotes: AI hacking and a SpatiaLite tutorial
Short weeknotes this time because the key things I worked on have already been covered here:
[... 477 words]How to implement Q&A against your documentation with GPT3, embeddings and Datasette
If you’ve spent any time with GPT-3 or ChatGPT, you’ve likely thought about how useful it would be if you could point them at a specific, current collection of text or documentation and have it use that as part of its input for answering questions.
[... 3491 words]