All the Hard Stuff Nobody Talks About when Building Products with LLMs |
https://www.honeycomb.io/blog/hard-stuff-nobody-talks-about-llm |
Phillip Carter shares lessons learned building LLM features for Honeycomb - hard won knowledge from building a query assistant for turning human questions into Honeycomb query filters.
This is very entertainingly written. "Use Embeddings and pray to the dot product gods that whatever distance function you use to pluck a relevant subset out of the embedding is actually relevant".
Few-shot prompting with examples had the best results out of the approaches they tried.
The section on how they're dealing with the threat of prompt injection - "The output of our LLM call is non-destructive and undoable, No human gets paged based on the output of our LLM call..." is particularly smart. |
2023-05-27 21:13:54+00:00 |
Exploration de données avec Datasette |
https://meetup-python-grenoble.github.io/datasette-workshop/ |
One of the great delights of open source development is seeing people run workshops on your project, even more so when they're in a language other than English! Romain Clement presented this French workshop for the Python Grenoble meetup on 25th May 2023, using GitHub Codespaces as the environment. It's pretty comprehensive, including a 300,000+ row example table which illustrates Datasette plugins such as datasette-cluster-map and datasette-leaflet-geojson. |
2023-05-27 00:36:52+00:00 |
Deno 1.34: deno compile supports npm packages |
https://deno.com/blog/v1.34 |
This feels like it could be extremely useful: Deno can load code from npm these days ('import { say } from "npm:cowsay@1.5.0"') and now the "deno compile" command can resolve those imports, fetch all of the dependencies and bundle them together with Deno itself into a single executable binary. This means pretty much anything that's been built as an npm package can now be easily converted into a standalone binary, including cross-compilation to Windows x64, macOS x64, macOS ARM and Linux x64. |
2023-05-25 17:01:08+00:00 |
Migrating out of PostHaven |
https://amjith.com/blog/posthaven/ |
Amjith Ramanujam decided to migrate his blog content from PostHaven to a Markdown static site. He used shot-scraper (shelled out to from a Python script) to scrape his existing content using a snippet of JavaScript, wrote the content to a SQLite database using sqlite-utils, then used markdownify (new to me, a neat Python package for converting HTML to Markdown via BeautifulSoup) to write the content to disk as Markdown. |
2023-05-24 19:38:37+00:00 |
REGENT: Coastal Travel. 100% Electric |
https://www.regentcraft.com/ |
As a long-time fan of ekranoplans this is very exciting to me: the REGENT Seaglider is a fully electric passenger carrying wing-in-ground-effect vehicle designed to serve coastal routes, operating at half the cost of an aircraft (and 1/10th the cost of a helicopter) and using hydrofoils to resolve previous problems with ekranoplans and wave tolerance. They're a YC company and the founder has been answering questions on Hacker News today. They've pre-sold 467 vehicles already and expect them to start entering service in various locations around the world "mid-decade". |
2023-05-24 02:17:08+00:00 |
Instant colour fill with HTML Canvas |
https://shaneosullivan.wordpress.com/2023/05/23/instant-colour-fill-with-html-canvas/ |
Shane O'Sullivan describes how to implement instant colour fill using HTML Canvas and some really clever tricks with Web Workers. A new technique to me is passing a canvas.getImageData() object to a Web Worker via worker.postMessage({action: "process", buffer: imageData.data.buffer}, [imageData.data.buffer]) where that second argument is a list of objects to "transfer ownership of" - then the worker can create a new ImageData(), populate it and transfer ownership of that back to the parent window. |
2023-05-24 01:27:00+00:00 |
MMS Language Coverage in Datasette Lite |
https://lite.datasette.io/?metadata=https://gist.github.com/simonw/492ea1059ad7942f0eec902a58fe9ea0&json=https://gist.github.com/simonw/63aa33ec827b093f9c6a2797df950aba#/data/raw?_filter_column=Language+Name&_filter_op=contains&_filter_value=&_sort=rowid&_facet=ASR&_facet=TTS&_facet=LID&_facet=Subgrouping&_facet=Language+Family |
I converted the HTML table of 4,021 languages supported by Meta's new Massively Multilingual Speech models to newline-delimited JSON and loaded it into Datasette Lite. Faceting by Language Family is particularly interesting - the top five families represented are Niger-Congo with 1,019, Austronesian with 609, Sino-Tibetan with 288, Indo-European with 278 and Afro-Asiatic with 222. |
2023-05-22 20:01:41+00:00 |
MLC: Bringing Open Large Language Models to Consumer Devices |
https://mlc.ai/blog/2023/05/22/bringing-open-large-language-models-to-consumer-devices |
"We bring RedPajama, a permissive open language model to WebGPU, iOS, GPUs, and various other platforms." I managed to get this running on my Mac (see via link) with a few tweaks to their official instructions. |
2023-05-22 19:25:13+00:00 |
Introducing speech-to-text, text-to-speech, and more for 1,100+ languages |
https://ai.facebook.com/blog/multilingual-model-speech-recognition/ |
New from Meta AI: Massively Multilingual Speech. "MMS supports speech-to-text and text-to-speech for 1,107 languages and language identification for over 4,000 languages. [...] Some of these, such as the Tatuyo language, have only a few hundred speakers, and for most of these languages, no prior speech technology exists."
It's licensed CC-BY-NC 4.0 though, so it's not available for commercial use.
"In a like-for-like comparison with OpenAI’s Whisper, we found that models trained on the Massively Multilingual Speech data achieve half the word error rate, but Massively Multilingual Speech covers 11 times more languages."
The training data was mostly sourced from audio Bible translations. |
2023-05-22 19:22:38+00:00 |
Trogon |
https://github.com/Textualize/trogon |
The latest project from the Textualize/Rich crew, Trogon provides a Python decorator - @tui - which, when applied to a Click CLI application, adds a new interactive TUI mode which introspects the available subcommands and their options and creates a full Text User Interface - with keyboard and mouse support - for assembling invocations of those various commands.
I just shipped sqlite-utils 3.32 with support for this - it uses an optional dependency, so you'll need to run "sqlite-utils install trogon" and then "sqlite-utils tui" to try it out. |
2023-05-21 21:39:01+00:00 |