395 items tagged “generativeai”
GGUF, the long way around (via) Vicki Boykis dives deep into the GGUF format used by llama.cpp, after starting with a detailed description of how PyTorch models work and how they are traditionally persisted using Python pickle.
Pickle lead to safetensors, a format that avoided the security problems with downloading and running untrusted pickle files.
Llama.cpp introduced GGML, which popularized 16-bit (as opposed to 32-bit) quantization and bundled metadata and tensor data in a single file.
Mistral Large. Mistral Medium only came out two months ago, and now it’s followed by Mistral Large. Like Medium, this new model is currently only available via their API. It scores well on benchmarks (though not quite as well as GPT-4) but the really exciting feature is function support, clearly based on OpenAI’s own function design.
Functions are now supported via the Mistral API for both Mistral Large and the new Mistral Small, described as follows: “Mistral Small, optimised for latency and cost. Mistral Small outperforms Mixtral 8x7B and has lower latency, which makes it a refined intermediary solution between our open-weight offering and our flagship model.” # 26th February 2024, 11:23 pm
Does Offering ChatGPT a Tip Cause it to Generate Better Text? An Analysis (via) Max Woolf:“I have a strong hunch that tipping does in fact work to improve the output quality of LLMs and its conformance to constraints, but it’s very hard to prove objectively. [...] Let’s do a more statistical, data-driven approach to finally resolve the debate.” # 23rd February 2024, 5:42 pm
Last week Google introduced Gemini Pro 1.5, an enormous upgrade to their Gemini series of AI models.[... 2839 words]
Something that caught my eye in the terms: “Google may update Gemma from time to time, and you must make reasonable efforts to use the latest version of Gemma.”
One of the biggest benefits of running your own model is that it can protect you from model updates that break your carefully tested prompts, so I’m not thrilled by that particular clause.
UPDATE: It turns out that clause isn’t uncommon—the phrase “You shall undertake reasonable efforts to use the latest version of the Model” is present in both the Stable Diffusion and BigScience Open RAIL-M licenses. # 21st February 2024, 4:22 pm
Let’s build the GPT Tokenizer. When Andrej Karpathy left OpenAI last week a lot of people expressed hope that he would be increasing his output of educational YouTube videos.
Here’s an in-depth 2 hour dive into how tokenizers work and how to build one from scratch, published this morning.
The section towards the end, “revisiting and explaining the quirks of LLM tokenization”, helps explain a number of different LLM weaknesses—inability to reverse strings, confusion over arithmetic and even a note on why YAML can work better than JSON when providing data to LLMs (the same data can be represented in less tokens). # 20th February 2024, 6:02 pm
Representation Engineering: Mistral-7B on Acid (via) Theia Vogel provides a delightfully clear explanation (and worked examples) of control vectors—a relatively recent technique for influencing the behaviour of an LLM by applying vectors to the hidden states that are evaluated during model inference.
These vectors are surprisingly easy to both create and apply. Build a small set of contrasting prompt pairs—“Act extremely happy” v.s. “Act extremely sad” for example (with a tiny bit of additional boilerplate), then run a bunch of those prompts and collect the hidden layer states. Then use “single-component PCA” on those states to get a control vector representing the difference.
llmc.sh (via) Adam Montgomery wrote this a neat wrapper around my LLM CLI utility: it adds a “llmc” zsh function which you can ask for shell commands (llmc ’use ripgrep to find files matching otter’) which outputs the command, an explanation of the command and then copies the command to your clipboard for you to paste and execute if it looks like the right thing. # 16th February 2024, 6:19 pm
Our next-generation model: Gemini 1.5 (via) The big news here is about context length: Gemini 1.5 (a Mixture-of-Experts model) will do 128,000 tokens in general release, available in limited preview with a 1 million token context and has shown promising research results with 10 million tokens!
Memory and new controls for ChatGPT (via) ChatGPT now has "memory", and it’s implemented in a delightfully simple way. You can instruct it to remember specific things about you and it will then have access to that information in future conversations—and you can view the list of saved notes in settings and delete them individually any time you want to.
The feature works by adding a new tool called "bio" to the system prompt fed to ChatGPT at the beginning of every conversation, described like this:
"The `bio` tool allows you to persist information across conversations. Address your message `to=bio` and write whatever information you want to remember. The information will appear in the model set context below in future conversations."
GPUs on Fly.io are available to everyone! We’ve been experimenting with GPUs on Fly for a few months for Datasette Cloud. They’re well documented and quite easy to use—any example Python code you find that uses NVIDIA CUDA stuff generally Just Works. Most interestingly of all, Fly GPUs can scale to zero—so while they cost $2.50/hr for a A100 40G (VRAM) and $3.50/hr for a A100 80G you can configure them to stop running when the machine runs out of things to do.
We’ve successfully used them to run Whisper and to experiment with running various Llama 2 LLMs as well.
Aya (via) “A global initiative led by Cohere For AI involving over 3,000 independent researchers across 119 countries. Aya is a state-of-art model and dataset, pushing the boundaries of multilingual AI for 101 languages through open science.”
Both the model and the training data are released under Apache 2. The training data looks particularly interesting: “513 million instances through templating and translating existing datasets across 114 languages”—suggesting the data is mostly automatically generated. # 13th February 2024, 5:14 pm
The unsettling scourge of obituary spam (via) Well this is particularly grim. Apparently “obituary aggregator” sites have been an SEO trick for at least 15 years, and now they’re using generative AI to turn around junk rewritten (and frequently inaccurate) obituaries even faster. # 13th February 2024, 12:36 am
One consideration is that such a deep ML system could well be developed outside of Google-- at Microsoft, Baidu, Yandex, Amazon, Apple, or even a startup. My impression is that the Translate team experienced this. Deep ML reset the translation game; past advantages were sort of wiped out. Fortunately, Google’s huge investment in deep ML largely paid off, and we excelled in this new game. Nevertheless, our new ML-based translator was still beaten on benchmarks by a small startup. The risk that Google could similarly be beaten in relevance by another company is highlighted by a startling conclusion from BERT: huge amounts of user feedback can be largely replaced by unsupervised learning from raw text. That could have heavy implications for Google.
Reality is that LLMs are not AGI -- they’re a big curve fit to a very large dataset. They work via memorization and interpolation. But that interpolative curve can be tremendously useful, if you want to automate a known task that’s a match for its training data distribution.
Memorization works, as long as you don’t need to adapt to novelty. You don’t *need* intelligence to achieve usefulness across a set of known, fixed scenarios.
Google’s Gemini Advanced: Tasting Notes and Implications. Ethan Mollick reviews the new Google Gemini Advanced—a rebranded Bard, released today, that runs on the GPT-4 competitive Gemini Ultra model.
“GPT-4 [...] has been the dominant AI for well over a year, and no other model has come particularly close. Prior to Gemini, we only had one advanced AI model to look at, and it is hard drawing conclusions with a dataset of one. Now there are two, and we can learn a few things.”
If your only way of making a painting is to actually dab paint laboriously onto a canvas, then the result might be bad or good, but at least it’s the result of a whole lot of micro-decisions you made as an artist. You were exercising editorial judgment with every paint stroke. That is absent in the output of these programs.
Open Language Models (OLMos) and the LLM landscape (via) OLMo is a newly released LLM from the Allen Institute for AI (AI2) currently available in 7b and 1b parameters (OLMo-65b is on the way) and trained on a fully openly published dataset called Dolma.
The model and code are Apache 2, while the data is under the “AI2 ImpACT license”.
From the benchmark scores shared here by Nathan Lambert it looks like this may be the highest performing model currently available that was built using a fully documented training set.
LLMs may offer immense value to society. But that does not warrant the violation of copyright law or its underpinning principles. We do not believe it is fair for tech firms to use rightsholder data for commercial purposes without permission or compensation, and to gain vast financial rewards in the process. There is compelling evidence that the UK benefits economically, politically and societally from upholding a globally respected copyright regime.
For many people in many organizations, their measurable output is words—words in emails, in reports, in presentations. We use words as proxy for many things: the number of words is an indicator of effort, the quality of the words is an indicator of intelligence, the degree to which the words are error-free is an indicator of care.
[...] But now every employee with Copilot can produce work that checks all the boxes of a formal report without necessarily representing underlying effort.
teknium/OpenHermes-2.5 (via) The Nous-Hermes and Open Hermes series of LLMs, fine-tuned on top of base models like Llama 2 and Mistral, have an excellent reputation and frequently rank highly on various leaderboards.
The developer behind them, Teknium, just released the full set of fine-tuning data that they curated to build these models. It’s a 2GB JSON file with over a million examples of high quality prompts, responses and some multi-prompt conversations, gathered from a number of different sources and described in the data card. # 1st February 2024, 4:18 am
Simon Willison interview: AI software still needs the human touch. Thomas Claburn interviewed me for The Resister. We talked about AI training copyright, applications of AI for programming, AI security and a whole bunch of other topics. # 27th January 2024, 10:08 pm
Danielle Del, a spokeswoman for Sasso, said Dudesy is not actually an A.I.
“It’s a fictional podcast character created by two human beings, Will Sasso and Chad Kultgen,” Del wrote in an email. “The YouTube video ‘I’m Glad I’m Dead’ was completely written by Chad Kultgen.”
The Articulation Barrier: Prompt-Driven AI UX Hurts Usability. Jakob Nielsen: “Generative AI systems like ChatGPT use prose prompts for intent-based outcomes, requiring users to be articulate in writing prose, which is a challenge for half of the population in rich countries.” # 27th January 2024, 3:49 pm
[... 1278 words]
Fairly Trained launches certification for generative AI models that respect creators’ rights. I’ve been using the term “vegan models” for a while to describe machine learning models that have been trained in a way that avoids using unlicensed, copyrighted data. Fairly Trained is a new non-profit initiative that aims to encourage such models through a “certification” stamp of approval.
Django Chat: Datasette, LLMs, and Django. I’m the guest on the latest episode of the Django Chat podcast. We talked about Datasette, LLMs, the New York Times OpenAI lawsuit, the Python Software Foundation and all sorts of other topics. # 24th January 2024, 8:41 pm
Google Research: Lumiere. The latest in text-to-video from Google Research, described as “a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion”.
Most existing text-to-video models generate keyframes and then use other models to fill in the gaps, which frequently leads to a lack of coherency. Lumiere “generates the full temporal duration of the video at once”, which avoids this problem.
Disappointingly but unsurprisingly the paper doesn’t go into much detail on the training data, beyond stating “We train our T2V model on a dataset containing 30M videos along with their text caption. The videos are 80 frames long at 16 fps (5 seconds)”.
Prompt Lookup Decoding (via) Really neat LLM optimization trick by Apoorv Saxena, who observed that it’s common for sequences of tokens in LLM input to be reflected by the output—snippets included in a summarization, for example.
Apoorv’s code performs a simple search for such prefixes and uses them to populate a set of suggested candidate IDs during LLM token generation.
AWS Fixes Data Exfiltration Attack Angle in Amazon Q for Business. An indirect prompt injection (where the AWS Q bot consumes malicious instructions) could result in Q outputting a markdown link to a malicious site that exfiltrated the previous chat history in a query string.