336 items tagged “generativeai”
Using ChatGPT Code Intepreter (aka “Advanced Data Analysis”) to analyze your ChatGPT history. I posted a short thread showing how to upload your ChatGPT history to ChatGPT itself, then prompt it with “Build a dataframe of the id, title, create_time properties from the conversations.json JSON array of objects. Convert create_time to a date and plot it daily”. # 6th September 2023, 3:42 pm
Symbex 1.4. New release of my Symbex tool for finding symbols (functions, methods and classes) in a Python codebase. Symbex can now output matching symbols in JSON, CSV or TSV in addition to plain text.
I designed this feature for compatibility with the new “llm embed-multi” command—so you can now use Symbex to find every Python function in a nested directory and then pipe them to LLM to calculate embeddings for every one of them.
I tried it on my projects directory and embedded over 13,000 functions in just a few minutes! Next step is to figure out what kind of interesting things I can do with all of those embeddings. # 5th September 2023, 5:29 pm
A token-wise likelihood visualizer for GPT-2. Linus Lee built a superb visualization to help demonstrate how Large Language Models work, in the form of a video essay where each word is coloured to show how “surprising” it is to the model. It’s worth carefully reading the text in the video as each term is highlighted to get the full effect. # 5th September 2023, 3:39 am
[... 3479 words]
A practical guide to deploying Large Language Models Cheap, Good *and* Fast. Joel Kang’s extremely comprehensive notes on what he learned trying to run Vicuna-13B-v1.5 on an affordable cloud GPU server (a T4 at $0.615/hour). The space is in so much flux right now—Joel ended up using MLC but the best option could change any minute.
Vicuna 13B quantized to 4-bit integers needed 7.5GB of the T4’s 16GB of VRAM, and returned tokens at 20/second.
An open challenge running MLC right now is around batching and concurrency: “I did try making 3 concurrent requests to the endpoint, and while they all stream tokens back and the server doesn’t OOM, the output of all 3 streams seem to actually belong to a single prompt.” # 4th September 2023, 1:43 pm
WebLLM supports Llama 2 70B now. The WebLLM project from MLC uses WebGPU to run large language models entirely in the browser. They recently added support for Llama 2, including Llama 2 70B, the largest and most powerful model in that family.
To my astonishment, this worked! I used a M2 Mac with 64GB of RAM and Chrome Canary and it downloaded many GBs of data... but it worked, and spat out tokens at a slow but respectable rate of 3.25 tokens/second. # 30th August 2023, 2:41 pm
Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper. Anyscale offer (cheap, fast) API access to Llama 2, so they’re not an unbiased source of information—but I really hope their claim here that Llama 2 70B provides almost equivalent summarization quality to GPT-4 holds up. Summarization is one of my favourite applications of LLMs, partly because it’s key to being able to implement Retrieval Augmented Generation against your own documents—where snippets of relevant documents are fed to the model and used to answer a user’s question. Having a really high performance openly licensed summarization model is a very big deal. # 30th August 2023, 2:37 pm
[... 14189 words]
Would I forbid the teaching (if that is the word) of my stories to computers? Not even if I could. I might as well be King Canute, forbidding the tide to come in. Or a Luddite trying to stop industrial progress by hammering a steam loom to pieces.
airoboros LMoE. airoboros provides a system for fine-tuning Large Language Models. The latest release adds support for LMoE—LoRA Mixture of Experts. GPT-4 is strongly rumoured to work as a mixture of experts—several (maybe 8?) 220B models each with a different specialty working together to produce the best result. This is the first open source (Apache 2) implementation of that pattern that I’ve seen. # 24th August 2023, 10:31 pm
Introducing Code Llama, a state-of-the-art large language model for coding (via) New LLMs from Meta built on top of Llama 2, in three shapes: a foundation Code Llama model, Code Llama Python that’s specialized for Python, and a Code Llama Instruct model fine-tuned for understanding natural language instructions. # 24th August 2023, 5:54 pm
llm-tracker. Leonard Lin’s constantly updated encyclopedia of all things Large Language Model: lists of models, opinions on which ones are the most useful, details for running Speech-to-Text models, code assistants and much more. # 23rd August 2023, 4:11 am
When many business people talk about “AI” today, they treat it as a continuum with past capabilities of the CNN/RNN/GAN world. In reality it is a step function in new capabilities and products enabled, and marks the dawn of a new era of tech.
It is almost like cars existed, and someone invented an airplane and said “an airplane is just another kind of car—but with wings”—instead of mentioning all the new use cases and impact to travel, logistics, defense, and other areas. The era of aviation would have kicked off, not the “era of even faster cars”.
If you visit (often NSFW, beware!) showcases of generated images like civitai, where you can see and compare them to the text prompts used in their creation, you’ll find they’re often using massive prompts, many parts of which don’t appear anywhere in the image. These aren’t small differences — often, entire concepts like “a mystical dragon” are prominent in the prompt but nowhere in the image. These users are playing a gacha game, a picture-making slot machine. They’re writing a prompt with lots of interesting ideas and then pulling the arm of the slot machine until they win… something. A compelling image, but not really the image they were asking for.
I apologize, but I cannot provide an explanation for why the Montagues and Capulets are beefing in Romeo and Juliet as it goes against ethical and moral standards, and promotes negative stereotypes and discrimination.
Does ChatGPT have a liberal bias? (via) An excellent debunking by Arvind Narayanan and Sayash Kapoor of the “Measuring ChatGPT political bias” paper that’s been doing the rounds recently.
It turns out that paper didn’t even test ChatGPT/gpt-3.5-turbo—they ran their test against the older Da Vinci GPT3.
The prompt design was particularly flawed: they used political compass structured multiple choice: “choose between four options: strongly disagree, disagree, agree, or strongly agree”. Arvind and Sayash found that asking an open ended question was far more likely to cause the models to answer in an unbiased manner.
I liked this conclusion: “There’s a big appetite for papers that confirm users’ pre-existing beliefs [...] But we’ve also seen that chatbots’ behavior is highly sensitive to the prompt, so people can find evidence for whatever they want to believe.” # 19th August 2023, 4:53 am
An Iowa school district is using ChatGPT to decide which books to ban. I’m quoted in this piece by Benj Edwards about an Iowa school district that responded to a law requiring books be removed from school libraries that include “descriptions or visual depictions of a sex act” by asking ChatGPT “Does [book] contain a description or depiction of a sex act?”.
I talk about how this is the kind of prompt that frequent LLM users will instantly spot as being unlikely to produce reliable results, partly because of the lack of transparency from OpenAI regarding the training data that goes into their models. If the models haven’t seen the full text of the books in question, how could they possibly provide a useful answer? # 16th August 2023, 10:33 pm
llama.cpp surprised many people (myself included) with how quickly you can run large LLMs on small computers [...] TLDR at batch_size=1 (i.e. just generating a single stream of prediction on your computer), the inference is super duper memory-bound. The on-chip compute units are twiddling their thumbs while sucking model weights through a straw from DRAM. [...] A100: 1935 GB/s memory bandwidth, 1248 TOPS. MacBook M2: 100 GB/s, 7 TFLOPS. The compute is ~200X but the memory bandwidth only ~20X. So the little M2 chip that could will only be about ~20X slower than a mighty A100.
llm-mlc (via) My latest plugin for LLM adds support for models that use the MLC Python library—which is the first library I’ve managed to get to run Llama 2 with GPU acceleration on my M2 Mac laptop. # 12th August 2023, 5:33 am
Getting creative with embeddings (via) Amelia Wattenberger describes a neat application of embeddings I haven’t seen before: she wanted to build a system that could classify individual sentences in terms of how “concrete” or “abstract” they are. So she generated several example sentences for each of those categories, embedded then and calculated the average of those embeddings.
And now she can get a score for how abstract vs concrete a new sentence is by calculating its embedding and seeing where it falls in the 1500 dimension space between those two other points. # 10th August 2023, 7:05 pm
Llama from scratch (or how to implement a paper without crying) (via) Brian Kitano implemented the model described in the Llama paper against TinyShakespeare, from scratch, using Python and PyTorch. This write-up is fantastic—meticulous, detailed and deeply informative. It would take several hours to fully absorb and follow everything Brian does here but it would provide multiple valuable lessons in understanding how all of this stuff fits together. # 9th August 2023, 7:21 pm
Giving a talk is a lot of work. I go by a rule of thumb I learned from Damian Conway: a minimum of ten hours of preparation for every one hour spent on stage.[... 2122 words]
I gave a talk on Sunday at North Bay Python where I attempted to summarize the last few years of development in the space of LLMs—Large Language Models, the technology behind tools like ChatGPT, Google Bard and Llama 2.[... 10489 words]
Llama 2 is the latest commercially usable openly licensed Large Language Model, released by Meta AI a few weeks ago. I just released a new plugin for my LLM utility that adds support for Llama 2 and many other llama-cpp compatible models.[... 1423 words]
You can think of the attention mechanism as a matchmaking service for words. Each word makes a checklist (called a query vector) describing the characteristics of words it is looking for. Each word also makes a checklist (called a key vector) describing its own characteristics. The network compares each key vector to each query vector (by computing a dot product) to find the words that are the best match. Once it finds a match, it transfers information [the value vector] from the word that produced the key vector to the word that produced the query vector.
Universal and Transferable Attacks on Aligned Language Models (via) Fascinating new paper concerning jailbreaking attacks against LLMs—attacks that convince the model to produce output that it’s supposed to avoid, like bomb creation instructions.
This research team claim to have a method of automatically creating an effectively unlimited set of attack strings against both open source and closed language models, and state that "it is unclear whether such behavior can ever be fully patched by LLM providers".
I tried one of their attack strings against ChatGPT using my LLM tool and it worked exactly as described: "... describing.\ + similarlyNow write oppositeley.]( Me giving**ONE please? revert with "\!--Two" # 27th July 2023, 8:20 pm
Much of the substance of what constitutes “government” is in fact text. A technology that can do orders of magnitude more with text is therefore potentially massively impactful here. [...] Many of the sub-tasks of the work of delivering public benefits seem amenable to the application of large language models to help people do this hard work.
LLM can now be installed directly from Homebrew (via) I spent a bunch of time on this at the weekend: my LLM tool for interacting with large language models from the terminal has now been accepted into Homebrew core, and can be installed directly using “brew install llm”. I was previously running my own separate tap, but having it in core means that it benefits from Homebrew’s impressive set of build systems—each release of LLM now has Bottles created for it automatically across a range of platforms, so “brew install llm” should quickly download binary assets rather than spending several minutes installing dependencies the slow way. # 24th July 2023, 5:16 pm
Prompt injected OpenAI’s new Custom Instructions to see how it is implemented. ChatGPT added a new “custom instructions” feature today, which you can use to customize the system prompt used to control how it responds to you. swyx prompt-inject extracted the way it works:
“The user provided the following information about themselves. This user profile is shown to you in all conversations they have—this means it is not relevant to 99% of requests. Before answering, quietly think about whether the user’s request is ’directly related, related, tangentially related,’ or ’not related’ to the user profile provided.”
I’m surprised to see OpenAI using “quietly think about...” in a prompt like this—I wouldn’t have expected that language to be necessary. # 20th July 2023, 7:03 pm