Simon Willison’s Weblog

Subscribe

Quotations tagged llms in 2024

Filters: Type: quotation × Year: 2024 × llms × Sorted by date

OpenAI’s monthly revenue hit $300 million in August, up 1,700 percent since the beginning of 2023, and the company expects about $3.7 billion in annual sales this year, according to financial documents reviewed by The New York Times. [...]

The company expects ChatGPT to bring in $2.7 billion in revenue this year, up from $700 million in 2023, with $1 billion coming from other businesses using its technology.

Mike Isaac and Erin Griffith, New York Times, Sep 27th 2024

# 23rd October 2024, 1:20 am / generative-ai, openai, new-york-times, ai, llms

For the same cost and similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses even Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks. Claude 3.5 Haiku is particularly strong on coding tasks. For example, it scores 40.6% on SWE-bench Verified, outperforming many agents using publicly available state-of-the-art models—including the original Claude 3.5 Sonnet and GPT-4o. [...]

Claude 3.5 Haiku will be made available later this month across our first-party API, Amazon Bedrock, and Google Cloud’s Vertex AI—initially as a text-only model and with image input to follow.

Anthropic, pre-announcing Claude 3.5 Haiku

# 22nd October 2024, 5:43 pm / anthropic, claude, generative-ai, ai, llms

I've often been building single-use apps with Claude Artifacts when I'm helping my children learn. For example here's one on visualizing fractions. [...] What's more surprising is that it is far easier to create an app on-demand than searching for an app in the app store that will do what I'm looking for. Searching for kids' learning apps is typically a nails-on-chalkboard painful experience because 95% of them are addictive garbage. And even if I find something usable, it can't match the fact that I can tell Claude what I want.

Arvind Narayanan

# 21st October 2024, 4:12 pm / anthropic, claude, education, ai, llms, claude-artifacts, ai-assisted-programming, generative-ai, arvind-narayanan

Who called it “intellectual property problems around the acquisition of training data for Large Language Models” and not Grand Theft Autocomplete?

Jens Ohlig, on March 8th 2024

# 20th October 2024, 9:02 pm / training-data, llms, ai, generative-ai

A common misconception about Transformers is to believe that they're a sequence-processing architecture. They're not.

They're a set-processing architecture. Transformers are 100% order-agnostic (which was the big innovation compared to RNNs, back in late 2016 -- you compute the full matrix of pairwise token interactions instead of processing one token at a time).

The way you add order awareness in a Transformer is at the feature level. You literally add to your token embeddings a position embedding / encoding that corresponds to its place in a sequence. The architecture itself just treats the input tokens as a set.

François Chollet

# 16th October 2024, 1:18 am / llms, ai, generative-ai, francois-chollet

At first, I struggled to understand why anyone would want to write this way. My dialogue with ChatGPT was frustratingly meandering, as though I were excavating an essay instead of crafting one. But, when I thought about the psychological experience of writing, I began to see the value of the tool. ChatGPT was not generating professional prose all at once, but it was providing starting points: interesting research ideas to explore; mediocre paragraphs that might, with sufficient editing, become usable. For all its inefficiencies, this indirect approach did feel easier than staring at a blank page; “talking” to the chatbot about the article was more fun than toiling in quiet isolation. In the long run, I wasn’t saving time: I still needed to look up facts and write sentences in my own voice. But my exchanges seemed to reduce the maximum mental effort demanded of me.

Cal Newport

# 3rd October 2024, 7:43 pm / writing, generative-ai, chatgpt, ai, llms

I listened to the whole 15-minute podcast this morning. It was, indeed, surprisingly effective. It remains somewhere in the uncanny valley, but not at all in a creepy way. Just more in a “this is a bit vapid and phony” way. [...] But ultimately the conversation has all the flavor of a bowl of unseasoned white rice.

John Gruber

# 30th September 2024, 6:56 pm / llms, generative-ai, notebooklm, ai, john-gruber, podcasts

In the future, we won't need programmers; just people who can describe to a computer precisely what they want it to do.

Jason Gorman

# 29th September 2024, 8:21 pm / ai-assisted-programming, llms, ai, generative-ai

o1 prompting is alien to me. Its thinking, gloriously effective at times, is also dreamlike and unamenable to advice.

Just say what you want and pray. Any notes on “how” will be followed with the diligence of a brilliant intern on ketamine.

Riley Goodside

# 16th September 2024, 5:28 pm / riley-goodside, o1, prompt-engineering, generative-ai, openai, ai, llms

[… OpenAI’s o1] could work its way to a correct (and well-written) solution if provided a lot of hints and prodding, but did not generate the key conceptual ideas on its own, and did make some non-trivial mistakes. The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student. However, this was an improvement over previous models, whose capability was closer to an actually incompetent graduate student.

Terrence Tao

# 15th September 2024, 12:04 am / o1, generative-ai, openai, mathematics, ai, llms

It's a bit sad and confusing that LLMs ("Large Language Models") have little to do with language; It's just historical. They are highly general purpose technology for statistical modeling of token streams. A better name would be Autoregressive Transformers or something.

They don't care if the tokens happen to represent little text chunks. It could just as well be little image patches, audio chunks, action choices, molecules, or whatever. If you can reduce your problem to that of modeling token streams (for any arbitrary vocabulary of some set of discrete tokens), you can "throw an LLM at it".

Andrej Karpathy

# 14th September 2024, 7:50 pm / andrej-karpathy, llms, ai, generative-ai

Believe it or not, the name Strawberry does not come from the “How many r’s are in strawberry” meme. We just chose a random word. As far as we know it was a complete coincidence.

Noam Brown, OpenAI

# 13th September 2024, 11:35 am / o1, generative-ai, openai, ai, llms

o1-mini is the most surprising research result I've seen in the past year

Obviously I cannot spill the secret, but a small model getting >60% on AIME math competition is so good that it's hard to believe

Jason Wei, OpenAI

# 12th September 2024, 11:45 pm / o1, generative-ai, openai, ai, llms, llm-reasoning

Telling the AI to "make it better" after getting a result is just a folk method of getting an LLM to do Chain of Thought, which is why it works so well.

Ethan Mollick

# 10th September 2024, 3:12 pm / prompt-engineering, ethan-mollick, generative-ai, ai, llms

history | tail -n 2000 | llm -s "Write aliases for my zshrc based on my terminal history. Only do this for most common features. Don't use any specific files or directories."

anjor

# 3rd September 2024, 3:01 pm / llm, llms, ai, generative-ai

I think that AI has killed, or is about to kill, pretty much every single modifier we want to put in front of the word “developer.”

“.NET developer”? Meaningless. Copilot, Cursor, etc can get anyone conversant enough with .NET to be productive in an afternoon … as long as you’ve done enough other programming that you know what to prompt.

Forrest Brazeal

# 31st August 2024, 12:52 pm / ai-assisted-programming, llms, ai, generative-ai, forrest-brazeal

We have recently trained our first 100M token context model: LTM-2-mini. 100M tokens equals ~10 million lines of code or ~750 novels.

For each decoded token, LTM-2-mini's sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B for a 100M token context window.

The contrast in memory requirements is even larger -- running Llama 3.1 405B with a 100M token context requires 638 H100s per user just to store a single 100M token KV cache. In contrast, LTM requires a small fraction of a single H100's HBM per user for the same context.

Magic AI

# 30th August 2024, 3:39 am / llms, ai, generative-ai

We've read and heard that you'd appreciate more transparency as to when changes, if any, are made. We've also heard feedback that some users are finding Claude's responses are less helpful than usual. Our initial investigation does not show any widespread issues. We'd also like to confirm that we've made no changes to the 3.5 Sonnet model or inference pipeline.

Alex Albert

# 26th August 2024, 8:44 pm / claude-3-5-sonnet, alex-albert, anthropic, claude, generative-ai, ai, llms

[...] here’s what we found when we integrated [Amazon Q, GenAI assistant for software development] into our internal systems and applied it to our needed Java upgrades:

  • The average time to upgrade an application to Java 17 plummeted from what’s typically 50 developer-days to just a few hours. We estimate this has saved us the equivalent of 4,500 developer-years of work (yes, that number is crazy but, real).
  • In under six months, we've been able to upgrade more than 50% of our production Java systems to modernized Java versions at a fraction of the usual time and effort. And, our developers shipped 79% of the auto-generated code reviews without any additional changes.

Andy Jassy, Amazon CEO

# 24th August 2024, 4:25 am / ai-assisted-programming, amazon, generative-ai, ai, llms, java

With statistical learning based systems, perfect accuracy is intrinsically hard to achieve. If you think about the success stories of machine learning, like ad targeting or fraud detection or, more recently, weather forecasting, perfect accuracy isn't the goal --- as long as the system is better than the state of the art, it is useful. Even in medical diagnosis and other healthcare applications, we tolerate a lot of error.

But when developers put AI in consumer products, people expect it to behave like software, which means that it needs to work deterministically.

Arvind Narayanan and Sayash Kapoor

# 19th August 2024, 11:04 pm / llms, ai, generative-ai, arvind-narayanan

Examples are the #1 thing I recommend people use in their prompts because they work so well. The problem is that adding tons of examples increases your API costs and latency. Prompt caching fixes this. You can now add tons of examples to every prompt and create an alternative to a model finetuned on your task with basically zero cost/latency increase. […]

This works even better with smaller models. You can generate tons of examples (test case + solution) with 3.5 Sonnet and then use those examples to create a few-shot prompt for Haiku.

Alex Albert

# 15th August 2024, 6:09 pm / claude-3-5-sonnet, alex-albert, prompt-engineering, anthropic, claude, ai, llms, prompt-caching

But [LLM assisted programming] does make me wonder whether the adoption of these tools will lead to a form of de-skilling. Not even that programmers will be less skilled, but that the job will drift from the perception and dynamics of a skilled trade to an unskilled trade, with the attendant change - decrease - in pay. Instead of hiring a team of engineers who try to write something of quality and try to load the mental model of what they're building into their heads, companies will just hire a lot of prompt engineers and, who knows, generate 5 versions of the application and A/B test them all across their users.

Tom MacWright

# 12th August 2024, 8:17 pm / ai-assisted-programming, generative-ai, ai, tom-macwright, llms

Some argue that by aggregating knowledge drawn from human experience, LLMs aren’t sources of creativity, as the moniker “generative” implies, but rather purveyors of mediocrity. Yes and no. There really are very few genuinely novel ideas and methods, and I don’t expect LLMs to produce them. Most creative acts, though, entail novel recombinations of known ideas and methods. Because LLMs radically boost our ability to do that, they are amplifiers of — not threats to — human creativity.

Jon Udell

# 10th August 2024, 5:57 pm / llms, jon-udell, ai, generative-ai

The RM [Reward Model] we train for LLMs is just a vibe check […] It gives high scores to the kinds of assistant responses that human raters statistically seem to like. It's not the "actual" objective of correctly solving problems, it's a proxy objective of what looks good to humans. Second, you can't even run RLHF for too long because your model quickly learns to respond in ways that game the reward model. […]

No production-grade actual RL on an LLM has so far been convincingly achieved and demonstrated in an open domain, at scale. And intuitively, this is because getting actual rewards (i.e. the equivalent of win the game) is really difficult in the open-ended problem solving tasks. […] But how do you give an objective reward for summarizing an article? Or answering a slightly ambiguous question about some pip install issue? Or telling a joke? Or re-writing some Java code to Python?

Andrej Karpathy

# 8th August 2024, 8:13 am / andrej-karpathy, llms, ai, generative-ai

[On release notes] in our partial defense, training these models can be more discovery than invention. often we don't exactly know what will come out.

we've long wanted to do release notes that describe each model's differences, but we also don't want to give false confidence with a shallow story.

Ted Sanders, OpenAI

# 3rd August 2024, 8:09 pm / openai, llms, ai, generative-ai

When Noam and Daniel started Character.AI, our goal of personalized superintelligence required a full stack approach. We had to pre-train models, post-train them to power the experiences that make Character.AI special, and build a product platform with the ability to reach users globally. Over the past two years, however, the landscape has shifted – many more pre-trained models are now available. Given these changes, we see an advantage in making greater use of third-party LLMs alongside our own. This allows us to devote even more resources to post-training and creating new product experiences for our growing user base.

Character.AI

# 2nd August 2024, 9:07 pm / llms, ai, fine-tuning, generative-ai

The [Apple Foundation Model] pre-training dataset consists of a diverse and high quality data mixture. This includes data we have licensed from publishers, curated publicly-available or open-sourced datasets, and publicly available information crawled by our web-crawler, Applebot. We respect the right of webpages to opt out of being crawled by Applebot, using standard robots.txt directives.

Given our focus on protecting user privacy, we note that no private Apple user data is included in the data mixture. Additionally, extensive efforts have been made to exclude profanity, unsafe material, and personally identifiable information from publicly available data (see Section 7 for more details). Rigorous decontamination is also performed against many common evaluation benchmarks.

We find that data quality, much more so than quantity, is the key determining factor of downstream model performance.

Apple Intelligence Foundation Language Models, PDF

# 29th July 2024, 10:39 pm / apple, generative-ai, training-data, ai, llms, apple-intelligence

Among many misunderstandings, [users] expect the RAG system to work like a search engine, not as a flawed, forgetful analyst. They will not do the work that you expect them to do in order to verify documents and ground truth. They will not expect the AI to try to persuade them.

Ethan Mollick

# 27th July 2024, 1:46 am / ethan-mollick, generative-ai, ai, rag, llms

Our estimate of OpenAI’s $4 billion in inference costs comes from a person with knowledge of the cluster of servers OpenAI rents from Microsoft. That cluster has the equivalent of 350,000 Nvidia A100 chips, this person said. About 290,000 of those chips, or more than 80% of the cluster, were powering ChartGPT, this person said.

Amir Efrati and Aaron Holmes

# 25th July 2024, 9:35 pm / generative-ai, openai, chatgpt, ai, llms

One interesting observation is the impact of environmental factors on training performance at scale. For Llama 3 405B , we noted a diurnal 1-2% throughput variation based on time-of-day. This fluctuation is the result of higher mid-day temperatures impacting GPU dynamic voltage and frequency scaling.

During training, tens of thousands of GPUs may increase or decrease power consumption at the same time, for example, due to all GPUs waiting for checkpointing or collective communications to finish, or the startup or shutdown of the entire training job. When this happens, it can result in instant fluctuations of power consumption across the data center on the order of tens of megawatts, stretching the limits of the power grid. This is an ongoing challenge for us as we scale training for future, even larger Llama models.

The Llama 3 Herd of Models

# 23rd July 2024, 11:26 pm / meta, generative-ai, llama, ai, llms, gpus