Quotations
Filters: Sorted by date
The reason current models are so large is because we're still being very wasteful during training - we're asking them to memorize the internet and, remarkably, they do and can e.g. recite SHA hashes of common numbers, or recall really esoteric facts. (Actually LLMs are really good at memorization, qualitatively a lot better than humans, sometimes needing just a single update to remember a lot of detail for a long time). But imagine if you were going to be tested, closed book, on reciting arbitrary passages of the internet given the first few words. This is the standard (pre)training objective for models today. The reason doing better is hard is because demonstrations of thinking are "entangled" with knowledge, in the training data.
Therefore, the models have to first get larger before they can get smaller, because we need their (automated) help to refactor and mold the training data into ideal, synthetic formats.
It's a staircase of improvement - of one model helping to generate the training data for next, until we're left with "perfect training set". When you train GPT-2 on it, it will be a really strong / smart model by today's standards. Maybe the MMLU will be a bit lower because it won't remember all of its chemistry perfectly.
Update, July 12: This innovation sparked a lot of conversation and questions that have no answers yet. We look forward to continuing to work with our customers on the responsible use of AI, but will not further pursue digital workers in the product.
— Lattice, HR platform
OpenAI and Anthropic focused on building models and not worrying about products. For example, it took 6 months for OpenAI to bother to release a ChatGPT iOS app and 8 months for an Android app!
Google and Microsoft shoved AI into everything in a panicked race, without thinking about which products would actually benefit from AI and how they should be integrated.
Both groups of companies forgot the “make something people want” mantra. The generality of LLMs allowed developers to fool themselves into thinking that they were exempt from the need to find a product-market fit, as if prompting is a replacement for carefully designed products or features. [...]
But things are changing. OpenAI and Anthropic seem to be transitioning from research labs focused on a speculative future to something resembling regular product companies. If you take all the human-interest elements out of the OpenAI boardroom drama, it was fundamentally about the company's shift from creating gods to building products.
We've doubled the max output token limit for Claude 3.5 Sonnet from 4096 to 8192 in the Anthropic API.
Just add the header
"anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15"to your API calls.
So much of knowledge/intelligence involves translating ideas between fields (domains). Those domains are walls the keep ideas siloed. But LLMs can help break those walls down and encourage humans to do more interdisciplinary thinking, which may lead to faster discoveries.
And note that I am implying that humans will make the breakthroughs, using LLMs as translation tools when appropriate, to help make connections. LLMs are strongest as translators of information that you provide. BYOD: Bring your own data!
My architecture is a monolith written in Go (this is intentional, I sacrificed scalability to improve my shipping speed), and this is where SQLite shines. With a DB located on the local NVMe disk, a 5$ VPS can deliver a whopping 60K reads and 20K writes per second.
We respect wildlife in the wilderness because we’re in their house. We don’t fully understand the complexity of most ecosystems, so we seek to minimize our impact on those ecosystems since we can’t always predict what outcomes our interactions with nature might have.
In software, many disastrous mistakes stem from not understanding why a system was built the way it was, but changing it anyway. It’s super common for a new leader to come in, see something they see as “useless”, and get rid of it – without understanding the implications. Good leaders make sure they understand before they mess around.
Add tests in a commit before the fix. They should pass, showing the behavior before your change. Then, the commit with your change will update the tests. The diff between these commits represents the change in behavior. This helps the author test their tests (I've written tests thinking they covered the relevant case but didn't), the reviewer to more precisely see the change in behavior and comment on it, and the wider community to understand what the PR description is about.
— Ed Page
Third, X fails to provide access to its public data to researchers in line with the conditions set out in the DSA. In particular, X prohibits eligible researchers from independently accessing its public data, such as by scraping, as stated in its terms of service. In addition, X's process to grant eligible researchers access to its application programming interface (API) appears to dissuade researchers from carrying out their research projects or leave them with no other choice than to pay disproportionally high fees.
Fighting bots is fighting humans [...] remind you that "only allow humans to access" is just not an achievable goal. Any attempt at limiting bot access will inevitably allow some bots through and prevent some humans from accessing the site, and it's about deciding where you want to set the cutoff. I fear that media outlets and other websites, in attempting to "protect" their material from AI scrapers, will go too far in the anti-human direction.
[On Paddington 3] If this movie is anywhere near as good as the second one, we are going to need to have an extremely serious conversation about this being one of the greatest film trilogies ever made.
My main concern is that the substantial cost to develop and run Al technology means that Al applications must solve extremely complex and important problems for enterprises to earn an appropriate return on investment.
We estimate that the Al infrastructure buildout will cost over $1tn in the next several years alone, which includes spending on data centers, utilities, and applications. So, the crucial question is: What $1tn problem will Al solve? Replacing low-wage jobs with tremendously costly technology is basically the polar opposite of the prior technology transitions I've witnessed in my thirty years of closely following the tech industry.
— Jim Covello, Goldman Sachs
Yeah, unfortunately vision prompting has been a tough nut to crack. We've found it's very challenging to improve Claude's actual "vision" through just text prompts, but we can of course improve its reasoning and thought process once it extracts info from an image.
In general, I think vision is still in its early days, although 3.5 Sonnet is noticeably better than older models.
— Alex Albert, Anthropic
Content slop has three important characteristics. The first being that, to the user, the viewer, the customer, it feels worthless. This might be because it was clearly generated in bulk by a machine or because of how much of that particular content is being created. The next important feature of slop is that feels forced upon us, whether by a corporation or an algorithm. It’s in the name. We’re the little piggies and it’s the gruel in the trough. But the last feature is the most crucial. It not only feels worthless and ubiquitous, it also feels optimized to be so. The Charli XCX “Brat summer” meme does not feel like slop, nor does Kendrick Lamar’s extremely long “Not Like Us” roll out. But Taylor Swift’s cascade of alternate versions of her songs does. The jury’s still out on Sabrina Carpenter. Similarly, last summer’s Barbenheimer phenomenon did not, to me, feel like slop. Dune: Part Two didn’t either. But Deadpool & Wolverine, at least in the marketing, definitely does.
Chrome's biggest innovation was the short release cycle with a silent unceremonious autoupdate.
When updates were big, rare, and manual, buggy and outdated browsers were lingering for soo long, that we were giving bugs names. We documented the bugs in magazines and books, as if they were a timeless foundation of WebDev.
Nowadays browser vendors can fix bugs in 6 weeks (even Safari can…). New-ish stuff is still buggy, but rarely for long enough for the bugs to make it to schools' curriculums.
Inside the labs we have these capable models, and they're not that far ahead from what the public has access to for free. And that's a completely different trajectory for bringing technology into the world that what we've seen historically. It's a great opportunity because it brings people along. It gives them intuitive sense for the capabilities and risks and allows people to prepare for the advent of bringing advanced AI into the world.
Someone elsewhere left a comment like "I CAN’T BELIEVE IT TOOK HER 15 YEARS TO LEARN BASIC READLINE COMMANDS". those comments are very silly and I'm going to keep writing “it took me 15 years to learn this basic thing" forever because I think it's important for people to know that it's normal to take a long time to learn “basic" things
Voters in the Clapham and Brixton Hill constituency can rest easy - despite appearances, their Reform candidate Mark Matlock really does exist. [...] Matlock - based in the South Cotswolds, some 100 miles from the constituency in which he is standing - confirmed: "I am a real person." Although his campaign image is Al-generated, he said this was for lack of a real photo of him wearing a tie in Reform's trademark turquoise.
Product teams that are smart are getting off the treadmill. Whatever framework you currently have, start investing in getting to know it deeply. Learn the tools until they are not an impediment to your progress. That’s the only option. Replacing it with a shiny new tool is a trap. [...]
Companies that want to reduce the cost of their frontend tech becoming obsoleted so often should be looking to get back to fundamentals. Your teams should be working closer to the web platform with a lot less complex abstractions. We need to relearn what the web is capable of and go back to that.
The expansion of the jagged frontier of AI capability is subtle and requires a lot of experience with various models to understand what they can, and can’t, do. That is why I suggest that people and organizations keep an “impossibility list” - things that their experiments have shown that AI can definitely not do today but which it can almost do. For example, no AI can create a satisfying puzzle or mystery for you to solve, but they are getting closer. When AI models are updated, test them on your impossibility list to see if they can now do these impossible tasks.
If you own the tracks between San Francisco and Los Angeles, you likely have some kind of monopolistic pricing power, because there can only be so many tracks laid between place A and place B. In the case of GPU data centers, there is much less pricing power. GPU computing is increasingly turning into a commodity, metered per hour. Unlike the CPU cloud, which became an oligopoly, new entrants building dedicated AI clouds continue to flood the market. Without a monopoly or oligopoly, high fixed cost + low marginal cost businesses almost always see prices competed down to marginal cost (e.g., airlines).
So VisiCalc came and went, but the software genre it pioneered – the spreadsheet – endured to become arguably the most influential type of code ever written, at least in the sense of touching the lives of millions of office workers. I’ve never worked in an organisation in which spreadsheet software was not at the heart of most accounting, budgeting and planning activities. I’ve even known professionals for whom it’s the only piece of PC software they’ve ever used: one elderly accountant of my acquaintance, for example, used Excel even for his correspondence; he simply widened column A to 80 characters, typed his text in descending cells and hit the “print” key.
I like the lies-to-children motif, because it underlies the way we run our society and resonates nicely with Discworld. Like the reason for Unseen being a storehouse of knowledge - you arrive knowing everything and leave realising that you know practically nothing, therefore all the knowledge you had must be stored in the university. But it's like that in "real Science", too. You arrive with your sparkling A-levels all agleam, and the first job of the tutors is to reveal that what you thought was true is only true for a given value of "truth".
Most of us need just "enough" knowledge of the sciences, and it's delivered to us in metaphors and analogies that bite us in the bum if we think they're the same as the truth.
When presented with a difficult task, I ask myself: “what if I didn’t do this at all?”. Most of the time, this is a stupid question, and I have to do the thing. But ~5% of the time, I realize that I can completely skip some work.
Absolutely any time I try to explore something even slightly against commonly accepted beliefs, LLMs always just rehash the commonly accepted beliefs.
As a researcher, I find this behaviour worse than unhelpful. It gives the mistaken impression that there's nothing to explore.
We argued that ChatGPT is not designed to produce true utterances; rather, it is designed to produce text which is indistinguishable from the text produced by humans. It is aimed at being convincing rather than accurate. The basic architecture of these models reveals this: they are designed to come up with a likely continuation of a string of text. It’s reasonable to assume that one way of being a likely continuation of a text is by being true; if humans are roughly more accurate than chance, true sentences will be more likely than false ones. This might make the chatbot more accurate than chance, but it does not give the chatbot any intention to convey truths. This is similar to standard cases of human bullshitters, who don’t care whether their utterances are true; good bullshit often contains some degree of truth, that’s part of what makes it convincing.
What Apple unveiled last week with Apple Intelligence wasn't so much new products, but new features—a slew of them—for existing products, powered by generative AI.
[...] These aren't new apps or new products. They're the most used, most important apps Apple makes, the core apps that define the Apple platforms ecosystem, and Apple is using generative AI to make them better and more useful—without, in any way, rendering them unfamiliar.
For some reason, many people still believe that browsers need to include non-standard hacks in HTML parsing to display the web correctly.
In reality, the HTML parsing spec is exhaustively detailed. If you implement it as described, you will have a web-compatible parser.
The people who are most confident AI can replace writers are the ones who think writing is typing.
In our “who validates the validators” user studies, we found that people expected—and also desired—for the LLM to learn from any human interaction. That too, “as efficiently as possible” (ie after 1-2 demonstrations, the LLM should “get it”)