<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: bert</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/bert.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2024-12-31T04:54:50+00:00</updated><author><name>Simon Willison</name></author><entry><title>Quoting Alexis Gallagher</title><link href="https://simonwillison.net/2024/Dec/31/alexis-gallagher/#atom-tag" rel="alternate"/><published>2024-12-31T04:54:50+00:00</published><updated>2024-12-31T04:54:50+00:00</updated><id>https://simonwillison.net/2024/Dec/31/alexis-gallagher/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.answer.ai/posts/2024-12-19-modernbert.html#encoder-only-models"&gt;&lt;p&gt;Basically, a frontier model like OpenAI’s O1 is like a Ferrari SF-23. It’s an obvious triumph of engineering, designed to win races, and that’s why we talk about it. But it takes a special pit crew just to change the tires and you can’t buy one for yourself. In contrast, a BERT model is like a Honda Civic. It’s also an engineering triumph, but more subtly, since it is engineered to be affordable, fuel-efficient, reliable, and extremely useful. And that’s why they’re absolutely everywhere.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.answer.ai/posts/2024-12-19-modernbert.html#encoder-only-models"&gt;Alexis Gallagher&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bert"&gt;bert&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/o1"&gt;o1&lt;/a&gt;&lt;/p&gt;



</summary><category term="bert"/><category term="ai"/><category term="generative-ai"/><category term="llms"/><category term="o1"/></entry><entry><title>Finally, a replacement for BERT: Introducing ModernBERT</title><link href="https://simonwillison.net/2024/Dec/24/modernbert/#atom-tag" rel="alternate"/><published>2024-12-24T06:21:29+00:00</published><updated>2024-12-24T06:21:29+00:00</updated><id>https://simonwillison.net/2024/Dec/24/modernbert/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.answer.ai/posts/2024-12-19-modernbert.html"&gt;Finally, a replacement for BERT: Introducing ModernBERT&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;a href="https://en.wikipedia.org/wiki/BERT_(language_model)"&gt;BERT&lt;/a&gt; was an early language model released by Google in October 2018. Unlike modern LLMs it wasn't designed for generating text. BERT was trained for masked token prediction and was generally applied to problems like Named Entity Recognition or Sentiment Analysis. BERT also wasn't very useful on its own - most applications required you to fine-tune a model on top of it.&lt;/p&gt;
&lt;p&gt;In exploring BERT I decided to try out &lt;a href="https://huggingface.co/dslim/distilbert-NER"&gt;dslim/distilbert-NER&lt;/a&gt;, a popular Named Entity Recognition model fine-tuned on top of DistilBERT (a smaller distilled version of the original BERT model). &lt;a href="https://til.simonwillison.net/llms/bert-ner"&gt;Here are my notes&lt;/a&gt; on running that using &lt;code&gt;uv run&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Jeremy Howard's &lt;a href="https://www.answer.ai/"&gt;Answer.AI&lt;/a&gt; research group, &lt;a href="https://www.lighton.ai/"&gt;LightOn&lt;/a&gt; and friends supported the development of ModernBERT, a brand new BERT-style model that applies many enhancements from the past six years of advances in this space.&lt;/p&gt;
&lt;p&gt;While BERT was trained on 3.3 billion tokens, producing 110 million and 340 million parameter models, ModernBERT trained on 2 trillion tokens, resulting in 140 million and 395 million parameter models. The parameter count hasn't increased much because it's designed to run on lower-end hardware. It has a 8192 token context length, a significant improvement on BERT's 512.&lt;/p&gt;
&lt;p&gt;I was able to run one of the demos from the announcement post using &lt;code&gt;uv run&lt;/code&gt; like this (I'm not sure why I had to use &lt;code&gt;numpy&amp;lt;2.0&lt;/code&gt; but without that I got an error about &lt;code&gt;cannot import name 'ComplexWarning' from 'numpy.core.numeric'&lt;/code&gt;):&lt;/p&gt;
&lt;div class="highlight highlight-source-shell"&gt;&lt;pre&gt;uv run --with &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;numpy&amp;lt;2.0&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; --with torch --with &lt;span class="pl-s"&gt;&lt;span class="pl-pds"&gt;'&lt;/span&gt;git+https://github.com/huggingface/transformers.git&lt;span class="pl-pds"&gt;'&lt;/span&gt;&lt;/span&gt; python&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then this Python:&lt;/p&gt;
&lt;pre&gt;&lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;torch&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;transformers&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;pipeline&lt;/span&gt;
&lt;span class="pl-k"&gt;from&lt;/span&gt; &lt;span class="pl-s1"&gt;pprint&lt;/span&gt; &lt;span class="pl-k"&gt;import&lt;/span&gt; &lt;span class="pl-s1"&gt;pprint&lt;/span&gt;
&lt;span class="pl-s1"&gt;pipe&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;pipeline&lt;/span&gt;(
    &lt;span class="pl-s"&gt;"fill-mask"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;model&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s"&gt;"answerdotai/ModernBERT-base"&lt;/span&gt;,
    &lt;span class="pl-s1"&gt;torch_dtype&lt;/span&gt;&lt;span class="pl-c1"&gt;=&lt;/span&gt;&lt;span class="pl-s1"&gt;torch&lt;/span&gt;.&lt;span class="pl-c1"&gt;bfloat16&lt;/span&gt;,
)
&lt;span class="pl-s1"&gt;input_text&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-s"&gt;"He walked to the [MASK]."&lt;/span&gt;
&lt;span class="pl-s1"&gt;results&lt;/span&gt; &lt;span class="pl-c1"&gt;=&lt;/span&gt; &lt;span class="pl-en"&gt;pipe&lt;/span&gt;(&lt;span class="pl-s1"&gt;input_text&lt;/span&gt;)
&lt;span class="pl-en"&gt;pprint&lt;/span&gt;(&lt;span class="pl-s1"&gt;results&lt;/span&gt;)&lt;/pre&gt;
&lt;p&gt;Which downloaded 573MB to &lt;code&gt;~/.cache/huggingface/hub/models--answerdotai--ModernBERT-base&lt;/code&gt; and output:&lt;/p&gt;
&lt;pre&gt;[{&lt;span class="pl-s"&gt;'score'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.11669921875&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'sequence'&lt;/span&gt;: &lt;span class="pl-s"&gt;'He walked to the door.'&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token'&lt;/span&gt;: &lt;span class="pl-c1"&gt;3369&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token_str'&lt;/span&gt;: &lt;span class="pl-s"&gt;' door'&lt;/span&gt;},
 {&lt;span class="pl-s"&gt;'score'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.037841796875&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'sequence'&lt;/span&gt;: &lt;span class="pl-s"&gt;'He walked to the office.'&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token'&lt;/span&gt;: &lt;span class="pl-c1"&gt;3906&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token_str'&lt;/span&gt;: &lt;span class="pl-s"&gt;' office'&lt;/span&gt;},
 {&lt;span class="pl-s"&gt;'score'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.0277099609375&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'sequence'&lt;/span&gt;: &lt;span class="pl-s"&gt;'He walked to the library.'&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token'&lt;/span&gt;: &lt;span class="pl-c1"&gt;6335&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token_str'&lt;/span&gt;: &lt;span class="pl-s"&gt;' library'&lt;/span&gt;},
 {&lt;span class="pl-s"&gt;'score'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.0216064453125&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'sequence'&lt;/span&gt;: &lt;span class="pl-s"&gt;'He walked to the gate.'&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token'&lt;/span&gt;: &lt;span class="pl-c1"&gt;7394&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token_str'&lt;/span&gt;: &lt;span class="pl-s"&gt;' gate'&lt;/span&gt;},
 {&lt;span class="pl-s"&gt;'score'&lt;/span&gt;: &lt;span class="pl-c1"&gt;0.020263671875&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'sequence'&lt;/span&gt;: &lt;span class="pl-s"&gt;'He walked to the window.'&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token'&lt;/span&gt;: &lt;span class="pl-c1"&gt;3497&lt;/span&gt;,
  &lt;span class="pl-s"&gt;'token_str'&lt;/span&gt;: &lt;span class="pl-s"&gt;' window'&lt;/span&gt;}]&lt;/pre&gt;

&lt;p&gt;I'm looking forward to trying out models that use ModernBERT as their base. The model release is accompanied by a paper (&lt;a href="https://arxiv.org/abs/2412.13663"&gt;Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference&lt;/a&gt;) and &lt;a href="https://huggingface.co/docs/transformers/main/en/model_doc/modernbert"&gt;new documentation&lt;/a&gt; for using it with the Transformers library.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://bsky.app/profile/benjaminwarner.dev/post/3ldur45oz322b"&gt;@benjaminwarner.dev&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bert"&gt;bert&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/nlp"&gt;nlp&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/transformers"&gt;transformers&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/jeremy-howard"&gt;jeremy-howard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/hugging-face"&gt;hugging-face&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/uv"&gt;uv&lt;/a&gt;&lt;/p&gt;



</summary><category term="bert"/><category term="nlp"/><category term="python"/><category term="transformers"/><category term="ai"/><category term="jeremy-howard"/><category term="hugging-face"/><category term="uv"/></entry><entry><title>Quoting Eric Lehman</title><link href="https://simonwillison.net/2024/Feb/11/eric-lehman/#atom-tag" rel="alternate"/><published>2024-02-11T22:59:38+00:00</published><updated>2024-02-11T22:59:38+00:00</updated><id>https://simonwillison.net/2024/Feb/11/eric-lehman/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://www.techemails.com/i/141315424/google-engineer-ai-is-a-serious-risk-to-our-business"&gt;&lt;p&gt;One consideration is that such a deep ML system could well be developed outside of Google-- at Microsoft, Baidu, Yandex, Amazon, Apple, or even a startup. My impression is that the Translate team experienced this. Deep ML reset the translation game; past advantages were sort of wiped out. Fortunately, Google's huge investment in deep ML largely paid off, and we excelled in this new game. Nevertheless, our new ML-based translator was still beaten on benchmarks by a small startup. The risk that Google could similarly be beaten in relevance by another company is highlighted by a startling conclusion from BERT: huge amounts of user feedback can be largely replaced by unsupervised learning from raw text. That could have heavy implications for Google.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://www.techemails.com/i/141315424/google-engineer-ai-is-a-serious-risk-to-our-business"&gt;Eric Lehman&lt;/a&gt;, internal Google email in 2018&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/bert"&gt;bert&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/google"&gt;google&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/machine-learning"&gt;machine-learning&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/translation"&gt;translation&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;&lt;/p&gt;



</summary><category term="bert"/><category term="google"/><category term="machine-learning"/><category term="translation"/><category term="ai"/><category term="generative-ai"/><category term="llms"/></entry></feed>