Binary vector embeddings are so cool (via) Evan Schwartz:
Vector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression and ~25x retrieval speedup.
It's so unintuitive how well this trick works: take a vector of 1024x4 byte floating point numbers (4096 bytes = 32,768 bits), turn that into an array of single bits for > 0 or <= 0 which reduces it to just 1024 bits or 128 bytes - a 1/32 reduction.
Now you can compare vectors using a simple Hamming distance - a count of the number of bits that differ - and yet still get embedding similarity scores that are only around 10% less accurate than if you had used the much larger floating point numbers.
Evan digs into models that this works for, which include OpenAI's text-embedding-3-large
and the small but powerful all-MiniLM-L6-v2
.
Recent articles
- Storing times for human events - 27th November 2024
- Ask questions of SQLite databases and CSV/JSON files in your terminal - 25th November 2024
- Weeknotes: asynchronous LLMs, synchronous embeddings, and I kind of started a podcast - 22nd November 2024