Simon Willison’s Weblog

Subscribe

Friday, 28th July 2023

You can think of the attention mechanism as a matchmaking service for words. Each word makes a checklist (called a query vector) describing the characteristics of words it is looking for. Each word also makes a checklist (called a key vector) describing its own characteristics. The network compares each key vector to each query vector (by computing a dot product) to find the words that are the best match. Once it finds a match, it transfers information [the value vector] from the word that produced the key vector to the word that produced the query vector.

Timothy B Lee and Sean Trott

# 11:30 am / llms, ai, generative-ai

2023 » July

MTWTFSS
     12
3456789
10111213141516
17181920212223
24252627282930
31