I cannot understand the duty of this function. Is it like a lookup table? Which means to return the parameters corresponding to each id (in ids)?
For instance, in the skip-gram model if we use tf.nn.embedding_lookup(embeddings, train_inputs), then for each train_input it finds the correspond embedding?
embedding_lookup function retrieves rows of the params tensor. The behavior is similar to using indexing with arrays in numpy. E.g.
matrix = np.random.random([1024, 64]) # 64-dimensional embeddings
ids = np.array([0, 5, 17, 33])
print matrix[ids] # prints a matrix of shape [4, 64]
params argument can be also a list of tensors in which case the ids will be distributed among the tensors. For example, given a list of 3 tensors [2, 64], the default behavior is that they will represent ids: [0, 3], [1, 4], [2, 5].
partition_strategy controls the way how the ids are distributed among the list. The partitioning is useful for larger scale problems when the matrix might be too large to keep in one piece.
In such a case, the indexes, specified in ids, correspond to elements of tensors according to a partition strategy, where the default partition strategy is ‘mod’.
In the ‘mod’ strategy, index 0 corresponds to the first element of the first tensor in the list. Index 1 corresponds to the first element of the second tensor. Index 2 corresponds to the first element of the third tensor, and so on. Simply index i corresponds to the first element of the (i+1)th tensor , for all the indexes 0..(n-1), assuming params is a list of n tensors.
Now, index n cannot correspond to tensor n+1, because the list params contains only n tensors. So index n corresponds to the second element of the first tensor. Similarly, index n+1 corresponds to the second element of the second tensor, etc.
一个简单的嵌入矩阵(形状vocabulary_size x embedding_dimension:)如下所示。(即每个单词将由一个数字向量表示;因此,名称为word2vec)
嵌入矩阵
the 0.4180.24968-0.412420.12170.34527-0.044457-0.49688-0.17862
like 0.368080.20834-0.223190.0462830.200980.27515-0.77127-0.76804
between 0.75030.71623-0.270330.20059-0.170080.68568-0.061672-0.054638
did 0.042523-0.211720.044739-0.192480.262240.0043991-0.881950.55184
just 0.176980.0652210.28548-0.42430.7499-0.14892-0.667860.11788
national -1.11050.94945-0.170780.93037-0.2477-0.70633-0.8649-0.56118
day 0.116260.53897-0.39514-0.260270.57706-0.79198-0.883740.30119
country -0.135310.15485-0.073090.034013-0.054457-0.20541-0.60086-0.22407
under 0.13721-0.295-0.05916-0.592350.023010.21884-0.34254-0.70213
such 0.610120.33512-0.534990.36139-0.398660.70627-0.18699-0.77246
second -0.298090.280690.0871020.544550.700030.44778-0.725650.62309
In[54]:from collections importOrderedDict# embedding as TF tensor (for now constant; could be tf.Variable() during training)In[55]: tf_embedding = tf.constant(emb, dtype=tf.float32)# input for which we need the embeddingIn[56]: input_str ="like the country"# build index based on our `vocabulary`In[57]: word_to_idx =OrderedDict({w:vocab.index(w)for w in input_str.split()if w in vocab})# lookup in embedding matrix & return the vectors for the input wordsIn[58]: tf.nn.embedding_lookup(tf_embedding, list(word_to_idx.values())).eval()Out[58]:
array([[0.36807999,0.20834,-0.22318999,0.046283,0.20097999,0.27515,-0.77126998,-0.76804],[0.41800001,0.24968,-0.41242,0.1217,0.34527001,-0.044457,-0.49687999,-0.17862],[-0.13530999,0.15485001,-0.07309,0.034013,-0.054457,-0.20541,-0.60086,-0.22407]], dtype=float32)
Yes, the purpose of tf.nn.embedding_lookup() function is to perform a lookup in the embedding matrix and return the embeddings (or in simple terms the vector representation) of words.
A simple embedding matrix (of shape: vocabulary_size x embedding_dimension) would look like below. (i.e. each word will be represented by a vector of numbers; hence the name word2vec)
Embedding Matrix
the 0.418 0.24968 -0.41242 0.1217 0.34527 -0.044457 -0.49688 -0.17862
like 0.36808 0.20834 -0.22319 0.046283 0.20098 0.27515 -0.77127 -0.76804
between 0.7503 0.71623 -0.27033 0.20059 -0.17008 0.68568 -0.061672 -0.054638
did 0.042523 -0.21172 0.044739 -0.19248 0.26224 0.0043991 -0.88195 0.55184
just 0.17698 0.065221 0.28548 -0.4243 0.7499 -0.14892 -0.66786 0.11788
national -1.1105 0.94945 -0.17078 0.93037 -0.2477 -0.70633 -0.8649 -0.56118
day 0.11626 0.53897 -0.39514 -0.26027 0.57706 -0.79198 -0.88374 0.30119
country -0.13531 0.15485 -0.07309 0.034013 -0.054457 -0.20541 -0.60086 -0.22407
under 0.13721 -0.295 -0.05916 -0.59235 0.02301 0.21884 -0.34254 -0.70213
such 0.61012 0.33512 -0.53499 0.36139 -0.39866 0.70627 -0.18699 -0.77246
second -0.29809 0.28069 0.087102 0.54455 0.70003 0.44778 -0.72565 0.62309
I split the above embedding matrix and loaded only the words in vocab which will be our vocabulary and the corresponding vectors in emb array.
Now we will see how can we perform embedding lookup for some arbitrary input sentence.
In [54]: from collections import OrderedDict
# embedding as TF tensor (for now constant; could be tf.Variable() during training)
In [55]: tf_embedding = tf.constant(emb, dtype=tf.float32)
# input for which we need the embedding
In [56]: input_str = "like the country"
# build index based on our `vocabulary`
In [57]: word_to_idx = OrderedDict({w:vocab.index(w) for w in input_str.split() if w in vocab})
# lookup in embedding matrix & return the vectors for the input words
In [58]: tf.nn.embedding_lookup(tf_embedding, list(word_to_idx.values())).eval()
Out[58]:
array([[ 0.36807999, 0.20834 , -0.22318999, 0.046283 , 0.20097999,
0.27515 , -0.77126998, -0.76804 ],
[ 0.41800001, 0.24968 , -0.41242 , 0.1217 , 0.34527001,
-0.044457 , -0.49687999, -0.17862 ],
[-0.13530999, 0.15485001, -0.07309 , 0.034013 , -0.054457 ,
-0.20541 , -0.60086 , -0.22407 ]], dtype=float32)
Observe how we got the embeddings from our original embedding matrix (with words) using the indices of words in our vocabulary.
Usually, such an embedding lookup is performed by the first layer (called Embedding layer) which then passes these embeddings to RNN/LSTM/GRU layers for further processing.
Side Note: Usually the vocabulary will also have a special unk token. So, if a token from our input sentence is not present in our vocabulary, then the index corresponding to unk will be looked up in the embedding matrix.
P.S. Note that embedding_dimension is a hyperparameter that one has to tune for their application but popular models like Word2Vec and GloVe uses 300 dimension vector for representing each word.
Here’s an image depicting the process of embedding lookup.
Concisely, it gets the corresponding rows of a embedding layer, specified by a list of IDs and provide that as a tensor. It is achieved through the following process.
Define a placeholder lookup_ids = tf.placeholder([10])
Define a embedding layer embeddings = tf.Variable([100,10],...)
Define the tensorflow operation embed_lookup = tf.embedding_lookup(embeddings, lookup_ids)
Get the results by running lookup = session.run(embed_lookup, feed_dict={lookup_ids:[95,4,14]})
When the params tensor is in high dimensions, the ids only refers to top dimension. Maybe it’s obvious to most of people but I have to run the following code to understand that:
embeddings = tf.constant([[[1,1],[2,2],[3,3],[4,4]],[[11,11],[12,12],[13,13],[14,14]],
[[21,21],[22,22],[23,23],[24,24]]])
ids=tf.constant([0,2,1])
embed = tf.nn.embedding_lookup(embeddings, ids, partition_strategy='div')
with tf.Session() as session:
result = session.run(embed)
print (result)
Just trying the ‘div’ strategy and for one tensor, it makes no difference.
Since I was also intrigued by this function, I’ll give my two cents.
The way I see it in the 2D case is just as a matrix multiplication (it’s easy to generalize to other dimensions).
Consider a vocabulary with N symbols.
Then, you can represent a symbol x as a vector of dimensions Nx1, one-hot-encoded.
But you want a representation of this symbol not as a vector of Nx1, but as one with dimensions Mx1, called y.
So, to transform x into y, you can use and embedding matrix E, with dimensions MxN:
y = Ex.
This is essentially what tf.nn.embedding_lookup(params, ids, …) is doing, with the nuance that ids are just one number that represents the position of the 1 in the one-hot-encoded vector x.
Adding to Asher Stern’s answer,
params is
interpreted as a partitioning of a large embedding tensor. It can be a single tensor representing the complete embedding tensor,
or a list of X tensors all of same shape except for the first dimension,
representing sharded embedding tensors.
The function tf.nn.embedding_lookup is written considering the fact that embedding (params) will be large. Therefore we need partition_strategy.