Neural Semantic Encoders
The NSE is made up of three neural networks-
- the read layer,
- the write layer
- and the compose layer.
They operate on the encoding memory to perform the namesake operations. The encoding memory is different form the neural network parameters. It is of dimension M ∈ Rk × l where k is the hidden dimension of the representations stored and l is the sequence length. Note: this is a single matrix maintained for all the data points in the set. The memory is initialized by the embedding vectors of the input sequence and is evolved over time. In some cases it is found as expected that the read module produces a key or points to a memory slot same as the input word, in this case they chose the second most probable memory slot as the associatively coherent word.
Important list of vectors -
- xt ∈ Rk this is the input to the LSTM (word embedding)
- ot ∈ Rk this is the hidden state (output) produced by the LSTM
- zt ∈ Rl this is the key vector shared between the read and write module
the read module neural network ot = frLSTM(xt)
the key vector zt is produced by attending over the memory at the previous time step, indicated by the equation zt = softmax(otTMt-1) Note: this is essentially a dot product happening between the hidden state and the memory slots to find the most coherently associated memory slot
Sine the key vector is fuzzy the final answer is got by taking a weighted sum of the key vector with all the memory slots, this is indicated by the equation mr,t = ztTMt-1 This can also be viewed as a soft attention mechanism
The next equations compose and write to the memory slot previously read