Neural Semantic Encoders

The NSE is made up of three neural networks-

the read layer,
the write layer
and the compose layer.

They operate on the encoding memory to perform the namesake operations. The encoding memory is different form the neural network parameters. It is of dimension M ∈ R^k × l where k is the hidden dimension of the representations stored and l is the sequence length. Note: this is a single matrix maintained for all the data points in the set. The memory is initialized by the embedding vectors of the input sequence and is evolved over time. In some cases it is found as expected that the read module produces a key or points to a memory slot same as the input word, in this case they chose the second most probable memory slot as the associatively coherent word.

Important list of vectors -

x_t ∈ R^k this is the input to the LSTM (word embedding)
o_t ∈ R^k this is the hidden state (output) produced by the LSTM
z_t ∈ R^l this is the key vector shared between the read and write module

the read module neural network o_t = f_r^LSTM(x_t)

the key vector z_t is produced by attending over the memory at the previous time step, indicated by the equation z_t = softmax(o_t^TM_t-1) Note: this is essentially a dot product happening between the hidden state and the memory slots to find the most coherently associated memory slot

Sine the key vector is fuzzy the final answer is got by taking a weighted sum of the key vector with all the memory slots, this is indicated by the equation m_r,t = z_t^TM_t-1 This can also be viewed as a soft attention mechanism

The next equations compose and write to the memory slot previously read

Compose and write equation