3.2.5. Objective function

Encoder Xð Þ≜BLSTMtð Þ X (19)

(20)

<sup>t</sup>¼<sup>1</sup> represents a T-dimensional vector. tanhð Þ: and

<sup>t</sup>¼<sup>1</sup> <sup>¼</sup> <sup>R</sup>∗a<sup>l</sup>�<sup>1</sup> (22)

(23)

Decoderð Þ: ≜ softmax LinB LSTMl ð ð Þ ð Þ: (25)

(26)

(21)

(24)

It is to be noted that the computational complexity of the encoder network is reduced by

<sup>þ</sup> LinBð Þ <sup>h</sup><sup>t</sup>

T t¼1 

<sup>t</sup>¼<sup>1</sup> is replaced in Eq. (16), then LocationAwareð Þ: is represented as

<sup>þ</sup> Linð Þþ <sup>h</sup><sup>t</sup> LinB <sup>f</sup><sup>t</sup>

t¼1 

elt <sup>¼</sup> gTtanh Lin <sup>q</sup><sup>l</sup>�<sup>1</sup>

alt ¼ Softmax ef glt

Linð Þ: represent the hyperbolic tangent activation function and linear layer with learnable

It is an extended version of content-based attention mechanism to deal with the location-aware

alt <sup>¼</sup> softmax ef g<sup>t</sup> <sup>T</sup>

Here, \* denotes 1-D convolution along the input feature axis, t, with the convolution parame-

The decoder network is an RNN that is conditioned on previous output Cl�<sup>1</sup> and hidden vector

<sup>q</sup><sup>l</sup> <sup>¼</sup> LSTMl <sup>r</sup>l; <sup>q</sup><sup>l</sup>�<sup>1</sup>; cl�<sup>1</sup>

r<sup>l</sup> represents the concatenated vector of the letter-wise hidden vector; cl�<sup>1</sup> represents the output

T

ft <sup>T</sup>

 <sup>T</sup> t¼1:

LSTMlð Þ: represents uniconditional LSTM that generates hidden vector q<sup>l</sup> as output:

elt <sup>¼</sup> <sup>ɡ</sup><sup>T</sup>tanh Lin <sup>q</sup><sup>l</sup>�<sup>1</sup>

<sup>q</sup><sup>l</sup>�<sup>1</sup>. LSTM is preferred choice of RNN that represented as follows:

subsampling the outputs [20, 21].

ContentAttentionð Þ: is shown as

matrix parameters, respectively.

attention. If al�<sup>1</sup> ¼ f g al�<sup>1</sup>

3.2.4. Decoder network

follows:

3.2.2. Content-based attention mechanism

30 From Natural to Artificial Intelligence - Algorithms and Applications

g represents a learnable parameter. f g elt

3.2.3. Location-aware attention mechanism

ter, R, to produce the set of T features f <sup>t</sup>

of the previous layer which is taken as input.

T

The objective function of the attention model is computed from the sequence posterior

$$p\_{\textit{attr}}(\mathbf{C}|\mathbf{X}) \approx \prod\_{l=1}^{L} p\left(c\_l | c\_1^\*, \dots, c\_{l-1}^\*, \mathbf{X}\right) \triangleq p\_{\textit{attr}}^\*(\mathbf{C}|\mathbf{X}) \tag{27}$$

where c<sup>∗</sup> <sup>l</sup> represents the ground truth of the previous characters. Attention-based approach is a combination of letter-wise objectives based on multiclass classification with the conditional ground truth history c<sup>∗</sup> <sup>l</sup> , …, c<sup>∗</sup> <sup>l</sup>�<sup>1</sup> in each output <sup>l</sup>.
