Deep Learning

Recurrent Neural Networks (RNN)

Introduction

Recurrent Neural Networks (RNNs) are one of the most popular neural network architecture. RNNs have been shown to provide good prediction performance in a variety of domains ranging from computer vision, natural language processing and smart cities.

We provide a brief overview of RNN in this article. Figures 1 and 2 show the folded and  unfolded versions of a single layer RNN, respectively, with the unfolded structure of the RNN showing the calculation done at each time step t. In these figures, Xt = xt-2 xt-1xt  and Ytyt-2 , yt-1yt   are the   input and corresponding output vector respectively, h is the hidden layer, and Wxh , Whh and Why are the weight matrices. The hidden layer ht serves as memory and is calculated using the previous hidden state ht-1  and the input x.  At each time step t, the hidden state of the RNN is given by,

h= φ(ht-1, xt)

where, φ is any non-linear activation function. The weight matrices are used for transforming the input to the output via the hidden layers. 

Figure 1: RNN Folded
Figure 2: RNN Folded

In a standard RNN, the nodes (the building blocks of a neural network architecture) are usually composed of basic activation functions such as tanh and sigmoid. Since RNN weights are learned by backpropagating errors through the network, the use of these activation functions can cause RNNs to suffer from the vanishing/exploding gradient problem, that causes the gradient to have either infinitesimally low or high values, respectively. This problem hinders RNN’s ability to learn long-term dependencies. To circumvent this problem, LSTM and GRU cells were proposed; they create paths through time with derivatives that do not vanish or explode by incorporating the ability to “forget”.  We next describe the details on an LSTM cell.

Long Short-Term Memory (LSTM)

The LSTM cell  consists of three gates namely,  the input gate,  the output gate, and the forget gate that lets it handle long-term dependencies.  Figure 3 shows the basic structure of a single LSTM cell. LSTM recurrent networks have an LSTM cell that has an internal recurrence (referred to as a self-loop in Figure 3). Note that this is in addition to the outer recurrence of the RNN.

Fig. 3: Illustration of LSTM cell architecture. gt, ft and qt are the input, forget, and output gates, respectively

Each cell has the same inputs and outputs as a node in an ordinary recurrent network, but also has more parameters and a system of gating units that controls the flow of information. The most important component is the state unit si that captures the internal state of  LSTM cell i, which has a linear self-loop and a self-loop weight, which is given by,

where biU, and W denote the bias, input weights, and recurrent weights, respectively. The self-loop weight is controlled by a forget gate unit, which controls the dependence of the current state sit on historical states sit-1.  fit is set to a value between 0 and 1 via a sigmoid unit as shown below.

where   xis the current input vector and h is the current hidden layer vector. bf,  Uf, and W refer to the bias, input weights, and recurrent weights for the forget gate. j denotes the cells feeding into i and ht-1 corresponds to their output.

The external gate unit is similar to the forget gate and is given by,

Finally, the output of the LSTM cell hit  and the output gate qit   is given by,