Recurrent Neural Networks (RNN)

Introduction

Recurrent Neural Networks (RNNs) are one of the most popular neural network architecture. RNNs have been shown to provide good prediction performance in a variety of domains ranging from computer vision, natural language processing and smart cities.

We provide a brief overview of RNN in this article. Figures 1 and 2 show the folded and unfolded versions of a single layer RNN, respectively, with the unfolded structure of the RNN showing the calculation done at each time step t. In these figures, Xt = xt-2 , xt-1, xt and Yt = yt-2 , yt-1, yt are the input and corresponding output vector respectively, ht is the hidden layer, and Wxh , Whh and Why are the weight matrices. The hidden layer ht serves as memory and is calculated using the previous hidden state ht-1 and the input xt . At each time step t, the hidden state of the RNN is given by,

ht = φ(ht-1, xt)

where, φ is any non-linear activation function. The weight matrices are used for transforming the input to the output via the hidden layers.

In a standard RNN, the nodes (the building blocks of a neural network architecture) are usually composed of basic activation functions such as tanh and sigmoid. Since RNN weights are learned by backpropagating errors through the network, the use of these activation functions can cause RNNs to suffer from the vanishing/exploding gradient problem, that causes the gradient to have either infinitesimally low or high values, respectively. This problem hinders RNN’s ability to learn long-term dependencies. To circumvent this problem, LSTM and GRU cells were proposed; they create paths through time with derivatives that do not vanish or explode by incorporating the ability to “forget”. We next describe the details on an LSTM cell.

Long Short-Term Memory (LSTM)

The LSTM cell consists of three gates namely, the input gate, the output gate, and the forget gate that lets it handle long-term dependencies. Figure 3 shows the basic structure of a single LSTM cell. LSTM recurrent networks have an LSTM cell that has an internal recurrence (referred to as a self-loop in Figure 3). Note that this is in addition to the outer recurrence of the RNN.

Fig. 3: Illustration of LSTM cell architecture. gt, ft and qt are the input, forget, and output gates, respectively

Each cell has the same inputs and outputs as a node in an ordinary recurrent network, but also has more parameters and a system of gating units that controls the flow of information. The most important component is the state unit sⁱt that captures the internal state of LSTM cell i, which has a linear self-loop and a self-loop weight, which is given by,

where bi, U, and W denote the bias, input weights, and recurrent weights, respectively. The self-loop weight is controlled by a forget gate unit, which controls the dependence of the current state sⁱt on historical states sⁱt-1. fⁱt is set to a value between 0 and 1 via a sigmoid unit as shown below.

where xt is the current input vector and ht is the current hidden layer vector. bf, Uf, and Wf refer to the bias, input weights, and recurrent weights for the forget gate. j denotes the cells feeding into i and ht-1 corresponds to their output.