DeepChannel: Deep Learning based Wireless Channel Prediction
Introduction
Accurately modeling and predicting wireless channel quality variations is essential for a number of networking applications such as scheduling and improved video streaming over 4G LTE networks and bit rate adaptation for improved performance in WiFi networks. In this article, we design DeepChannel, an encoder-decoder based sequence-to-sequence deep learning model that is capable of predicting future wireless signal strength variations based on past signal strength data. We consider two different versions of DeepChannel; the first and second versions use LSTM and GRU as their basic cell structure, respectively. DeepChannel is highly adaptable and can predict future channel conditions for different networks, sampling rates, mobility patterns, and communication standards. We compare the performance (i.e., the root mean squared error, mean absolute error and relative error of future predictions) of DeepChannel with respect to two baselines—i) linear regression, and ii) ARIMA for multiple networks and communication standards. In particular, we consider 4G LTE, WiFi, and Zigbee networks operating under varying levels of user mobility and observe that DeepChannel provides significantly superior performance.
Problem Statement
Several factors such as the environment, user mobility, and communication technology cause sudden variations in the received signal strength, thus posing challenges in developing a generalized framework for this prediction task. In this article, our goal is to design a predictive model, which is capable of accurately predicting received signal strength variations irrespective of mobility pattern, communication standard, and sampling rate. This problem can be modeled as a classic time series prediction problem, where the goal at time T is to predict signal strength variations for k steps into the future (i.e., YT = y1 , y2, …., yk ) based on past signal strength measurement values in a window size of n (i.e., XT = x1 , x2, …., xn).
Data
To demonstrate the widespread applicability of the proposed model, we consider multiple received signal strength measurement datasets collected at the end hosts for different networks—4G LTE and WiFi, WiMAX, and an IoT network. We next describe the network settings, characteristics, and preprocessing steps undertaken for each dataset.
4G LTE Measurements
We collect Reference Signal Received Power (RSRP) measurements using a Motorola G5 smartphone over T-Mobile and AT&T 4G LTE networks in vehicular and pedestrian mobility scenarios. The vehicular and pedestrian mobility traces are approximately 50 and 20 minutes in duration and are collected at a granularity of 1 second. Wireless channel prediction on the seconds’ timescale for improved video streaming over cellular networks.
WiFi Measurements
We collect two datasets containing received signal strength indicator (RSSI) using a Motorola G5 smartphone on a campus WiFi network at sampling rates of 1 and 2 seconds respectively. Each measurement is carried out for approximately 50 minutes amidst pedestrian mobility (indoor and outdoor). Research has demonstrated the need for wireless channel prediction on the seconds’ timescale for designing block-based bit rate adaptation algorithms for WiFi networks.
IoT Measurements
We consider signal strength measurements collected over an IoT networks comprising of sensor nodes operating under Zigbee containing around 2000 samples. The datasets were collected using two sensor nodes communicating with each other over fixed distances of 10m and 15m for a power level of 31 (0 dBm). We fill potential missing values indicative of packet loss with random signal strength values obtained between the smallest recorded RSSI and 10 units below that.
RNN based Encoder-Decoder Model
DeepChannel comprises of an encoder-decoder based sequence-to-sequence deep model as shown in Figure 1. The model consists of two components—an encoder and a decoder, each of which is an RNN. An RNN consists of a network of neural nodes that are organized in layers, with there being directed connections from one layer to the next. At the highest level, the encoder accepts an input sequence x1 , x2, …., xn, which corresponds to the equipment usage in the last n time steps and generates a hidden encoded vector C which encapsulates information for the input sequence. This encoded vector is given as an input to the decoder which generates y1, y2, …., yk, the predicted equipment usage for the next k time steps.
Internally, at each time step t, an RNN consists of a hidden state ht that gets updated based on the input xt and the previous hidden state (i.e., ht-1 ) using some non-linear function f. ht serves as memory and after the entire input sequence is read, the hidden state is the summary C capturing the information of the entire input sequence. This summary C is then used by the decoder to generate the output sequence by predicting the next value yt given the hidden state. We use ReLU activation function after each decoder output to prevent prediction of negative equipment usage values.
In the standard RNN architecture, the neural nodes are usually composed of basic activation functions such as tanh and sigmoid. During the training phase, the weights are learned by the backpropagation algorithm that propagates errors through the network. However, the use of these basic activation functions can cause RNNs to suffer from the vanishing/exploding gradient problem that causes the gradient to have either infinitesimally low or high values, respectively. This prevents RNN from being able to learn long-term dependencies based on the data. To circumvent this problem, LSTM and GRU cells were proposed; they create paths through time with derivatives that do not vanish or explode by incorporating the ability to “forget”. Therefore, we consider two variations of our model (Figure 1) based on the basic cell structure used internally at each multi-layer RNN, namely, an LSTM version and a GRU version. Both LSTM and GRU cells are composed of a number of gated units and they primarily differ in the number of gates and their interconnections. LSTM consists of three gates namely, the input gate, the output gate, and the forget gate that lets it handle long-term dependencies. In comparison, the GRU cell consists of two gates, a reset gate that combines the current input with previous memory and an update gate that determines the percentage of previous state to remember.
Implementation Details
We split the data into two parts— 70% for training and 30% for testing and use TensorFlow for implementing the deep learning model. We train our models on a high computing cluster available at our university. The configuration used on the cluster for all experiments is 4 cores and 8 GB RAM.
At training time, the encoder and decoder are trained jointly using the backpropagation algorithm. Additionally, we investigate three possible training schemes—i) guided, ii) unguided, and iii) curriculum. We observed that unguided training performs the best. In unguided training, the decoder uses the previous predicted output value as an input to the next step of the decoder. One of main benefits of unguided training is that it enables a better exploration of the state space, which results in superior prediction performance at test time. At both training and test times, for a given equipment usage value, we use a sliding window of one step to obtain the input sequences. This ensures that we achieve the maximum overlap of sequences used. We incorporate L2 regularization in our model to minimize overfitting.
Experimental Results
In this section, we demonstrate the superior prediction performance of DeepChannel by comparing it with two baseline approaches, linear regression and ARIMA.
Linear Regression – It is a statistical model that produces the best fit straight line based on the data.
ARIMA – Auto-Regressive Integrated Moving Average, popularly known as ARIMA is a statistical model that comprises of three terms. The first term is the Autoregressive term (AR), the second is the differencing term (I) and the third is the moving average term (MA).
The main metrics used in our evaluation are root mean squared error (RMSE) and mean absolute error (MAE). We present results for RMSE below.
We discuss RMSE results for the 4G LTE network to demonstrate the superior performance of DeepChannel. As the performance results consist of multiple similar looking graphs, we first present results for the 4G LTE network and then discuss other networks. Figures 2 and 3 show the performance of DeepChannel and the baseline approaches for the AT&T 4G LTE networks for pedestrian and vehicular mobility scenarios. We observe from the figures that the LSTM and GRU variants of DeepChannel significantly outperforms the linear regression and ARIMA models in both mobile settings. We observe that in comparison to linear regression and ARIMA, the RMSE values for DeepChannel increase slowly as the number of time steps increases. This means that DeepChannel is able to predict further into the future considerably better than the baseline approaches. Additionally, based on these results and those from all networks, we observe that there is no clear winner between the two versions of DeepChannel, with both variants outperforming one another depending on the network.
Subscribe to Our monthly newsletter for exciting data science news and learnings
Subscribe to our monthly newsletter, DataTrain, our thought train on all things data.