Abstract:
In recent years, artificial neural networks (ANNs) have been extensively used in various
fields. Among them, backpropagation (BP) networks appear to be most popular and have been
widely used in many applications such as forecasting and classification. To predict the future
outcome values with an acceptable level of accuracy, a BP network has to be trained with a
large sample of historical data that have been collected over a given time period. The BP
network will then learn to generalize and extrapolate from new data to predict the future
outcomes. However, there have been several problems encountered. One is how to determine
the appropriate structure of the network for a particular problem. Another problem is its slow
convergence (and no convergence in some cases), so that many iterations are required to train
even a simple network.
The structure of the network seriously affects the performance of the network model.
As the network becomes more complex, the training time will increase. Therefore, the network
should be kept as simple as possible. As the number of nodes in the input and output layers are
application-dependent, the remaining problems are how to optimally choose the number of
hidden layers and the number of hidden nodes. For many applications, they are determined by
trial-and-error. Generally, when the number of parameters (the number of weights and biases)
increases, the mean squared error (MSE) should be reduced. Therefore, it is difficult to
determine the best network model by using only MSE. Instead, the Bayesian Information
Criterion (BIC) was proposed in this study to select the best model from the candidate models
having different numbers of parameters. It should be noted that the BIC penalizes the model for
having more parameters and therefore tends to result in a smaller model. A new stopping rule
was proposed to systematically determine the appropriate network structure using a procedure
that gradually increases the network complexity until the current value of BIC is greater than
the previous one or the decrease in the value of BIC becomes very small.
Two new algorithms were devised to speed up the convergence of BP networks:
The first proposed algorithm was obtained by applying the adaptive neural model with the
temperature momentum term to the Kalman filter (KF) with the momentum term.
For advanced refinement, the nonlinear neural network problem can be partitioned into the
nonlinear part in the weights of the hidden layers and the linear part in the weights of the
output layer. By employing the conjugate gradient method for the nonlinear part and the KF
algorithm for the linear part, we arrived at the second proposed algorithm. After the
weights of the hidden layers are obtained by using the conjugate gradient method, the
weights of the output layer (in a linear problem) are readily solved by KF. The partition
allows the nonlinear and linear parts of the search to be conducted in a reduced dimensional
space, resulting in acceleration of the training process. Consequently, the second proposed
algorithm can greatly improve the convergence speed. 4
From simulation experiments with three data sets; namely, daily streamflow (rainfallrunoff)
data, quarterly data on exports and gross domestic product (GDP) of Thailand, and
daily data on stock prices in Thai market, it was found that the BIC and these algorithms could
perform satisfactorily in all cases considered.
The BIC criterion and the two algorithms were introduced without any conditions.
Consequently, they should be generally applicable to any type of data.
Keywords: backpropagation networks, forecasting, network structure, convergence rate,
Bayesian Information Criterion, Kalman filter, adaptive neural model, conjugate gradient