Stochastics and StatisticsAdaptive neural network model for time-series forecasting W.K. Wong a, * , Min Xia a,b , W.C. Chu a a Institute of Textiles and Clothing, The Hong Kong Polytechnic University, Hong Kong b College of Information Science and Technology, Donghua University, Shanghai, China a r t i c l e i n f o Article history: Received 27 January 2010 Accepted 14 May 2010 Available online 1 June 2010 Keywords: Time-series Forecasting Adaptive metrics Neural networks a b s t r a c t In this study, a novel adaptive neural network (ADNN) with the adaptive metrics of inputs and a new mechanism for admixture of outputs is proposed for time-series prediction. The adaptive metrics of inputs can solve the problems of amplitude changing and trend determination, and avoid the over-fitting of networks. The new mechanism for admixture of outputs can adjust forecasting results by the relative error and make them more accurate. The proposed ADNN method can predict periodical time-series with a complicated structure. The experimental results show that the proposed model outperforms the auto- regression (AR), artificial neural network (ANN), and adaptive k-nearest neighbors (AKN) models. The ADNN model is proved to benefit from the merits of the ANN and the AKN through its’ novel structure with high robustness particularly for both chaotic and real time-series predictions. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction Many planning activities require prediction of the behavior of variables (e.g. economic, financial, traffic and physical). The predic- tions support the strategic decisions of organizations (Makridakis, 1996), which in turn sustain a practical interest in forecasting methods. Time-series methods are generally used to model fore- casting systems when there is not much information about the generation process of the underlying variable and when other vari- ables provide no clear explanation about the studied variable (Zhang, 2003). Time-series forecasting is used to forecast the future based on historical observations (Makridakis et al., 1998). There have been many approaches to modeling time-series dependent on the the- ory or assumption about the relationship in the data (Huarng and Yu, 2006; Chen and Hwang, 2000; Taylor and Buizza, 2002; Kim and Kim, 1997; Zhang et al., 1998; Wang and Chien, 2006; Singh and Deo, 2007). Traditional methods, such as time-series regression, exponential smoothing and autoregressive integrated moving average (Brooks, 2002) (ARIMA), are based on linear mod- els. All these methods assume linear relationships among the past values of the forecast variable and therefore non-linear patterns cannot be captured by these models. One problem that makes developing and implementing this type of time-series model diffi- cult is that the model must be specified and a probability distribu- tion for data must be assumed (Hansen et al., 2002). Approximation of linear models to complex real-world problems is not always satisfactory. Recently, artificial neural networks (ANN) have been proposed as a promising alternative to time-series forecasting. A large num- ber of successful applications have shown that neural networks can be a very useful tool for time-series modeling and forecasting (Adya and Collopy, 1998; Zhang et al., 1998; Celik and Karatepe, 2007; Wang and Chien, 2006; Sahoo and Ray, 2006; Singh and Deo, 2007; Barbounis and Teocharis, 2007; Bodyanskiy and Popov, 2006; Freitas and Rodrigues, 2006). The reason is that the ANN is a universal function approximator which is capable of mapping any linear or non-linear functions (Cybenko, 1989; Funahashi, 1989). Neural networks are basically a data-driven method with few pri- ori assumptions about underlying models. Instead they let data speak for themselves and have the capability to identify the under- lying functional relationship among the data. In addition, the ANN is capable of tolerating the presence of chaotic components and thus is better than most methods (Masters, 1995). This capacity is particularly important, as many relevant time-series possess sig- nificant chaotic components. However, since the neural network lacks a systematic proce- dure for model-building, the forecasting result is not always accu- rate when the input data is very different from the training data. Like other flexible non-linear estimation methods such as kernel regression and smoothing splines, the ANN may suffer either under-fitting or over-fitting (Moody, 1992; Geman et al., 1992; Bartlett, 1997). A network that is not sufficiently complex can fail to fully detect the signal in a complicated data set and lead to under-fitting. A network that is too complex may fit not only the signal but also the noise and lead to over-fitting. Over-fitting is especially misleading because it can easily lead to wild prediction far beyond the range of the training data even with the noise-free data. In order to solve this problem, a novel ANN model is proposed 0377-2217/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2010.05.022 * Corresponding author. Tel.: +00852 64300917. E-mail address:
[email protected] (W.K. Wong). European Journal of Operational Research 207 (2010) 807–816 Contents lists available at ScienceDirect European Journal of Operational Research j our nal homepage: www. el sevi er . com/ l ocat e/ ej or in this study with the adaptive metrics of inputs, and the output data is evolved by a mechanism for admixture. The adaptive met- rics of inputs of the model can adapt to local variations of trends and amplitudes. Most inputs of the network are close to the histor- ical data in order to avoid a dramatic increase in the forecasting er- ror due to the big difference between training data and input data. In using the proposed mechanism for admixture of outputs, the forecasting result can be adjusted by the relative error, making the forecasting result more accurate. The forecasting results generated by the proposed model are compared with those obtained by the traditional statistical AR model, traditional ANN architectures (BP network), and adaptive k-nearest neighbors (AKN) method (Kulesh et al., 2008) in the re- lated literature. The experimental results indicate that the pro- posed model outperforms the other models, especially in chaotic and real data time-series predictions. This paper is organized as follows. In the next section, the fun- damental principle of the proposed method is introduced. The experimental results are presented in Section 3. The last section concludes this study. 2. Methodology We focus on one-step-ahead point forecasting in this work. Let y 1 , y 2 , y 3 , . . . , y t be a time-series. At time t for t P1, the next value y t+1 will be predicted based on the observed realizations of y t , y tÀ1 , y tÀ2 , . . . , y 1 . 2.1. The ANN approach to time-series modeling The ANN is a flexible computing framework for a broad range of non-linear problems (Wong et al., 2000). The network model is greatly determined by data characteristics. A single hidden-layer feed-forward network is the most widely used model for time-ser- ies modeling and forecasting (Zhang and Qi, 2005). The model is characterized by a network of three layers of simple processing units connected by acyclic links. The hidden layers can capture the non-linear relationship among variables. Each layer consists of multiple neurons that are connected to neurons in adjacent layers. The relationship between the output y t+1 and the inputs y t , y tÀ1 , y tÀ2 , . . . , y tÀp+1 has the following mathematical representation: y tþ1 ¼ a 0 þ q j¼1 a j g b 0j þ p i¼1 b ij y tÀiþ1 _ _ þe; ð1Þ where a j (j = 0, 1, 2, . . . , q) and b ij (i = 0, 1, 2, . . . , p; j = 1, 2, 3, . . . , q) are the model parameters called connection weights, p is the number of input nodes and q is the number of hidden nodes. The logistic function is often used as the hidden-layer transfer function, which is, gðxÞ ¼ 1 1 þ e Àx : ð2Þ A neural network can be trained by the historical data of a time- series in order to capture the characteristics of this time-series. The model parameters (connection weights and node biases) can be ad- justed iteratively by the process of minimizing the forecasting er- rors (Liu et al., 1995). 2.2. Adaptive neural network model for forecasting (ADNN) It is well known that the ANN may suffer either under-fitting or over-fitting (Moody, 1992; Geman et al., 1992; Bartlett, 1997). A network that is not sufficiently complex can fail to fully detect the signal leads to under-fitting. Over-fitting generally occurs when a model is excessively complex. A model which has been un- der-fitting or over-fitting will generally have poor predictive per- formance, as it can exaggerate minor fluctuations in the data. For these two problems, the over-fitting is more important when the signal data is sufficient and the network is sufficiently complex. Thus, in this paper we emphasize on the problem of over-fitting for the ANN. Generally, the ANN algorithm is said to over-fitting rel- ative to a simpler one if it is more accurate in fitting known data (hindsight) but less accurate in predicting new data (foresight). In order to avoid over-fitting, the adaptive neural network model is proposed. In this model, the hindsight data is used to modify the inputs of the ANN in the prediction processing making the in- puts approach to the learning data. Thus, this algorithm can reduce the chance of over-fitting. Based on the current ANN, an extension is done to develop the adaptive neural network (ADNN) model for time-series forecasting. Firstly, a strategy is used to initialize the input data y t , y tÀ1 , y tÀ2 , . . . , y tÀm+1 , where m is the number of input nodes. The strategy adopts the adaptive metrics which are similar to the adaptive k-nearest neighbor method. The data set y t , y tÀ1 , y tÀ2 , . . . , y tÀm+1 is compared with the other parts of this time-series, which have the same length. The determination of the closeness measure is the major factor in prediction accuracy. Closeness is usually defined in terms of metric distance on the Euclidean space. The most common choices are the Minkowski metrics: L M ðY t ; Y r Þ ¼ ðjy t À y r j d þ jy tÀ1 À y rÀ1 j d þ Á Á Á þ jy tÀmþ1 À y rÀmþ1 j d Þ 1 d : ð3Þ This equation gives the value difference between Y t and Y r , but the differences of trends and amplitudes are not presented. In time-series forecasting, the information on trends and amplitudes is the crucial factor. In this study, adaptive metrics are introduced to solve this problem and the arithmetic is presented as: L A ðY t ; Y r Þ ¼ min kr ;ur f r ðk r ; u r Þ; ð4Þ f r ðk r ; u r Þ ¼ ðjy t À k r y r À u r j d þ jy tÀ1 À k r y rÀ1 À u r j d þ Á Á Á þ jy tÀmþ1 À k r y rÀmþ1 À u r j d Þ 1 d ; ð5Þ where h r and l r are the largest and smallest elements of vector cor- respondingly, k r 2 1; hr lr _ _ , u r 2 [0, h r À l r ]. The parameter of minimi- zation k r equilibrates the amplitude difference between Y t and Y r . The parameter u r is responsible for the trend of time-series. The optimization problem (4) can be solved by the algorithm of Levenberg–Marquardt (Press et al., 1992) optimization or other gradient methods for d P1. In this study, d is assumed to be 2 and gives the widely used Euclidean metrics. f r ðk r ; u r Þ ¼ ðjy t À k r y r À u r j 2 þ jy tÀ1 À k r y rÀ1 À u r j 2 þ Á Á Á þ jy tÀmþ1 À k r y rÀmþ1 À u r j 2 Þ 1 2 : ð6Þ For d = 2, two equations are considered: @fr ðkr ;ur Þ @kr ¼ 0; @fr ðkr ;ur Þ @ur ¼ 0: _ ð7Þ When the corresponding linear system is solved, the solution of the minimization problem can be obtained analytically: u r ¼ z 1 z 2 À z 3 z 4 mz 2 À z 2 3 ; k r ¼ mz 4 À z 1 z 3 mz 2 À z 2 3 ; where 808 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807–816 z 1 ¼ m i¼1 y tÀiþ1 ; z 2 ¼ m i¼1 y 2 rÀiþ1 ; z 3 ¼ m i¼1 y rÀiþ1 ; z 4 ¼ m i¼1 y rÀiþ1 y tÀiþ1 : Based on this strategy, the adaptive k-nearest neighbors are chosen. And the input vector of the first network (known as the main network) can be defined as: input v ¼ q v t ; q v tÀ1 ; . . . ; q v tÀpþ1 _ _ ¼ y t À u rv k rv ; y tÀ1 À u rv k rv ; . . . ; y tÀpþ1 À u rv k rv _ _ : ð8Þ Most input values can be close to the historical data using this method. The forecasting error increases dramatically due to the big difference between training data and input data. In order to get more accurate results for time-series y t , y tÀ1 , y tÀ2 , . . . , y tÀp+1 , k sets of inputs are used and the output vector are output v = b v , v = 1, 2, . . . , k. Due to the different value of L A ðY t ; Y rv Þ; k rv ; u rv , and jt À r v j for v = 1, 2, . . . , k, the forecasting result is affected. The rela- tive error is used to measure the impact of L A ðY t ; Y rv Þ; k rv ; u rv , and jt À r v j. The relative error is defined as RE v ¼ y rv À~ yrv y rv , where y rv is the source point and ~ y rv is the predicted point. In this study, the second neural network (known as modified network) is used to find out the relationship between the four factors and the RE v . The estimated result of RE v is RE v , which is presented as follows: RE v ¼ f ðL A ðY t ; Y rv Þ; k rv ; u rv ; jt À r v jÞ: ð9Þ The mechanism for admixture of outputs is presented as follows: y tþ1 ¼ 1 U k v¼1 ðk rv b v þ u rv Þe À ¯ REv ; ð10Þ U ¼ k v¼1 e À ¯ REv : ð11Þ From Eq. (10), the forecasting result of y t+1 is calculated from b v v = 1, 2, . . . , k with different weighing coefficients. Based on the methodology proposed above, the forecasting scheme can be for- mulated as shown in Fig. 1. The steps of the algorithm of the proposed method are given as follows. Step 1: Train the two neural networks using the historical data. In the first neural network, y i , y iÀ1 , y iÀ2 , . . . , y iÀm+1 are the input training data, and y i+1 is the output training data. In the second neural network, L A ðY t ; Y rv Þ; k rv ; u rv , and jt À r v j are the training inputs, and the relative error RE v is the training output. BP algorithm is used to train these two neural networks. Step 2: Compare the data set y t , y tÀ1 , y tÀ2 , . . . , y tÀm+1 and other parts of the time-series using the adaptive metrics dis- tance on the Euclidean space based on Eq. (6). Step 3: Choose the k-nearest neighbors and get k rv ; u rv based on Eq. (7). Initialize the input data of the first neural network according to Eq. (8). The input data of the first neural net- work are q v i ; q v iÀ1 ; q v iÀ2 ; . . . ; q v iÀmþ1 ; v ¼ 1; 2; . . . ; k. Step 4: Apply the first neural network and obtain the results of the output v = b v , v = 1, 2, . . . , k. Step 5: Use L A ðY t ; Y rv Þ; k rv ; u rv , and j t À r v j to predict the relative error RE v , the number of hidden neurons is 5 for all simu- lations in the second neural network. Step 6: Apply the mechanism for admixture of Eq. (10) and obtain the forecasting result of y t+1 . 3. Numerical simulations Since the auto-regression model (AR), the traditional Back Prop- agation (BP) ANN architectures and the adaptive k-nearest neigh- bors method (AKN) are the popular methods for forecasting programme, the performance of the proposed model is bench- marked against Zhang’s AR model (Zhang, 2003), Adya’s ANN model (Adya and Collopy, 1998), and Kulesh’s AKN model (Kulesh et al., 2008) reported in the literature. To illustrate the accuracy of the method, several deterministic and chaotic time-series are gener- ated and predicted, and three real time-series are considered in this section. The Mean Absolute Percentage Error (MAPE) statistic is used to evaluate the forecasting performance of the model. The MAPE is regarded as one of the standard statistical performance measures and takes the following form MAPE ¼ 1 M M i¼1 y i À ~ y i y i ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ Á 100%; where y i is the source point, ~ y i is the predicted point and M is the number of predicted points. The mean squared error (NMSE) is used as the error criterion, which is the ratio of the mean squared error to the variance of the time-series. It defined, for a time-series y i , by NMSE ¼ M i¼1 ðy i À ~ y i Þ 2 M i¼1 ðy i À ^ y i Þ 2 ¼ M i¼1 ðy i À ~ y i Þ 2 Mr 2 ; ^ y i ¼ 1 M M i¼1 y i ; Fig. 1. The forecasting scheme for adaptive neural network modeling. W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807–816 809 where ^ y i is the mean value of the source data and r 2 is the variance of the source data. In the simulations, for both AKN method and ADNN method, the number of nearest neighbors is set as k, and the data length for comparison is set as m. For each simulation, we use the AR(m) for forecasting. The number of input nodes and the number of hid- den nodes for ANN method are set the same as the first neural net- work of ADNN, and the parameters setting for all simulations are shown in Table 1. 3.1. Deterministic synthetic examples In this section, the proposed method is tested on three deter- ministic time-series with obvious seasonal dependence, trend and amplitude change. The corresponding MAPE values and NMSE values of the predicted time-series are listed in Tables 2–4 respec- tively. In order to investigate the generalization ability for the pro- posed model, the different noise terms are added into the time series. For the time-series y t , y tÀ1 , y tÀ2 , . . . , y 1 , the noise term is r t , r tÀ1 , r tÀ2 , . . . , r 1 . Thus, the training inputs are y i + r i , y iÀ1 + r iÀ1 , y iÀ3 + r iÀ3 , . . . , y iÀm+1 + r iÀm+1 , and the training output is y i+1 + r i+1 . In this study, the noise terms are generated by normal distribution. The first synthetic time-series is a seasonal dependence series with linear increasing as shown in Fig. 2a. The equation of the sea- sonal dependence series is: SðtÞ ¼ cos t 25 _ _ sin t 100 _ _ þ t 1000 þ1; t 2 ½0; 2200; where t denotes time. For this time-series data set, the first 2200 lengths are used for training, and the last 200 source time-series lengths for prediction. The parameters k and m are set as 2 and 100 respectively. The numbers of the hidden neurons is 16. For this Table 1 Parameters setting for simulations. Types of time-series k First network’s hidden nodes of ADNN m Seasonal dependence time-series 2 16 100 Multiplicative seasonality time-series 2 8 15 High-frequency time-series 3 10 70 Duffing chaotic time-series 2 8 50 Mackey–Glass chaotic time-series 3 6 14 Ikeda map chaotic time-series 3 5 28 Sunspot time-series 3 6 11 Traffic time-series 3 12 180 Payments time-series 3 8 30 Table 2 Prediction summary for seasonal dependence time-series. Noise AKN AR ANN ADNN MAPE NMSE MAPE NMSE MAPE (%) NMSE MAPE NMSE 0 4.83 Â 10 À7 6.35 Â 10 À11 3.43 Â 10 À17 2.53 Â 10 À30 0.02 1.16 Â 10 À10 3 Â 10 À3 1.3 Â 10 À10 0.001 0.83% 7.34 Â 10 À5 0.84% 1.23 Â 10 À4 2.95 1.8 Â 10 À3 1.05% 1.31 Â 10 À4 0.005 2.4% 6.08 Â 10 À4 2.1083% 8.53 Â 10 À4 4.32 4.1 Â 10 À3 2.86% 1.34 Â 10 À3 0.01 2.65% 7.45 Â 10 À4 2.55% 2.1 Â 10 À4 4.85 3.1 Â 10 À4 3.49% 1.21 Â 10 À3 0.02 3.31% 9.81 Â 10 À4 3.19% 2.8 Â 10 À4 5.21 5.2 Â 10 À3 3.87% 2.6 Â 10 À3 0.03 4.74% 2.1 Â 10 À3 4.18% 3.2 Â 10 À3 5.55 7.1 Â 10 À3 4.31% 3.3 Â 10 À3 0.04 4.92% 2.4 Â 10 À3 5.12% 3.9 Â 10 À3 6.44 9.2 Â 10 À3 5.49% 4.4 Â 10 À3 0.05 5.68% 2.9 Â 10 À3 5.32% 4.8 Â 10 À3 8.06 1.21 Â 10 À2 6.31% 5.2 Â 10 À3 Table 3 Prediction summary for multiplicative seasonality series. Noise AKN AR ANN ADNN MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE 0 1.15 1.42 Â 10 À4 1.18 1.88 Â 10 À6 13.81 0.09 1.17 6.72 Â 10 À4 0.001 3.61 1.51 Â 10 À3 3.05 2.7 Â 10 À3 15.69 0.12 4.27 3.9 Â 10 À3 0.005 4.76 1.89 Â 10 À3 7.29 9.7 Â 10 À3 17.01 0.15 6.11 1.1 Â 10 À2 0.01 5.6 6.3 Â 10 À3 8.43 1.89 Â 10 À2 19.31 0.18 9.13 1.72 Â 10 À2 0.02 7.47 8.3 Â 10 À3 12.46 3.17 Â 10 À2 22.44 0.22 12.95 5.58 Â 10 À2 0.03 8.92 1.88 Â 10 À2 13.43 5.31 Â 10 À2 28.75 0.41 13.67 7.3 Â 10 À2 0.04 10.56 2.12 Â 10 À2 14.49 7.2 Â 10 À2 34.78 0.52 14.76 8.8 Â 10 À2 0.05 11.85 3.89 Â 10 À2 15.71 8.3 Â 10 À2 35.34 0.64 15.92 9.8 Â 10 À2 Table 4 Prediction summary for high-frequency series. Noise AKN AR ANN ADNN MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE 0 5.52 9.14 Â 10 À4 1.18 3.54 Â 10 À5 1.09 7.01 Â 10 À5 1.17 1.78 Â 10 À4 0.001 7.55 1.6 Â 10 À3 5.36 1.7 Â 10 À3 5.88 1.79 Â 10 À3 5.57 2.18 Â 10 À3 0.005 8.91 5.7 Â 10 À3 6.78 3.1 Â 10 À3 6.22 3.2 Â 10 À3 6.32 3.2 Â 10 À3 0.01 10.98 4.3 Â 10 À3 9.41 4.2 Â 10 À3 8.20 3.9 Â 10 À3 8.24 4.1 Â 10 À3 0.02 14.72 7.3 Â 10 À3 9.96 4.9 Â 10 À3 8.96 4.2 Â 10 À3 9.17 4.5 Â 10 À3 0.03 17.31 8.2 Â 10 À3 13.15 9.6 Â 10 À3 9.47 4.9 Â 10 À3 9.52 4.9 Â 10 À3 0.04 24.43 1.3 Â 10 À2 14.09 1.1 Â 10 À2 9.67 5.3 Â 10 À3 10.16 5.4 Â 10 À3 0.05 32.27 3.12 Â 10 À2 15.38 1.6 Â 10 À2 11.24 6.5 Â 10 À3 12.01 8.3 Â 10 À3 810 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807–816 time series, the amplitude of seasonal components does not change, which means that the optimal value of parameter k rv is equal to 1. In the prediction process, only the parameter u rv , which is respon- sible for a changing trend, should be determined. Fig. 2b illustrates the prediction results with different noise terms using ADNN. In Fig. 2b, the prediction data presented by ’ * ’, ‘h’, and ‘+’ are under the circumstances that the source data added with the noise varia- tion 0, 0.01, and 0.05 respectively. The simulation result shows that the model can predict this time-series very accurately when the source data with no noise term, and the forecasting accuracy is de- creased when the noise variation is increased. The performance of different models is described in Table 2. Table 2 shows that the ADNN has better performance than the ANN model, and has almost the same performance of the AKN. As the synthetic time-series has a feature of strong orderliness, the result of ADNN is not better than that of the AR model. Table 2 also indicates that ADNN model has almost the same noise endurance ability as the AKN and ANN for this time-series. The second time-series, which is non-linear and multiplicative seasonal, is simulated and the results are shown in Fig. 3a. This time-series has a non-linear trend and the amplitude of seasonal oscillations increases with time. The model of the time-series is de- scribed as: SðtÞ ¼ RðtÞ; t 2 ½0; 79; SðtÀsÞ 2 SðtÀ2sÞ ; t 2 ½80; 590; _ where RðtÞ ¼ 1 70000 sin t 350 _ _ cos 9t 7 _ _ þ 10 _ _ , s = 14. Prediction is done for 15% of source time-series length; the first 85% observations are used for training. The parameters k and m are set as 2 and 15 respec- tively. The numbers of the hidden neurons is 8. Fig. 3b illustrates the prediction results with different noise terms using ADNN. Fig. 3b illustrates that the predicted data is almost the same as the source data with no noise term. This time-series has a peculiar- ity in which the amplitude of every following periodic segment is as much a fixed time as the previous segment. The performance for different models is described in Table 3. Table 3 shows that the per- formance of the ADNN is better than that of the ANN, but more close to that of the AKN. Like the first time-series simulation above, the time-series also has a feature of strong orderliness, and the AR mod- el generates the same results as those of the ADNN. Table 3 also indicates that ADNN model has better noise endurance ability than ANN, but almost the same as the AKN for this time-series. The third synthetic time-series is high-frequency with multipli- cative seasonality and smoothly increasing amplitude as shown in Fig. 4a. To formulate this time-series, the following explicit expres- sion is used: SðtÞ ¼ t 100 sin t 2 _ _ þ cos t 20 _ _ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸ ; t 2 ½0; 550: It models a high-frequency series with seasonal periodicity. The prediction is done for 1 11 of the time-series length. The first 10 11 of the source data is used for training. The parameters k and m are set as 3 and 70 respectively. The numbers of the hidden neurons is 10. Fig. 4b shows the prediction results with different noise terms using ADNN. The performance for different models is described in Table 4. Table 4 indicates that the ADNN outperforms the AKN, but has the same performance as the ANN and AR models. Table 4 also indicates that ADNN model has better noise endurance ability than AKN, but almost the same as the ANN for this time-series. In the seasonal dependence series with linear increasing and non-linear multiplicative seasonality, the performance of the ADNN is better than that of the ANN, but the same as that of the AKN. In high-frequency time-series with multiplicative seasonality and smoothly increasing amplitude, the performance of the ADNN is 0 500 1000 1500 2000 0 1 2 3 4 Time D a t a s e r i e s (a) 2000 2050 2100 2150 2200 2 2.5 3 3.5 4 4.5 5 Time D a t a s e r i e s (b) Source data Variation=0 Variation=0.01 Variation=0.05 Fig. 2. (a) Time-series with seasonal dependence (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms using ADNN model. 400 450 500 550 0 20 40 60 80 Time D a t a s e r i e s (a) 560 565 570 575 580 585 590 0 20 40 60 80 Time D a t a s e r i e s (b) Source data Variation=0 Variation=0.01 Variation=0.05 Fig. 3. (a) Time-series with multiplicative seasonality (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms using ADNN model. W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807–816 811 the same as the ANN and AR models, and better than that of the AKN. The above simulation results show that the ADNN model ben- efits from the merits of the ANN and AKN. 3.2. Chaotic time-series In this section, the proposed method is tested on three chaotic time series. The corresponding MAPE values and NMSE values of the predicted time-series are listed in Tables 5–7 respectively. The Duffing-equation chaotic time-series consists of 2050 observations generated by the equation: dy dx ¼ Ày þ x À x 3 þ bcosðatÞ; dy dx ¼ y: _ The results based on this equation are shown in Fig. 5a. For predic- tion, only the horizontal component of this chaotic two-component series is used. We assumed that the time-series consists of the 0 100 200 300 400 500 0 1 2 3 4 5 6 7 Time D a t a s e r i e s (a) 505 510 515 520 525 530 535 540 545 550 0 2 4 6 8 Time D a t a s e r i e s (b) Source data Variation=0 Variation=0.01 Variation=0.05 Fig. 4. (a) High-frequency time-series with multiplicative seasonality (vertical line showing the prediction start) and (b) zoom of predicted values with different noise terms using ADNN model. Table 5 Prediction summary for Duffing chaotic time-series. Noise AKN AR ANN ADNN MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE 0 7.2999 0.0034 0.6483 2.4143 Â 10 À5 0.2806 2.5246 Â 10 À6 0.3361 7.4141 Â 10 À6 0.001 13.4408 0.011 7.8903 0.0042 3.6449 1.1432 Â 10 À3 3.4630 1.1653 Â 10 À3 0.005 15.8850 0.0134 13.4571 0.0211 9.2552 0.0082 8.8457 0.0068 0.01 19.1673 0.0165 13.9801 0.0223 11.9432 0.0098 11.2729 0.0094 0.02 19.6696 0.0143 19.2554 0.0282 15.2521 0.0122 16.7104 0.0163 0.03 23.4675 0.0631 20.6164 0.0397 16.5124 0.0231 17.3692 0.0267 0.04 25.9714 0.0365 21.9570 0.0376 19.2540 0.0215 20.6404 0.0214 0.05 27.2354 0.0461 24.3629 0.0433 21.5172 0.0273 21.7155 0.0273 Table 6 Prediction summary for Mackey–Glass chaotic time-series. Noise AKN AR ANN ADNN MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE 0 1.0484 2.1431 Â 10 À3 0.4941 7.1211 Â 10 À5 0.0373 2.8314 Â 10 À7 0.0655 8.1435 Â 10 À7 0.001 1.5015 3.8461 Â 10 À4 4.2862 0.0031 2.0275 7.7453 Â 10 À4 2.1203 9.6342 Â 10 À4 0.005 3.3865 0.0016 5.6697 0.0061 5.6017 0.0061 6.5236 0.0065 0.01 5.4567 0.0039 6.4944 0.0078 6.9095 0.0082 7.3885 0.0086 0.02 7.1349 0.0067 7.3276 0.0098 8.8958 0.0132 9.0145 0.0172 0.03 9.0888 0.0121 7.8270 0.0110 9.0929 0.0126 10.2763 0.0183 0.04 10.9259 0.0190 9.4304 0.0169 12.1027 0.0324 11.0820 0.0194 0.05 12.7687 0.0287 10.3548 0.0182 13.6413 0.0116 12.3625 0.0213 Table 7 Prediction summary for Ikeda map chaotic time-series. Noise AKN AR ANN ADNN MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE 0 20.5213 0.1231 11.2426 0.0164 9.3819 0.0142 9.1635 0.0114 0.001 21.5745 0.1348 13.3011 0.0193 10.3539 0.0194 9.7751 0.0178 0.005 24.7196 0.1333 17.4031 0.0272 12.8506 0.0183 12.9018 0.0204 0.01 27.3472 0.1408 19.5150 0.0383 14.5355 0.0253 14.9533 0.0281 0.02 26.6362 0.1512 22.9523 0.0577 16.7840 0.0316 16.5926 0.0334 0.03 27.0252 0.1331 24.1169 0.0643 18.6916 0.0298 18.7898 0.0385 0.04 31.5612 0.1612 29.8482 0.0786 21.4366 0.0399 21.2370 0.0465 0.05 34.2396 0.1827 32.5005 0.116 26.3222 0.0796 26.8273 0.0845 812 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807–816 positive values x i P0 and therefore the value x 0 is added to the source data (Fig. 5b). The first 1950 observations are used for train- ing and the remaining observations for testing. The parameters k and m are set as 2 and 50 respectively. The numbers of the hidden neurons is 8. The prediction results with different noise terms using ADNN are shown in Fig. 5c. In Fig. 5c, the simulation result shows that the model can predict this time-series perfectly if the source data with no noise term, and the prediction get worse when the noise variation is increased. The performance of different models is described in Table 5. Table 5 indicates that the ADNN has better performance than the AKN model and AR model, and has almost the same performance of the ANN. Table 5 also indicates that ADNN model has almost the same noise endurance ability as the ANN for this time-series. The Mackey–Glass benchmarks (Casdagli, 1989) are well known for their evaluation of prediction methods. The time-series is gen- erated by the following non-linear differential equation: dxðtÞ dt ¼ ÀbxðtÞ þ axðt ÀsÞ 1 þ x c ðt ÀsÞ : The different values of s generate various degrees of chaos. The behavior is chaotic as s > 16.8 and s = 17, which is commonly seen in the literature. In this study, the parameters are set as a = 0.2, b = 0.1, c = 10, s = 17 as shown in Fig. 6a. According to common practice (Kulesh et al., 2008), the first 1950 values of this series are used for the learning set and the next 100 values for the testing set. The parameters k and m are set as 3 and 14 respectively. The numbers of the hidden neurons is 6. Fig. 6c depicts the prediction results with different noise terms. The performance for different models is described in Table 6. Table 6 indicates that the ADNN has almost the same performance of the ANN and the AKN. Table 6 also indicates that ADNN model has almost the same noise endurance ability as the ANN and AKN for this time-series. The Ikeda map is another chaotic time-series experiment which may be given in terms of mapping the complex plane to itself. The coordinates of the phase space are related to the complex degree of freedom z = x + iy. The mapping is (Makridakis, 1996; Murray, 1993): z nþ1 ¼ p þ Bz n e ðaÀ b 1þjznj 2 Þi ; where p = 1, B = 1, a = 0.4, b = 6. Fig. 7a is generated by a time-series of the Ikeda map, and the series length is 2048. The prediction is done for the translated vertical component y(t) + y 0 for 5% of the time-series length (Fig. 7b). The parameters k and m are set as 3 and 28 respectively. The numbers of the hidden neurons is 5. The prediction results with different noise terms are shown in Fig. 7c. The performance for different models is presented in Table 7. Table 7 indicates that the ADNN model has better prediction result than AKN and AR, but has almost the same prediction performance re- sults as ANN model for this time series. From the simulation above, the results show that the proposed ADNN model can make predictions on a complicated chaotic time- series as well as those by the ANN algorithm of predicting chaotic time-series. The ADNN model reduces the problems of amplitude changing and trend determination. When detrended signals are without amplitude changing, the prediction results are similar to those of the ANN as expected. 3.3. Real time-series In this section, the proposed method is tested on three real data time series. In the real data time-series, there must have many er- rors because of many reasons such as observation error. So, in this part of simulation, we don’t add the noise term into the series. The sunspots dataset (Fig. 8a) is natural and contains the yearly number of dark spots on the sun from 1701 to 2007. The time-ser- ies has a pseudo-period of 10–11 years. It is common practice (McDonnell and Waagen, 1994) to use the data from 1700 to 1920 as a training set and to assess the performance of the model on another set of 1921–1955 (Test 1). The parameters k and m are set as 3 and 11 respectively. The numbers of the hidden neurons is 6. The prediction results are shown in Fig. 8b. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 1 2 3 4 5 6 Time x ( t ) + x 0 (b) −3 −2 −1 0 1 2 3 −4 −2 0 2 4 x(t) y ( t ) (a) 1960 1970 1980 1990 2000 2010 2020 2030 2040 0 1 2 3 4 5 Time x ( t ) + x 0 (c) Source data Variation=0 Variation=0.01 Variation=0.05 Fig. 5. (a) Chaotic time-series based on the solution of Duffing equation, (b) horizontal component of the time-series (vertical line showing the prediction start) and (c) zoom of predicted values with different noise terms using ADNN model. W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807–816 813 The fitting traffic flow data consists of 1689 observations (11 weeks) from the hourly vehicle count for the Monash Freeway, outside Melbourne in Victoria, Australia, beginning in August 1995. A graph of the data is shown in Fig. 9a. The parameters k and m are set as 3 and 180 respectively. The numbers of the hidden neurons is 12. The prediction results are shown in Fig. 9b. The data in Fig. 10a are the payments based on paper docu- ments (filled in and sent to the bank Kulesh et al., 2008). The con- cerned data have appreciable seasonal components with sinusoidal trends. The parameters k and m are set as 3 and 30 respectively. The numbers of the hidden neurons is 8. The prediction results are shown in Fig. 10b. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 2 4 6 8 10 12 14 Time y ( t ) + y 0 (b) −10 −5 0 5 10 −4 −2 0 2 4 6 8 10 12 x(t) y ( t ) (a) 1960 1970 1980 1990 2000 2010 2020 2030 2040 0 2 4 6 8 10 12 14 16 Time x ( t ) + x 0 Source data Variation=0 Variation=0.01 Variation=0.05 Fig. 7. (a) Chaotic time-series based on the Ikeda map, (b) vertical component of the time-series (vertical line showing the prediction start) and (c) zoom of predicted values with different noise terms using ADNN model. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0.4 0.6 0.8 1 1.2 1.4 Time x ( t ) (b) 0.4 0.6 0.8 1 1.2 1.4 0.4 0.6 0.8 1 1.2 1.4 x(t) x ( t − t 0 ) (a) 1960 1970 1980 1990 2000 2010 2020 2030 2040 0.4 0.6 0.8 1 1.2 1.4 1.6 Time x ( t ) (c) Source data Variation=0 Variation=00.1 Variation=0.05 Fig. 6. (a) Chaotic time-series based on the solution of the Mackey–Glass delay differential equation, (b) horizontal component of the time-series (vertical line showing the prediction start) and (c) zoom of predicted values with different noise terms using ADNN model. 814 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807–816 Table 8 summarizes the prediction performance of different prediction models on the three sets of real time-series data men- tioned above. It is obvious that the proposed ADNN method outper- forms the three other models as the proposed method has the advantage of adapting to local variations of trends and amplitudes and solving the problem of over-fitting. Moreover, the neural networks have the flexible non-linear modeling capability. Because of the reasons above, the real time-series prediction made by the 1750 1800 1850 1900 1950 2000 0 50 100 150 200 Time T i m e s e r i e s (a) 220 225 230 235 240 245 250 255 0 20 40 60 80 100 120 140 160 Time D a t a s e r i e s (b) Source data Prediction data Fig. 8. (a) Real sunspots data (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model. 0 200 400 600 800 1000 1200 1400 1600 0 1000 2000 3000 4000 5000 6000 Time T i m e s e r i e s (a) 1560 1580 1600 1620 1640 1660 1680 0 1000 2000 3000 4000 5000 Time T i m e s e r i e s (b) Source data Prediction data Fig. 9. (a) Real traffic flow data (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model. 100 200 300 400 500 3 4 5 6 7 8 9 Time D a t a s e r i e s (a) 505 510 515 520 525 530 535 540 545 550 2 3 4 5 6 7 8 9 Time D a t a s e r i e s (b) Source data Prediction data Fig. 10. (a) Real payment data based on paper documents (vertical line showing the prediction start) and (b) zoom of predicted values using ADNN model. Table 8 Prediction summary for real data time-series. Time-series Sunspot time-series Traffic time-series Payments time-series MAPE (%) NMSE MAPE (%) NMSE MAPE (%) NMSE ADNN 28.45 0.068 14.31 0.0193 8.08 0.0109 ANN 30.8 0.078 17.97 0.0267 15.24 0.0274 AR 31.2 0.0852 26.98 0.0818 9.06 0.0113 AKN 50.3 0.1833 17.39 0.0206 12.5 0.0178 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807–816 815 proposed method gives us better fitting to the real data in compar- ison with the three other methods. 4. Conclusions This study presents a novel adaptive approach to extending the artificial neural network in which the adaptive metrics of inputs and a new mechanism for admixture of outputs are proposed for time-series prediction. The experimental results generated by a set of consistent performance measures with different metrics (MAPE, NMSE) show that this new method can improve the accu- racy of time-series prediction. The performance of the proposed method is validated by three sets of complex time-series, namely deterministic synthetic time-series, chaotic time-series and real time-series. In addition, the predicted results generated by the ADNN are also compared with those by the ANN, AKN, and AR meth- ods and indicate that the proposed model outperforms these con- ventional techniques, particularly in forecasting chaotic and real time-series. References Adya, M., Collopy, F., 1998. How effective are neural networks at forecasting and prediction? A review and evaluation. Journal of Forecasting 17, 481–495. Barbounis, T.G., Teocharis, J.B., 2007. Locally recurrent neural networks for wind speed prediction using spatial correlation. Information Science 177, 5775–5797. Bartlett, P.L., 1997. For valid generalization, the size of the weights is more important than the size of the network. In: Mozer, M.C., Jordan, M.I., Petsche, T. (Eds.), Advances in Neural Information Processing Systems, vol. 9. The MIT Press, Cambridge, MA, pp. 134–140. Bodyanskiy, Y., Popov, S., 2006. Neural network approach to forecasting of quasiperiodic financial time series. European Journal of Operational Research 175, 1357–1366. Brooks, C., 2002. Introductory Econometrics for Finance. Cambridge University Press, Cambridge, UK. p. 289. Casdagli, M., 1989. Nonlinear prediction of chaotic time series. Physics D 35, 335– 356,. Celik, A.E., Karatepe, Y., 2007. Evaluating and forecasting banking crises through neural network models: an application for Turkish banking sector. Expert Systems with Applications 33, 809–815. Chen, S.M., Hwang, J.R., 2000. Temperature prediction using fuzzy time series. IEEE Transactions on Systems, Man and Cybernetics Part B 30, 263–275. Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems 2, 303–314. Freitas, P.S.A., Rodrigues, A.J.L., 2006. Model combination in neural-based forecasting. European Journal of Operational Research 173, 801–814. Funahashi, K., 1989. On the approximate realization of continuous mappings by neural networks. Neural Networks 2, 183–192. Geman, S., Bienenstock, E., Doursat, R., 1992. Neural networks and the bias/variance dilemma. Neural Computation 4, 1–58. Hansen, J.V., McDonald, J.B., Nelson, R.D., 2002. Time series prediction with genetic- algorithm designed neural networks: An empirical comparison with modern statistical models. Computational Intelligence 15, 171–184. Huarng, K., Yu, T.H., 2006. Ratio-based lengths of intervals to improve fuzzy time series forecasting. IEEE Transactions on Systems, Man and Cybernetics Part B 36, 328–340. Kim, D., Kim, C., 1997. Forecasting time series with genetic fuzzy predictor ensemble. IEEE Transactions on Fuzzy Systems 5, 523–535. Kulesh, M., Holschneider, M., Kurennaya, K., 2008. Adaptive metrics in the nearest neighbours method. Physics D 237, 283–291. Liu, M.C., Kuo, W., Sastri, T., 1995. An exploratory study of a neural network approach for reliability data analysis. Quality and Reliability Engineering International 11, 107–112. Makridakis, S., 1996. Forecasting: its role and value for planning and strategy. International Journal of Forecasting 12, 513–537. Makridakis, S., Wheelwright, S.C., Hyndman, R.J., 1998. Forecasting-Methods and Applications, third ed. Wiley, New York. pp. 42–50. Masters, T., 1995. Advanced Algorithms for Neural Networks: A C++ Sourcebook. Wiley, New York. McDonnell, J.R., Waagen, D., 1994. Evolving recurrent perceptrons for time series modeling. IEEE Transactions on Neural Networks 5, 24–38. Moody, J.E., 1992. The effective number of parameters: an analysis of generalization and regularization in nonlinear learning systems. Neural Information Processing Systems 4, 847–854. Murray, D.B., 1993. Forecasting a chaotic time series using an improved metric for embedding space. Physics D 68, 318–325. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992. Numerical Recipe in C: The Art of Scientific Computing. Cambridge University Press. Sahoo, G.B., Ray, C., 2006. Flow forecasting for a Hawaii stream using rating curves and neural networks. Journal of Hydrology 317, 63–80. Singh, P., Deo, M.C., 2007. Suitability of different neural networks in daily flow forecasting. Applied Soft Computing 7, 968–978. Taylor, J.W., Buizza, R., 2002. Neural network load forecasting with weather ensemble predictions. IEEE Transactions on Power Systems 17, 59. Wang, T., Chien, S., 2006. Forecasting innovation performance via neural networks – a case of Taiwanese manufacturing industry. Technovation 26, 635–643. Wong, B.K., Vincent, S., Jolie, L., 2000. A bibliography of neural network business applications research: 1994–1998. Operations Research and Computers 27, 1045–1076. Zhang, G.P., 2003. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50, 159–175. Zhang, P., Qi, G.M., 2005. Neural network forecasting for seasonal and trend time series. European Journal of Operational Research 160, 501–514. Zhang, G., Eddy, P.B., Hu, M.Y., 1998. Forecasting with artificial neural networks: the state of the art. International Journal of Forecasting 14, 35–62. 816 W.K. Wong et al. / European Journal of Operational Research 207 (2010) 807–816