Engineering Applications of Artiﬁcial Intelligence 16 (2003) 453–463

Recurrent radial basis function network for time-series prediction Ryad Zemouri*, Daniel Racoceanu, Noureddine Zerhouni # e de Fonctionnement, 25, Rue Alain Savary, 25 000 [email protected], France Laboratoire d’Automatique de [email protected], Groupe Maintenance et Suret!

Abstract This paper proposes a Recurrent Radial Basis Function network (RRBFN) that can be applied to dynamic monitoring and prognosis. Based on the architecture of the conventional Radial Basis Function networks, the RRBFN have input looped neurons with sigmoid activation functions. These looped-neurons represent the dynamic memory of the RRBF, and the Gaussian neurons represent the static one. The dynamic memory enables the networks to learn temporal patterns without an input buffer to hold the recent elements of an input sequence. To test the dynamic memory of the network, we have applied the RRBFN in two time series prediction benchmarks (MacKey-Glass and Logistic Map). The third application concerns an industrial prognosis problem. The nonlinear system identiﬁcation using the Box and Jenkins gas furnace data was used. A two-steps training algorithm is used: the RCE training algorithm for the prototype’s parameters, and the multivariate linear regression for the output connection weights. The network is able to predict the two temporal series and gives good results for the nonlinear system identiﬁcation. The advantage of the proposed RRBF network is to combine the learning ﬂexibility of the RBF network with the dynamic performances of the local recurrence given by the looped-neurons. r 2003 Elsevier Ltd. All rights reserved. Keywords: Neural network; Radial basis function; Dynamic neural networks; Recurrent neural networks; Neural predictive model; Time series prediction

1. Introduction The modern industrial monitoring requires processing a certain number of sensors signals. It concerns essentially the detection of all deviations comparing to a working reference by generating an alarm, and the failure diagnosis. The diagnosis operation has two main functions: the location of the weakening system or subsystem and the identiﬁcation of the primary cause of this failure (Lefebvre, 2000). The monitoring methods can be classiﬁed in two categories (Dash and Venkatasubramanian, 2000): model-based monitoring methodologies and without any model monitoring. The ﬁrst class contains essentially control system techniques based on the difference between the system model’s outputs and the equipment’s output (Combacau, 1991). The major disadvantage of these techniques consists in the difﬁculty to obtain the formal model especially for complex or re-conﬁgurable equipments. The second class of monitoring techniques is not sensitive to this problem. These techniques are the probabilistic ones and the *Corresponding author. URL: http://www.lab.cnrs.fr 0952-1976/03/$ - see front matter r 2003 Elsevier Ltd. All rights reserved. doi:10.1016/S0952-1976(03)00063-0

Artiﬁcial Intelligence ones. The AI techniques are essentially based on a training process that gives certain adaptability to the monitoring application (Rengaswamy and Venkatasubramanian, 1995). The use of the Artiﬁcial Neural Networks (ANN) on a monitoring task can be viewed as a pattern recognition application. The form to recognize is the measurable or observable equipment data. The output classes are the different working and failure modes of the equipment (Koivo, 1994). The Radial Basis Function Networks are completely adapted to this kind of application. Due to the non-exhaustiveness of the history database of the equipment operation, RBF networks are able to detect new operations or failures modes by their local generalization. This one is obtained by the Gaussians basis functions that are maximal to the core, and decrease in a monotonous way with the distance. The second advantage of the RBF network is the ﬂexibility of their training process. The problem with the static classiﬁcation methods is that the dynamic process behavior is not considered (Koivo, 1994). For example, the distinction between a true degradation and a false alarm needs a dynamic processing of the sensors signals (Zemouri et al., 2002a).

ARTICLE IN PRESS 454

R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453–463

In our previous works, we have demonstrated that a dynamic RBF is able to distinguish between a pick of variation and a continuous variation of a signal sensor. This can be interpreted as a distinction between a false alarm and a true degradation. The prognosis function is also strongly dependent on the dynamic behavior of the process. The aim of the prognosis function is to predict a sensor signal evolution. This operation can be obtained either by a priori knowledge of the laws of the ageing phenomena evolution or by a training process of the signal evolution. In this way, the prognosis can identify degradations or predict the time remaining before breakdown (Brunet et al., 1990). For this purpose, we introduce a new Recurrent Radial Basis Function Network (RRBF) architecture that is able to learn temporal sequences. The RRBFN network is based on the advantages of Radial Basis Function networks in term of training process time. The recurrent or dynamic aspect is obtained by cascading looped neurons on the ﬁrst layer. This layer represents the dynamic memory of the RRBF network that permits to learn temporal data. The proposed network combines the easy use of the RBF network with the dynamic performance of the Locally Recurrent Globally Feed forward network (Tsoi and Back, 1994). The prognosis function can be seen like a timeseries prediction problem. In order to validate the prediction capability of the RRBFN, we test the network on two standards time series prediction benchmarks: the MacKey-Glass and the Logistic Map. The prognosis validation is made on a nonlinear system identiﬁcation using the Box & Jenkins gas furnace data. The paper is organized in three sections: a brief survey of the RBF network, their application and their training process algorithms is presented in the second section. The third section describes the architecture of the RRBF network for the time series prediction. Finally, we present the results obtained on the three benchmarks.

2. Radial basis function network overview 2.1. RBF networks definition Radial Basis Functions networks are able to provide a local representation of an N-dimensional space. This is made by restricted inﬂuence zone of the basis functions. The parameters of this basis function are given by a reference vector (core or prototype) lj and the dimension of the inﬂuence ﬁeld sj : The response of the basis function depends on the Euclidian distance between the input vector x and the prototype vector lj ; and depends

also on the size of the inﬂuence ﬁeld: ! jjx lj jj2 : fj ðxÞ ¼ exp 2s2j

ð1Þ

For a given input, a restricted number of basis functions gives the calculation of the output. The RBF network can be classiﬁed in two categories, according to the type of output neuron: standardized and nonstandardized (Mak and Kung, 2000; Moody and Darken, 1989; Xu, 1998; Ghosh and Nag, 2000). Moreover, the RBF network can be used in two kind of application: regression and classiﬁcation. 2.2. RBF training techniques The parameters of the RBF networks are the center and the inﬂuence ﬁeld of the radial function and the output weight (between the intermediate layer’s neurons and those of the output layer). The training process can obtain these parameters. One classify these training techniques in the three following groups: 2.2.1. Supervised techniques The principle of these techniques is to minimize the quadratic error (Ghosh et al., 1992): X E¼ En : ð2Þ n

At each step of the training process, we consider the variations: Dwij of the weight, Dmjk of the center and Dsj of the inﬂuence ﬁeld. The update law is obtained by using the descent of the gradient on En (Rumelhart et al., 1986; Le Cun, 1985). 2.2.2. Heuristic techniques The principle of these techniques is to determine the network parameters in an iterative way. Generally, we start the training process by initializing the network on a center with an initial inﬂuence ﬁeld ðl0 ; s0 Þ: Presenting the training vectors progressively creates the prototype’s centers. The aim of the next step is to modify the inﬂuence rays and the connections weights (only weights between the intermediate layer and the output one). Some of the heuristic techniques used for RBF training are presented below: 2.2.2.1. RCE Algorithm (Restricted Coulomb Energy) (Hudak, 1992). The RCE Algorithm was inspired from the theory of particles charges. The principle of the training algorithm is to modify the network architecture in a dynamic way. The intermediate neurons are added only when it is necessary. The inﬂuence ﬁeld is then adjusted to minimize conﬂicting zones by a threshold y (Fig. 1). 2.2.2.2. Dynamic Decay Adjustment Algorithm (Berthold and Diamond, 1995). This technique, partially extracted

ARTICLE IN PRESS R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453–463

φ

A

cluster point N k with the same class. This center is obtained by a segmentation of the training space wðkÞ of J ðkÞ the k classes, in J ðkÞ disjoined groups fwðkÞ j gj¼1 : The population of this group is Njk points. We estimate then the center lj of the function by the average: 1 X x: ð4Þ lj ¼ k Nj xAwk

B

θ xA

xn

xB

x

j

Input Vector (category B) Fig. 1. Inﬂuence ﬁeld adjustment by RCE algorithm. Only one threshold is used. The reduction of the conﬂicting zone must respect the following relations: fB ðxA Þoy; fA ðxn Þoy; fA ðxB Þoy: No new prototype is added for the input vector xn : φ

θ+ θ− xA

xn

xB

The second step calculates the variance of the Gaussian function (inﬂuence ﬁeld). This one is calculated using the following expression: 1 X sj ¼ k ðx lj Þðx lj ÞT : ð5Þ Nj xAwk j

Method Expectation Maximization (EM) (Dempster et al., 1977): This technique is based on the analogy between the RBF network and the Gaussian mixture models. The Expectation Maximization (EM) algorithm determines, in an iterative way, the parameters of a Gaussian mixture (by the maximum of probability). The RBF parameters are obtained by the two steps: step E which calculates the mean of the unknown data compared to the known data. The step M which maximizes the vector parameters of the step E:

B

A

455

x

Input Vector (category B) Fig. 2. Inﬂuence ﬁeld adjustment by DDA algorithm. Two thresholds y and yþ are used for the conﬂict reduction according to this expression fB ðxA Þoy ; fA ðxn Þoy ; fA ðxB Þoy : No prototype is added for the input vector ðfB ðxn Þ > yþ Þ:

2.2.3.2. Second phase (Supervised). Maximum of membership (Hernandez, 1999): This technique, used in the classiﬁcation applications, considers the most signiﬁcant basis functions values fi ðxÞ: N

from the RCE algorithm, is used for classiﬁcation applications (discrimination). The principle of this technique is to introduce two thresholds y and yþ in order to reduce conﬂicting zone between prototypes. To ensure the convergence of training algorithm, the neural network must satisfy the two inequality (3) and this for each vector x of class C from the training set (Fig. 2): ð(i: fci ðxÞXyþ Þ4ð8kac; 8j: fkj ðxÞoy Þ:

ð3Þ

2.2.3. Two times training techniques These techniques estimate the RBF parameters in two phases: a ﬁrst phase is used to determine the centers and the rays of the basis functions. In this step, only input vectors are used (unsupervised training algorithm). The second step has to calculate the connections weights between the hidden layer and the output layer (supervised training). Some of these techniques are presented as below. 2.2.3.1. First phase (unsupervised). The k-means algorithm: The prototype’s centers and the variances matrix can be calculated in two steps: in the ﬁrst step, the k-means cluster algorithm determines the center of the

fmax ¼ max fi ;

ð6Þ

i¼1

where N is the number of basis functions for all the classes. The output of the neural network is then given by y ¼ classeðfmax Þ:

ð7Þ

Algorithm of least squares: Let suppose that is ﬁxed an empirical risk function to minimize (Remp). As for the Multi Layer Perceptron, the determination of the parameters can then be done in a supervised way by gradient decent method. If the selected cost function is quadratic with ﬁxed basis functions F; the weight matrix W is obtained by a simple linear system resolution. The solution is the weights matrix W that minimizes the empirical risk Remp. By canceling the derivative of this risk compared to the weight, we obtain the optimal conditions, which can be written in the following matrix form: ðFt FÞW t ¼ Ft Y :

ð8Þ t

Y represents the desired outputs vector. If the F F matrix is square and non-singular (Michelli condition (Michelli, 1986)), the optimal solution for the weights, with ﬁxed basis functions, can be written as W t ¼ ðFt FÞ1 Ft Y ¼ F1 Y :

ð9Þ

ARTICLE IN PRESS R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453–463

456

3. The recurrent radial basis function network

xi

The proposed recurrent RBF neural network considers the time as an internal representation (Chappelier, 1996; Elman, 1990). The dynamic aspect is obtained by the use an additional self-connection on the input neurons with a sigmoid activation function. These looped neurons are a special case of the Locally Recurrent Globally Feedforward architecture, called local output feed back (Tsoi and Back, 1994). The RRBF network can thus take into account a certain past of the input signal (Fig. 3).

xi

ai wii

t

∆ a+

f(ai)

f(ai)

t+1 t+2 a0

ai

a−

(∆)

(a)

ai

(b)

Fig. 4. Equilibrium points of the looped neuron: (a) the forget behavior ðkwii p2Þ and (b) temporal memorizing behavior (kwii > 2).

3.1. Looped neuron Each neuron of the input layer gives a summation at the instant t between its input Ii and its previous output weighted by a self-connection wii : The output of its activation function is

function parameter k: The equilibrium points of the looped neuron satisfy the following equation: aðtÞ ¼ wii f ðaðt 1ÞÞ:

ð14Þ

ai ðtÞ ¼ wii xi ðt 1Þ þ Ii ðtÞ;

ð10Þ

The point a0 ¼ 0 is a ﬁrst obvious solution of this equation. The other solutions are obtained by the variations study of the function:

xi ðtÞ ¼ f ðai ðtÞÞ;

ð11Þ

gðaÞ ¼ a wii f ðaÞ:

where ai ðtÞ and xi ðtÞ represent respectively the neuron activation and its output at the instant t: f is the sigmoid activation function: 1 expðkxÞ : ð12Þ f ðxÞ ¼ 1 þ expðkxÞ To highlight the inﬂuence of this self-connection, we let evolve the neuron without an external inﬂuence (Frasconi et al., 1995; Bernauer, 1996). The initial conditions are: the input Ii ðt0 Þ ¼ 0 and that xi ðt0 Þ ¼ 1: The output of the neuron evolves according to the following expression: 1 expðkwii xðt 1ÞÞ : ð13Þ xðtÞ ¼ 1 þ expðkwii xðt 1ÞÞ Fig. 4 shows the temporal evolution of the output neuron. This evolution depends on the slope of the straightline D: This slope depends on two parameters: the selfconnection weight ðwii Þ and the value of the activation w Input

I1

I2 w Output Neurons

Sigmoid Function

According to kwii ; the looped neuron has one or more equilibrium points: *

*

If kwii p2; the neuron has only one equilibrium point a0 ¼ 0: If kwii > 2; the neuron has three equilibrium points a0 ¼ 0; aþ > 0; a o0:

To study the stability of these points, we study the variations of the Lyapunov function (Frasconi et al., 1995; Bernauer, 1996). In the case where kwii p2; this function is deﬁned by V ðaÞ ¼ a2 : We obtain DV ¼ ðwii f ðaÞÞ2 a2 ¼ gðaÞðwii f ðaÞ þ aÞ:

ð16Þ

If a > 0; then f ðaÞ > 0 and gðaÞo0: If wii > 0 so then, we have DV o0: If ao0; then f ðaÞo0 and gðaÞ > 0: If wii > 0; we have DV o0: The point a0 ¼ 0 is thus a steady-state equilibrium point if kwii p2 with wii > 0: In the case where kwii > 2; the looped neuron has three equilibrium points: a0 ¼ 0; aþ > 0 and a o0: To study the stability of the point aþ ; we deﬁne the Lyapunov function V ðaÞ ¼ ða aþ Þ2 (see Frasconi et al., 1995; Bernauer, 1996). We obtain DV ¼ ðwii f ðaÞ aþ Þ2 ða aþ Þ2 ¼ gðaÞ½gðaÞ þ 2ða aþ Þ:

w

I3

ð15Þ

Radial Basis Function

Fig. 3. RRBF network (recurring networks with radial basis functions).

If a > aþ ; gðaÞo0 and ½gðaÞ þ 2ða aþ Þ0; so we have DV o0: The calculation is the same in the case of aoaþ : The point aþ is a stable equilibrium point. In the same way, we can prove that the point a is another stable equilibrium point. The point a0 ¼ 0 is an unstable equilibrium point.

ARTICLE IN PRESS R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453–463

The looped neuron thus can exhibit two behaviors according to kwii : forgetting behavior ðkwii p2Þ; and temporal memory behavior ðkwii > 2Þ: The ﬁgure below shows the inﬂuence of the self-connection weight on the behavior of the looped neuron with ðk ¼ 0:05Þ (Fig. 5): The self-connection procures to the neuron the capacity to memorize a certain past of the input data. The weight of this self-connection can be obtained by training, but the easier way to do it is to ﬁx it a priori. We will see in the next section how this looped neuron can make the RRBF network possible to treat dynamic data whereas traditional RBR treat only static data. 3.2. RRBF for the prognosis After showing the effect of the self-connection on the dynamic behavior of the RRBF network, we present in this paragraph the topology of the RRBF network and its training algorithm for time series prediction applications (Fig. 6).

1

Output of the looped neuron

0.9 0.8 0.7 0.6 0.5 0.4

wii = 41

0.3 0.2 0.1 0

0

20

40

wii = 40

wii = 39

wii = 30 60

80

100

120

140

160

180

200

Time

Fig. 5. Inﬂuence of self-connection on the behavior of the looped neuron with ðk ¼ 0:05Þ:

457

The looped neurons cascades represent the dynamic memory of the neural network. The network then treats the data dynamically. The output vector of the looped neurons represents the input vector for the RBF nodes. The neural network output is deﬁned by n X yðtÞ ¼ wi fi ðli ; si Þ; ð17Þ i¼1

where wi represents the connection weight between radial neurons and the output neuron. The output of the RBF nodes has the following expression: ! Pm j 2 j j¼1 ðx ðtÞ li Þ fi ðli ; si Þ ¼ exp ð18Þ s2i li ¼ ½lji m j¼1 and si represent respectively the center and the dimension of the inﬂuence ray of the ith prototype. These radial neurons are the static memory of the network. The output xj ðtÞ of the jth looped neurons is the dynamic memory of the network with the following expression: xj ðtÞ ¼

1 expðkð$ xj ðt 1Þ þ xj1 ðtÞÞÞ 1 þ expðkð$ xj ðt 1Þ þ xj1 ðtÞÞÞ

ð19Þ

with j ¼ 1; y; m represents the number of the neurons of the input layer. The ﬁrst neuron of this layer has a linear activation function x1 ðtÞ ¼ xðtÞ: Fig. 7 shows the relation between the looped neuron number and the length of a signal past. We have introduced a variation D at the instant t ¼ 50 for a signal (Figs. 7(a) and (b)). The aim is to highlight the dynamic memory longer of the RRBF shown in Fig. 6. Four looped-neuron RRBF is stimulated by the signal of Fig. 7(a). Figs. 7(c)–(f) show the output error of each looped neuron caused by this variation D: The network parameters are determined with a twostage training process. During the ﬁrst stage, an unsupervised learning algorithm is used to determine the parameters of the RBF nodes (the centers and the inﬂuence rays). In the second stage, linear regression is used to determine the weights between the hidden and the output layer. 3.3. Training process of the RRBF

Fig. 6. Topology of the RRBF. The self-connection of the input neurons procures to the network a dynamic processing of the input data.

3.3.1. The prototype’s parameters The ﬁrst step of the training process consists to determine the centers and the inﬂuence rays of the prototypes (static memory). These prototypes are extracted from the output of the looped neurons (dynamic memory). Each temporal signal is thus characterize by a cluster point that the coordinate are the output of the loop neuron at every moment t: We have adopted the RCE training algorithm for this ﬁrst stage of the training process. The inﬂuence rays are

ARTICLE IN PRESS R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453–463

458 62

62

60

60

58

58

56

56

54

54

52

52

50

50

48

48

46

46

44 0

50

100

(a)

150

200

250

44

300

0

50

(b)

Signal evolution

100

150

200

250

300

signal with variation

fn1

-4

0.012

2.5

0.01

2

0.008

where N represents the number of the basis functions, centered in the N input points. The solution of this problem is to solve the N linear equations to ﬁnd the weight coefﬁcients: 2 32 3 2 3 w1 y1 f11 f12 ? f1n 6f 7 6 7 6 7 6 21 f22 ? f2n 76 w2 7 6 y2 7 ð21Þ 6 76 7 ¼ 6 7; 4 ^ ^ & ^ 54 ^ 5 4 ^ 5 fn2

?

fnn

wn

yn

x 10

yi is the desired output, and fij ¼ fðjjli lj jjÞ;

i; j ¼ 1; 2; y; n:

ð22Þ

1.5

The equation can be written as

0.006 1 0.004

F w ¼ Y:

0

(c)

ð23Þ

0.5

0.002 0

50

100

150

200

250

300

1st looped neuron error

The weight vector is then

0

0

50

100

150

200

250

300

w ¼ F1 Y:

2nd looped neuron error

(d)

x 10

4

1.6

3.5

1.4

4. Application in prediction

3

1.2

2.5

1 2

0.8

1.5

0.6 0.4

1

0.2

0.5

0 0

(e)

ð24Þ

-6

-5

1.8 x 10

50

3rd

100

150

200

250

300

looped neuron error

0

0

(f)

50

100

150

200

250

300

4th looped neuron error

Fig. 7. Inﬂuence of the number of looped neurons on the length of the dynamic memory of the network: (a) signal evolution, (b) signal with variation D; (c) ﬁrst looped neuron error, (d) second looped neuron error, (e) third looped neuron error, and (f) fourth looped neuron error.

adjusted according to a threshold y: A complete iteration of this algorithm is as follows: // Training Iteration // Creation of a new prototype for all training vector x Do: add a new prototype pnþ1 with: lnþ1 ¼ x nþ ¼ 1 end // adjusting the influence rays for all prototype li Do: si ¼ max1pjpn4jai s: fi ðlj Þoy end // End

n X i¼1

wi fi ðjjx li jjÞ;

4.1. MacKey–Glass chaotic time series The MacKey–Glass chaotic time series is generated by the following differential equation: xðtÞ ’ ¼ bxðtÞ þ

3.3.2. Connections weights The time series prediction can be seen like an interpolation problem. The output of RBF network is hðxÞ ¼

We have tested the RRBF network on three time series predictions applications. On these three applications, the required goal is to predict the evolution of the input data from the knowledge of the past of these data. The training process is made from a part of the data set. The network was tested on the totality of the data. We give for each application, two error-prediction average and two error standard deviations according if the network test is made on the only the test population or on both test and training population.

ð20Þ

axðt tÞ : 1 þ x10 ðt tÞ

ð25Þ

xðtÞ is quasi-periodical and chaotic for the following parameters: a ¼ 0:2; b ¼ 0:1 and t ¼ 17 (Jang, 1993; Chiu, 1994). The simulated data were obtained by using the fourth-order Runge–Kutta method for Eq. (25) with the following initial conditions xð0Þ ¼ 1:2; and xðt tÞ ¼ 0 for 0ptot: The simulation step is 1. The data of this series are available on the following location http:// neural.cs.nthu.edu.tw/jang/benchmark. We have tested the RRBF network presented previously on the MacKey–Glass prediction. To obtain good result, we have used six looped neurons. The parameters of these looped neurons are set such as to obtain the longest dynamic memory (Fig. 5). This characteristic is obtained with the value $ ¼ 40 of the self-connection and the parameter of the sigmoid function k ¼ 0:05: The parameters of the Gaussian functions as well as the

ARTICLE IN PRESS R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453–463

459

Table 1 Results of the RRBF test on the MacKey–Glass series prediction Nb 50 100 150 200 250 300 350 400 450 500

Min

Max 4

3:90 10 3:27 105 4:13 105 2:60 105 4:54 105 1:46 105 2:45 106 3:35 105 9:56 105 1:50 105

0.043% 0.0036% 0.00458% 0.00288% 0.00504% 0.00162% 0.00027% 0.0037% 0.01062% 0.00166%

1.1669 1.1632 0.7129 0.3915 0.3000 0.2727 0.2874 0.3114 0.2893 0.2789

Moy1 129% 129% 79% 43% 33% 30% 31% 34% 32% 31%

0.1862 0.0969 0.0655 0.0502 0.0480 0.0441 0.0439 0.0375 0.0360 0.0380

Moy2 20% 10% 7% 5% 5% 5% 4% 4% 4% 4%

0.1776 0.0879 0.0564 0.0408 0.0369 0.0318 0.0296 0.0236 0.0209 0.0203

Dev Std1 19% 9% 6% 4% 4% 3% 3% 2% 2% 2%

Dev Std2

0.251 0.184 0.103 0.058 0.054 0.048 0.048 0.042 0.042 0.043

27% 20% 11% 6% 6% 5% 5% 4% 4% 4%

0.2482 0.1778 0.0982 0.0559 0.0518 0.0456 0.0445 0.0382 0.0368 0.0371

27% 19% 11% 6% 5% 5% 5% 4% 4% 4%

Nb represents the population of the training points. The columns Min and Max represent minimal and maximal error prediction. Moy1 represent the average errors of predictions on the part of the data without training population, and Moy2 the average errors on all the data. Dev Std1 and Dev Std2 are the standard deviations without and with training data. The percentages are given according to the amplitude of the signal 0.9. 1.4

0.35

1.2

0.3

1

0.25

0.8

0.2

0.6

0.15

0.4

0.1

0.2

0.05

Network Output System Output

0

0

200

400

600

800

1000

1200

(a)

0

0

200

400

600

800

1000

1200

(b)

Fig. 8. Prediction results: (a) neural network output and the MacKey-Glass series values and (b) error of the neural network prediction.

connections weights are given by the training algorithms presented previously with y ¼ 0:8: Table 1 presents the obtained results by the RRBF network with different number of training points (Nb) taken from the 118th data point. The prediction errors between the network output and the real value of the series are presented in the various columns of the table with the percentages of each error. This percentage is calculated according to the amplitude 0.9 of the series. The network is able to predict the series evolution with a minimum of 50 training points with a mean error equal to 19% and standard deviation error equal to 27%. This error decreases with the augmentation of the training points until 2% of the error. The training time corresponds to one iteration. Fig. 8 show the results of the test with 500 training points.

This series is chaotic in the interval of [0,1], with xð0Þ ¼ 0:2: The goal of this application is to predict the target value of xðt þ 1Þ: The input value of the RRBF network is xðtÞ: The best prediction results are obtained with one looped neuron having the parameters $ ¼ 40 for the self-connection, and k ¼ 0:05 for the sigmoid function parameter. The parameter y ¼ 0:999 was used for the ﬁrst stage training process. Table 2 shows the test results of the RRBF network for different training number (Nb). The network can gives good results with only 10 training points. Fig. 9 shows the results of the test with a 100 training data points.

4.2. Logistic map

The third application relates to a nonlinear prediction system, using the Box and Jenkins (1970) gas furnace database, which is available in the location http:// neural.cs.nthu.edu.tw/jang/benchmark. These data represent a time series of gas furnace process with uðtÞ represents the input gas and yðtÞ represents the output

The Logistic Map series is deﬁned by the expression below: xðt þ 1Þ ¼ 4xðtÞð1 xðtÞÞ:

ð26Þ

4.3. Prediction nonlinear system

ARTICLE IN PRESS R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453–463

460

Table 2 Results of the RRBF test on the Logistic Map series prediction Nb

Moy1

Moy2

10 20 30 40 50 60 70 80 90 100

0.0945 7:26 104 1:59 106 4:69 108 1:33 109 4:29 1010 7:11 1011 4:23 1012 1:51 1011 2:14 1011

9% 7:26 102 % 1:59 104 % 4:69 106 % 1:33 107 % 4:29 108 % 7:11 109 % 4:23 1010 % 1:51 109 % 2:14 109 %

Dev Std1

0.0898 6:53 104 1:35 106 3:75 108 1:00 109 3:02 1010 5:10 1011 3:25 1012 1:32 1011 1:55 1011

9% 6:53 102 % 1:35 104 % 3:75 106 % 1:00 107 % 3:02 108 % 5:10 109 % 3:25 1010 % 1:32 109 % 1:55 109 %

Dev Std2

0.0636 5:11 104 1:69 106 3:66 108 1:64 109 8:06 1010 1:90 1010 9:86 1012 1:23 1011 1:68 1011

6% 5:11 102 % 1:69 104 % 3:66 106 % 1:64 107 % 8:06 108 % 1:90 108 % 9:86 1010 % 1:23 109 % 1:68 109 %

0.0652 5:32 104 1:66 106 3:77 108 1:53 109 7:00 1010 1:55 1010 7:74 1012 1:45 1011 1:38 1011

6% 5:32 102 % 1:66 104 % 3:77 106 % 1:53 107 % 7:00 108 % 1:55 108 % 7:74 1010 % 1:45 109 % 1:38 109 %

Nb represents the population of the training points. The columns Min and Max represent minimal and maximal error prediction. Moy1 represent the average errors of predictions on the part of the data without training population, and Moy2 the average errors on all the data. Dev Std1 and Dev Std2 are the standard deviations without and with training data. The percentages are given compared to amplitude of the signal.

Network Output System Output -11

1

9

x 10

0.9 8

0.8 7

0.7 6

0.6 5

0.5 4

0.4 3

0.3

2

0.2

1

0.1 0

0

0

20

40

60

80

100

120

140

160

180

(a)

200

0

20

40

60

80

100

120

140

160

180

200

(b)

Fig. 9. (a) Comparison of the prediction results of the network and the values of the series Logistic Map and (b) error of prediction of the neuron network.

CO2 concentration. The goal of this application is to predict the yðtÞ value from the knowledge of yðt 1Þ and uðt 1Þ: The used RRBF network contains two inputs: an input for yðtÞ and another for uðtÞ: The past of each input signal is taken into account by a looped neuron. The output of the neural network gives the yðt þ 1Þ value. The network is composed of four input neurons (a linear neuron and a looped neuron for each input signal) and one output neuron. The intermediate neurons are determined by the ﬁrst stage training process described previously. The ﬁrst 145 points of the database are used for the training process. The second stage-training algorithm determined the connections weights. The best results were obtained with $ ¼ 500 and k ¼ 0:05 for the sigmoid function, and y ¼ 0:84 for the training of the inﬂuence ray.

Table 3 shows the results of the network test on this application. The RRBF neuronal network gives a prediction result with an error average estimation of 8%. The training process takes one timeiteration.

5. Discussion The Recurrent Radial Basis Function Network presented in this article was successfully validated in the two time series prediction problems. Figs. 8 and 9 show the results and the error prediction of the RRBF for the MacKey–Glass series and the Logistic Map series. This dynamic aspect is obtained thanks to the looped input nodes (Fig. 3). This local output feedback procures to the neuron a dynamic memory (Fig. 5). We

ARTICLE IN PRESS R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453–463

461

Table 3 Results of the RRBF test on the nonlinear system prediction Nb

Min

Max

145

0.0067

0.04%

Moy1

18.0235

120%

1.5274

Moy2 10%

Dev Std1

1.2441

8%

2.3267

Dev Std2 15%

3.4950

23%

Nb represents the population of the training points. The columns Min and Max represent minimal and maximal error prediction. Moy1 represent the average errors of predictions on the part of the data without training population, and Moy2 the average errors on all the data. Dev Std1 and Dev Std2 are the standard deviations without and with training data. The percentages are given compared to amplitude of the signal.

62

3

60 2

58 1

54

u(t)

y(t)

56

0

52 50

-1

48 -2

46 44 0

(a)

50

100

150

200

250

t

300

-3 0

(b)

50

100

150

200

250

300

t

Fig. 10. (a) CO2 output concentration of the gas furnace and (b) input gas of the furnace.

do not have so to use temporal windows to store or bloc the input data as some neural architecture: NETtalk introduced by Sejnowski and Rosenberg (1986), the TDNN by Waibel et al. (1989) and the TDRBF by Berthold (1994). These temporal windows techniques can have many disadvantages (Elman, 1990). First, the data must be blocked by an external mechanism: when the data can be presented to the network? The second disadvantage is the limitation of the temporal window dimension. The recurrent networks are not affected with these points. We have shown in Fig. 7 that the RRBF with four looped neurons is sensitive to a past of about 100 step time data. A second advantage of the RRBF is the ﬂexibility of the training process. A two stage-learning algorithm was used. The ﬁrst stage concerns the determination of the RBF parameters, and the second stage for the output weight calculation. Only few seconds are required for train the RRBF by a personal computer with a 700 MHz processor. The major difﬁculty is to ﬁnd the best parameters that optimize the output result. These parameters are: the number of the input looped neurons ðN > 0Þ; the selfconnection value ðwii > 0Þ; the parameter of the sigmoid function ðk > 0Þ; and the parameter of the ﬁrst stagetraining algorithm 0oyo1. In the major case, we can have good results with only one looped neuron ðN ¼ 1Þ:

This input neuron is conﬁgured to have the longest memory obtained with kw ¼ 2 (Fig. 5). The k parameter is chosen so that to give a quasi-linear aspect to the sigmoid function around the initial point ðkE0:05Þ: The last parameter to adjust is the ﬁrst stage-training threshold y: The results obtained by the RRBF show that the RCE algorithm does not rigorously calculate the parameters of the Gaussian nodes. The neural network is over training. This result is completely coherent because all the data of the training set are stored as prototypes. The clustering techniques like the k-means algorithm, which minimizes the sum of squares error (SSE) between the inputs and hidden node centers, will certainly give better result than the RCE algorithm. However, these techniques can have also some disadvantages. We have presented in our previous work an example which highlights these disadvantages (Zemouri et al., 2002b): *

*

There is no formal method for specifying the number of hidden nodes. These nodes are initialized randomly. We have to run several iterations to obtain the best result.

Our future works will concern the development of a new method, which boosts the performances of the k-means algorithm (Figs. 10–12).

ARTICLE IN PRESS R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453–463

462

Network Output System Output y(t) 60

50

40

Training population

Test population

30 0

50

100

150

200

250

300

Fig. 11. Comparison of the test results of the CO2 concentration prediction of the furnace gas with the real values.

20

10

0 0

50

100

150

200

250

300

Fig. 12. Prediction error of the RRBF network.

6. Conclusion We have presented in this article an application of the RRBF network on three time series prediction problems: MacKey-Glass, Logistic Map and Box & Jenkins gas furnace data. Thanks to its dynamic memory, the RRBF network is able to learn temporal sequences. This dynamic memory is obtained by a self-connection of the input neurons. The input data are not blocked by an external mechanism, but are memorized by the input neurons. The training process time is relatively short. It took one iteration-time for the RBF parameters calculation and a matrix multiplication-time for the output weight calculation. In the three examples, all the training data were correctly tested. The results obtained in the three Time-Series Prediction applications represent a validation for the dynamical data-treatment by the RRBF network.

References Bernauer, E., 1996. Les r!eseaux de neurones et l’aide au diagnostic: un mod"ele de neurones boucl!es pour l’apprentissage de s!equences temporelles, Ph.D. Thesis, LAAS/FRANCE. Berthold, M.R., 1994. A time delay radial basis function network for phoneme recognition. Proceedings of International Conference on Neural Networks, Orlando, Vol. 7, pp. 4470–4473. Berthold, M.R., Diamond, J., 1995. Boosting the performance of RBF networks with dynamic decay adjustment. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (Eds.), Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, pp. 521–528. Box, G.E.P., Jenkins, G.M. 1970. Time Series Analysis, Forecasting and Control. Holden Day, San Francisco, pp. 532–533.

Brunet, J., Jaume, D., Labarr"ere, M., Rault, A., Verg!e, M., 1990. D!etection et diagnostic de panes, Approche par mod!elisation. Traitement des nouvelles technologies/s!erie diagnostic et maintenance, edition hermes FRANCE. Chappelier, J.C., 1996. RST: une architecture connexionniste pour la prise en compte de relations spatiales et temporelles. Ph.D. Thesis, Ecole Nationale Sup!erieure des T!el!ecommunications/France. Chiu, S., 1994. Fuzzy model identiﬁcation based on cluster estimation. Journal of Intelligent & Fuzzy Systems 2 (3), 267–278. Combacau, M., 1991. Commande et surveillance des syst"emes a" e! v!enements discrets complexes: application aux ateliers ﬂexibles. Ph.D. Thesis, University of.Sabatier Toulouse, France. Dash, S., Venkatasubramanian, V., 2000. Challenges in the industrial applications of fault diagnostic systems. Proceedings of the Conference on Process Systems Engineering Computing and Chemical Engineering 24(2–7). Keystone, Colorado, pp. 785–791. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistic Society, Series B 39, 1–38. Elman, J.L., 1990. Finding Structure in Time. Cognitive Science 14, 179–211. Frasconi, P., Gori, M., Maggini, M., Soda, G., 1995. Uniﬁed integration of explicit knowledge and learning by example in recurrent networks. IEEE Transactions on Knowledge and Data Engineering 7 (2), 340–346. Ghosh, J., Nag, A., 2000. In: Howlett, R.J., Jain, L.C. (Eds.), Radial Basis Function Neural Network Theory and Applications. PhysicaVerlag, Wurzburg. Ghosh, J., Beck, S., Deuser, L., 1992. A neural network based hybrid system for detection, characterization and classiﬁcation of shortduration oceanic signals. IEEE Journal of Ocean Engineering 17 (4), 351–363. Hernandez, N.G., 1999. Syst!eme de diagnostic par r!eseaux de neurones et statistiques: application a" la d!etection d’hypovigilance d’un conducteur automobile. Ph.D. Thesis, LAAS/France. Hudak, M.J., 1992. RCE classiﬁers: theory and practice. Cybernetics and Systems 23, 483–515. Jang, J.-S.R., 1993. ANFIS: adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man, and Cybernetics 23, 665–685. Koivo, H.N, 1994. Artiﬁcial neural networks in fault diagnosis and control. Control in Engineering Practice 2 (1), 89–101. Le Cun, Y., 1985. Une proc!edure d’apprentissage pour r!eseau a" seuil asym!etrique. Cognitiva 85, 599–604. Lefebvre, D., 2000. Contribution a" la mod!elisation des syst!emes dynamiques a" e! v!enements discrets pour la commande et la surveillance. Habilitation a" Diriger des Recherches, Universit!e de Franche Comt!e/ IUT Belfort, Montb!eliard/France. Mak, M.W., Kung, S.Y., 2000. Estimation of elliptical basis function parameters by the EM algorithms with application to speaker veriﬁcation. IEEE Transactions on Neural Networks 11 (4), 961–969. Michelli, C.A., 1986. Interpolation of scattered data: distance matrices and conditionally positive deﬁnite functions. Constructive Approximation 2, 11–22. Moody, J., Darken, J., 1989. Fast learning in networks of locally tuned processing units. Neural Computation 1, 281–294. Rengaswamy, R., Venkatasubramanian, V., 1995. A syntactic pattern recognition approach for process monitoring and fault diagnosis. Engineering Applications of Artiﬁcial Intelligence Journal 8 (1), 35–51. Rumelhart, D.E, Hinton, G.E., Williams, R.J., 1986. Learning internal representation by error propagation. In: Rumelhart, D.E., McClelland, J.L. (Eds.), Parallel Distributed Processing Explorations in the Microstructure of Cognition, Vol. 1. The MIT Press, Bradford Books, Cambridge, MA, pp. 318–362.

ARTICLE IN PRESS R. Zemouri et al. / Engineering Applications of Artificial Intelligence 16 (2003) 453–463 Sejnowski, T.J., Rosenberg, C.R., 1986. NetTalk: a parallel network that learns to read aloud. Electrical Engineering and Computer Science Technical Report, The Johns Hopkins University. Tsoi, A.C., Back, D., 1994. Locally Recurrent Globally Feedforward: a critical review of the architectures. IEEE Transactions on Neural Networks 5 (2), 229–239. Xu, L., 1998. RBF nets, mixture experts, and Bayesian Ying-Yang learning. Neurocomputing 19 (1–3), 223–257. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K., 1989. Phoneme recognition using time delay neural network. IEEE

463

Transactions in Acoustics, Speech and Signal Processing 37 (3), 328–339. Zemouri, R., Racoceanu, D., Zerhouni, N., 2002a. Application of the dynamic RBF network in a monitoring problem of the production systems. 15 IFAC World Congress on Automatic Control, Barcelone, Espagne. Zemouri, R., Racoceanu, D., Zerhouni, N., 2002b. R"eseaux de neurones R!ecurrents a" Fonction de base Radiales RRFR: Application au pronostic. Revue d’Intelligence Artiﬁcielle, RSTI S!erie RIA 16 (03), 307–338.