Hierarchical Dynamic Neural Networks for Cascade System Modeling With Application to Wastewater Treatment

Abstract

Many cascade processes, such as wastewater treatment, include complex nonlinear subsystems and many variables. The normal input–output relation only represents the first block and the last block of the cascade process.

In order to model the whole process, we use hierarchical dynamic neural networks to identify the cascade process. The internal variables of the cascade process are estimated. Two stable learning algorithms and theoretical analysis are given. Real operational data of a wastewater treatment plant are applied to illustrate this new neural modeling approach.

Keywords

Hierarchical neural networks; Cascade system; Wastewater treatment

1.1 Introduction

The input–output relation within a cascade process is very complex. It usually can be described by several nonlinear subsystems, as is for example the case with the cascade process of wastewater treatment. Obviously, the first block and the last block cannot represent the whole process. Hierarchical models can be used to model this problem. When the cascade process is unknown, only the input and output data are available. Black-box modeling techniques are needed. Also, the internal variables of the cascade process need to be estimated.

There are three different approaches that can be used to model a cascade process. If the input/output data in each subblock are available, each model is identified independently. If the internal variables are not measurable, a general method is to regard the whole process as one block and to use one model to identify it [6,3,19]. Another method is to use hierarchical models to identify cascade processes. Advantages of this approach are that the cascade information is used for identification and that the internal variable can be estimated. In [2], discrete-time feedforward neural networks are applied to approximate the uncertainty parts of the cascade system.

Neural networks can approximate any nonlinear function to any prescribed accuracy provided a sufficient number of hidden neurons can be incorporated. Hierarchical neural models consisting of a number of low-dimensional neural systems have been presented by [9] and [11] in order to avoid the dimension explosion problem. The main applications of hierarchical models are fuzzy systems, because rule explosion problem can be avoided in hierarchical systems [9], for example, in hierarchical fuzzy neural network [17], hierarchical fuzzy systems [12], and hierarchical fuzzy cerebellar model articulation controller (CMAC) networks [15]. Sensitivity analysis of the hierarchical fuzzy model was given in [12]. A statistical learning method was employed to construct hierarchical models in [3]. Based on Kolmogorov's theorem, [18] showed that any continuous function can be represented as a superposition of functions with the natural hierarchical structure. In [16], fuzzy CMAC networks are formed into a hierarchical structure.

The normal training method of hierarchical neural systems is still gradient descent. The key for the training of hierarchical neural models is to get an explicit expression of each internal error. Normal identification algorithms (gradient descent, least square, etc.) are stable under ideal conditions. They might become unstable in the presence of unmodeled dynamics. The Lyapunov approach can be used directly to obtain robust training algorithms of continuous-time and discrete-time neural networks. By using passivity theory, [5], [8], and [14] successfully proved that gradient descent algorithms of continuous-time dynamic neural networks were stable and robust to any bounded uncertainties.

The main problem for the training of a hierarchical neural model is the estimation of the internal variable. In this chapter, a hierarchical dynamic neural network is applied to model wastewater treatment. Two stable training algorithms are proposed. A novel approximate method for the internal variable of the cascade process is discussed. Real application results show that the new modeling approach is effective for this cascade process.

1.2 Cascade Process Modeling Via Hierarchical Dynamic Neural Networks

Each subprocess of a cascade process, such as wastewater treatment, can be described using the following general nonlinear dynamic equation:

x ˙ = f ( x , u ) ,

(1.1)

where x∈Rnc is the inner state, u∈Rmc is the input, and f is a vector function. Without loss of generality, we use two nonlinear affine systems to show how to use the hierarchical dynamic neural networks to model the system; see Fig. 1.1. The identified cascade nonlinear systems are given by

x ˙ 1 = f 1 ( x 1 ) + g 1 ( x 1 ) u , x ˙ 2 = f 2 ( x 2 ) + g 2 ( x 2 ) x 1 ,

(1.2)

where x1,x2∈Rn are the inner states of the subsystems, and f1, f2, g1, and g2 are unknown vector functions; x1 can be also regarded as the output of subsystem 1 and as the input of subsystem 2; u∈R is the input of the whole system, and also the input of subsystem 1; x2 is the output of the whole system, and also the output of subsystem 2.

Fig. 1.1 Cascade system modeling via hierarchical recurrent neural networks.

Only u and x2 are available for the cascade process modeling. Since the internal variables are not measurable, a general method is to regard the whole process as one block and to use one model to identify it. Another method is to use hierarchical models to identify cascade processes. The advantages of this approach are that the cascade information is used for identification and that the internal variable can be estimated. In Section 1.3 we will show how to approximate it. In many wastewater treatment plants, x1 is sampled occasionally. We can use this real value to improve the modeling accuracy.

We construct the following hierarchical dynamic neural networks to model (1.2):

z ˙ 1 = A 1 z 1 + W 1 σ 1 ( z 1 ) + V 1 ϕ 1 ( z 1 ) u , z ˙ 2 = A 2 z 2 + W 2 σ 2 ( z 2 ) + V 2 ϕ 2 ( z 2 ) z 1 ,

(1.3)

where z1,z2∈Rn are the states of the neural models, and W1, W2, V1, and V2 are the weights of the neural networks; A1 and A2 are known stable matrices. The active functions of σ(⋅) and ϕ(⋅) are used as sigmoid functions, i.e.,

σ ( z ) = a 1 + e − b z − c .

From Fig. 1.1, the modeling error is

Δ 2 = z 2 − x 2 .

(1.4)

The internal modeling error is

Δ 1 = z 1 − x 1 .

(1.5)

This will be estimated.

Generally, the hierarchical dynamic neural networks (1.3) cannot follow the nonlinear system (1.2) exactly. The nonlinear cascade system may be written as

x ˙ 1 = A 1 x 1 + W 1 ⁎ σ 1 ( x 1 ) + V 1 ⁎ ϕ 1 ( x 1 ) u − f ˜ 1 , x ˙ 2 = A 2 x 2 + W 2 ⁎ σ 2 ( x 2 ) + V 2 ⁎ ϕ 2 ( x 2 ) z 1 − f ˜ 2 ,

(1.6)

where W1⁎, W2⁎, V1⁎, and V2⁎ are unknown bounded matrices. We assume the upper bounds, W¯1, W¯2, V¯1, and V¯2, are known as

W 1 ⁎ Λ w − 1 W 1 ⁎ T ⩽ W ¯ 1 , W 2 ⁎ Λ w − 1 W 2 ⁎ T ⩽ W ¯ 2 , Λ w = Λ w T > 0 , V 1 ⁎ Λ v − 1 V 1 ⁎ T ⩽ V ¯ 1 , W 2 ⁎ Λ v − 1 W 2 ⁎ T ⩽ V ¯ 2 , Λ v = Λ v T > 0 ,

(1.7)

where f˜1 and f˜2 are modeling errors and disturbances. Since the state and output variables are physically bounded, the modeling errors f˜1 and f˜2 can be assumed to be bounded too. The upper bounds of the modeling errors are

‖ f ˜ 1 ‖ Λ f 1 2 ⩽ η ¯ f 1 ⩽ ∞ , ‖ f ˜ 2 ‖ Λ f 2 2 ⩽ η ¯ f 2 ⩽ ∞ ,

(1.8)

where η‾f1 and η¯f2 are known positive matrices, and Λf1 and Λf2 are any positive definite matrices.

Now we calculate the following errors:

W 1 σ 1 ( z 1 ) − W 1 ⁎ σ 1 ( x 1 ) = [ W 1 σ 1 ( z 1 ) − W 1 ⁎ σ 1 ( z 1 ) ] + [ W 1 ⁎ σ 1 ( z 1 ) − W 1 ⁎ σ 1 ( x 1 ) ] = W ˜ 1 σ 1 ( z 1 ) + W 1 ⁎ [ σ 1 ( z 1 ) − σ 1 ( x 1 ) ] = W ˜ 1 σ 1 ( z 1 ) + W 1 ⁎ σ ˜ 1 ,

(1.9)

where W˜1=W1−W1⁎, σ˜1=σ1(z1)−σ1(x1). Similarly,

W 2 σ 2 ( z 2 ) − W 2 ⁎ σ 2 ( x 2 ) = W ˜ 2 σ 2 ( z 2 ) + W 2 ⁎ σ ˜ 2 , V 1 σ 1 ( z 1 ) u − V 1 ⁎ ϕ 1 ( x 1 ) u = V ˜ 1 σ 1 ( z 1 ) u + V 1 ⁎ ϕ ˜ 1 u , V 2 ϕ 2 ( z 2 ) z 1 − V 2 ⁎ ϕ 2 ( x 2 ) z 1 = V ˜ 2 σ 2 ( z 2 ) z 1 + V 2 ⁎ ϕ ˜ 2 z 1 .

Because σ(⋅) and ϕ(⋅) are chosen as sigmoid functions, they satisfy the following Lipschitz property:

σ ˜ 1 T Λ w σ ˜ 1 ⩽ Δ 1 T D σ 1 Δ 1 , ϕ ˜ 1 T Λ v ϕ ˜ 1 ⩽ Δ 1 T D ϕ 1 Δ 1 , σ ˜ 2 T Λ w σ ˜ 2 ⩽ Δ 2 T D σ 2 Δ 2 , ϕ ˜ 2 T Λ v ϕ ˜ 2 ⩽ Δ 2 T D ϕ 2 Δ 2 ,

(1.10)

where Λw, Λv, Dσ1, Dϕ1, Dσ2, and Dϕ2 are positive definite matrices.

1.3 Stable Training of the Hierarchical Dynamic Neural Networks

In order to obtain a stable training algorithm for the hierarchical dynamic neural networks (1.3), we calculate the error dynamics of the submodels. From (1.3) and (1.6), we have

Δ ˙ 1 = A 1 Δ 1 + W ˜ 1 σ 1 ( z 1 ) + V ˜ 1 ϕ 1 ( z 1 ) u + W 1 ⁎ σ ˜ 1 + V 1 ⁎ ϕ ˜ 1 u + f ˜ 1 , Δ ˙ 2 = A 2 Δ 2 + W ˜ 2 σ 2 ( z 2 ) + V ˜ 2 ϕ 2 ( z 2 ) z 1 + W 2 ⁎ σ ˜ 2 + V 2 ⁎ ϕ ˜ 2 z 1 + f ˜ 2 .

(1.11)

If the outputs of all blocks are available, we can train each block independently via the modeling errors between neural models and the corresponding process blocks, Δ1 and Δ2. Let us define

R 1 = W ¯ 1 + V ¯ 1 , Q 1 = ( D σ 1 + ‖ u ‖ D ϕ 1 + Q 10 )

(1.12)

and the matrices A1 and Q10 are selected to fulfill the following conditions:

(1) the pair (A1,R11/2) is controllable, the pair (Q11/2,A1) is observable;

(2) if the local frequency condition [1] is satisfied, i.e.,

A 1 T R 1 − 1 A 1 − Q 1 ⩾ 1 4 [ A 1 T R 1 − 1 − R 1 − 1 A 1 ] R 1 [ A 1 T R 1 − 1 − R 1 − 1 A 1 ] T ,

(1.13)

then the following assumption can be established.

A1: There exist a stable matrix A1 and a strictly positive definite matrix Q10 such that the matrix Riccati equation

A 1 T P 1 + P 1 A 1 + P 1 R 1 P 1 + Q 1 = 0

(1.14)

has a positive solution P1=P1T>0.

This condition is easily fulfilled if we select A1 as a stable diagonal matrix. The next theorem states the learning procedure of a neuroidentifier. Similarly, there exist a stable matrix A2 and a strictly positive definite matrix Q20 such that the matrix Riccati equation

A 2 T P 2 + P 2 A 2 + P 2 R 2 P 2 + Q 2 = 0 ,

(1.15)

where R2=W¯2+V¯2, Q2=(Dσ2+‖z1‖Dϕ2+Q20).

First, we may choose A1 and Q1 such that the Riccati equation (1.14) has a positive solution P1. Then Λf1 may be found according to the condition (1.8). Since (1.7) is correct for any positive definite matrix, (1.8) can be established if Λf1 is selected as a small enough constant matrix. The condition (1.8) has no effect on the network dynamics (1.3) and its training (1.16).

Theorem 1.1

If Assumption A1 is satisfied and the weights W1,t and W2,t are updated as

d d t ( W ˜ 2 T ) = − K w 2 P 2 Δ 2 σ 2 T ( z 2 ) , d d t ( V ˜ 2 T ) = − K v 2 P 2 Δ 2 [ ϕ 2 ( z 2 ) z 1 ] T , d d t ( W ˜ 1 T ) = − K w 1 P 1 Δ 1 σ 1 T ( z 1 ) , d d t ( V ˜ 1 T ) = − K v 1 P 1 Δ 1 [ ϕ 1 ( z 1 ) u ] T ,

(1.16)

where P1 and P1 are the solution of the Riccati equation (1.14) and (1.15), then the identification error dynamics (1.11) is strictly passive from the modeling error f˜1 and f˜2 to the identification errors 2P1Δ1 and 2P2Δ2 and the updating law (1.16) can make the identification procedure stable.

Proof

Select a Lyapunov function (storage function) as

S = Δ 1 T P 1 Δ 1 + t r { W ˜ 1 T K w 1 − 1 W ˜ 1 } + t r { V ˜ 1 T K v 1 − 1 V ˜ 1 } + Δ 2 T P 2 Δ 2 + t r { W ˜ 2 T K w 2 − 1 W ˜ 2 } + t r { V ˜ 2 T K v 2 − 1 V ˜ 2 } ,

(1.17)

where P∈ℜn×n is a positive definite matrix. According to (1.11), the derivative is

S ˙ = Δ 1 T ( P 1 A 1 + A 1 T P 1 ) Δ 1 + 2 Δ 1 T P 1 W ˜ 1 σ 1 ( z 1 ) + 2 Δ 1 T P 1 V ˜ 1 ϕ 1 ( z 1 ) u + Δ 2 T ( P 2 A 2 + A 2 T P 2 ) Δ 2 + 2 Δ 2 T P 2 W ˜ 2 σ 2 ( z 2 ) + 2 Δ 2 T P 2 V ˜ 2 ϕ 2 ( z 2 ) z 1 + 2 Δ 1 T P 1 ( W 1 ⁎ σ ˜ 1 + V 1 ⁎ ϕ ˜ 1 u + f ˜ 1 ) + 2 Δ 2 T P 2 ( W 2 ⁎ σ ˜ 2 + V 2 ⁎ ϕ ˜ 2 z 1 + f ˜ 2 ) + 2 t r { d d t ( W ˜ 1 T ) K w 1 − 1 W ˜ 1 } + 2 t r { d d t ( W ˜ 2 T ) K w 2 − 1 W ˜ 2 } + 2 t r { d d t ( V ˜ 1 T ) K v 1 − 1 V ˜ 1 } + 2 t r { d d t ( V ˜ 2 T ) K v 2 − 1 V ˜ 2 } .

Since 2Δ1TP1W1⁎σ˜1 is a scalar, using (1.10) and the matrix inequality

X T Y + ( X T Y ) T ⩽ X T Λ − 1 X + Y T Λ Y ,

(1.18)

where X,Y,Λ∈ℜn×k are any matrices and Λ is any positive definite matrix, we obtain

2 Δ 1 T P 1 W 1 ⁎ σ ˜ 1 ⩽ Δ 1 T P 1 W 1 ⁎ Λ w − 1 W 1 ⁎ T P 1 Δ 1 + σ ˜ 1 T Λ w σ ˜ 1 ⩽ Δ 1 T ( P 1 W ¯ 1 P 1 + D σ 1 ) Δ 1 .

(1.19)

Similarly,

2 Δ 2 T P 2 W 2 ⁎ σ ˜ 2 ⩽ Δ 2 T ( P 2 W ¯ 2 P 2 + D σ 2 ) Δ 2 , 2 Δ 1 T P 1 V 1 ⁎ ϕ ˜ 1 u ⩽ Δ 1 T ( P 1 V ¯ 1 P 1 + ‖ u ‖ D ϕ 1 ) Δ 1 , 2 Δ 2 T P 2 V 2 ⁎ ϕ ˜ 2 z 1 ⩽ Δ 2 T ( P 2 V ¯ 2 P 2 + ‖ z 1 ‖ D ϕ 2 ) Δ 2 .

So we have

S ˙ ⩽ Δ 1 T [ P 1 A 1 + A 1 T P 1 + P 1 ( W ¯ 1 + V ¯ 1 ) P 1 + ( D σ 1 + ‖ u ‖ D ϕ 1 + Q 10 ) ] Δ 1 − Δ 1 T Q 10 Δ 1 + Δ 2 T [ P 2 A 2 + A 2 T P 2 + P 2 ( W ¯ 2 + V ¯ 2 ) P 2 + ( D σ 3 + ‖ z 1 ‖ D ϕ 2 + Q 20 ) ] Δ 2 − Δ 2 T Q 20 Δ 2 + 2 t r { d d t ( W ˜ 2 T ) K w 2 − 1 W ˜ 2 } + 2 Δ 2 T P 2 W ˜ 2 σ 2 ( z 2 ) + 2 Δ 2 T P 2 f ˜ 2 + 2 t r { d d t ( V ˜ 2 T ) K v 2 − 1 V ˜ 2 } + 2 Δ 2 T P 2 V ˜ 2 ϕ 2 ( z 2 ) z 1 + 2 t r { d d t ( W ˜ 1 T ) K w 1 − 1 W ˜ 1 } + 2 Δ 1 T P 1 W ˜ 1 σ 1 ( z 1 ) + 2 Δ 1 T P 1 f ˜ 1 + 2 t r { d d t ( V ˜ 1 T ) K v 1 − 1 V ˜ 1 } + 2 Δ 1 T P 1 V ˜ 1 ϕ 1 ( z 1 ) u .

Using (1.14), (1.15), (1.16), and ddt(W˜1T)=ddtW˜1,

S ˙ ⩽ − Δ 1 T Q 10 Δ 1 + 2 Δ 1 T P 1 f ˜ 1 − Δ 2 T Q 20 Δ 2 + 2 Δ 2 T P 2 f ˜ 2 .

(1.20)

From the passivity definition, if we define the inputs as f˜1 and f˜2 and the outputs as 2P1Δ1 and 2P2Δ2, then the system is strictly passive with Δ1TQ10Δ1⩾0 and Δ2TQ20Δ2.

In view of the matrix inequality (1.18),

2 Δ 1 T P 1 f ˜ 1 ⩽ Δ 1 T P 1 Λ f 1 P 1 Δ 1 + f ˜ 1 T Λ f 1 − 1 f ˜ 1 , 2 Δ 2 T P 2 f ˜ 2 ⩽ Δ 2 T P 2 Λ f 2 P 2 Δ 2 + f ˜ 2 T Λ f 2 − 1 f ˜ 2 ,

(1.21)

(1.20) can be represented as

S ˙ t ⩽ − λ min ( Q 10 ) ‖ Δ 1 ‖ 2 − λ min ( Q 20 ) ‖ Δ 2 ‖ 2 + Δ 1 T P 1 Λ f 1 P 1 Δ 1 + f ˜ 1 T Λ f 1 − 1 f ˜ 1 + Δ 2 T P 2 Λ f 2 P 2 Δ 2 + f ˜ 2 T Λ f 2 − 1 f ˜ 2 ⩽ ( − α ‖ Δ 1 ‖ ‖ Δ 1 ‖ + β ‖ f ˜ 1 ‖ ‖ f ˜ 1 ‖ ) + ( − α ‖ Δ 2 ‖ ‖ Δ 2 ‖ + β ‖ f ˜ 2 ‖ ‖ f ˜ 2 ‖ ) ,

where α‖Δ1‖=[λmin(Q10)−λmax(P1Λf1P1)]‖Δ1‖, β‖f˜1‖=λmax(Λf1−1)‖f˜1‖, α‖Δ2‖=[λmin(Q20)−λmax(P2Λf2P2)]‖Δ2‖, and β‖f˜2‖=λmax(Λf2−1)‖f˜2‖. We can select positive definite matrices Λf1 and Λf2 such that (1.8) is established. So α‖Δ1‖, α‖Δ2‖, β‖f˜1‖, and β‖f˜2‖ are K∞ functions and St is an input-to-state stable (ISS)-Lyapunov function. The dynamics of identification error (1.11) is ISS. So when the modeling errors f˜1 and f˜2 are bounded, the updating law (1.16) can make the modeling errors stable, i.e.,

Δ 1 ∈ L ∞ , Δ 2 ∈ L ∞ .

□

Since the updating rates in (1.16) are KiPj, and Ki can be selected as any positive matrix, the learning process of the dynamic neural network is free of the solution of the Riccati equation (1.14).

Theorem 1.2

If the modeling errors f˜1=f˜2=0 (only uncertainty parameters present), then the updating law (1.16) can make the identification error asymptotically stable, i.e.,

lim t → ∞ ⁡ Δ 1 = 0 , lim t → ∞ ⁡ Δ 2 = 0 .

(1.22)

Proof

The modeling errors f˜1=f˜2=0, 2Δ1TP1f˜1=2Δ2TP2f˜2=0, and the storage function (1.20) satisfies

S ˙ ⩽ − Δ 1 T Q 10 Δ 1 − Δ 2 T Q 20 Δ 2 ⩽ 0 .

The positive definite S(xt) implies Δ1, Δ2 and the weights are bounded. From the error equation (1.11) Δ˙1∈L∞, Δ˙2∈L∞. Integrate (1.20) on both sides to obtain

∫ 0 ∞ ‖ Δ 1 ‖ Q 10 + ‖ Δ 2 ‖ Q 20 ⩽ S 0 − S ∞ < ∞ .

So Δ1∈L2∩L∞, Δ2∈L2∩L∞, and using Barbalat's lemma, we have (1.22). Since u, σ, ϕ, and P are bounded, limt→∞⁡Δ1=0, limt→∞⁡Δ2=0. □

For many cascade systems the outputs in the internal blocks are not measurable, for example Δ1. The modeling error in the final block Δ2 should be propagated to the other blocks, i.e., we should calculate the internal modeling error Δ1 from Δ2.

From (1.3) and (1.6), the last block can be written as

z ˙ 2 = A 2 z 2 + W 2 σ 2 ( z 2 ) + V 2 ϕ 2 ( z 2 ) z 1 , x ˙ 2 = A 2 x 2 + W 2 σ 2 ( x 2 ) + V 2 ϕ 2 ( z 2 ) x 1 − f ˜ 3 ,

(1.23)

where f˜3 is the modeling error corresponding to the weights W and V2. So

Δ ˙ 2 = A 2 Δ 2 + W 2 σ ˜ 2 + V 2 ϕ 2 ( z 2 ) Δ 1 + f ˜ 3 .

So Δ1 is approximated by

Δ 1 = [ V 2 ϕ 2 ( z 2 ) ] − 1 [ Δ ˙ 2 − A 2 Δ 2 − W 2 σ ˜ 2 − f ˜ 3 ] .

In Fig. 1.1 we note fˆ3 is the difference between the block “f2,g2” and the block “A2,W2,V2.” So fˆ3 can be estimated as

f ˜ 3 ≈ z ˙ 2 − Δ ˙ 2 .

Here

z ˙ 2 = z 2 ( t ) − z 2 ( t − τ ) τ + δ ( t ) , τ > 0 ,

where δ(t) is the differential approximation error. So

f ˜ 3 ≈ f ˆ 3 = z 2 ( t ) − z 2 ( t − τ ) τ − Δ 2 ( t ) − Δ 2 ( t − τ ) τ .

The internal modeling error is approximated by

Δ ˆ 1 = [ V 2 ϕ 2 ( z 2 ) ] − 1 × [ 2 Δ 2 ( t ) − Δ 2 ( t − τ ) τ − z 2 ( t ) − z 2 ( t − τ ) τ − A 2 Δ 2 − W 2 [ σ 2 ( z 2 ) − σ 2 ( x 2 ) ] ] .

(1.24)

In order to ensure Δˆ1 is bounded, we use (1.24) when ‖V2ϕ2(z2)‖>τ. Otherwise, we use Δ2 to represent Δ1, i.e., Δˆ1=Δ2.

Although the gradient algorithm (1.16) can ensure the modeling errors Δ1 and Δ2 are bounded (Theorem 1.1), the structure uncertainties f˜1 and f˜2 will cause the parameters drift for the gradient algorithm (1.16). Some robust modification should be applied to make the parameters (weights) stable. In order to guarantee the overall models are stable, we use the following dead-zone training algorithm.

Theorem 1.3

The weights are adjusted as follows:

(a) if ‖Δ1‖2>η¯f1λmin(Q1) and ‖Δ2‖2>η¯f2λmin(Q2), then the updating law is given by (1.16);

(b) if ‖Δ1‖2⩽η¯f1λmin(Q1) or ‖Δ2‖2>η¯f2λmin(Q2), then we stop the learning procedure (all right-hand sides of the corresponding system of differential equations are equal to zero) and maintain all weights constant; then, besides the modeling errors being bounded, the weight matrices also remain bounded and for any T>0 the identification error fulfills the following tracking performance:

lim T → ∞ ⁡ 1 T ∫ 0 T ( ‖ Δ 1 ‖ Q 1 2 + ‖ Δ 2 ‖ Q 2 2 ) d t ⩽ κ 1 η ¯ f 1 + κ 2 η ¯ f 2 ,

(1.25)

where κ is the condition number of Q defined as κ1=λmax(Q1)λmin(Q1), κ2=λmax(Q2)λmin(Q2).

Proof

From (1.21), (1.14), and (1.15), (1.20) can be rewritten as

S ˙ t ⩽ − Δ 1 T Q 1 Δ 1 − Δ 2 T Q 2 Δ 2 + f ˜ 1 T Λ f 1 − 1 f ˜ 1 + f ˜ 2 T Λ f 2 − 1 f ˜ 2 ⩽ − Δ 1 T Q 1 Δ 1 − Δ 2 T Q 2 Δ 2 + η ¯ f 1 + η ¯ f 2 .

(1.26)

(I) If ‖Δ1‖2>η¯f1λmin(Q1) and ‖Δ2‖2>η¯f2λmin(Q2), using the updating law (1.16) we conclude that S˙t<0. St is bounded. Integrating (1.26) from 0 to T yields

S T − S 0 ⩽ − ∫ 0 T ( Δ 1 T Q 1 Δ 1 + Δ 2 T Q 2 Δ 2 ) d t + ( η ¯ f 1 + η ¯ f 2 ) T .

Because κ⩾1 and ST⩾0, we have

∫ 0 T ( Δ 1 T Q 1 Δ 1 + Δ 2 T Q 2 Δ 2 ) d t ⩽ S 0 + ( η ¯ f 1 + η ¯ f 2 ) T ⩽ S 0 + ( κ 1 η ¯ f 1 + κ 2 η ¯ f 2 ) T ,

(1.27)

where κ1 is the condition number of Q1.

(II) If ‖Δ1‖2>η¯f1λmin(Q1) or ‖Δ2‖2>η¯f2λmin(Q2), the weights become constants and St remains bounded. Since S0⩾0,

∫ 0 T ( Δ 1 T Q 1 Δ 1 + Δ 2 T Q 2 Δ 2 ) d t ⩽ ∫ 0 T ( λ max ( Q 1 ) ‖ Δ 1 ‖ 2 + λ max ( Q 2 ) ‖ Δ 2 ‖ 2 ) d t ⩽ ( λ max ( Q 1 ) λ min ( Q 1 ) η ¯ f 1 + λ max ( Q 2 ) λ min ( Q 2 ) η ¯ f 2 ) T ⩽ S 0 + ( κ 1 η ¯ f 1 + κ 2 η ¯ f 2 ) T .

(1.28)

From (I) and (II), St is bounded. Because W1,0=W1⁎, W2,0=W2⁎, V1,0=V1⁎ and V2,0=V2⁎. From (1.27) and (1.28), (1.25) is obtained. The theorem is proved. □

1.4 Modeling of Wastewater Treatment

The wastewater treatment plant studied in this chapter is an anoxic/oxidant nitrogenous removal process [10]. It consists of two biodegradation tanks and a secondary clarifier in series form; see Fig. 1.2. Here, Qin,Qm,Qr,QR, and Qw denote flow of wastewater to be disposed, flow of mixed influent, flow of internal recycles, flow of external recycles, and flow of surplus sludge, respectively. Water quality indices such as chemical oxygen demand (COD), biological oxygen demand (BOD₅), NH₄-N (ammonia), nitrate, and suspended solid (SS) are decomposed into those components in activated sludge models (ASMs) [4]. The state variable x is defined as

x = [ S I , S S , X I , X S , X P , X B H , X B A , S N O , S N H , S O , S N D , X N D , S a l k ] T ,

(1.29)

where SI is soluble inert and SS is readily biodegradable substrate, XI is suspended inert and XS is slowly biodegradable substrate, XP is suspended inert products, XBH is autotrophic biomass, XBA is heterotrophic biomass, SNO is nitrate, SNH is ammonia, SO is soluble oxygen, SND is soluble organic nitrogen, XND is suspended organic nitrogen, and Salk is alkalinity. In this tank, there are three major reaction processes, i.e.,

N H 4 + + 1.5 O 2 → N O 2 − + H 2 O + 2 H + , N O 2 − + 0.5 O 2 → N O 3 − , C O D + O 2 → C O 2 + H 2 O + A S ,

(1.30)

where COD represents carbonous contamination and AS denotes activated sludge. Nitrite is recycled from the aerobic tanks. It is deoxidized by autotrophic microorganisms in the denitrification phase by the following reaction:

2 N O 3 − + 2 H + → N 2 + H 2 O + 2.5 O 2 .

(1.31)

The water quality index COD depends on the control input

w = [ w i ] = [ Q i n , Q r , Q R , Q w ] T

(1.32)

and on the influent quality xin, i.e.,

C O D = f ( Q i n , Q r , Q R , Q w , x i n ) .

COD is also affected by other external factors such as temperature, flow distribution, and toxins. It is very difficult to find the nonlinear function f(⋅) [13].

We know each biological reactor in wastewater treatment plants can be described by the following dynamic equation:

x ˙ ( t ) = A x ( t ) + B x b ( t ) + φ ( x ( t ) ) ,

(1.33)

where x∈R13 is the inner state, which is defined in (1.29), xb(t)∈R4 is the input, which is defined in (1.32), φ denotes the reaction rates, φ(⋅)∈R13, A=−w1(t)+w2(t)+w3(t)V, and B=w1(t)+w2(t)+w3(t)V.

The resulting steady-state values of anoxic and aerobic reactors are shown in Table 1.1. Used data are from 2003.

Table 1.1

Steady-state values of anoxic and aerobic reactors.

	S _S	X _BH	X _S	X _I	S _NH	S _I	S _ND	X _ND	S _O	X _BA	S _NO	X _P	S _alk
Anoxic	1.2518	3249	74.332	642.4	7.9157	38.374	0.7868	5.7073	0.0001	220.86	3.9377	822.19	4.9261
Aerobic	0.6867	3244.8	47.392	643.36	0.1896	38.374	0.6109	3.7642	1.4988	222.39	12.819	825.79	3.7399

As discussed above, hierarchical dynamic neural networks are suitable for modeling this process. Each neural network corresponds to a reaction rate in a reactor, i.e.,

z ˙ 1 = A 1 z 1 + G 1 σ 1 ( z 1 ) + H 1 x u , z ˙ 2 = A 2 z 2 + G 2 σ 2 ( z 2 ) + H 2 ϕ 2 ( z 2 ) z 1 .

The design choices are as follows: both neural networks have one hidden layer, each hidden layer has 50 hidden nodes. The training algorithms of each neural model are (1.16); here the activation function ϕi(⋅)=tanh⁡(x)=ex−e−xex+e−x, η0=1, and the initial weights of W(1) and V(1) are random numbers between [0,1].

Dynamic modeling uses the steady-state values resulting from steady-state simulations as initial values with a hydraulic residence time of 10.8 h and a sludge age of 15 days. A total of 100 input/output data pairs from the records of 2003 are used as the training data, the other 30 input/output pairs as the testing data. The testing results of effluent COD are shown in Fig. 1.3.

Fig. 1.3 The results using hierarchical dynamic neural networks.

We compare the hierarchical neural networks (HNNs) with the other three modeling methods, i.e., ASMs [4], linear models (LMs) [7], and neural networks (NNs) [6]. The model parameters of the ASM are the default values in [4]. The numbers of concerned variables in linear models are selected 2, 3, and 4. The hidden nodes of NNs are chosen as 30, 50, and 70; they are the same as those in the HNN. The initial values for all weights are chosen randomly from the interval (0,1). The comparison results are shown in Table 1.2, where the root mean square (RMS) of the HNN refers to the summation of errors in the final output.

Table 1.2

Comparison results of an activated sludge model (ASM), linear model (LM), neural network (NN), and hierarchical neural network (HNN). RMS, root mean square.

Network	ASM	LM		NN			HNN
Case	–	2	4	30	50	70	10	70
Parameters	19	3	5	60	100	140	140	980
Training	148	0.1	0.21	3.8	5.28	5.23	2.73	36.81
RMS	37	11.99	11.54	24.39	8.45	10.64	11.81	8.62

The wastewater treatment process suffers from external disturbances such as temperature, influent qualities, influent flow, operational status, and internal factors like microorganism activities. Both the NN and the HNN can achieve high modeling accuracy. The LM and the NN can provide the water quality in the aerobic reactor, while the ASM and the HNN give water qualities in both anoxic and aerobic reactors.

1.5 Conclusions

The main contribution of this chapter is that a new hierarchical model is proposed. This hierarchical dynamic neural network is effective for cascade process modeling. Two stable training algorithms are discussed for this model. A new estimation method for the internal variables of the cascade process is illustrated. Real data of a wastewater treatment plant are applied to illustrate the modeling approach.