Hierarchical Dynamic Neural Networks for Cascade System Modeling With Application to Wastewater Treatment

Wen Yu, DScDaniel Carrillo, DSc    

Keywords

Hierarchical neural networks; Cascade system; Wastewater treatment

1.1 Introduction

The input–output relation within a cascade process is very complex. It usually can be described by several nonlinear subsystems, as is for example the case with the cascade process of wastewater treatment. Obviously, the first block and the last block cannot represent the whole process. Hierarchical models can be used to model this problem. When the cascade process is unknown, only the input and output data are available. Black-box modeling techniques are needed. Also, the internal variables of the cascade process need to be estimated.

There are three different approaches that can be used to model a cascade process. If the input/output data in each subblock are available, each model is identified independently. If the internal variables are not measurable, a general method is to regard the whole process as one block and to use one model to identify it [6,3,19]. Another method is to use hierarchical models to identify cascade processes. Advantages of this approach are that the cascade information is used for identification and that the internal variable can be estimated. In [2], discrete-time feedforward neural networks are applied to approximate the uncertainty parts of the cascade system.

Neural networks can approximate any nonlinear function to any prescribed accuracy provided a sufficient number of hidden neurons can be incorporated. Hierarchical neural models consisting of a number of low-dimensional neural systems have been presented by [9] and [11] in order to avoid the dimension explosion problem. The main applications of hierarchical models are fuzzy systems, because rule explosion problem can be avoided in hierarchical systems [9], for example, in hierarchical fuzzy neural network [17], hierarchical fuzzy systems [12], and hierarchical fuzzy cerebellar model articulation controller (CMAC) networks [15]. Sensitivity analysis of the hierarchical fuzzy model was given in [12]. A statistical learning method was employed to construct hierarchical models in [3]. Based on Kolmogorov's theorem, [18] showed that any continuous function can be represented as a superposition of functions with the natural hierarchical structure. In [16], fuzzy CMAC networks are formed into a hierarchical structure.

The normal training method of hierarchical neural systems is still gradient descent. The key for the training of hierarchical neural models is to get an explicit expression of each internal error. Normal identification algorithms (gradient descent, least square, etc.) are stable under ideal conditions. They might become unstable in the presence of unmodeled dynamics. The Lyapunov approach can be used directly to obtain robust training algorithms of continuous-time and discrete-time neural networks. By using passivity theory, [5], [8], and [14] successfully proved that gradient descent algorithms of continuous-time dynamic neural networks were stable and robust to any bounded uncertainties.

The main problem for the training of a hierarchical neural model is the estimation of the internal variable. In this chapter, a hierarchical dynamic neural network is applied to model wastewater treatment. Two stable training algorithms are proposed. A novel approximate method for the internal variable of the cascade process is discussed. Real application results show that the new modeling approach is effective for this cascade process.

1.2 Cascade Process Modeling Via Hierarchical Dynamic Neural Networks

Each subprocess of a cascade process, such as wastewater treatment, can be described using the following general nonlinear dynamic equation:

x ˙ = f ( x , u ) ,

where xRncImage is the inner state, uRmcImage is the input, and f is a vector function. Without loss of generality, we use two nonlinear affine systems to show how to use the hierarchical dynamic neural networks to model the system; see Fig. 1.1. The identified cascade nonlinear systems are given by

x ˙ 1 = f 1 ( x 1 ) + g 1 ( x 1 ) u , x ˙ 2 = f 2 ( x 2 ) + g 2 ( x 2 ) x 1 ,

where x1,x2RnImage are the inner states of the subsystems, and f1Image, f2Image, g1Image, and g2Image are unknown vector functions; x1Image can be also regarded as the output of subsystem 1 and as the input of subsystem 2; uRImage is the input of the whole system, and also the input of subsystem 1; x2Image is the output of the whole system, and also the output of subsystem 2.

Image
Fig. 1.1 Cascade system modeling via hierarchical recurrent neural networks.

Only u and x2Image are available for the cascade process modeling. Since the internal variables are not measurable, a general method is to regard the whole process as one block and to use one model to identify it. Another method is to use hierarchical models to identify cascade processes. The advantages of this approach are that the cascade information is used for identification and that the internal variable can be estimated. In Section 1.3 we will show how to approximate it. In many wastewater treatment plants, x1Image is sampled occasionally. We can use this real value to improve the modeling accuracy.

We construct the following hierarchical dynamic neural networks to model (1.2):

z ˙ 1 = A 1 z 1 + W 1 σ 1 ( z 1 ) + V 1 ϕ 1 ( z 1 ) u , z ˙ 2 = A 2 z 2 + W 2 σ 2 ( z 2 ) + V 2 ϕ 2 ( z 2 ) z 1 ,

where z1,z2RnImage are the states of the neural models, and W1Image, W2Image, V1Image, and V2Image are the weights of the neural networks; A1Image and A2Image are known stable matrices. The active functions of σ()Image and ϕ()Image are used as sigmoid functions, i.e.,

σ ( z ) = a 1 + e b z c .

From Fig. 1.1, the modeling error is

Δ 2 = z 2 x 2 .

The internal modeling error is

Δ 1 = z 1 x 1 .

This will be estimated.

Generally, the hierarchical dynamic neural networks (1.3) cannot follow the nonlinear system (1.2) exactly. The nonlinear cascade system may be written as

x ˙ 1 = A 1 x 1 + W 1 σ 1 ( x 1 ) + V 1 ϕ 1 ( x 1 ) u f ˜ 1 , x ˙ 2 = A 2 x 2 + W 2 σ 2 ( x 2 ) + V 2 ϕ 2 ( x 2 ) z 1 f ˜ 2 ,

where W1Image, W2Image, V1Image, and V2Image are unknown bounded matrices. We assume the upper bounds, W¯1Image, W¯2Image, V¯1Image, and V¯2Image, are known as

W 1 Λ w 1 W 1 T W ¯ 1 , W 2 Λ w 1 W 2 T W ¯ 2 , Λ w = Λ w T > 0 , V 1 Λ v 1 V 1 T V ¯ 1 , W 2 Λ v 1 W 2 T V ¯ 2 , Λ v = Λ v T > 0 ,

where f˜1Image and f˜2Image are modeling errors and disturbances. Since the state and output variables are physically bounded, the modeling errors f˜1Image and f˜2Image can be assumed to be bounded too. The upper bounds of the modeling errors are

f ˜ 1 Λ f 1 2 η ¯ f 1 , f ˜ 2 Λ f 2 2 η ¯ f 2 ,

where ηf1Image and η¯f2Image are known positive matrices, and Λf1Image and Λf2Image are any positive definite matrices.

Now we calculate the following errors:

W 1 σ 1 ( z 1 ) W 1 σ 1 ( x 1 ) = [ W 1 σ 1 ( z 1 ) W 1 σ 1 ( z 1 ) ] + [ W 1 σ 1 ( z 1 ) W 1 σ 1 ( x 1 ) ] = W ˜ 1 σ 1 ( z 1 ) + W 1 [ σ 1 ( z 1 ) σ 1 ( x 1 ) ] = W ˜ 1 σ 1 ( z 1 ) + W 1 σ ˜ 1 ,

where W˜1=W1W1Image, σ˜1=σ1(z1)σ1(x1)Image. Similarly,

W 2 σ 2 ( z 2 ) W 2 σ 2 ( x 2 ) = W ˜ 2 σ 2 ( z 2 ) + W 2 σ ˜ 2 , V 1 σ 1 ( z 1 ) u V 1 ϕ 1 ( x 1 ) u = V ˜ 1 σ 1 ( z 1 ) u + V 1 ϕ ˜ 1 u , V 2 ϕ 2 ( z 2 ) z 1 V 2 ϕ 2 ( x 2 ) z 1 = V ˜ 2 σ 2 ( z 2 ) z 1 + V 2 ϕ ˜ 2 z 1 .

Because σ()Image and ϕ()Image are chosen as sigmoid functions, they satisfy the following Lipschitz property:

σ ˜ 1 T Λ w σ ˜ 1 Δ 1 T D σ 1 Δ 1 , ϕ ˜ 1 T Λ v ϕ ˜ 1 Δ 1 T D ϕ 1 Δ 1 , σ ˜ 2 T Λ w σ ˜ 2 Δ 2 T D σ 2 Δ 2 , ϕ ˜ 2 T Λ v ϕ ˜ 2 Δ 2 T D ϕ 2 Δ 2 ,

where ΛwImage, ΛvImage, Dσ1Image, Dϕ1Image, Dσ2Image, and Dϕ2Image are positive definite matrices.

1.3 Stable Training of the Hierarchical Dynamic Neural Networks

In order to obtain a stable training algorithm for the hierarchical dynamic neural networks (1.3), we calculate the error dynamics of the submodels. From (1.3) and (1.6), we have

Δ ˙ 1 = A 1 Δ 1 + W ˜ 1 σ 1 ( z 1 ) + V ˜ 1 ϕ 1 ( z 1 ) u + W 1 σ ˜ 1 + V 1 ϕ ˜ 1 u + f ˜ 1 , Δ ˙ 2 = A 2 Δ 2 + W ˜ 2 σ 2 ( z 2 ) + V ˜ 2 ϕ 2 ( z 2 ) z 1 + W 2 σ ˜ 2 + V 2 ϕ ˜ 2 z 1 + f ˜ 2 .

If the outputs of all blocks are available, we can train each block independently via the modeling errors between neural models and the corresponding process blocks, Δ1Image and Δ2Image. Let us define

R 1 = W ¯ 1 + V ¯ 1 , Q 1 = ( D σ 1 + u D ϕ 1 + Q 10 )

and the matrices A1Image and Q10Image are selected to fulfill the following conditions:

(1) the pair (A1,R11/2Image) is controllable, the pair (Q11/2,A1Image) is observable;

(2) if the local frequency condition [1] is satisfied, i.e.,

A 1 T R 1 1 A 1 Q 1 1 4 [ A 1 T R 1 1 R 1 1 A 1 ] R 1 [ A 1 T R 1 1 R 1 1 A 1 ] T ,

then the following assumption can be established.

A1: There exist a stable matrix A1Image and a strictly positive definite matrix Q10Image such that the matrix Riccati equation

A 1 T P 1 + P 1 A 1 + P 1 R 1 P 1 + Q 1 = 0

has a positive solution P1=P1T>0Image.

This condition is easily fulfilled if we select A1Image as a stable diagonal matrix. The next theorem states the learning procedure of a neuroidentifier. Similarly, there exist a stable matrix A2Image and a strictly positive definite matrix Q20Image such that the matrix Riccati equation

A 2 T P 2 + P 2 A 2 + P 2 R 2 P 2 + Q 2 = 0 ,

where R2=W¯2+V¯2Image, Q2=(Dσ2+z1Dϕ2+Q20)Image.

First, we may choose A1Image and Q1Image such that the Riccati equation (1.14) has a positive solution P1Image. Then Λf1Image may be found according to the condition (1.8). Since (1.7) is correct for any positive definite matrix, (1.8) can be established if Λf1Image is selected as a small enough constant matrix. The condition (1.8) has no effect on the network dynamics (1.3) and its training (1.16).

Theorem 1.1

If Assumption A1 is satisfied and the weights W1,tImage and W2,tImage are updated as

d d t ( W ˜ 2 T ) = K w 2 P 2 Δ 2 σ 2 T ( z 2 ) , d d t ( V ˜ 2 T ) = K v 2 P 2 Δ 2 [ ϕ 2 ( z 2 ) z 1 ] T , d d t ( W ˜ 1 T ) = K w 1 P 1 Δ 1 σ 1 T ( z 1 ) , d d t ( V ˜ 1 T ) = K v 1 P 1 Δ 1 [ ϕ 1 ( z 1 ) u ] T ,

where P1Image and P1Image are the solution of the Riccati equation (1.14) and (1.15), then the identification error dynamics (1.11) is strictly passive from the modeling error f˜1Image and f˜2Image to the identification errors 2P1Δ1Image and 2P2Δ2Image and the updating law (1.16) can make the identification procedure stable.

Proof

Select a Lyapunov function (storage function) as

S = Δ 1 T P 1 Δ 1 + t r { W ˜ 1 T K w 1 1 W ˜ 1 } + t r { V ˜ 1 T K v 1 1 V ˜ 1 } + Δ 2 T P 2 Δ 2 + t r { W ˜ 2 T K w 2 1 W ˜ 2 } + t r { V ˜ 2 T K v 2 1 V ˜ 2 } ,

where Pn×nImage is a positive definite matrix. According to (1.11), the derivative is

S ˙ = Δ 1 T ( P 1 A 1 + A 1 T P 1 ) Δ 1 + 2 Δ 1 T P 1 W ˜ 1 σ 1 ( z 1 ) + 2 Δ 1 T P 1 V ˜ 1 ϕ 1 ( z 1 ) u + Δ 2 T ( P 2 A 2 + A 2 T P 2 ) Δ 2 + 2 Δ 2 T P 2 W ˜ 2 σ 2 ( z 2 ) + 2 Δ 2 T P 2 V ˜ 2 ϕ 2 ( z 2 ) z 1 + 2 Δ 1 T P 1 ( W 1 σ ˜ 1 + V 1 ϕ ˜ 1 u + f ˜ 1 ) + 2 Δ 2 T P 2 ( W 2 σ ˜ 2 + V 2 ϕ ˜ 2 z 1 + f ˜ 2 ) + 2 t r { d d t ( W ˜ 1 T ) K w 1 1 W ˜ 1 } + 2 t r { d d t ( W ˜ 2 T ) K w 2 1 W ˜ 2 } + 2 t r { d d t ( V ˜ 1 T ) K v 1 1 V ˜ 1 } + 2 t r { d d t ( V ˜ 2 T ) K v 2 1 V ˜ 2 } .

Since 2Δ1TP1W1σ˜1Image is a scalar, using (1.10) and the matrix inequality

X T Y + ( X T Y ) T X T Λ 1 X + Y T Λ Y ,

where X,Y,Λn×kImage are any matrices and Λ is any positive definite matrix, we obtain

2 Δ 1 T P 1 W 1 σ ˜ 1 Δ 1 T P 1 W 1 Λ w 1 W 1 T P 1 Δ 1 + σ ˜ 1 T Λ w σ ˜ 1 Δ 1 T ( P 1 W ¯ 1 P 1 + D σ 1 ) Δ 1 .

Similarly,

2 Δ 2 T P 2 W 2 σ ˜ 2 Δ 2 T ( P 2 W ¯ 2 P 2 + D σ 2 ) Δ 2 , 2 Δ 1 T P 1 V 1 ϕ ˜ 1 u Δ 1 T ( P 1 V ¯ 1 P 1 + u D ϕ 1 ) Δ 1 , 2 Δ 2 T P 2 V 2 ϕ ˜ 2 z 1 Δ 2 T ( P 2 V ¯ 2 P 2 + z 1 D ϕ 2 ) Δ 2 .

So we have

S ˙ Δ 1 T [ P 1 A 1 + A 1 T P 1 + P 1 ( W ¯ 1 + V ¯ 1 ) P 1 + ( D σ 1 + u D ϕ 1 + Q 10 ) ] Δ 1 Δ 1 T Q 10 Δ 1 + Δ 2 T [ P 2 A 2 + A 2 T P 2 + P 2 ( W ¯ 2 + V ¯ 2 ) P 2 + ( D σ 3 + z 1 D ϕ 2 + Q 20 ) ] Δ 2 Δ 2 T Q 20 Δ 2 + 2 t r { d d t ( W ˜ 2 T ) K w 2 1 W ˜ 2 } + 2 Δ 2 T P 2 W ˜ 2 σ 2 ( z 2 ) + 2 Δ 2 T P 2 f ˜ 2 + 2 t r { d d t ( V ˜ 2 T ) K v 2 1 V ˜ 2 } + 2 Δ 2 T P 2 V ˜ 2 ϕ 2 ( z 2 ) z 1 + 2 t r { d d t ( W ˜ 1 T ) K w 1 1 W ˜ 1 } + 2 Δ 1 T P 1 W ˜ 1 σ 1 ( z 1 ) + 2 Δ 1 T P 1 f ˜ 1 + 2 t r { d d t ( V ˜ 1 T ) K v 1 1 V ˜ 1 } + 2 Δ 1 T P 1 V ˜ 1 ϕ 1 ( z 1 ) u .

Using (1.14), (1.15), (1.16), and ddt(W˜1T)=ddtW˜1Image,

S ˙ Δ 1 T Q 10 Δ 1 + 2 Δ 1 T P 1 f ˜ 1 Δ 2 T Q 20 Δ 2 + 2 Δ 2 T P 2 f ˜ 2 .

From the passivity definition, if we define the inputs as f˜1Image and f˜2Image and the outputs as 2P1Δ1Image and 2P2Δ2Image, then the system is strictly passive with Δ1TQ10Δ10Image and Δ2TQ20Δ2.Image

In view of the matrix inequality (1.18),

2 Δ 1 T P 1 f ˜ 1 Δ 1 T P 1 Λ f 1 P 1 Δ 1 + f ˜ 1 T Λ f 1 1 f ˜ 1 , 2 Δ 2 T P 2 f ˜ 2 Δ 2 T P 2 Λ f 2 P 2 Δ 2 + f ˜ 2 T Λ f 2 1 f ˜ 2 ,

(1.20) can be represented as

S ˙ t λ min ( Q 10 ) Δ 1 2 λ min ( Q 20 ) Δ 2 2 + Δ 1 T P 1 Λ f 1 P 1 Δ 1 + f ˜ 1 T Λ f 1 1 f ˜ 1 + Δ 2 T P 2 Λ f 2 P 2 Δ 2 + f ˜ 2 T Λ f 2 1 f ˜ 2 ( α Δ 1 Δ 1 + β f ˜ 1 f ˜ 1 ) + ( α Δ 2 Δ 2 + β f ˜ 2 f ˜ 2 ) ,

where αΔ1=[λmin(Q10)λmax(P1Λf1P1)]Δ1Image, βf˜1=λmax(Λf11)f˜1Image, αΔ2=[λmin(Q20)λmax(P2Λf2P2)]Δ2Image, and βf˜2=λmax(Λf21)f˜2Image. We can select positive definite matrices Λf1Image and Λf2Image such that (1.8) is established. So αΔ1Image, αΔ2Image, βf˜1Image, and βf˜2Image are KImage functions and StImage is an input-to-state stable (ISS)-Lyapunov function. The dynamics of identification error (1.11) is ISS. So when the modeling errors f˜1Image and f˜2Image are bounded, the updating law (1.16) can make the modeling errors stable, i.e.,

Δ 1 L , Δ 2 L .

 □

Since the updating rates in (1.16) are KiPjImage, and KiImage can be selected as any positive matrix, the learning process of the dynamic neural network is free of the solution of the Riccati equation (1.14).

Theorem 1.2

If the modeling errors f˜1=f˜2=0Image (only uncertainty parameters present), then the updating law (1.16) can make the identification error asymptotically stable, i.e.,

lim t Δ 1 = 0 , lim t Δ 2 = 0 .

Proof

The modeling errors f˜1=f˜2=0Image, 2Δ1TP1f˜1=2Δ2TP2f˜2=0Image, and the storage function (1.20) satisfies

S ˙ Δ 1 T Q 10 Δ 1 Δ 2 T Q 20 Δ 2 0 .

The positive definite S(xt)Image implies Δ1Image, Δ2Image and the weights are bounded. From the error equation (1.11) Δ˙1LImage, Δ˙2LImage. Integrate (1.20) on both sides to obtain

0 Δ 1 Q 10 + Δ 2 Q 20 S 0 S < .

So Δ1L2LImage, Δ2L2LImage, and using Barbalat's lemma, we have (1.22). Since u, σ, ϕ, and P are bounded, limtΔ1=0Image, limtΔ2=0Image. □

For many cascade systems the outputs in the internal blocks are not measurable, for example Δ1Image. The modeling error in the final block Δ2Image should be propagated to the other blocks, i.e., we should calculate the internal modeling error Δ1Image from Δ2Image.

From (1.3) and (1.6), the last block can be written as

z ˙ 2 = A 2 z 2 + W 2 σ 2 ( z 2 ) + V 2 ϕ 2 ( z 2 ) z 1 , x ˙ 2 = A 2 x 2 + W 2 σ 2 ( x 2 ) + V 2 ϕ 2 ( z 2 ) x 1 f ˜ 3 ,

where f˜3Image is the modeling error corresponding to the weights W and V2Image. So

Δ ˙ 2 = A 2 Δ 2 + W 2 σ ˜ 2 + V 2 ϕ 2 ( z 2 ) Δ 1 + f ˜ 3 .

So Δ1Image is approximated by

Δ 1 = [ V 2 ϕ 2 ( z 2 ) ] 1 [ Δ ˙ 2 A 2 Δ 2 W 2 σ ˜ 2 f ˜ 3 ] .

In Fig. 1.1 we note fˆ3Image is the difference between the block “f2,g2Image” and the block “A2,W2,V2Image.” So fˆ3Image can be estimated as

f ˜ 3 z ˙ 2 Δ ˙ 2 .

Here

z ˙ 2 = z 2 ( t ) z 2 ( t τ ) τ + δ ( t ) , τ > 0 ,

where δ(t)Image is the differential approximation error. So

f ˜ 3 f ˆ 3 = z 2 ( t ) z 2 ( t τ ) τ Δ 2 ( t ) Δ 2 ( t τ ) τ .

The internal modeling error is approximated by

Δ ˆ 1 = [ V 2 ϕ 2 ( z 2 ) ] 1 × [ 2 Δ 2 ( t ) Δ 2 ( t τ ) τ z 2 ( t ) z 2 ( t τ ) τ A 2 Δ 2 W 2 [ σ 2 ( z 2 ) σ 2 ( x 2 ) ] ] .

In order to ensure Δˆ1Image is bounded, we use (1.24) when V2ϕ2(z2)>τImage. Otherwise, we use Δ2Image to represent Δ1Image, i.e., Δˆ1=Δ2Image.

Although the gradient algorithm (1.16) can ensure the modeling errors Δ1Image and Δ2Image are bounded (Theorem 1.1), the structure uncertainties f˜1Image and f˜2Image will cause the parameters drift for the gradient algorithm (1.16). Some robust modification should be applied to make the parameters (weights) stable. In order to guarantee the overall models are stable, we use the following dead-zone training algorithm.

Theorem 1.3

The weights are adjusted as follows:

(a) if Δ12>η¯f1λmin(Q1)Image and Δ22>η¯f2λmin(Q2)Image, then the updating law is given by (1.16);

(b) if Δ12η¯f1λmin(Q1)Image or Δ22>η¯f2λmin(Q2)Image, then we stop the learning procedure (all right-hand sides of the corresponding system of differential equations are equal to zero) and maintain all weights constant; then, besides the modeling errors being bounded, the weight matrices also remain bounded and for any T>0Image the identification error fulfills the following tracking performance:

lim T 1 T 0 T ( Δ 1 Q 1 2 + Δ 2 Q 2 2 ) d t κ 1 η ¯ f 1 + κ 2 η ¯ f 2 ,

where κ is the condition number of Q defined as κ1=λmax(Q1)λmin(Q1)Image, κ2=λmax(Q2)λmin(Q2)Image.

1.4 Modeling of Wastewater Treatment

The wastewater treatment plant studied in this chapter is an anoxic/oxidant nitrogenous removal process [10]. It consists of two biodegradation tanks and a secondary clarifier in series form; see Fig. 1.2. Here, Qin,Qm,Qr,QRImage, and QwImage denote flow of wastewater to be disposed, flow of mixed influent, flow of internal recycles, flow of external recycles, and flow of surplus sludge, respectively. Water quality indices such as chemical oxygen demand (COD), biological oxygen demand (BOD5), NH4-N (ammonia), nitrate, and suspended solid (SS) are decomposed into those components in activated sludge models (ASMs) [4]. The state variable x is defined as

x = [ S I , S S , X I , X S , X P , X B H , X B A , S N O , S N H , S O , S N D , X N D , S a l k ] T ,

where SIImage is soluble inert and SSImage is readily biodegradable substrate, XIImage is suspended inert and XSImage is slowly biodegradable substrate, XPImage is suspended inert products, XBHImage is autotrophic biomass, XBAImage is heterotrophic biomass, SNOImage is nitrate, SNHImage is ammonia, SOImage is soluble oxygen, SNDImage is soluble organic nitrogen, XNDImage is suspended organic nitrogen, and SalkImage is alkalinity. In this tank, there are three major reaction processes, i.e.,

N H 4 + + 1.5 O 2 N O 2 + H 2 O + 2 H + , N O 2 + 0.5 O 2 N O 3 , C O D + O 2 C O 2 + H 2 O + A S ,

where COD represents carbonous contamination and AS denotes activated sludge. Nitrite is recycled from the aerobic tanks. It is deoxidized by autotrophic microorganisms in the denitrification phase by the following reaction:

2 N O 3 + 2 H + N 2 + H 2 O + 2.5 O 2 .

The water quality index COD depends on the control input

w = [ w i ] = [ Q i n , Q r , Q R , Q w ] T

and on the influent quality xinImage, i.e.,

C O D = f ( Q i n , Q r , Q R , Q w , x i n ) .

COD is also affected by other external factors such as temperature, flow distribution, and toxins. It is very difficult to find the nonlinear function f()Image [13].

Image
Fig. 1.2 Wastewater treatment process.

We know each biological reactor in wastewater treatment plants can be described by the following dynamic equation:

x ˙ ( t ) = A x ( t ) + B x b ( t ) + φ ( x ( t ) ) ,

where xR13Image is the inner state, which is defined in (1.29), xb(t)R4Image is the input, which is defined in (1.32), φ denotes the reaction rates, φ()R13Image, A=w1(t)+w2(t)+w3(t)VImage, and B=w1(t)+w2(t)+w3(t)VImage.

The resulting steady-state values of anoxic and aerobic reactors are shown in Table 1.1. Used data are from 2003.

Table 1.1

Steady-state values of anoxic and aerobic reactors.

S S X BH X S X I S NH S I S ND X ND S O X BA S NO X P S alk
Anoxic 1.2518 3249 74.332 642.4 7.9157 38.374 0.7868 5.7073 0.0001 220.86 3.9377 822.19 4.9261
Aerobic 0.6867 3244.8 47.392 643.36 0.1896 38.374 0.6109 3.7642 1.4988 222.39 12.819 825.79 3.7399

Image

As discussed above, hierarchical dynamic neural networks are suitable for modeling this process. Each neural network corresponds to a reaction rate in a reactor, i.e.,

z ˙ 1 = A 1 z 1 + G 1 σ 1 ( z 1 ) + H 1 x u , z ˙ 2 = A 2 z 2 + G 2 σ 2 ( z 2 ) + H 2 ϕ 2 ( z 2 ) z 1 .

The design choices are as follows: both neural networks have one hidden layer, each hidden layer has 50 hidden nodes. The training algorithms of each neural model are (1.16); here the activation function ϕi()=tanh(x)=exexex+exImage, η0=1Image, and the initial weights of W(1)Image and V(1)Image are random numbers between [0,1]Image.

Dynamic modeling uses the steady-state values resulting from steady-state simulations as initial values with a hydraulic residence time of 10.8 h and a sludge age of 15 days. A total of 100 input/output data pairs from the records of 2003 are used as the training data, the other 30 input/output pairs as the testing data. The testing results of effluent COD are shown in Fig. 1.3.

Image
Fig. 1.3 The results using hierarchical dynamic neural networks.

We compare the hierarchical neural networks (HNNs) with the other three modeling methods, i.e., ASMs [4], linear models (LMs) [7], and neural networks (NNs) [6]. The model parameters of the ASM are the default values in [4]. The numbers of concerned variables in linear models are selected 2, 3, and 4. The hidden nodes of NNs are chosen as 30, 50, and 70; they are the same as those in the HNN. The initial values for all weights are chosen randomly from the interval (0,1)Image. The comparison results are shown in Table 1.2, where the root mean square (RMS) of the HNN refers to the summation of errors in the final output.

Table 1.2

Comparison results of an activated sludge model (ASM), linear model (LM), neural network (NN), and hierarchical neural network (HNN). RMS, root mean square.

Network ASM LM NN HNN
Case 2 4 30 50 70 10 70
Parameters 19 3 5 60 100 140 140 980
Training 148 0.1 0.21 3.8 5.28 5.23 2.73 36.81
RMS 37 11.99 11.54 24.39 8.45 10.64 11.81 8.62

Image

The wastewater treatment process suffers from external disturbances such as temperature, influent qualities, influent flow, operational status, and internal factors like microorganism activities. Both the NN and the HNN can achieve high modeling accuracy. The LM and the NN can provide the water quality in the aerobic reactor, while the ASM and the HNN give water qualities in both anoxic and aerobic reactors.

1.5 Conclusions

The main contribution of this chapter is that a new hierarchical model is proposed. This hierarchical dynamic neural network is effective for cascade process modeling. Two stable training algorithms are discussed for this model. A new estimation method for the internal variables of the cascade process is illustrated. Real data of a wastewater treatment plant are applied to illustrate the modeling approach.