Hierarchical Dynamic Neural Networks for Cascade System Modeling With Application to Wastewater Treatment
Wen Yu, DSc; Daniel Carrillo, DSc
Abstract
Many cascade processes, such as wastewater treatment, include complex nonlinear subsystems and many variables. The normal input–output relation only represents the first block and the last block of the cascade process.
In order to model the whole process, we use hierarchical dynamic neural networks to identify the cascade process. The internal variables of the cascade process are estimated. Two stable learning algorithms and theoretical analysis are given. Real operational data of a wastewater treatment plant are applied to illustrate this new neural modeling approach.
The input–output relation within a cascade process is very complex. It usually can be described by several nonlinear subsystems, as is for example the case with the cascade process of wastewater treatment. Obviously, the first block and the last block cannot represent the whole process. Hierarchical models can be used to model this problem. When the cascade process is unknown, only the input and output data are available. Black-box modeling techniques are needed. Also, the internal variables of the cascade process need to be estimated.
There are three different approaches that can be used to model a cascade process. If the input/output data in each subblock are available, each model is identified independently. If the internal variables are not measurable, a general method is to regard the whole process as one block and to use one model to identify it [6,3,19]. Another method is to use hierarchical models to identify cascade processes. Advantages of this approach are that the cascade information is used for identification and that the internal variable can be estimated. In [2], discrete-time feedforward neural networks are applied to approximate the uncertainty parts of the cascade system.
Neural networks can approximate any nonlinear function to any prescribed accuracy provided a sufficient number of hidden neurons can be incorporated. Hierarchical neural models consisting of a number of low-dimensional neural systems have been presented by [9] and [11] in order to avoid the dimension explosion problem. The main applications of hierarchical models are fuzzy systems, because rule explosion problem can be avoided in hierarchical systems [9], for example, in hierarchical fuzzy neural network [17], hierarchical fuzzy systems [12], and hierarchical fuzzy cerebellar model articulation controller (CMAC) networks [15]. Sensitivity analysis of the hierarchical fuzzy model was given in [12]. A statistical learning method was employed to construct hierarchical models in [3]. Based on Kolmogorov's theorem, [18] showed that any continuous function can be represented as a superposition of functions with the natural hierarchical structure. In [16], fuzzy CMAC networks are formed into a hierarchical structure.
The normal training method of hierarchical neural systems is still gradient descent. The key for the training of hierarchical neural models is to get an explicit expression of each internal error. Normal identification algorithms (gradient descent, least square, etc.) are stable under ideal conditions. They might become unstable in the presence of unmodeled dynamics. The Lyapunov approach can be used directly to obtain robust training algorithms of continuous-time and discrete-time neural networks. By using passivity theory, [5], [8], and [14] successfully proved that gradient descent algorithms of continuous-time dynamic neural networks were stable and robust to any bounded uncertainties.
The main problem for the training of a hierarchical neural model is the estimation of the internal variable. In this chapter, a hierarchical dynamic neural network is applied to model wastewater treatment. Two stable training algorithms are proposed. A novel approximate method for the internal variable of the cascade process is discussed. Real application results show that the new modeling approach is effective for this cascade process.
1.2 Cascade Process Modeling Via Hierarchical Dynamic Neural Networks
Each subprocess of a cascade process, such as wastewater treatment, can be described using the following general nonlinear dynamic equation:
x˙=f(x,u),
(1.1)
where x∈Rnc is the inner state, u∈Rmc is the input, and f is a vector function. Without loss of generality, we use two nonlinear affine systems to show how to use the hierarchical dynamic neural networks to model the system; see Fig. 1.1. The identified cascade nonlinear systems are given by
x˙1=f1(x1)+g1(x1)u,x˙2=f2(x2)+g2(x2)x1,
(1.2)
where x1,x2∈Rn are the inner states of the subsystems, and f1, f2, g1, and g2 are unknown vector functions; x1 can be also regarded as the output of subsystem 1 and as the input of subsystem 2; u∈R is the input of the whole system, and also the input of subsystem 1; x2 is the output of the whole system, and also the output of subsystem 2.
Only u and x2 are available for the cascade process modeling. Since the internal variables are not measurable, a general method is to regard the whole process as one block and to use one model to identify it. Another method is to use hierarchical models to identify cascade processes. The advantages of this approach are that the cascade information is used for identification and that the internal variable can be estimated. In Section 1.3 we will show how to approximate it. In many wastewater treatment plants, x1 is sampled occasionally. We can use this real value to improve the modeling accuracy.
We construct the following hierarchical dynamic neural networks to model (1.2):
where z1,z2∈Rn are the states of the neural models, and W1, W2, V1, and V2 are the weights of the neural networks; A1 and A2 are known stable matrices. The active functions of σ(⋅) and ϕ(⋅) are used as sigmoid functions, i.e.,
Generally, the hierarchical dynamic neural networks (1.3) cannot follow the nonlinear system (1.2) exactly. The nonlinear cascade system may be written as
where f˜1 and f˜2 are modeling errors and disturbances. Since the state and output variables are physically bounded, the modeling errors f˜1 and f˜2 can be assumed to be bounded too. The upper bounds of the modeling errors are
‖f˜1‖Λf12⩽η¯f1⩽∞,‖f˜2‖Λf22⩽η¯f2⩽∞,
(1.8)
where η‾f1 and η¯f2 are known positive matrices, and Λf1 and Λf2 are any positive definite matrices.
where Λw, Λv, Dσ1, Dϕ1, Dσ2, and Dϕ2 are positive definite matrices.
1.3 Stable Training of the Hierarchical Dynamic Neural Networks
In order to obtain a stable training algorithm for the hierarchical dynamic neural networks (1.3), we calculate the error dynamics of the submodels. From (1.3) and (1.6), we have
If the outputs of all blocks are available, we can train each block independently via the modeling errors between neural models and the corresponding process blocks, Δ1 and Δ2. Let us define
R1=W¯1+V¯1,Q1=(Dσ1+‖u‖Dϕ1+Q10)
(1.12)
and the matrices A1 and Q10 are selected to fulfill the following conditions:
(1) the pair (A1,R11/2) is controllable, the pair (Q11/2,A1) is observable;
(2) if the local frequency condition [1] is satisfied, i.e.,
A1: There exist a stable matrix A1 and a strictly positive definite matrix Q10 such that the matrix Riccati equation
A1TP1+P1A1+P1R1P1+Q1=0
(1.14)
has a positive solution P1=P1T>0.
This condition is easily fulfilled if we select A1 as a stable diagonal matrix. The next theorem states the learning procedure of a neuroidentifier. Similarly, there exist a stable matrix A2 and a strictly positive definite matrix Q20 such that the matrix Riccati equation
A2TP2+P2A2+P2R2P2+Q2=0,
(1.15)
where R2=W¯2+V¯2, Q2=(Dσ2+‖z1‖Dϕ2+Q20).
First, we may choose A1 and Q1 such that the Riccati equation (1.14) has a positive solution P1. Then Λf1 may be found according to the condition (1.8). Since (1.7) is correct for any positive definite matrix, (1.8) can be established if Λf1 is selected as a small enough constant matrix. The condition (1.8) has no effect on the network dynamics (1.3) and its training (1.16).
Theorem 1.1
If Assumption A1 is satisfied and the weightsW1,tandW2,tare updated as
whereP1andP1are the solution of the Riccati equation(1.14)and(1.15), then the identification error dynamics(1.11)is strictly passive from the modeling errorf˜1andf˜2to the identification errors2P1Δ1and2P2Δ2and the updating law(1.16)can make the identification procedure stable.
From the passivity definition, if we define the inputs as f˜1 and f˜2 and the outputs as 2P1Δ1 and 2P2Δ2, then the system is strictly passive with Δ1TQ10Δ1⩾0 and Δ2TQ20Δ2.
where α‖Δ1‖=[λmin(Q10)−λmax(P1Λf1P1)]‖Δ1‖, β‖f˜1‖=λmax(Λf1−1)‖f˜1‖, α‖Δ2‖=[λmin(Q20)−λmax(P2Λf2P2)]‖Δ2‖, and β‖f˜2‖=λmax(Λf2−1)‖f˜2‖. We can select positive definite matrices Λf1 and Λf2 such that (1.8) is established. So α‖Δ1‖, α‖Δ2‖, β‖f˜1‖, and β‖f˜2‖ are K∞ functions and St is an input-to-state stable (ISS)-Lyapunov function. The dynamics of identification error (1.11) is ISS. So when the modeling errors f˜1 and f˜2 are bounded, the updating law (1.16) can make the modeling errors stable, i.e.,
Δ1∈L∞,Δ2∈L∞.
□
Since the updating rates in (1.16) are KiPj, and Ki can be selected as any positive matrix, the learning process of the dynamic neural network is free of the solution of the Riccati equation (1.14).
Theorem 1.2
If the modeling errorsf˜1=f˜2=0(only uncertainty parameters present), then the updating law(1.16)can make the identification error asymptotically stable, i.e.,
limt→∞Δ1=0,limt→∞Δ2=0.
(1.22)
Proof
The modeling errors f˜1=f˜2=0, 2Δ1TP1f˜1=2Δ2TP2f˜2=0, and the storage function (1.20) satisfies
S˙⩽−Δ1TQ10Δ1−Δ2TQ20Δ2⩽0.
The positive definite S(xt) implies Δ1, Δ2 and the weights are bounded. From the error equation (1.11)Δ˙1∈L∞, Δ˙2∈L∞. Integrate (1.20) on both sides to obtain
∫0∞‖Δ1‖Q10+‖Δ2‖Q20⩽S0−S∞<∞.
So Δ1∈L2∩L∞, Δ2∈L2∩L∞, and using Barbalat's lemma, we have (1.22). Since u, σ, ϕ, and P are bounded, limt→∞Δ1=0, limt→∞Δ2=0. □
For many cascade systems the outputs in the internal blocks are not measurable, for example Δ1. The modeling error in the final block Δ2 should be propagated to the other blocks, i.e., we should calculate the internal modeling error Δ1 from Δ2.
From (1.3) and (1.6), the last block can be written as
In order to ensure Δˆ1 is bounded, we use (1.24) when ‖V2ϕ2(z2)‖>τ. Otherwise, we use Δ2 to represent Δ1, i.e., Δˆ1=Δ2.
Although the gradient algorithm (1.16) can ensure the modeling errors Δ1 and Δ2 are bounded (Theorem 1.1), the structure uncertainties f˜1 and f˜2 will cause the parameters drift for the gradient algorithm (1.16). Some robust modification should be applied to make the parameters (weights) stable. In order to guarantee the overall models are stable, we use the following dead-zone training algorithm.
Theorem 1.3
The weights are adjusted as follows:
(a) if‖Δ1‖2>η¯f1λmin(Q1)and‖Δ2‖2>η¯f2λmin(Q2), then the updating law is given by(1.16);
(b) if‖Δ1‖2⩽η¯f1λmin(Q1)or‖Δ2‖2>η¯f2λmin(Q2), then we stop the learning procedure (all right-hand sides of the corresponding system of differential equations are equal to zero) and maintain all weights constant; then, besides the modeling errors being bounded, the weight matrices also remain bounded and for anyT>0the identification error fulfills the following tracking performance:
limT→∞1T∫0T(‖Δ1‖Q12+‖Δ2‖Q22)dt⩽κ1η¯f1+κ2η¯f2,
(1.25)
where κ is the condition number of Q defined asκ1=λmax(Q1)λmin(Q1),κ2=λmax(Q2)λmin(Q2).
(I) If ‖Δ1‖2>η¯f1λmin(Q1) and ‖Δ2‖2>η¯f2λmin(Q2), using the updating law (1.16) we conclude that S˙t<0. St is bounded. Integrating (1.26) from 0 to T yields
From (I) and (II), St is bounded. Because W1,0=W1⁎, W2,0=W2⁎, V1,0=V1⁎ and V2,0=V2⁎. From (1.27) and (1.28), (1.25) is obtained. The theorem is proved. □
1.4 Modeling of Wastewater Treatment
The wastewater treatment plant studied in this chapter is an anoxic/oxidant nitrogenous removal process [10]. It consists of two biodegradation tanks and a secondary clarifier in series form; see Fig. 1.2. Here, Qin,Qm,Qr,QR, and Qw denote flow of wastewater to be disposed, flow of mixed influent, flow of internal recycles, flow of external recycles, and flow of surplus sludge, respectively. Water quality indices such as chemical oxygen demand (COD), biological oxygen demand (BOD5), NH4-N (ammonia), nitrate, and suspended solid (SS) are decomposed into those components in activated sludge models (ASMs) [4]. The state variable x is defined as
where SI is soluble inert and SS is readily biodegradable substrate, XI is suspended inert and XS is slowly biodegradable substrate, XP is suspended inert products, XBH is autotrophic biomass, XBA is heterotrophic biomass, SNO is nitrate, SNH is ammonia, SO is soluble oxygen, SND is soluble organic nitrogen, XND is suspended organic nitrogen, and Salk is alkalinity. In this tank, there are three major reaction processes, i.e.,
where COD represents carbonous contamination and AS denotes activated sludge. Nitrite is recycled from the aerobic tanks. It is deoxidized by autotrophic microorganisms in the denitrification phase by the following reaction:
2NO3−+2H+→N2+H2O+2.5O2.
(1.31)
The water quality index COD depends on the control input
w=[wi]=[Qin,Qr,QR,Qw]T
(1.32)
and on the influent quality xin, i.e.,
COD=f(Qin,Qr,QR,Qw,xin).
COD is also affected by other external factors such as temperature, flow distribution, and toxins. It is very difficult to find the nonlinear function f(⋅)[13].
We know each biological reactor in wastewater treatment plants can be described by the following dynamic equation:
x˙(t)=Ax(t)+Bxb(t)+φ(x(t)),
(1.33)
where x∈R13 is the inner state, which is defined in (1.29), xb(t)∈R4 is the input, which is defined in (1.32), φ denotes the reaction rates, φ(⋅)∈R13, A=−w1(t)+w2(t)+w3(t)V, and B=w1(t)+w2(t)+w3(t)V.
The resulting steady-state values of anoxic and aerobic reactors are shown in Table 1.1. Used data are from 2003.
Table 1.1
Steady-state values of anoxic and aerobic reactors.
SS
XBH
XS
XI
SNH
SI
SND
XND
SO
XBA
SNO
XP
Salk
Anoxic
1.2518
3249
74.332
642.4
7.9157
38.374
0.7868
5.7073
0.0001
220.86
3.9377
822.19
4.9261
Aerobic
0.6867
3244.8
47.392
643.36
0.1896
38.374
0.6109
3.7642
1.4988
222.39
12.819
825.79
3.7399
As discussed above, hierarchical dynamic neural networks are suitable for modeling this process. Each neural network corresponds to a reaction rate in a reactor, i.e.,
The design choices are as follows: both neural networks have one hidden layer, each hidden layer has 50 hidden nodes. The training algorithms of each neural model are (1.16); here the activation function ϕi(⋅)=tanh(x)=ex−e−xex+e−x, η0=1, and the initial weights of W(1) and V(1) are random numbers between [0,1].
Dynamic modeling uses the steady-state values resulting from steady-state simulations as initial values with a hydraulic residence time of 10.8 h and a sludge age of 15 days. A total of 100 input/output data pairs from the records of 2003 are used as the training data, the other 30 input/output pairs as the testing data. The testing results of effluent COD are shown in Fig. 1.3.
We compare the hierarchical neural networks (HNNs) with the other three modeling methods, i.e., ASMs [4], linear models (LMs) [7], and neural networks (NNs) [6]. The model parameters of the ASM are the default values in [4]. The numbers of concerned variables in linear models are selected 2, 3, and 4. The hidden nodes of NNs are chosen as 30, 50, and 70; they are the same as those in the HNN. The initial values for all weights are chosen randomly from the interval (0,1). The comparison results are shown in Table 1.2, where the root mean square (RMS) of the HNN refers to the summation of errors in the final output.
Table 1.2
Comparison results of an activated sludge model (ASM), linear model (LM), neural network (NN), and hierarchical neural network (HNN). RMS, root mean square.
Network
ASM
LM
NN
HNN
Case
–
2
4
30
50
70
10
70
Parameters
19
3
5
60
100
140
140
980
Training
148
0.1
0.21
3.8
5.28
5.23
2.73
36.81
RMS
37
11.99
11.54
24.39
8.45
10.64
11.81
8.62
The wastewater treatment process suffers from external disturbances such as temperature, influent qualities, influent flow, operational status, and internal factors like microorganism activities. Both the NN and the HNN can achieve high modeling accuracy. The LM and the NN can provide the water quality in the aerobic reactor, while the ASM and the HNN give water qualities in both anoxic and aerobic reactors.
1.5 Conclusions
The main contribution of this chapter is that a new hierarchical model is proposed. This hierarchical dynamic neural network is effective for cascade process modeling. Two stable training algorithms are discussed for this model. A new estimation method for the internal variables of the cascade process is illustrated. Real data of a wastewater treatment plant are applied to illustrate the modeling approach.