Stochastic Optimal Control of Unknown Linear Networked Control
System in the Presence of Random Delays and Packet Losses
Faculty Advisor: Dr. Jagannathan Sarangapani, ECE Department
uk 1
r zk ,uk J
1
0
x 0
x 0 0.1818 2.6727
0
0
0
0 0.4545 31.1818
Investigate the effects of delays and packet losses on the stability of
the NCS with unknown dynamics
where
T
k
z Qz zk u Rz uk
H z
i
T
*
k 1
T
k 1
u
AzT Pi Bz
Rz BzT Pi Bz
4. Define the update law to tune the H matrix online in least-squares sense
Networked control can reduce the installation costs and increase
productivity through the use of wireless communication technology
1) Vectorize the H matrix: h vec H
2) Update law:
hi 1 arg min
hi 1
hiT1w
1) Stability:
zk d w zk , H i
dz k
ui* z k
T T
5. Develop the stochastic suboptimal control
K R B P B B P A H H
u K z
6. Convergence: when i , Q z , u Q z , u andH i H , K i K at the same time.
i
i *
k
z
1
T
z i
z
T
z i
i 1
uu
z
i
uz
Approximate dynamic programming (ADP) techniques intent to solve
optimal control problems of complex systems without the knowledge
of system dynamics in a forward-in-time manner.
k
i *
k
*
k
2. Set up stochastic Q-function:Q zk , uk E
3. Using the adaptive estimator to represent the
J k 1
ukT
T
u kT
H k z kT
Q-function:Q zk , uk wkT H k wk hkT wk
Figure 1 the wireless networked control system
The proposed approach for optimal controller design involves using a
combination of Q-learning and adaptive estimator (AE) whereas for
suboptimal controller design only Q-learning scheme will be utilized
The delays and packet losses are incorporated in the dynamic
model which will be used for the controller development
zk 1
Networked control system representation
B
A z B u , y C z
zk k
As
0
0
Azk
0
zk k
k
z k
Ipk 1 B1k
Ipk i Bik
0
Im
0
0
Im
0
Im
ik 1 i 1T
k
i
Ipk d 1 Bdk 1
Ip k B0k
0
Im
0
B 0
0
zk 0
0
0
ik
iT
and
u k K k z k H kuu
1
,
zk
xkT
T
u kT d 1
Figure 2 depicts a block diagram representation:
and
u k u k*
Figure 3 present the block diagram for the AE-based
stochastic optimal regulator of NCS
zk
uu 1
k
H
H kuz
u zk Hkuu
1
Plant
Sensor
Wireless Network
Delay
And
Packet losses
ca (t)
sc (t )
Ip (t )
Ip (t )
Delay
And
Packet losses
Azk and Bzk
Adaptive Estimator of
Q z k , uk function
Jk wkT H k wk
h kT w k
Cost Function
Network
Controller
Figure 2 Block diagram of Networked control system
-40
0
3.5
7
10.5
Time (Sec)
(b)
14
-40
0
17.5
3.5
7
10.5
Time (Sec)
(c)
14
17.5
System total costs with Q-learning
suboptimal control and Proposed AE optimal control
5
with unknown dynamics
x 10
15
Q-learning suboptimal control
Proposed AE optimal control
10
Control inputs with Q-learning suboptimal control and
Proposed AE optimal control with unknown dynamics
100
Q-learning suboptimal control
Proposed AE optimal control
50
5
0
-50
-100
-150
3.5
7
Time (Sec)
(a)
10.5
14
-200
0
3.5
7
Time (Sec)
(b)
10.5
14
Proposed Q-learning based suboptimal and AE-based
optimal control design for NCS with unknown dynamics in
presence of random delays and packet losses performs
superior than a traditional controller
Both Q-learning based suboptimal control and AE-based
optimal control can maintain NCS stable.
Proposed AE-based optimal control is more effective than
Proposed Q-learning based suboptimal control.
Hkuz zk
z k 1 Azk z k B zk u z k
Linear Network Control System with Unknown
-20
-30
7
0
n d 1 m
T
Actuator
AE-based Stochastic Optimal Control (2)
e AT s dsB 1 T ik 1 ik 1 ik iT ,
u kT 1
then
Jk J k*
-20
CONCLUSIONS
H kuz z k
k , z k 0 hk hk
-10
20
As shown in figure
6-(a),
proposed
AE-base optimal controller can minimize the cost-to
T
T
zi Qz zi ui Rz ui ) better than proposed Q-learning suboptimal controller. In
J E
go
( k
i k
Figure 6-(b), proposed AE-based optimal control can force NCS states converge to zero
quicker than Q-learning suboptimal control. It indicates proposed AE-based optimal
control is more effective than Q-learning suboptimal control.
5. Determine the AE stochastic optimal control input
0, if uk i was received during kT , k 1 T
1, if uk i was lost during kT , k 1 T
Ipk i
6. Convergence: when
i 1,2,..., d 1
C z C 0 0
2) Update
ehk r z k 1 , u k 1 hkT Wk 1
T
T
whereWk 1 wk wk 1 and r z k 1 , u k 1 z k 1Qz z k 1 u k 1 Rz u k 1
1
T
T
T
T
h
W
W
W
e
r
z
,
u
k
1
k
k
k
h
hk
k
k
law for time varying matrix H:
0
40
Figure 6 Optimal performance
where h is a constant, and 0 h 1
Networked Control System Model
1) Represent residual error:
3.5
5.25
Time (Sec)
(a)
0
0
4. Define the update law to tune the approximated H matrix
1.75
2) Optimality:
T
2
2
2
T
T
n d m l
w
w
,...,
w
,...,
w
w
,
w
h
vec
H
,
w
z
u
z
,
w
k1
k2
kl 1 kl
kl is the Kronecker
k
k
k
where k
and k
k k
product quadratic polynomial basis vector
-60
60
As shown in Figure 5, if we use a PID without considering delays and packet losses, the
NCS will be unstable(fig.5-(a)). However, when we implement proposed Q-learning
suboptimal and AE optimal control, the NCS can still maintain stable(Fig.5-(b),(c)).
1. When random delays and packet losses are considered, H matrix become
time-varying. However, we assume that it changes slowly.
z kT
e1
e2
e3
e4
10
e1
e2
e3
e4
Figure 5 Stability performance
AE-based Stochastic Optimal Control
ukT Rz u k
-20
0
*
k
z kT Qz z k
0
80
e1
e2
e3
e4
20
-80
i k
i
30
-40
State Regulation Errors with Proposed AE
Optimal control of NCS with unknown dynamics
State Regulation Errors with Q-learning
suboptimal control of NCS with unknown dynamics
20
where d w zk , H i zkT Qz zk ui zk T Rz ui zk Qi z k 1 , ui zk 1 and w zk z kT
The challenging problems in control of networked-based system are
network delay and packet losses. These effects do not only degrade
the performance of NCS, but also can destabilize the system.
Performance evaluation of proposed suboptimal and optimal control
State Regulation Error of NCS
with Delay and Packet Losses
2
0 x 0
0 x 1.8182
u
1 0
0 4.5455
After random delays and packet losses due to NCS, the original time-invariant system
was discretized and represented as a time-varying system zk 1 Azk zk Bzk uk , yk C z zk (Note:
since the random delays and packet losses are considered, the NCS model is not only
time varying , but also a function of time k)
T
3. Using mean values of the delays and packet losses instead of the random
delays and packet losses, then H matrix become time-invariant matrix.
BACKGROUND
u
T
k
i
Qz AzT Pi Az
H zu
i
T
H uu
Bz Pi Az
System total costs J(Xk)
z
T
*
k 1
T
k 1
*
i
k 1
i
H zz
H i i
H uz
Regulation Error Values
Develop an adaptive estimator (AE)-based stochastic optimal control
Consider the linear time-invariant inverted pendulum dynamics
Control Input
Qi 1 zk ,uk r zk ,uk min Qi zk 1 ,uk 1
2. Define the update law to tune the Q-function
Develop a Q-learning based stochastic suboptimal controller for an
unknown networked control system (NCS) with random delay and
packet losses;
Regulation Error Values
Q z k , u k E z kT Qz z k u kT Rz u k J k 1
1. Define the Q-function:
Simulation Results
Q-learning Stochastic Suboptimal Control
OBJECTIVES
Regulation Error Values
Student: Hao Xu, ECE Department
Figure 3 Stochastic optimal regulator block diagram
FUTURE WORK
Design suboptimal and optimal control for nonlinear
networked control systems (NNCS) with unknown
dynamics in presence of random delays and packet losses
Design a novel wireless network protocol to decrease the
effects of random delays and packet losses.
Optimize the NNCS globally from both control part and
wireless network part.