Stochastic Optimal Control of Unknown Linear Networked Control

System in the Presence of Random Delays and Packet Losses

Faculty Advisor: Dr. Jagannathan Sarangapani, ECE Department

uk 1

r zk ,uk J

1

0

x 0

x 0 0.1818 2.6727

0

0

0

0 0.4545 31.1818

Investigate the effects of delays and packet losses on the stability of

the NCS with unknown dynamics

where

T

k

z Qz zk u Rz uk

H z

i

T

*

k 1

T

k 1

u

AzT Pi Bz

Rz BzT Pi Bz

4. Define the update law to tune the H matrix online in least-squares sense

Networked control can reduce the installation costs and increase

productivity through the use of wireless communication technology

1) Vectorize the H matrix: h vec H

2) Update law:

hi 1 arg min

hi 1

hiT1w

1) Stability:

zk d w zk , H i

dz k

ui* z k

T T

5. Develop the stochastic suboptimal control

K R B P B B P A H H

u K z

6. Convergence: when i , Q z , u Q z , u andH i H , K i K at the same time.

i

i *

k

z

1

T

z i

z

T

z i

i 1

uu

z

i

uz

Approximate dynamic programming (ADP) techniques intent to solve

optimal control problems of complex systems without the knowledge

of system dynamics in a forward-in-time manner.

k

i *

k

*

k

2. Set up stochastic Q-function:Q zk , uk E

3. Using the adaptive estimator to represent the

J k 1

ukT

T

u kT

H k z kT

Q-function:Q zk , uk wkT H k wk hkT wk

Figure 1 the wireless networked control system

The proposed approach for optimal controller design involves using a

combination of Q-learning and adaptive estimator (AE) whereas for

suboptimal controller design only Q-learning scheme will be utilized

The delays and packet losses are incorporated in the dynamic

model which will be used for the controller development

zk 1

Networked control system representation

B

A z B u , y C z

zk k

As

0

0

Azk

0

zk k

k

z k

Ipk 1 B1k

Ipk i Bik

0

Im

0

0

Im

0

Im

ik 1 i 1T

k

i

Ipk d 1 Bdk 1

Ip k B0k

0

Im

0

B 0

0

zk 0

0

0

ik

iT

and

u k K k z k H kuu

1

,

zk

xkT

T

u kT d 1

Figure 2 depicts a block diagram representation:

and

u k u k*

Figure 3 present the block diagram for the AE-based

stochastic optimal regulator of NCS

zk

uu 1

k

H

H kuz

u zk Hkuu

1

Plant

Sensor

Wireless Network

Delay

And

Packet losses

ca (t)

sc (t )

Ip (t )

Ip (t )

Delay

And

Packet losses

Azk and Bzk

Adaptive Estimator of

Q z k , uk function

Jk wkT H k wk

h kT w k

Cost Function

Network

Controller

Figure 2 Block diagram of Networked control system

-40

0

3.5

7

10.5

Time (Sec)

(b)

14

-40

0

17.5

3.5

7

10.5

Time (Sec)

(c)

14

17.5

System total costs with Q-learning

suboptimal control and Proposed AE optimal control

5

with unknown dynamics

x 10

15

Q-learning suboptimal control

Proposed AE optimal control

10

Control inputs with Q-learning suboptimal control and

Proposed AE optimal control with unknown dynamics

100

Q-learning suboptimal control

Proposed AE optimal control

50

5

0

-50

-100

-150

3.5

7

Time (Sec)

(a)

10.5

14

-200

0

3.5

7

Time (Sec)

(b)

10.5

14

Proposed Q-learning based suboptimal and AE-based

optimal control design for NCS with unknown dynamics in

presence of random delays and packet losses performs

superior than a traditional controller

Both Q-learning based suboptimal control and AE-based

optimal control can maintain NCS stable.

Proposed AE-based optimal control is more effective than

Proposed Q-learning based suboptimal control.

Hkuz zk

z k 1 Azk z k B zk u z k

Linear Network Control System with Unknown

-20

-30

7

0

n d 1 m

T

Actuator

AE-based Stochastic Optimal Control (2)

e AT s dsB 1 T ik 1 ik 1 ik iT ,

u kT 1

then

Jk J k*

-20

CONCLUSIONS

H kuz z k

k , z k 0 hk hk

-10

20

As shown in figure

6-(a),

proposed

AE-base optimal controller can minimize the cost-to

T

T

zi Qz zi ui Rz ui ) better than proposed Q-learning suboptimal controller. In

J E

go

( k

i k

Figure 6-(b), proposed AE-based optimal control can force NCS states converge to zero

quicker than Q-learning suboptimal control. It indicates proposed AE-based optimal

control is more effective than Q-learning suboptimal control.

5. Determine the AE stochastic optimal control input

0, if uk i was received during kT , k 1 T

1, if uk i was lost during kT , k 1 T

Ipk i

6. Convergence: when

i 1,2,..., d 1

C z C 0 0

2) Update

ehk r z k 1 , u k 1 hkT Wk 1

T

T

whereWk 1 wk wk 1 and r z k 1 , u k 1 z k 1Qz z k 1 u k 1 Rz u k 1

1

T

T

T

T

h

W

W

W

e

r

z

,

u

k

1

k

k

k

h

hk

k

k

law for time varying matrix H:

0

40

Figure 6 Optimal performance

where h is a constant, and 0 h 1

Networked Control System Model

1) Represent residual error:

3.5

5.25

Time (Sec)

(a)

0

0

4. Define the update law to tune the approximated H matrix

1.75

2) Optimality:

T

2

2

2

T

T

n d m l

w

w

,...,

w

,...,

w

w

,

w

h

vec

H

,

w

z

u

z

,

w

k1

k2

kl 1 kl

kl is the Kronecker

k

k

k

where k

and k

k k

product quadratic polynomial basis vector

-60

60

As shown in Figure 5, if we use a PID without considering delays and packet losses, the

NCS will be unstable(fig.5-(a)). However, when we implement proposed Q-learning

suboptimal and AE optimal control, the NCS can still maintain stable(Fig.5-(b),(c)).

1. When random delays and packet losses are considered, H matrix become

time-varying. However, we assume that it changes slowly.

z kT

e1

e2

e3

e4

10

e1

e2

e3

e4

Figure 5 Stability performance

AE-based Stochastic Optimal Control

ukT Rz u k

-20

0

*

k

z kT Qz z k

0

80

e1

e2

e3

e4

20

-80

i k

i

30

-40

State Regulation Errors with Proposed AE

Optimal control of NCS with unknown dynamics

State Regulation Errors with Q-learning

suboptimal control of NCS with unknown dynamics

20

where d w zk , H i zkT Qz zk ui zk T Rz ui zk Qi z k 1 , ui zk 1 and w zk z kT

The challenging problems in control of networked-based system are

network delay and packet losses. These effects do not only degrade

the performance of NCS, but also can destabilize the system.

Performance evaluation of proposed suboptimal and optimal control

State Regulation Error of NCS

with Delay and Packet Losses

2

0 x 0

0 x 1.8182

u

1 0

0 4.5455

After random delays and packet losses due to NCS, the original time-invariant system

was discretized and represented as a time-varying system zk 1 Azk zk Bzk uk , yk C z zk (Note:

since the random delays and packet losses are considered, the NCS model is not only

time varying , but also a function of time k)

T

3. Using mean values of the delays and packet losses instead of the random

delays and packet losses, then H matrix become time-invariant matrix.

BACKGROUND

u

T

k

i

Qz AzT Pi Az

H zu

i

T

H uu

Bz Pi Az

System total costs J(Xk)

z

T

*

k 1

T

k 1

*

i

k 1

i

H zz

H i i

H uz

Regulation Error Values

Develop an adaptive estimator (AE)-based stochastic optimal control

Consider the linear time-invariant inverted pendulum dynamics

Control Input

Qi 1 zk ,uk r zk ,uk min Qi zk 1 ,uk 1

2. Define the update law to tune the Q-function

Develop a Q-learning based stochastic suboptimal controller for an

unknown networked control system (NCS) with random delay and

packet losses;

Regulation Error Values

Q z k , u k E z kT Qz z k u kT Rz u k J k 1

1. Define the Q-function:

Simulation Results

Q-learning Stochastic Suboptimal Control

OBJECTIVES

Regulation Error Values

Student: Hao Xu, ECE Department

Figure 3 Stochastic optimal regulator block diagram

FUTURE WORK

Design suboptimal and optimal control for nonlinear

networked control systems (NNCS) with unknown

dynamics in presence of random delays and packet losses

Design a novel wireless network protocol to decrease the

effects of random delays and packet losses.

Optimize the NNCS globally from both control part and

wireless network part.