Lecture 2: Linear Optimum Filtering

Solution of the Wiener – Hopf equations for linear transversal filters (FIR) .... in a quadratic form (how to complete a perfect ”square”, encompassing −2pT w)?.
172KB taille 18 téléchargements 302 vues
1

Lecture 2: Linear Optimum Filtering Wiener Filters Problem statement • Given the sequence of input samples [u(0), u(1), u(2), . . .] and the sequence of desired response samples [d(0), d(1), d(2), . . .] • Given a family of filters computing their outputs according to y(n) =

∞ X

wk u(n − k), n = 0, 1, 2, . . .

(1)

k=0

• Find the parameters {w0 , w1 , w2 , . . .} such as to minimize the mean square error defined as J = E[e(n)2 ] where the error signal is e(n) = d(n) − y(n) = d(n) −

(2) ∞ X

wl u(n − l)

l=0

2 The family of filters (1) is the family of linear discrete time filters (IIR or FIR).

(3)

Lecture 2

2

Principle of orthogonality Define the gradient operator ∇, having its k-th entry ∂ ∂wk P and thus, the k−th entry of the gradient of criterion J is (remember, e(n) = d(n) − ∞ l=0 wl u(n − l)) ∇k =



(4)



∂J ∂e(n)  ∇k J = = 2E e(n) = −2E [e(n)u(n − k)] ∂wk ∂wk For the criterion to attain its minimum, the gradient of the criterion must be identically zero, that is ∇k J = 0,

k = 0, 1, 2, . . .

resulting in the fundamental Principle of orthogonality: E [eo (n)u(n − k)] = 0,

k = 0, 1, 2, . . .

(5)

Stated in words: • The criterion J attains its minimum iff • the estimation error eo (n) is orthogonal to the samples u(i) which are used to compute the filter output. We will index with o all the variables e.g. eo , yo computed using the optimal parameters {wo0 , wo1 , wo2 , . . .}. Let us compute the cross- correlation 

E [eo (n)yo (n)] = E

e

o (n)

∞ X k=0



wok u(n −

k)

=

∞ X

wok E [u(n − k)eo (n)] = 0

k=0

(6)

Lecture 2

3

Otherwise stated, in words, we have the following Corollary of Orthogonality Principle: • When the criterion J attains its minimum then • the estimation error e0 (n) is orthogonal to the filter output y0 (n). Wiener – Hopf equations From the orthogonality estimation error – input window samples we have 

E [u(n − k)e0 (n)] = 0, k = 0, 1, 2, . . .

E u(n − k)(d(n) −

∞ X



woi u(n − i)) = 0, k = 0, 1, 2, . . .

i=0 ∞ X

woi E [u(n − k)u(n − i)] = E [u(n − k)d(n)] , k = 0, 1, 2, . . .

i=0

But * E [u(n − k)u(n − i)] = r(i − k) is the autocorrelation function of input signal u(n) at lag i − k * E [u(n − k)d(n)] = p(−k) is the cross-correlation between the filter input u(n − k) and the desired signal d(n) and therefore

∞ X i=0

woi r(i − k) = p(−k), k = 0, 1, 2, . . .

W IEN ER − HOP F

(7)

Lecture 2

4

Solution of the Wiener – Hopf equations for linear transversal filters (FIR)

y(n) =

M −1 X

wk u(n − k), n = 0, 1, 2, . . .

(8)

k=0

and since only w0 , w1 , w2 , . . . wM −1 are nonzero, Wiener-Hopf equations become M −1 X

woi r(i − k) = p(−k), k = 0, 1, 2, . . . , M − 1 W IEN ER − HOP F

(9)

i=0

which is a system of M equations with M unknowns: {wo,0 , wo,1 , wo,2 , . . . , wo,M −1 }. Matrix formulation of Wiener – Hopf equations Let us denote h

u(n) u(n − 1) u(n − 2) . . . u(n − M + 1)

u(n) =

   

R = E[u(n)uT (n)] = E       =    

u(n) u(n − 1) u(n − 2) u(n − M + 1)

iT

 h   u(n) u(n − 1) 

Eu(n)u(n) Eu(n)u(n − 1) ... Eu(n)u(n − M + 1) Eu(n − 1)u(n) Eu(n − 1)u(n − 1) ... Eu(n − 1)u(n − M + 1) . . . . . . . . . . . . Eu(n − M + 1)u(n) Eu(n − M + 1)u(n − 1) . . . Eu(n − M + 1)u(n − M + 1) h

p = E[u(n)d(n)] =

p(0) p(−1) p(−2) . . . p(1 − M )

h

w0 =

i

u(n − 2) . . . u(n − M + 1)

wo,0 wo,1 . . . wo,M −1

iT

iT



(10) 

        =      

r(0) r(1) . . . r(M − 1) r(1) r(0) . . . r(M − 2) . . . . . . . . . . . . r(M − 1) r(M − 2) . . . r(0)

        

(11) (12)

Lecture 2

5

then Wiener – Hopf equations can be written in a compact form with solution wo = R−1 p

Rw0 = p

(13)

Mean square error surface Let us define ew (n) = d(n) −

M −1 X

wk u(n − k) = d(n) − wT u(n)

(14)

k=0

Then the cost function can be written as Jw = = = = = =

E[ew (n)ew (n)] = E[(d(n) − wT u(n))(d(n) − uT (n)w)] E[d2 (n) − d(n)uT (n)w − wT u(n)d(n) + wT u(n)uT (n)w] E[d2 (n)] − E[d(n)uT (n)]w − wT E[u(n)d(n)] + wT E[u(n)uT (n)]w E[d2 (n)] − 2E[d(n)uT (n)]w + wT E[u(n)uT (n)]w σd2 − 2pT w + wT Rw σd2

−2

M −1 X i=0

p(−i)wi +

M −1 M −1 X X

wl wi Ri,l

l=0 i=0

Thus, we can proceed in another way to find the (same) optimal solution wo . * Jw is a second order function of the parameters { w0 w1 . . . wM −1 } * J[

w0 w1 . . . wM −1

]

is a bowl shaped M + 1- dimensional surface with M degrees of freedom.

(15)

Lecture 2

6

* J attains the minimum, Jmin , where the gradient is zero ∇w J = 0 ∂J = 0, k = 0, 1, . . . , M − 1 ∂wk M −1 X ∂J = −2p(−k) + 2 wl r(k − l) = 0, ∂wk l=0

k = 0, 1, . . . , M − 1

which finally gives the same Wiener – Hopf equations M −1 X

wl r(k − l) = p(−k)

(16)

l=0

Minimum Mean square error Using the form of the criterion Jw = σd2 − 2pT w + wT Rw

(17)

one can find the value of the minimum criterion (remember, Rw0 = p and wo = R−1 p): Jwo = = = =

σd2 − 2pT wo + wTo Rwo = σd2 − 2wTo Rwo + wTo Rwo σd2 − wTo Rwo σd2 − wTo p σd2 − pT R−1 p

(18)

Lecture 2

7

Canonical form of the Error - performance surface (Parenthesis: How to compute a scalar out of a vector w, containing the entries of w at power one (linear combination) or at power two (quadratic form): * linear combination (first order form) aT w = * quadratic form wT Rw =

PM −1 l=0 al wl ;

PM −1 PM −1 i=0 wl wi Ri,l l=0

2 = w02 R0,0 + w0 w1 R1,0 + . . . wM −1 RM −1,M −1 )

How can we rewrite the criterion Jw = σd2 − 2pT w + wT Rw

(19)

in a quadratic form (how to complete a perfect ”square”, encompassing −2pT w)? Consider first the case when w is simply a scalar (resulting also in scalars R, r, p) 2

Jw = Rw − 2pw +

σd2

p p p2 p2 p 2 p2 2 2 2 = R(w − 2w ) + σd = R(w − 2w + 2 ) − + σd = R(w − ) − + σd2 R R R R R R 2

In the case when w is a vector, the term corresponding to the one-dimensional

p2 R

is pT R−1 p

Jw = wT Rw − 2pT w + pT R−1 p − pT R−1 p + σd2 = (w − R−1 p)T R(w − R−1 p) − pT R−1 p + σd2 = Jw0 + (w − R−1 p)T R(w − R−1 p) = Jw0 + (w − w0 )T R(w − w0 ) 2 (This was the solution of exercise 5.5 page 182 in [Haykin 91])

Lecture 2

8

Let λ1 , λ2 , . . . , λM be the eigenvalues and (generally the complex) eigenvectors µ1 , µ2 , . . . , µM of the matrix R, thus satisfying Rµi = λi µi (20) Then the matrix Q = [µ1 µ2 . . . µM ] can transform R to a diagonal form Λ as follows R = QΛQH

(21)

where the superscript H means complex conjugation and transposition. Then Jw = Jw0 + (w − w0 )T R(w − w0 ) = Jw0 + (w − w0 )T QΛQH (w − w0 ) Introduce now the transformed version of the tap vector w as ν = QH (w − wo ) Now the quadratic form can be put into its canonical form J = Jwo + ν H Λν = Jwo + = Jwo +

M X i=1 M X i=1

λi νi νi∗ λi |νi |2

(22)

Lecture 2

9

Optimal Wiener Filter Design: Example • (Useful) Signal Generating Model The model is given by the transfer function H1 (z) =

D(z) 1 1 = = −1 V1 (z) 1 + az 1 + 0.8458z −1

or the difference equation d(n) + ad(n − 1) = v1 (n)

d(n) + 0.8458d(n − 1) = v1 (n)

where σv21 = rv1 (0) = 0.27 • The channel (perturbation) model is more complex. It involves a low pass filter with a transfer function H2 (z) =

X(z) 1 1 = = −1 D(z) 1 + bz 1 − 0.9458z −1

leading for the variable x(n) to the difference equation x(n) = 0.9458x(n − 1) + d(n) and a white noise corruption (x(n) and v2 (n) are uncorrelated) u(n) = x(n) + v2 (n) with σv22 = rv2 (0) = 0.1 resulting in the final measurable signal u(n). • FIR Filter The signal u(n) will be filtered in order to recover the original (useful) d(n) signal, using the filter y(n) = w0 u(n) + w1 u(n − 1)

Lecture 2

10

Lecture 2

11

We plan to apply the Wiener – Hopf equations  

ru (0) ru (1) ru (1) ru (0)

 

w0 w1







=

Ed(n)u(n) Ed(n)u(n − 1)

 

The signal x(n) obeys the generation model H(z) =

1 X(z) 1 1 1 = H1 (z)H2 (z) = = = V1 (z) 1 + az −1 1 + bz −1 1 + a1 z −1 + a2 z −2 1 − 0.1z −1 − 0.8z −2

and thus x(n) + a1 x(n − 1) + a2 x(n − 2) = v1 (n) Using the fact that x(n) and v2 (n) are uncorrelated and u(n) = x(n) + v2 (n) it results ru (k) = rx (k) + rv2 (k) and consequently, since for white noise rv2 (0) = σv22 = 0.1 and rv2 (1) = 0 it follows ru (0) = rx (0) + 0.1, and ru (1) = rx (1) Now we concentrate to find rx (0), rx (1) for the AR process x(n) + a1 x(n − 1) + a2 x(n − 2) = v(n)

Lecture 2

12

First multiply in turn the equation with x(n), x(n − 1) and x(n − 2) and then take the expectation Ex(n)× →

Ex(n)x(n) + Ex(n)a1 x(n − 1) + Ex(n)a2 x(n − 2) = Ex(n)v(n) resulting in rx (0) + a1 rx (1) + a2 rx (2) = Ex(n)v(n) = σv2

Ex(n − 1)× →

Ex(n − 1)x(n) + Ex(n − 1)a1 x(n − 1) + Ex(n − 1)a2 x(n − 2) = Ex(n − 1)v(n) resulting in rx (1) + a1 rx (0) + a2 rx (1) = Ex(n − 1)v(n) = 0

Ex(n − 2)× →

Ex(n − 2)x(n) + Ex(n − 2)a1 x(n − 1) + Ex(n − 2)a2 x(n − 2) = Ex(n − 2)v(n) resulting in rx (2) + a1 rx (1) + a2 rx (0) = Ex(n − 2)v(n) = 0

The equality Ex(n)v(n) = σv2 can be obtained multiplying the AR model difference equation with v(n) and then taking expectations Ev(n)× →

Ev(n)x(n) + Ev(n)a1 x(n − 1) + Ev(n)a2 x(n − 2) = Ev(n)v(n) resulting in Ev(n)x(n) = σv2

since v(n) is uncorrelated with older values, x(n−τ ). We obtained the most celebrated Yule Walker equations: rx (0) + a1 rx (1) + a2 rx (2) = σv2 rx (1) + a1 rx (0) + a2 rx (1) = 0 rx (2) + a1 rx (1) + a2 rx (0) = 0 or as usually given in matrix form

    

rx (0) rx (1) rx (2) rx (1) rx (0) rx (1) rx (2) rx (1) rx (0)

    

1 a1 a2





   

= 



σv2 0 0

    

Lecture 2

13

But we need to use the equations differently:     

1 a1 a2 a1 1 + a2 0 a2 a1 1

    

rx (0) rx (1) rx (2)





   

= 



σv2 0 0

    

Solving for rx (0), rx (1), rx (2) we obtain Ã

!

1 + a2 σv2 rx (0) = 1 − a2 (1 + a2 )2 − a21 −a1 rx (0) 1 + a2   2 a 1  rx (0) rx (2) = −a2 + 1 + a2 rx (1) =

In our example we need only the first two values, rx (0), rx (1), which result to be rx (0) = 1, rx (1) = 0.5. Now we will solve for the cross-correlations Ed(n)u(n), Ed(n)u(n − 1). First observe Eu(n)d(n) = E(x(n) + v2 (n))d(n) = Ex(n)d(n) Eu(n − 1)d(n) = E(x(n − 1) + v2 (n − 1))d(n) = Ex(n − 1)d(n) and now take as a “master” difference equation x(n) + bx(n − 1) = d(n)

Lecture 2

14

and multiply it in turn with x(n) and x(n − 1) and then take the expectation

Ex(n) →

Ex(n − 1) →

Ex(n)x(n) + Ex(n)bx(n − 1) = Ex(n)d(n) Ex(n)d(n) = rx (0) + brx (1) Ex(n − 1)x(n) + Ex(n − 1)bx(n − 1) = Ex(n − 1)d(n) Ex(n − 1)d(n) = rx (1) + brx (0)

Using the numerical values, one obtains Eu(n)d(n) = Ex(n)d(n) = 0.5272

Eu(n − 1)d(n) = Ex(n − 1)d(n) = −0.4458

Now we have all necessary variables needed to write the Wiener – Hopf equations  

ru (0) ru (1) ru (1) ru (0)  

resulting in

 

1.1 0.5 0.5 1.1  

w0 w1

w0 w1

 







=

w0 w1







=

Ed(n)u(n) Ed(n)u(n − 1)







=

0.5272 −0.4458

0.8360 −0.7853

 

 

 