1
Lecture 2: Linear Optimum Filtering Wiener Filters Problem statement • Given the sequence of input samples [u(0), u(1), u(2), . . .] and the sequence of desired response samples [d(0), d(1), d(2), . . .] • Given a family of filters computing their outputs according to y(n) =
∞ X
wk u(n − k), n = 0, 1, 2, . . .
(1)
k=0
• Find the parameters {w0 , w1 , w2 , . . .} such as to minimize the mean square error defined as J = E[e(n)2 ] where the error signal is e(n) = d(n) − y(n) = d(n) −
(2) ∞ X
wl u(n − l)
l=0
2 The family of filters (1) is the family of linear discrete time filters (IIR or FIR).
(3)
Lecture 2
2
Principle of orthogonality Define the gradient operator ∇, having its k-th entry ∂ ∂wk P and thus, the k−th entry of the gradient of criterion J is (remember, e(n) = d(n) − ∞ l=0 wl u(n − l)) ∇k =
(4)
∂J ∂e(n) ∇k J = = 2E e(n) = −2E [e(n)u(n − k)] ∂wk ∂wk For the criterion to attain its minimum, the gradient of the criterion must be identically zero, that is ∇k J = 0,
k = 0, 1, 2, . . .
resulting in the fundamental Principle of orthogonality: E [eo (n)u(n − k)] = 0,
k = 0, 1, 2, . . .
(5)
Stated in words: • The criterion J attains its minimum iff • the estimation error eo (n) is orthogonal to the samples u(i) which are used to compute the filter output. We will index with o all the variables e.g. eo , yo computed using the optimal parameters {wo0 , wo1 , wo2 , . . .}. Let us compute the cross- correlation
E [eo (n)yo (n)] = E
e
o (n)
∞ X k=0
wok u(n −
k)
=
∞ X
wok E [u(n − k)eo (n)] = 0
k=0
(6)
Lecture 2
3
Otherwise stated, in words, we have the following Corollary of Orthogonality Principle: • When the criterion J attains its minimum then • the estimation error e0 (n) is orthogonal to the filter output y0 (n). Wiener – Hopf equations From the orthogonality estimation error – input window samples we have
E [u(n − k)e0 (n)] = 0, k = 0, 1, 2, . . .
E u(n − k)(d(n) −
∞ X
woi u(n − i)) = 0, k = 0, 1, 2, . . .
i=0 ∞ X
woi E [u(n − k)u(n − i)] = E [u(n − k)d(n)] , k = 0, 1, 2, . . .
i=0
But * E [u(n − k)u(n − i)] = r(i − k) is the autocorrelation function of input signal u(n) at lag i − k * E [u(n − k)d(n)] = p(−k) is the cross-correlation between the filter input u(n − k) and the desired signal d(n) and therefore
∞ X i=0
woi r(i − k) = p(−k), k = 0, 1, 2, . . .
W IEN ER − HOP F
(7)
Lecture 2
4
Solution of the Wiener – Hopf equations for linear transversal filters (FIR)
y(n) =
M −1 X
wk u(n − k), n = 0, 1, 2, . . .
(8)
k=0
and since only w0 , w1 , w2 , . . . wM −1 are nonzero, Wiener-Hopf equations become M −1 X
woi r(i − k) = p(−k), k = 0, 1, 2, . . . , M − 1 W IEN ER − HOP F
(9)
i=0
which is a system of M equations with M unknowns: {wo,0 , wo,1 , wo,2 , . . . , wo,M −1 }. Matrix formulation of Wiener – Hopf equations Let us denote h
u(n) u(n − 1) u(n − 2) . . . u(n − M + 1)
u(n) =
R = E[u(n)uT (n)] = E =
u(n) u(n − 1) u(n − 2) u(n − M + 1)
iT
h u(n) u(n − 1)
Eu(n)u(n) Eu(n)u(n − 1) ... Eu(n)u(n − M + 1) Eu(n − 1)u(n) Eu(n − 1)u(n − 1) ... Eu(n − 1)u(n − M + 1) . . . . . . . . . . . . Eu(n − M + 1)u(n) Eu(n − M + 1)u(n − 1) . . . Eu(n − M + 1)u(n − M + 1) h
p = E[u(n)d(n)] =
p(0) p(−1) p(−2) . . . p(1 − M )
h
w0 =
i
u(n − 2) . . . u(n − M + 1)
wo,0 wo,1 . . . wo,M −1
iT
iT
(10)
=
r(0) r(1) . . . r(M − 1) r(1) r(0) . . . r(M − 2) . . . . . . . . . . . . r(M − 1) r(M − 2) . . . r(0)
(11) (12)
Lecture 2
5
then Wiener – Hopf equations can be written in a compact form with solution wo = R−1 p
Rw0 = p
(13)
Mean square error surface Let us define ew (n) = d(n) −
M −1 X
wk u(n − k) = d(n) − wT u(n)
(14)
k=0
Then the cost function can be written as Jw = = = = = =
E[ew (n)ew (n)] = E[(d(n) − wT u(n))(d(n) − uT (n)w)] E[d2 (n) − d(n)uT (n)w − wT u(n)d(n) + wT u(n)uT (n)w] E[d2 (n)] − E[d(n)uT (n)]w − wT E[u(n)d(n)] + wT E[u(n)uT (n)]w E[d2 (n)] − 2E[d(n)uT (n)]w + wT E[u(n)uT (n)]w σd2 − 2pT w + wT Rw σd2
−2
M −1 X i=0
p(−i)wi +
M −1 M −1 X X
wl wi Ri,l
l=0 i=0
Thus, we can proceed in another way to find the (same) optimal solution wo . * Jw is a second order function of the parameters { w0 w1 . . . wM −1 } * J[
w0 w1 . . . wM −1
]
is a bowl shaped M + 1- dimensional surface with M degrees of freedom.
(15)
Lecture 2
6
* J attains the minimum, Jmin , where the gradient is zero ∇w J = 0 ∂J = 0, k = 0, 1, . . . , M − 1 ∂wk M −1 X ∂J = −2p(−k) + 2 wl r(k − l) = 0, ∂wk l=0
k = 0, 1, . . . , M − 1
which finally gives the same Wiener – Hopf equations M −1 X
wl r(k − l) = p(−k)
(16)
l=0
Minimum Mean square error Using the form of the criterion Jw = σd2 − 2pT w + wT Rw
(17)
one can find the value of the minimum criterion (remember, Rw0 = p and wo = R−1 p): Jwo = = = =
σd2 − 2pT wo + wTo Rwo = σd2 − 2wTo Rwo + wTo Rwo σd2 − wTo Rwo σd2 − wTo p σd2 − pT R−1 p
(18)
Lecture 2
7
Canonical form of the Error - performance surface (Parenthesis: How to compute a scalar out of a vector w, containing the entries of w at power one (linear combination) or at power two (quadratic form): * linear combination (first order form) aT w = * quadratic form wT Rw =
PM −1 l=0 al wl ;
PM −1 PM −1 i=0 wl wi Ri,l l=0
2 = w02 R0,0 + w0 w1 R1,0 + . . . wM −1 RM −1,M −1 )
How can we rewrite the criterion Jw = σd2 − 2pT w + wT Rw
(19)
in a quadratic form (how to complete a perfect ”square”, encompassing −2pT w)? Consider first the case when w is simply a scalar (resulting also in scalars R, r, p) 2
Jw = Rw − 2pw +
σd2
p p p2 p2 p 2 p2 2 2 2 = R(w − 2w ) + σd = R(w − 2w + 2 ) − + σd = R(w − ) − + σd2 R R R R R R 2
In the case when w is a vector, the term corresponding to the one-dimensional
p2 R
is pT R−1 p
Jw = wT Rw − 2pT w + pT R−1 p − pT R−1 p + σd2 = (w − R−1 p)T R(w − R−1 p) − pT R−1 p + σd2 = Jw0 + (w − R−1 p)T R(w − R−1 p) = Jw0 + (w − w0 )T R(w − w0 ) 2 (This was the solution of exercise 5.5 page 182 in [Haykin 91])
Lecture 2
8
Let λ1 , λ2 , . . . , λM be the eigenvalues and (generally the complex) eigenvectors µ1 , µ2 , . . . , µM of the matrix R, thus satisfying Rµi = λi µi (20) Then the matrix Q = [µ1 µ2 . . . µM ] can transform R to a diagonal form Λ as follows R = QΛQH
(21)
where the superscript H means complex conjugation and transposition. Then Jw = Jw0 + (w − w0 )T R(w − w0 ) = Jw0 + (w − w0 )T QΛQH (w − w0 ) Introduce now the transformed version of the tap vector w as ν = QH (w − wo ) Now the quadratic form can be put into its canonical form J = Jwo + ν H Λν = Jwo + = Jwo +
M X i=1 M X i=1
λi νi νi∗ λi |νi |2
(22)
Lecture 2
9
Optimal Wiener Filter Design: Example • (Useful) Signal Generating Model The model is given by the transfer function H1 (z) =
D(z) 1 1 = = −1 V1 (z) 1 + az 1 + 0.8458z −1
or the difference equation d(n) + ad(n − 1) = v1 (n)
d(n) + 0.8458d(n − 1) = v1 (n)
where σv21 = rv1 (0) = 0.27 • The channel (perturbation) model is more complex. It involves a low pass filter with a transfer function H2 (z) =
X(z) 1 1 = = −1 D(z) 1 + bz 1 − 0.9458z −1
leading for the variable x(n) to the difference equation x(n) = 0.9458x(n − 1) + d(n) and a white noise corruption (x(n) and v2 (n) are uncorrelated) u(n) = x(n) + v2 (n) with σv22 = rv2 (0) = 0.1 resulting in the final measurable signal u(n). • FIR Filter The signal u(n) will be filtered in order to recover the original (useful) d(n) signal, using the filter y(n) = w0 u(n) + w1 u(n − 1)
Lecture 2
10
Lecture 2
11
We plan to apply the Wiener – Hopf equations
ru (0) ru (1) ru (1) ru (0)
w0 w1
=
Ed(n)u(n) Ed(n)u(n − 1)
The signal x(n) obeys the generation model H(z) =
1 X(z) 1 1 1 = H1 (z)H2 (z) = = = V1 (z) 1 + az −1 1 + bz −1 1 + a1 z −1 + a2 z −2 1 − 0.1z −1 − 0.8z −2
and thus x(n) + a1 x(n − 1) + a2 x(n − 2) = v1 (n) Using the fact that x(n) and v2 (n) are uncorrelated and u(n) = x(n) + v2 (n) it results ru (k) = rx (k) + rv2 (k) and consequently, since for white noise rv2 (0) = σv22 = 0.1 and rv2 (1) = 0 it follows ru (0) = rx (0) + 0.1, and ru (1) = rx (1) Now we concentrate to find rx (0), rx (1) for the AR process x(n) + a1 x(n − 1) + a2 x(n − 2) = v(n)
Lecture 2
12
First multiply in turn the equation with x(n), x(n − 1) and x(n − 2) and then take the expectation Ex(n)× →
Ex(n)x(n) + Ex(n)a1 x(n − 1) + Ex(n)a2 x(n − 2) = Ex(n)v(n) resulting in rx (0) + a1 rx (1) + a2 rx (2) = Ex(n)v(n) = σv2
Ex(n − 1)× →
Ex(n − 1)x(n) + Ex(n − 1)a1 x(n − 1) + Ex(n − 1)a2 x(n − 2) = Ex(n − 1)v(n) resulting in rx (1) + a1 rx (0) + a2 rx (1) = Ex(n − 1)v(n) = 0
Ex(n − 2)× →
Ex(n − 2)x(n) + Ex(n − 2)a1 x(n − 1) + Ex(n − 2)a2 x(n − 2) = Ex(n − 2)v(n) resulting in rx (2) + a1 rx (1) + a2 rx (0) = Ex(n − 2)v(n) = 0
The equality Ex(n)v(n) = σv2 can be obtained multiplying the AR model difference equation with v(n) and then taking expectations Ev(n)× →
Ev(n)x(n) + Ev(n)a1 x(n − 1) + Ev(n)a2 x(n − 2) = Ev(n)v(n) resulting in Ev(n)x(n) = σv2
since v(n) is uncorrelated with older values, x(n−τ ). We obtained the most celebrated Yule Walker equations: rx (0) + a1 rx (1) + a2 rx (2) = σv2 rx (1) + a1 rx (0) + a2 rx (1) = 0 rx (2) + a1 rx (1) + a2 rx (0) = 0 or as usually given in matrix form
rx (0) rx (1) rx (2) rx (1) rx (0) rx (1) rx (2) rx (1) rx (0)
1 a1 a2
=
σv2 0 0
Lecture 2
13
But we need to use the equations differently:
1 a1 a2 a1 1 + a2 0 a2 a1 1
rx (0) rx (1) rx (2)
=
σv2 0 0
Solving for rx (0), rx (1), rx (2) we obtain Ã
!
1 + a2 σv2 rx (0) = 1 − a2 (1 + a2 )2 − a21 −a1 rx (0) 1 + a2 2 a 1 rx (0) rx (2) = −a2 + 1 + a2 rx (1) =
In our example we need only the first two values, rx (0), rx (1), which result to be rx (0) = 1, rx (1) = 0.5. Now we will solve for the cross-correlations Ed(n)u(n), Ed(n)u(n − 1). First observe Eu(n)d(n) = E(x(n) + v2 (n))d(n) = Ex(n)d(n) Eu(n − 1)d(n) = E(x(n − 1) + v2 (n − 1))d(n) = Ex(n − 1)d(n) and now take as a “master” difference equation x(n) + bx(n − 1) = d(n)
Lecture 2
14
and multiply it in turn with x(n) and x(n − 1) and then take the expectation
Ex(n) →
Ex(n − 1) →
Ex(n)x(n) + Ex(n)bx(n − 1) = Ex(n)d(n) Ex(n)d(n) = rx (0) + brx (1) Ex(n − 1)x(n) + Ex(n − 1)bx(n − 1) = Ex(n − 1)d(n) Ex(n − 1)d(n) = rx (1) + brx (0)
Using the numerical values, one obtains Eu(n)d(n) = Ex(n)d(n) = 0.5272
Eu(n − 1)d(n) = Ex(n − 1)d(n) = −0.4458
Now we have all necessary variables needed to write the Wiener – Hopf equations
ru (0) ru (1) ru (1) ru (0)
resulting in
1.1 0.5 0.5 1.1
w0 w1
w0 w1
=
w0 w1
=
Ed(n)u(n) Ed(n)u(n − 1)
=
0.5272 −0.4458
0.8360 −0.7853