Bayesian source separation with mixture of

Figure 1- Results of separation with QAM-&"╧ (Quadratic Amplitude Modulation) ..... 5. Estimate the mixing matrix according to the re-estimation equation (56).
248KB taille 21 téléchargements 384 vues
Bayesian source separation with mixture of Gaussians prior for sources and Gaussian prior for mixture coefficients Hichem Snoussi and Ali Mohammad-Djafari

 Laboratoire des Signaux et Systèmes (L2S), Supélec, Plateau de Moulon, 91192 Gif-sur-Yvette Cedex, France Abstract. In this contribution, we present new algorithms to source separation for the case of noisy instantaneous linear mixture, within the Bayesian statistical framework. The source distribution prior is modeled by a mixture of Gaussians [1] and the mixing matrix elements distributions by a Gaussian [2]. We model the mixture of Gaussians hierarchically by mean of hidden variables representing the labels of the mixture. Then, we consider the joint a posteriori distribution of sources, mixing matrix elements, labels of the mixture and other parameters of the mixture with appropriate prior probability laws to eliminate degeneracy of the likelihood function of variance parameters and we propose two iterative algorithms to estimate jointly sources, mixing matrix and hyperparameters: Joint MAP (Maximum a posteriori) algorithm and penalized EM algorithm. The illustrative example is taken in [3] to compare with other algorithms proposed in literature.

PROBLEM DESCRIPTION We consider a linear instantaneous mixture of  sources. Observations could be corrupted by an additive noise. This noise may represent measurement errors or model incertainty:

 

  

  !

(1)

where   is the ("$#  ) measurement vector, %  is the ( #  ) source vector which components have to be separated, is the mixing matrix of dimension ("&#  ) and '  represents noise affecting the measurements. We assume that the ("(# ) noise matrix '  is statistically independant of sources, centered, white and Gaussian with known variance )+* , . We note /.1020 3 the matrix  # of sources and 4.1020 3 the matrix "(# of data. Source separation problem consists of two sub-problems: Sources restoration and mixing matrix identification. Therefore, three directions can be followed:

1. Supervised learning: Identify knowing a training sequence of sources  , then use it to reconstruct the sources. 2. Unsupervised learning: Identify directly from a part or the whole observations and then use it to recover  .

3. Unsupervised joint estimation: Estimate jointly  and In the following, we investigate the third direction. This choice is motivated by practical cases where sources and mixing matrix are unknown. This paper is organised as follows: We begin in section II by proposing a Bayesian approach to source separation. We set up the notations, present the prior laws of the sources and the mixing matrix elements and present the joint MAP estimation algorithm assuming known hyperparameters. We introduce, in section III, a hierarchical modelisation of the sources by mean of hidden variables representing the labels of the mixture of Gaussians in the prior modeling and present a version of JMAP using the estimation of these hidden variables (classification) as an intermediate step. In both algorithms, we assumed known the hyperparameters which is not realistic in applications. That is why, in section IV, we present an original method for the estimation of hyperparameters which takes advantages of using this hierarchical modeling. Finally, since EM algorithm has been used extensively in source separation [4], we considered this algorithm and propose, in section V, a penalized version of the EM algorithm for source separation. This penalization of the likelihood function is necessary to eliminate its degeneracy when some variances of Gaussian mixture approche zero [5]. Each section is supported by one typical simulation result and partial conclusion. At the end, we compare the two last algorithms.

BAYESIAN APPROACH TO SOURCE SEPARATION Given the observations -.1020 3 and is:

4.1020 3

, the joint a posteriori distribution of unknown variables

5 6 71/.1020 39824.1020 3: ; 5  71/.1020 3% 5 6  5 !-.1020 3% (2) where 5 6  and 5 !-.1020 3% are the prior distributions through which we modelise our a priori information about sources  and mixing matrix . 5 !4.1020 3=82 ?@-.1020 3% is the joint likelihood distribution. We have, now, three directions:

1. First, integrate (2) with respect to -.1020 3 to obtain the marginal in by: A

and then estimate

B DCFE GIHJCLK 6  RQTS 5 ! U8 4.1020 3%WV M NPO 2. Second, integrate (2) with respect to to obtain the marginal in -.1020 3 estimate -.1020 3 by: A /.1020 3X RCFEYGIHZCLK !-.1020 3 RQS 5 6/.1020 39824.1020 3%WV []\^ ^ _ NPO -.1020 3 and : 3. Third, estimateA jointly A  ? -.1020 3% RCFEYGIHZCLK 6 71/.1020 3% RQS 5 6 71-.1020 3b824.1020 3%WV ` Mba []\^ ^ _dc NPO

(3) and then (4)

(5)

Choice of a priori distributions The a priori distribution reflects our knowledge concerning the parameter to be estimated. Therefore, it must be neither very specific to a particular problem nor too general (uniform) and non informative. A parametric model for these distributions seems to fit this goal: Its stucture expresses the particularity of the problem and its parameters allow a certain flexibility. Sources a priori: For sources  , we choose a mixture of Gaussians [1]:

5 6egfh k l ij f m .n

lpo

l

l

 " f  ) f * 

qr F 

(6)

Hyperparameters s f are supposed to be known. This choice was motivated by the following points: It represents a general class of distributions and is convenient in many digital communications and image processing applications. 5 -.1020 3 1  (considered as a function of -.1020 3 ), the • For a Gaussian likelihood a posteriori law remains in the same class (conjugate prior). We then have only to update the parameters of the mixture with the data.



Mixing matrix a priori: To account for some model uncertainty, we assign a Gaussian l: prior law to each element of the mixing l matrix o l

5 6 f

6u&f  ) v* a f  l l(7) which can be interpreted as knowing every element ( u&f ) with some uncertainty () v * a f ). We underline here the advantage of estimating the mixing matrix and not a separating matrix w (inverse of ) which is the case of almost all the existing methods for source separation (see for example [6]). This approach has at least two advantages: (i) does not need to be invertible (y x " ), (ii) naturally, we have some a priori information on the mixing matrix not on its inverse which may not exist.

JMAP algorithm We propose an alternating iterative algorithm to estimate jointly extremizing the log-posterior distribution: A

A z  `~} c

{>| A .1020 3 `ƒ} c

CFE GHZCLK []\^ ^ _ QS 5X€ ~` }h A . c 1-.1020 3b824.1020 3:‚ CFEYGIHZCLK M QTS 5 € 7  ~`.1} 020 c 3 824.1020 3 ‚

-.1020 3

and



by

(8)

In the following, we suppose that sources are white and spatially independant. This assumption is not necessary in our approach but we start from here to be able to compare later with other classical methods in which this hypothesis is fundamental.

6„4…' , the criterion to optimize with respect to -.1020 3 is: A 3 ‡ 6/.1020 3 k † QTS 5X€  '8 `ƒ} c @  ‚ˆ k ‰ QS 5 f!ŠfL  1‹ (9) O fm . m . Therefore, the optimisation is done independantly at each time  : A A † . %  `~}gŒ c RCFEYGIHZCLK QTS 5 € '8 `ƒ} c 1 ‚  k ‰ QTS 5 f!LfF YWV (10) [`c N fm . The a posteriori distribution of  is a mixture of Ž f ‰ m . s f Gaussians. This leads to a With this hypothesis, in step

high computational cost. To obtain a more reasonable algorithm, we propose an iterative scalar algorithm. The first step consists in estimating each source component knowing the other components estimated in the previous iteration: A A A

† LfL  `~}gŒ . c RCFEYGIHZCLK QTS 5 € LfF 82  `~} c  Tm ‘ fL  `ƒ} c ‚ V (11) [ `c N o j The a posteriori distribution of Šf is a mixture of s f Gaussians: ’ “iTj m . f “  " f “  ) f “ *  , nI” ” ” with: z••• f * " f “  )f * “ " f  ) •••

f " ”“ ••• ) f*  ) f* “ •• { )f * )f * “ ••• ) f” “ * ) f*  ) f* “ (12) •• ••|• ••    f “ f “ f  f4– K—˜W™ š f  f  " f ™ " f “  *Y› n ” n ) *“ ) * ) *“ ) * where

z•••

l )+* l f) *

. l* f A l ••• ’  œ m l l • { =f ’ m . l f%! l ™   ‰ ••• " ••• ’ œ m . l * f l A ••|• •  k Y žm ‘ f •••

(13)

A l

If the means " f “ aren’t close to each other, we are in the case of a multi-modal ” distribution. The algorithm to estimate Šf is to first compute  , Ÿ  Y Y  " , " f and

)f *

f “ by (12), and select the " f “ for which the ratio ¡£ ” j!¢ ” ” j¾ f8 * l 82¾ f8 * k  k%l   ‹ Ÿ :6¾  .  š k%l  k   Ä / ` «82¾ f8 ™ f " /Ä `  8>¾ 18 * ™ f " * Figure ^FÄ represents the index evolution through iterations. Note the convergence of Ñ SR . For the same JMAP algorithm since iteration Ê · to a satisfactory value of ™ É Q SNR, algorithms PWS, NS [3] and EASI [6] reach a value greater than ™ËÊ ÑQSR after Ï··F· observations. Figures ^a and ^F© illustrate the identification of hyperparameters. We note the algorithm convergence to the original values ( ™  for " .1. and '·· for ( .1. ). In order to validate the idea of data classification before estimating hyperparameters,

we can visualize the evolution of classification error (number of data badly classified). Figure ^ shows that this error converges to zero at iteration Ñ . Then, after this iteration, hyperparameters identification is performed on the true classified data. Estimation of " f “ and ( f “ takes into account only data which belong to this class and then it is not corrupted by other data which bring erroneous information on these hyperparameters.

z• eŠ.h 

{| •

e    * z• `.Ö 

{| •

  A* •z eŠ.h  {| •A e   A * •z eŠ.h  eŠ.Ö  ™ {| •A e   ™ e   * * `

2

2

1

1

1

0 −1 −2 −2

Sh2

2

x2

s2

Figure 4- Separation results with ¾NMPOD ÑQSR

0 −1

0 s1

2

−2 −2

0 −1

0 x1

2

−2 −2

0 Sh1

2

Figure 5- Separation results with ¾NMPOD ÑQSR : Phase space distribution of sources, mixed signals and separated sources.

s1

s2

300

400 300

200

200 100

100

0 −2

−1

0 x1

1

0 −2

2

200

200

150

150

100

100

50

−1

0 x2

1

2

−2

0 Sh2

2

4

−1

0

1

2

50

0 −4

−2

0 Sh1

2

0 −4

4

600

600

400

400

200

200

0 −2

−1

0

1

0 −2

2

Figure 6- Separation results with ¾NMPOD ÑQSR : Histograms of sources, mixed signals and separated sources. 0

0

−5

−10

−15

m11

index

−20

−25

−0.5

−30

−35

−40

−45

−50

0

10

20

30 iteration

40

50

−1

60

0

10

20

30 iteration

40

Figure 7-b- Identification of b

Figure 7-a- Evolution of index through iterations 120

50

60

.1.

1000

900

100 800

700

ErreurPartition

psi11

80

60

600

500

400

40 300

200

20 100

0

0

10

20

30 iteration

40

Figure 7-c- Identification of c

50

.1.

60

0

0

10

20

30 iteration

40

50

60

Figure 7-d- Evolution of classification error

Thus, a joint estimation of sources, mixing matrix and hyperparameters is performed successfully with a JMAP algorithm. The EM algorithm was used in [4] to solve source

separation problem in a maximum likelihood context. We now use the EM algorithm in a Bayesian approach to take into account of our a priori information on the mixing matrix.

PENALIZED EM The EM algorithm has been used extensively in data analysis to find the maximum likelihood estimation of a set of parameters from given data [8]. Considering both the mixing matrix and hyperparameters ù , at the same level, being unknown parameters and complete data 4.1020 3 and /.1020 3 . Complete data means jointly observed data 4.1020 3 and unobserved data -.1020 3 . The EM algorithm is executed in two steps: (i) E-step (expectation) consists in forming the logarithm of the joint distribution of observed data  and hidden data  conditionally to parameters and ù and then compute its expectation conditionally to  and estimated parameters and ù ” (evaluated in ” the previous iteration), (ii) M-step (maximization) consists of the maximization of the obtained functional with respect to the parameters and ù : 1. E-step :

…@ 7 ù