Mean Field Approximation for BSS of images with a ... - Nadia BALI

The joint estimation of the sources, classification labels, the mixing matrix ... probability law of these unknowns, given the observed mixed data. In previous works ...
299KB taille 2 téléchargements 258 vues
Mean Field Approximation for BSS of images with a compound hierarchical Gauss-Markov-Potts model Nadia Bali∗ and Ali Mohammad-Djafari∗ ∗ Laboratoire des Signaux et Systèmes, Unité mixte de recherche 8506 (CNRS-Supélec-UPS) Supélec, Plateau de Moulon, 91192 Gif-sur-Yvette, France

Abstract. In this paper we consider the Blind Source Separation (BSS) of images whom prior distributions are modeled through a compound Gauss-Markov modeling with a hidden classification labels modeled with a Potts distribution. This model is a powerful tool for modeling the images which are composed of homogeneous regions. The joint estimation of the sources, classification labels, the mixing matrix and all the hyperparameters of the model (noise covariance, means and variances of the pixels in each region) can be done through the Gibbs sampling of the joint posterior probability law of these unknowns, given the observed mixed data. In previous works, we had implemented this general algorithm. However the huge complexity and cost of this algorithm limits its use in practical applications. In this paper, we propose a Mean Field Approximation (MFA) approach which consists in approximating this joint posterior by a separable one whose sampling will be much easier and less expensive. Based on MFA, we propose three algorithms which look like the Stochastic Expectation-Maximization (SEM) to jointly separate and segment the image sources. Key Words: Blind Source Separation (BSS), Mean field approximation, Variational Bayesian inference, Expectation-Maximization (EM), Gibbs sampling. Multi and hyperspectral image processing.

INTRODUCTION The BSS of the images can be modeled as x(r) = As(r) + (r)

(1)

where x(r) = {xi (r), i = 1, · · · , m} is a set of m observed mixed images,r is spatial position, A is an unknown mixing matrix of dimensions (m, n), s(r) = {sj (r), j = 1, · · · , n} is a set of n unknown source images, and (r) represents the errors (modeling and measurement). If we note by x = {x(r), r ∈ R}, s = {s(r), r ∈ R} and  = {(r), r ∈ R}, then we can write x = As + . The first step in the Bayesian estimation approach is to translate our prior knowledge on the model and the errors  to the likelihood p(x|s, A, θ 1 ) and our prior knowledge on the unknown sources to the prior law p(s|θ 2 ) and appropriate prior laws p(A) for the unknown mixing matrix and p(θ) for the hyperparameters

θ = {θ 1 , θ 2 }, to obtain an expression for the posterior probability law p(s, A, θ|x) ∝ p(x|s, A, θ 1 ) p(s|θ 2 ) p(A) p(θ) The second step is to use this posterior law to infer either all the unknowns or only some of them in particular. In the following, we assume that the errors are centered, white, Gaussian, independent and identically distributed. This leads to Y p(x|s, A, θ) = N (x(r)|As(r), Σ ) r

where θ = Σ = diag(σ21 , · · · , σ2m ). One of the main points in any Bayesian approach is the appropriate choice of the prior modeling for the sources. In image processing, the Markovian models are the appropriate tools. In the following we consider two such models and then we discuss their relative performances, their algorithmic complexities, and their usefulness in BSS problems.

COMPOUND GAUSS-MARKOV MODELING WITH CLASSIFICATION HIDDEN VARIABLES Many different compound Gauss-Markov models with different hidden variables (contours or regions) have been used in many image processing problems such denoising, segmentation, restoration and reconstruction [1, 2, 3]. However, their use in BSS is recent. This model has many features: it reflects naturally the piecewise homogeneity of the pixels distribution through a Mixture of Gaussians (MoG) distribution for the marginal distributions of pixels and a Potts Markov field for their spatial structures. This model has been used in many image processing applications by the authors and their collaborators [2]. Here, we consider the following model :  2 p(sj (r)|zj (r) = z) = N (m   jz , σjz ), z = 1, · · · , K   X X p(zj (r), r ∈ R|βj ) ∝ exp βj δ(zj (r) − zj (r 0 ))    r∈R r 0 ∈V(r)

Where z represents labels of classes in MoG model and βj are parameters of the Potts model which control the mean size of the class populations. which results to the joint probability law : " # Y X p(s, z) = p(sj (r), zj (r), r ∈ R) ∝ exp − Uj (sj (r), zj (r), r ∈ R) j

j

with

  K X X 1 X X (sj (r) − mjz )2 Uj (sj (r), zj (r), rb ∈ R) = +β δ(zj (r)−zj (r 0 )) 2 2 z=1 r∈R σjz r∈R 0 zω

r ∈V(r)

(2)

S P P P where we used Rjz = {r : zj (r) = z}, Rj = K = R and jz z r∈Rz . r z=1 The main idea in Bayesian approach is then to use this prior model to obtain the posterior law: p(s, z, θ|x) ∝ p(x|s, z, θ) p(s, z|θ) p(θ)

2 where θ = (A, Σ , {mjz , σjz }) represents the mixing matrix A and all the hyperparameters of the prior modeling of the noise and the sources. From this expression, there are many possibilities to use it to infer all the unknowns of the problem. One may try to develop algorithms to reach the joint M maximum a posteriori (MAP) or to reach the posterior mean. For the first, one may try to optimize it iteratively with respect to each set of unknowns while keeping the others fixed. For the second, the main tool is the Gibbs sampling which gives the possibility to generate samples from this posterior and then use those samples to compute the mean values or any other statistics on the unknowns. One may also mix the maximization and sampling and even when possible direct analytical computation of the mean values to obtain the Expectation-Restoration-Maximization algorithms such as those we propose in the following.



Algorithm 1: This algorithm uses each set of unknowns iteratively conditioned on the values obtained for the others in previous iteration: s ∼ p(s|z, θ, x)



−→

z ∼ p(z|s, θ, x)

−→

θ ∼ p(θ|s, z, x)

(3)

Algorithm 2: Here, the unknowns are first splited in two categories: the hidden variables (s, z) and all the hyperparameters θ: (s, z) ∼ p(s, z|θ, x)

−→

θ ∼ p(θ|s, z, x)

The sampling of the hidden variables p(s, z|θ, x) in the first step is again splited in two hierarchical steps: sampling of p(s|z, θ, x) and p(s|θ, x) which results to: s ∼ p(s|z, θ, x)

−→

z ∼ p(z|θ, x)

−→

θ ∼ p(θ|s, z, x)

(4)

However, we may note that the computation of p(z|θ, x) needs marginalization over the sources s: Z p(z|θ, x) = p(z|s, θ, x) p(s|θ, x) ds which may be intractable in some cases. • Algorithm 3:

where

s ∼ p(s|z, θ, x)

−→

p(θ|x) =

z ∼ p(z|θ, x) X z

−→

p(x|z, θ)p(z|θ)p(θ)

θ ∼ p(θ|x)

(5)

If we replace the sampling θ ∼ p(θ|x) steps of these algorithms by only the most probable sample θ = arg maxθ {p(θ|x)}, then we obtain an Stochastic ExpectationMaximization (SEM)-like algorithm. Indeed, if we replace the sampling steps of the hidden variables by the computation of their expected values we obtain an EM-like algorithm. However, obtaining analytical expressions for those expectations is, in general, not possible. In our case we have: p(s, z|θ, x) = p(s|z, x, θ) p(z|x, θ) Let first examine the probability laws of the right hand •

p(s|z, x, θ) : p(s(r)|z(r), x(r), θ) ∝ p(x(r)|s(r), z(r), θ) p(s(r)|z(r), θ) ∝ p(x(r)|s(r), θ) p(s(r)|z(r), θ) where 

 −1 −1 0 p(x(r)|s(r), θ) ∝ exp (x(r) − As(r)) Σ (x(r) − As(r)) 2   p(s(r)|z(r), θ) ∝ exp (s(r) − mz(r) )0 Σz(r) (s(r) − mz(r) )   where mz(r) = [mz1 (r) , · · · , mzn (r) ], Σz(r) = {diag σz1 (r) , · · · , σzn (r) } and where zi (r) = {1, · · · , Ki }. We see that p(s|z, x, θ) is Gaussian and separable in r and we can write Y p(s(r)|z(r), x(r), θ) p(s|z, x, θ) = r

where

pr (s(r)|z(r), θ, x(r)) ∝ N (µ(r), B(r)) with ( i−1 h −1 B(r) = At Σ−1 A + Σ  z(r)

−1 µ(r) = B(r)[At Σ−1  x(r) + Σz(r) mz(r) ]



(6)

p(z|x, θ) : where

p(z|x, θ) ∝ p(x|z, θ) p(z) p(x|z, θ) ∝

with

Y

p(x(r)|z(r), θ)

r

p(x(r)|z(r), θ) = N (Amz(r) , AΣz(r) At + Σ )

(7)

We note that p(x|z, θ) is perfectly separable in r, but p(z|θ, x) is not due to the Markovian structure of p(z).

MEAN FIELD APPROXIMATION (MFA) The mean field approximation is a general method for approximating the expectation of a Markov random variable. The idea consists in, when considering a pixel, to neglect the variation of its neighbor pixels by fixing them to their mean values [4, 1]. Another

interpretation of theY MFA is to approximate a non separable posterior law p(z|x) by a separable q(z|x) = q(z(r)|x) in such a way to minimize KL(p, q) for a given class r

of separable distributions q ∈ Q [5]. In our case, this approximation consists in replacing   X X βδ(z(r) − z(r 0 )) p(z|β) ∝ exp  r r 0 ∈V(r)

where V represents the neighbors of r, and δ(z(r) − z(r 0 )) = [δ(z1 (r) − z1 (r 0 )), · · · , δ(zn (r) − zn (r 0 ))] with the following : Y ¯ 0 ), r 0 ∈ V(r), β) q(z|β) ' p(z(r)|z(r r

Then, using this approximation of the prior law results to: p(z|x, θ) ∝ ∝

Y r

Y r

¯ 0 ), r 0 ∈ V(r), β) p(x(r)|z(r), θ) q(z(r)|z(r ¯ 0 ), r 0 ∈ V(r), θ) p(z(r)|x(r), z(r

¯ 0 ), r 0 ∈ V(r), θ) ∝ p(x(r)|z(r), θ) p(z(r)|z(r ¯ 0 ), r 0 ∈ V(r)) Where p(z(r)|x(r), z(r ¯ and z(r) is given by: " # X X ¯ 0 )) + ln p(x(r)|z(r), θ) z(r) exp β δ(z(r) − z(r ¯ z(r) =

z(r)

X z(r)

"

exp β

r0

X r0

¯ 0 )) + ln p(x(r)|z(r), θ) δ(z(r) − z(r

#

(8)

EXPECTATION-MAXIMISATION ESTIMATE FOR θ b = arg max {p(θ|x)} = arg max {ln p(x|θ)} + ln p(θ). In algorithm 3, we need to estimate θ θ

θ

if we concider x as incomplete data and (x, s, z) as complete data, we can use the EM algorithm to obtain this estimate. To implement the EM algorithm for the estimation of the hyperparameters θ = (θ 1 , θ 2 , θ 3 ) where θ 1 = (Σ , A), 2 θ 2 = (mjk , σjk ), j = 1 . . . n, k = 1 . . . K, and θ 3 = βj , j = 1 . . . n, we need to compute: b = E[ln p(x, s, z|θ) + ln p(θ)|x, θ] b Q(θ, θ)

So, we need the expression of p(x, s, z|θ) = p(x|s, θ 1 )p(s|z, θ 2 )p(z|θ 3 ) X ln p(x, s, z|θ) = [ln p(x(r)|s(r), θ 1 ) + ln p(s(r)|z(r), θ 2 ) + ln p(z(r)|θ 3 )] r

We note that Q is separable into three parts : Q1 , Q2 , Q3 XXZ ln(p(x(r)|s(r), θ 1 )) p(s(r)|x(r), z(r), θ 1 ) p(z(r)|x(r), θ 1 )ds Q1 (θ, θ 1 ) = r

z(r)

Q2 (θ, θ 2 ) = Q3 (θ, θ 3 ) =

s

XXZ r

z(r)

r

z(r)

XX

ln(p(s(r)|z(r), θ 2 )) p(s(r), z(r)|x(r), θ 2 )ds s

ln(p(z(r)|θ 3 ))p(z(r)|x(r), θ 3 )

b which maximizes them, we obtain: Expanding these relations and looking for θ " #" #−1 X X dQ (θ, θ ) 1 1 0 0 b= A = x(r)¯ s (r) s¯(r)¯ s (r) + B(r) dA r r "" # " ## X X dQ (θ, θ ) 1 1 b = = x(r)x0 (r) − x(r)¯ s0 (r) . Σ dΣ r r # #−1 " " X X s¯(r)x0 (r) s¯(r)¯ s0 (r) + B(r) r

r

P ¯j (r)p(zj (r)|¯ zj (r 0 ), xj (r), θ) dQ2 (θ, θ 2 ) rs P µ bz (ω) = = P (zj (r)|¯ zj (r 0 ), xj (r), θ) dmjz P r sj (r) − mjz )2 p(zj (r)|¯ zj (r 0 ), xj (r), θ) dQ2 (θ, θ 2 ) r (¯ P σbz2 (ω) = = 2 zj (r 0 ), xj (r), θ) dσjz r P (zj (r)|¯

o nP P 0 b ¯j (r ), r ∈ V(r), θ) βj = arg maxβj r z(r) ln(pz¯ (zj (r)|βj )) p(zj (r)|xj (r), z Finally, the whole MFA algorithm becomes:   2  ¯ = E s(r)|x(r), z(r), mjz , σjz , θ = µ(r) s¯(r) ¯ ¯ 0 ), x(r), θ} z(r) = E {z(r)|z(r  Compute bΣ b , µ A, bz (ω), σbz2 (ω), βbj

SIMULATION RESULTS

The main objectives of these simulations are: first to show that the proposed algorithms give the desired results, and second to compare their relative performances. For this purpose, first we start be generating artificially the data according to the sources generating model, i.e., first generate the labels {Zj , j = 1, · · · , n} with a given Potts model with Kj

levels and parameter αj , then generate sources {Sj , j = 1, · · · , n} using the conditionals p(sj |zj ), and finally generate the mixtures {Xi , i = 1, · · · , m} using given mixing matrix A and noise covariance Σ and the conditional p(x|A, Σ , s). Figure 1 shows an example with the following parameters: m = 2, n = 2, K = [2, 2], α = [1.2, 2] for the generation of the label fields {Zj , j = 1, · · · , n} , [m1 = [1, 2], σ1 = [.1, .2], m2 = [1, 2], σ2 = [.1, .2]] for the generation of sources {Sj , j = 1, · · · , n}   .7143 .3333 for the generation of the mixture data Σ = diag [.01, .01] and A = .2857 .6667 {Xi , i = 1, · · · , m}.

Z1

Z2

X1

S1

S2

X2

FIGURE 1. Data generation scheme: labels Z1 and Z2 are generated using Potts distribution, Sources S1 and S2 are generated conditionnaly to their labels, and finally, the observed images X 1 and X2 are obtained via the BSS model.

Sb1

Zb1

Sb2

Zb2

FIGURE 2. BSS results obtained using Gibbs sampling method.d s1 = 0.032, ds2 = 0.033,dz1 = 0.037, dz2 = 0.0002

Figure 3 show the results we obtained  with the mean field approximation methods. ¯ = .6693 .3527 so we have a good estimation the mixture matrix is estimated as A .2743 .6957 for the two sources. This method is faster than Gibbs sampling and we obtain sources and segmentation at the same time.

Sb1

Zb1

Sb2

Zb2

FIGURE 3. BSS results obtained using the proposed mean field approximation method. d s1 = 0.0243, ds2 = 0.0245, dz1 = 0.002, dz2 = 0.0054

CONCLUSIONS We proposed a Bayesian approach to source separation. The sources are modeled by a double stochastic process by the introduction of hidden variables representing the labels of mixture of Gaussians and labels presenting a Markovian structure called a Potts model. To compute the bayesian solution two approches has been studied : the Gibbs sampling framework, and the MFA. The first, however, can be computationnally intensive, and assessment of convergence is often problematic. Alternatively, the MFA is faster and gives good results. The MFA has been frequently used in segmentation and more recently in source separation. Our contribution is the first use of MFA for Gauss-Markov model using region labels as hidden variables.

REFERENCES 1. J. Zhang, “The Mean Field Theory in EM Procedures for blind Markov random field image restoration,” in IEEE Transactions on Image Processing, vol. 2, pp. 27–40, IEEE, 1993. 2. H. Snoussi and A. Mohammad-Djafari, “Fast joint separation and segmentation of mixed images,” Journal of Electronic Imaging, vol. 13, pp. 349–361, Apr. 2004. 3. A. Tonazzini, L. Bedini, and E. Salerno, “Blind Separation of Auto-Correlated images from noisy mixtures using MRF models,” in 4th International Symposition on Independent Component Analysis and Blind Signal Separation, pp. 675–680, Apr. 2003. 4. F. Forbes and N. Peyrard, “Hidden Markov random field model selection criteria based on mean fieldlike approximations,” in Pattern Analysis and Machine Intelligence, vol. 25, pp. 1089–1101, IEEE Transactions, 2003. 5. G. Parisi, Statistical Field Theory. Addison-Wesley, 1988.