Variational Bayesian Approximation methods for

Nov 23, 2012 - Prior models with hidden variables: p(f|z,θ2) p(z|θ3) q(f,z,θ|g) ∝ p(g|f ...... “Hierarchical Markovian Models for Joint Classification, Segmentation.
411KB taille 3 téléchargements 197 vues
.

Variational Bayesian Approximation methods for inverse problems Ali Mohammad-Djafari Laboratoire des Signaux et Syst`emes, UMR8506 CNRS-SUPELEC-UNIV PARIS SUD 11 SUPELEC, 91192 Gif-sur-Yvette, France http://lss.supelec.free.fr Email: [email protected] http://djafari.free.fr

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

1/33

Content 1. General inverse problem 2. General Bayesian Inference 3. Sparsity enforcing prior models ◮

Simple heavy tailed models: ◮ ◮ ◮



Generalized Gaussian, Double Exponential Student-t, Cauchy Elastic net

Hierarchical mixture models: ◮ ◮

Mixture of Gaussians, Bernoulli-Gaussian Mixture of Gammas, Bernoulli-Gamma

4. MAP, Joint MAP, Marginal MAP, PM estimates 5. Hierarchical models and hidden variables 6. Bayesian Computation ◮ ◮ ◮

Gaussian approximation (Laplace) Numerical exploration MCMC Variational Bayes Approximation

7. Infinite mixture model A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

2/33

1. General inverse problem g(t) = Hf (t) + ǫ(t), g(r) = Hf (r) + ǫ(r),

t ∈ [1, · · · , T ] r = (x, y) ∈ R2



f unknown quantity (input)



H Forward operator: (Convolution, Radon, Fourier or any Linear operator)



g observed quantity (output)



ǫ represents the errors of modeling and measurement

Discretization: g = Hf + ǫ



Forward operation Hf



Adjoint operation H ′ g :



Inverse operation (if exists) H −1 g

< H ′ g, f >=< Hf , g >

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

3/33

2. General Bayesian Inference ◮

Bayesian inference: p(f |g, θ) =



◮ ◮

p(g|f , θ 1 ) p(f |θ 2 ) p(g|θ)

with θ = (θ 1 , θ 2 ) Point estimators: b Maximum A Posteriori (MAP) or Posterior Mean (PM) −→ f Full Bayesian inference: Simple prior models: p(f |θ 2 )

q(f , θ|g) ∝ p(g|f , θ 1 ) p(f |θ 2 ) p(θ) ◮

Prior models with hidden variables: p(f |z, θ 2 ) p(z|θ 3 ) q(f , z, θ|g) ∝ p(g|f , θ 1 ) p(f |z, θ 2 ) p(z|θ 3 ) p(θ)

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

4/33

Two main steps in the Bayesian approach ◮

Prior modeling ◮

◮ ◮



Separable: Gaussian, Generalized Gaussian, Gamma, mixture of Gaussians, mixture of Gammas, ... Markovian: Gauss-Markov, GGM, ... Separable or Markovian with hidden variables (contours, region labels)

Choice of the estimator and computational aspects ◮ ◮ ◮ ◮



MAP, Posterior mean, Marginal MAP MAP needs optimization algorithms Posterior mean needs integration methods Marginal MAP and Hyperparameter estimation need integration and optimization Approximations: ◮ ◮ ◮

Gaussian approximation (Laplace) Numerical exploration MCMC Variational Bayes (Separable approximation)

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

5/33

3. Sparsity enforcing prior models ◮

Sparse signals: Direct sparsity 1

3

0.8 2.5 0.6

0.4 2 0.2

0

1.5

−0.2 1 −0.4

−0.6 0.5 −0.8

−1



0

20

40

60

80

100

120

140

160

180

200

0

0

20

40

60

80

100

120

140

160

180

200

Sparse signals: Sparsity in a Transform domaine

1

6

1

0.8

0.8 4

0.6

0.6

0.4

0.4 2

0.2

0.2

0

0

0

−0.2

−0.2 −2

−0.4

−0.4

−0.6

−0.6 −4

−0.8

−1

−0.8

0

20

40

60

80

100

120

140

160

180

200

−6

0

1

200

0.8

180

0.6

160

0.4

140

0.2

120

0

100

−0.2

80

−0.4

60

−0.6

40

−0.8

20

20

40

60

80

100

120

140

160

180

−1

200

0

20

40

60

80

100

120

140

160

180

200

180

200

1

2

3

4

5

6

7

8 −1

0

20

40

60

80

100

120

140

160

180

200

0

0

0.1

0.2

0.3

0.4

0.5

A. Mohammad-Djafari Variational Bayesian Approximation methods...

0.6

0.7

0.8

0.9

1

20

40

60

80

100

120

140

160

AgroParisTech, Paris, France, November 23, 2012.

6/33

3. Sparsity enforcing prior models ◮

Simple heavy tailed models: ◮ ◮ ◮

◮ ◮



Generalized Gaussian, Double Exponential Student-t, Cauchy Elastic net Symmetric Weibull, Symmetric Rayleigh Generalized hyperbolic

Hierarchical mixture models: ◮ ◮

◮ ◮ ◮ ◮

Mixture of Gaussians Bernoulli-Gaussian Mixture of Gammas Bernoulli-Gamma Mixture of Dirichlet Bernoulli-Multinomial

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

7/33

Simple heavy tailed models • Generalized Gaussian, Double Exponential     X Y |f j |β p(f |γ, β) = GG(f j |γ, β) ∝ exp −γ   j

j

β = 1 Double exponential or Laplace. 0 < β ≤ 1 are of great interest for sparsity enforcing. • Student-t and Cauchy models p(f |ν) =

Y j

    ν+1X  2 St(f j |ν) ∝ exp − log 1 + f j /ν   2 j

Cauchy model is obtained when ν = 1. • Elastic net prior model p(f |ν) =

Y

   X  2 EN (f j |ν) ∝ exp − (γ1 |f j | + γ2 f j )  

j A. Mohammad-Djafari Variational Bayesian Approximation methods...

j AgroParisTech, Paris, France, November 23, 2012.

8/33

Mixture models • Mixture of two Gaussians (MoG2) model Y p(f |λ, v1 , v0 ) = (λN (f j |0, v1 ) + (1 − λ)N (f j |0, v0 )) j

• Bernoulli-Gaussian (BG) model Y Y (λN (f j |0, v) + (1 − λ)δ(f j )) p(f |λ, v) = p(f j ) = j

j

• Mixture of Gammas Y p(f |λ, v1 , v0 ) = (λG(f j |α1 , β1 ) + (1 − λ)G(f j |α2 , β2 )) j

• Bernoulli-Gamma model Y p(f |λ, α, β) = [λG(f j |α, β) + (1 − λ)δ(f j )] j

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

9/33

MAP, Joint MAP ◮ ◮

Inverse problems: Posterior law:

g = Hf + ǫ

p(f |θ, g) ∝ p(g|f , θ 1 ) p(f |θ 2 ) ◮



Examples: Gaussian noise, Gaussian prior and MAP: b = arg min {J(f )} with J(f ) = kg − Hf k2 + λkf k2 f 2 2 f

Gaussian noise, Double Exponential prior and MAP: b = arg min {J(f )} with J(f ) = kg − Hf k2 + λkf k1 f 2 f Full Bayesian: Joint Posterior:

p(f , θ|g) ∝ p(g|f , θ 1 ) p(f |θ 2 ) p(θ) ◮

Joint MAP: b = arg max {p(f , θ|g)} b , θ) (f (f ,θ )

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

10/33

Marginal MAP and PM estimates ◮







b = arg max {p(θ|g)} where Marginal MAP: θ θ Z Z p(θ|g) = p(f , θ|g) df = p(g|f , θ 1 ) p(f |θ 2 ) df

n o b g) b = arg max p(f | θ, and then f f Z b= b g) df Posterior Mean: f f p(f |θ, EM and GEM Algorithms

Variational Bayesian Approximation: Approximate p(f , θ|g) by q(f , θ|g) = q1 (f |g) q2 (θ|g) and then continue computations.

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

11/33

Summary of Bayesian estimation 1 ◮

Simple Bayesian Model and Estimation θ1

θ2

?

p(f |θ 2 ) Prior ◮

?

⋄ p(g|f , θ 1 ) −→ Likelihood

p(f |g, θ) Posterior

b −→ f

Full Bayesian Model and Hyperparameter Estimation ↓ α, β

Hyperpraram. model p(θ|α, β) p(θ 2 ) ?

p(f |θ 2 ) Prior

p(θ 1 )

? b −→ f ⋄ p(g|f , θ 1 ) −→p(f, θ|g, α, β) b −→ θ Likelihood Joint Posterior

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

12/33

Summary of Bayesian estimation 2



Marginalization for Hyperparameter Estimation p(f , θ|g) −→

p(θ|g)

b −→ p(f |θ, b g) −→ f b −→ θ

Joint Posterior Marginalize over f ◮

Variational Bayesian Approximation

p(f , θ|g) −→

Variational Bayesian Approximation

A. Mohammad-Djafari Variational Bayesian Approximation methods...

b −→ q1 (f ) −→ f b −→ θ b −→ q2 (θ)

AgroParisTech, Paris, France, November 23, 2012.

13/33

Variational Bayesian Approximation ◮

Full Bayesian: p(f , θ|g) ∝ p(g|f , θ 1 ) p(f |θ 2 ) p(θ)



Approximate p(f , θ|g) by q(f , θ|g) = q1 (f |g) q2 (θ|g) and then continue computations.



Criterion KL(q(f , θ|g) : p(f , θ|g)) Z Z Z Z q1 q2 KL(q : p) = q ln q/p = q1 q2 ln p Iterative algorithm q1 −→ q2 −→ q1 −→ q2 , · · ·

◮ ◮

 n o  qb1 (f ) ∝ exp hln p(g, f , θ; M)i q b ( θ ) 2 o n  qb2 (θ) ∝ exp hln p(g, f , θ; M)i qb1 (f ) p(f , θ|g) −→

Variational Bayesian Approximation

A. Mohammad-Djafari Variational Bayesian Approximation methods...

b −→ q1 (f ) −→ f b −→ θ b −→ q2 (θ)

AgroParisTech, Paris, France, November 23, 2012.

14/33

BVA: Choice of the critrion

KL(q : p) =

ZZ

q(f , θ) ln

ZZ

q(f , θ)

p(f , θ|g; M) df dθ q(f , θ)

p(g, f , θ|M) df dθ q(f , θ) ZZ p(g, f , θ|M) df dθ. ≥ q(f , θ) ln q(f , θ)

p(g|M) =

F(q) =

ZZ

q(f , θ) ln

p(g, f , θ|M) df dθ q(f , θ)

p(g|M) = F(q) + KL(q : p)

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

15/33

BVA: Separable Approximation

p(g|M) = F(q) + KL(q : p) q(f , θ) = q1 (f ) q2 (θ) (b q1 , qb2 ) = arg min {KL(q1 q2 : p)} = arg max {F(q1 q2 )} (q1 ,q2 )

(q1 ,q2 )

KL(q1 q2 : p) is convexe wrt q1 when q2 is fixed and vise versa:  qb1 = arg minq1 {KL(q1 qb2 : p)} = arg maxq1 {F(q1 qb2 )} qb2 = arg minq2 {KL(b q1 q2 : p)} = arg maxq2 {F(b q1 q2 )}  o n  qb1 (f ) ∝ exp hln p(g, f , θ; M)i qb2 (θ ) o n  qb2 (θ) ∝ exp hln p(g, f , θ; M)i qb1 (f ) A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

16/33

BVA: Choice of family of laws q1 and q2 n o ( Case 1 : −→ Joint MAP e M) e = arg max f p(f , θ|g; e e qb1 (f |f ) = δ(f − f ) fn o −→ e e e e , θ|g; M) qb2 (θ|θ) = δ(θ − θ) θ = arg max p(f θ







Case 2 : −→ EM  e M)i qb1 (f ) ∝ p(f |θ, g) Q(θ, θ)= hln p(f , θ|g; q1 (o f |θe ) n −→ e = δ(θ − θ) e e e θ qb2 (θ|θ) = arg maxθ Q(θ, θ)

problems   ◮ Appropriate choice for inverse e g; M) Prise en compte des incertitudes qb1 (f ) ∝ p(f |θ, −→ b pour f b et vise versa. de θ qb2 (θ) ∝ p(θ|f , g; M) A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

17/33

Hierarchical models and hidden variables ◮

All the mixture models and some of simple models can be modeled via hidden variables z. p(f ) =

K X k=1



αk pk (f ) −→



), p(f |z = k) = pk (f P P (z = k) = αk , k αk = 1

Example 1: MoG model: pk (f ) = N (f |mk , vk ) 2 Gaussians: p0 = N (0, v0 ), p1 = N (0, v1 ), α0 = λ, α1 = 1 − λ

p(fj |λ, v1 , v0 ) = λN (fj |0, v1 ) + (1 − λ)N (fj |0, v0 )   p(fj |zj = 0, v0 ) = N (fj |0, v0 ), P (zj = 0) = λ, and p(fj |zj = 1, v1 ) = N (fj |0, v1 ), P (zj = 1) = 1 − λ     p(f |z) = Q p(f |z ) = Q N f |0, v  ∝ exp − 1 P fj2 j j j zj j vzj j j 2 P P  n n n0 = j δ(zj ), n1 = j δ(zj − 1) p(z) = λ 1 (1 − λ) 0 ,

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

18/33

Hierarchical models and hidden variables ◮



Example 2: Student-t model    ν+1 2 log 1 + f /ν St(f |ν) ∝ exp − 2 Infinite mixture Z ∞ N (f |, 0, 1/z) G(z|α, β) dz, St(f |ν) ∝=

with α = β = ν/2

0

n o 1P 2 N (f |0, 1/z ) ∝ exp − z f j j j j j j 2 Q Q (α−1) = j G(z exp {−βzj } nPj |α, β) ∝ j zj o ∝ exp (α − 1) ln zj − βzj n jP o p(f , z|α, β) ∝ exp − 21 j zj fj2 + (α − 1) ln zj − βzj

  p(f |z)      p(z|α, β)      

=

Q

j p(fj |zj ) =

Q

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

19/33

Hierarchical models and hidden variables Example 3: Laplace (Double Exponential) model Z ∞ a N (f |, 0, z) E(z|a2 /2) dz, a > 0 DE(f |a) = exp {−a|f |} = 2 0 n o Q Q P = j p(fj |zj ) = j N (fj |0, zj ) ∝ exp − 12 j fj2 /zj p(f |z) nP 2 o Q 2 2 a = j E(zj | a2 ) ∝ exp p(z| a2 ) j 2 zj o n P 2 2 p(f , z| a2 ) ∝ exp − 21 j fj2 /zj + a2 zj ◮

        



With these models we have: p(f , z, θ|g) ∝ p(g|f , θ 1 ) p(f |z, θ 2 ) p(z|θ 3 ) p(θ)

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

20/33

Summary of Bayesian estimation 3 • Full Bayesian Hierarchical Model with Hyperparameter Estimation ↓ α, β, γ Hyper prior model p(θ|α, β, γ) p(θ 2 )

p(θ 3 ) ?

p(θ 1 )

?

?

⋄ p(f |z, θ 2 ) ⋄ p(g|f , θ 1 ) −→

p(z|θ 3 )

Hidden variable

Prior

Likelihood

p(f , z, θ|g) Joint Posterior

• Full Bayesian Hierarchical Model and Variational Approximation

b −→ f b −→ z b −→ θ

↓ α, β, γ Hyper prior model p(θ|α, β, γ) p(θ 3 ) ? p(z|θ3 )



Hidden variable

p(θ2 ) ? p(f |z, θ2 ) Prior

p(θ 1 ) VBA b ? q1 (f ) −→ f b ⋄ p(g|f , θ1 ) −→ p(f , z, θ|g) −→ q (z) −→ z 2 b −→ θ q3 (θ) Likelihood Joint Posterior

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

21/33

Bayesian Computation and Algorithms ◮

Often, the expression of p(f , z, θ|g) is complex.



Its optimization (for Joint MAP) or its marginalization or integration (for Marginal MAP or PM) is not easy



Two main techniques: MCMC and Variational Bayesian Approximation (VBA)



MCMC: Needs the expressions of the conditionals p(f |z, θ, g), p(z|f , θ, g), and p(θ|f , z, g)



VBA: Approximate p(f , z, θ|g) by a separable one q(f , z, θ|g) = q1 (f ) q2 (z) q3 (θ) and do any computations with these separable ones.

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

22/33

Bayesian Variational Approximation ◮





Objective: Approximate p(f , z, θ|g) by a separable one q(f , z, θ|g) = q1 (f ) q2 (f ) q3 (θ) Criterion:   Z q q KL(q : p) = q ln = ln p p q Free energy: KL(q : p) = ln p(g|M) − F(q) where: Z Z Z p(g|M) = p(f , z, θ, g|M) df dz dθ

with p(f , z, θ, g|M) = p(g|f , θ) p(f |z, θ) p(z|θ) p(θ) and F(q) is the free energy associated to q defined as   p(f , z, θ, g|M) F(q) = ln q(f , z, θ) q ◮

For a given model M, minimizing KL(q : p) is equivalent to maximizing F(q) and when optimized, F(q ∗ ) gives a lower bound for ln p(g|M).

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

23/33

BVA with Scale Mixture Model of Student-t priors Scale Mixture Model of Student-t: Z ∞ St(f j |ν) = N (f j |0, 1/zj ) G(zj |ν/2, ν/2) dzj 0

Hidden variables zj : p(f |z)

=

Q

j

p(f j |zj ) =

p(zj |α, β) = G(zj |α, β) ∝

n

Q

j N (f j |0, 1/zj ) ∝ exp (α−1) exp {−βzj } with α zj

− 12

P

2 j zj f j

= β = ν/2

o

Cauchy model is obtained when ν = 1: ◮

Graphical model:  - f  Hn ?  R @ - g αǫ0 , βǫ0 - zn ǫ- ǫ 

αz0 , βz0- zn

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

24/33

12. BVA with Student-t priors Algorithm  p(g|f , zǫ ) = N (g|Hf , (1/zǫ )I)    αj , βej )  q2j (zj ) = G(zj |e    p(zǫ |αz0 , βQ  z0 ) = G(zǫ |αz0 , βz0 )  e < f >= µ    α e = α + 1/2  j   00    p(f |z) = j N (f j |0, 1/zj )    ′ e +µ  2   eµ e′ < f f >= Σ e Q βj = β00 + < f j > /2      p(z|α0 , β0 ) = j G(zj |α0 , β0 )   e jj + µ e2j < f 2j >= [Σ] e e e q (z ) = G(z |e α , β ),  3 ǫ q1 (f |e  ǫ zǫ zǫ µ, Σ) = N (f |e µ, Σ)    e=α   λ ez /βez   α ezǫ = αz0 + (n + 1)/2     e ′g e µ = hλi ΣH          βezǫ = βz0 + 1/2[kgk2 Σ e = (hλi H ′ H + Z) e −1 , ej /βej zej = α     ′ ′ ′  ′  −2 hfi H g + H hf f i H] e =z e−1 = diag [e with Z z] e = N (f, e Σ) e z, λ) e q1 (f |e λ −→

6

e −f →

ej ) e ) = G(zj |α ej , β q2j (zj |f

α e j = α00 + n+1 2 D E e =λ e ΣH e ′g f e = β00 + 1 f 2 e e z Σ β −→ e j e ′H + z 2 e −1 )−1 −→ j Σ = (λH ej z ej = α e j /β

6

A. Mohammad-Djafari Variational Bayesian Approximation methods...

ez ) e ) = G(τ |α ez , β q3 (τ |f e e e z = αz0 + n+1 −f → α −λ → 2 e ez = βz0 + 1 [kgk2 −Σ → β 2 ′ ′ ′ ′ e z z e→ − 2 < f > H g + H < f f > H] −→ − j e =α ez λ e z /β

AgroParisTech, Paris, France, November 23, 2012.

25/33

13. Implementation issues ◮

In inverse problems, often we do not have access directly to the matrix H. But, we can compute: ◮ ◮





Forward operator : Hf −→ g Adjoint operator : H ′ g −→ f

g=direct(f,...) f=transp(g,...)

For any particular application, we can always write two programs (direct & transp) corresponding to the application of these two operators. e , we use a gradient based optimization To compute f algorithm which will use these operators.



We may also need to compute the diagonal elements of [H ′ H].. We also developped algorithms which computes these diagonal elements with the same programs (direct & transp)

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

26/33

14. Conclusions and Perspectives ◮









We proposed a list of different probabilistic prior models which can be used for sparsity enforcing. We classified these models in two categories: simple heavy tails and hierarchical mixture models We showed how to use these models for inverse problems where the desired solutions are sparse Different algorithms have been developed and their relative performances are compared. We use these models for inverse problems in different signal and image processing applications such as: ◮ ◮ ◮

◮ ◮ ◮

Period estimation in biological time series X ray Computed Tomography, Signal deconvolution in Proteomic and molecular imaging Diffraction Optical Tomography Microwave Imaging, Acoustic imaging and sources localization Synthetic Aperture Radar (SAR) Imaging

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

27/33

15. References 1. A. Mohammad-Djafari, “Bayesian approach with prior models which enforce sparsity in signal and image processing,” EURASIP Journal on Advances in Signal Processing, vol. Special issue on Sparse Signal Processing, (2012). 2. S. Zhu, A. Mohammad-Djafari, H. Wang, B. Deng, X. Li and J. Mao J, “Parameter estimation for SAR micromotion target based on sparse signal representation,” EURASIP Journal on Advances in Signal Processing, vol. Special issue on Sparse Signal Processing, (2012). 3. N. Chu, J. Picheral and A. Mohammad-Djafari, “A robust super-resolution approach with sparsity constraint for near-field wideband acoustic imaging,” IEEE International Symposium on Signal Processing and Information Technology pp 286–289, Bilbao, Spain, Dec14-17,2011 4. N. Bali and A. Mohammad-Djafari, “Bayesian Approach With Hidden Markov Modeling and Mean Field Approximation for Hyperspectral Data Analysis,” IEEE Trans. on Image Processing 17: 2. 217-225 Feb. (2008). 5. J. Griffin and P. Brown, “Inference with normal-gamma prior distributions in regression problems,” Bayesian Analysis, 2010. 6. N. Polson and J. Scott., “Shrink globally, act locally: sparse Bayesian regularization and prediction,” Bayesian Statistics 9, 2010. 7. T. Park and G. Casella., “The Bayesian Lasso,” Journal of the American Statistical Association, 2008. 8. C. F´ evotte and S. Godsill, “A Bayesian aproach for blind separation of sparse source,” IEEE Transactions on Audio, Speech, and Language processing, 2006. 9. H. Snoussi and J. Idier., “Bayesian blind separation of generalized hyperbolic processes in noisy and underdeterminate mixtures,” IEEE Trans. on Signal Processing, 2006. 10. J. R. H. Ishwaran, “Spike and slab variable selection: Frequentist and Bayesian strategies,” Annals of Statistics, 2005. 11. M. Tipping, “Sparse Bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, 2001. A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

28/33

14. References .. 1. Abib. Doucet and P. Duvaut, “Bayesian estimation of state-space models applied to deconvolution of Bernoulli-Gaussian processes,” Signal Processing, vol. 57, no. 2, 1997. 2. P. Williams, “Bayesian regularization and pruning using a Laplace prior,” Neural Computation, 1995. 3. M. Lavielle, “Bayesian deconvolution of Bernoulli-Gaussian processes,” Signal Processing, vol. 33, pp. 67–79, 1993. 4. T. Mitchell and J. Beauchamp, “Bayesian variable selection in linear regression,” Journal of the American Statistical Association, 1988. 5. J. J. Kormylo and J. M. Mendel, “Maximum-likelihood detection and estimation of Bernoulli-Gaussian processes,” vol. 28, pp. 482–488, 1982. 6. H. Snoussi and A. Mohammad-Djafari, “ Estimation of Structured Gaussian Mixtures: The Inverse EM Algorithm,” IEEE Trans. on Signal Processing 55: 7. 3185-3191 July (2007). 7. N. Bali and A. Mohammad-Djafari, “A variational Bayesian Algorithm for BSS Problem with Hidden Gauss-Markov Models for the Sources,” in: Independent Component Analysis and Signal Separation (ICA 2007) Edited by:M.E. Davies, Ch.J. James, S.A. Abdallah, M.D. Plumbley. 137-144 Springer (LNCS 4666) (2007). 8. N. Bali and A. Mohammad-Djafari, “Hierarchical Markovian Models for Joint Classification, Segmentation and Data Reduction of Hyperspectral Images” ESANN 2006, September 4-8, Belgium. (2006) 9. M. Ichir and A. Mohammad-Djafari, “Hidden Markov models for wavelet-based blind source separation,” IEEE Trans. on Image Processing 15: 7. 1887-1899 July (2005) 10. S. Moussaoui, C. Carteret, D. Brie and A Mohammad-Djafari, “Bayesian analysis of spectral mixture data using Markov Chain Monte Carlo methods sampling,” Chemometrics and Intelligent Laboratory Systems 81: 2. 137-148 (2005). 11. H. Snoussi and A. Mohammad-Djafari, “Fast joint separation and segmentation of mixed images” Journal of Electronic Imaging 13: 2. 349-361 April (2004) 12. H. Snoussi and A. Mohammad-Djafari, “Bayesian unsupervised learning for source separation with mixture of Gaussians prior,” Journal of VLSI Signal Processing Systems 37: 2/3. 263-279 June/July (2004) A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

29/33

Separable models • Model 1:

v

vǫ • Model 2:

 - f p(f j |v) = N (0,   v)  2 Hn 1 P fj ? p(f |v) ∝ exp − 2 j v  R @ - ǫ - g p(g|f , vǫ ) = N (Hf , vǫ I) 

 p(f j |v) = N (0,  v)  - f 2 1 P fj  p(f |v) ∝ exp − 2 j v Hn ? p(g|f , v ) = N (Hf , v I)  R @ ǫ ǫ - g αǫ0 , βǫ0 - vn ǫ- ǫ |α , β ) = G(α , p(v ǫ0 βǫ0 )  ǫ ǫ0 ǫ0

α0 , β0 - vn

p(v|α0 , β0 ) = G(α0 , β0 )

• Model 3: α0 , β0 - vn

 - f p(f j |vj ) = N (0,  vj )   2 n 1 P fj H p(f |v) ∝ exp − 2 j vj ?  R @ - g p(vj |α0 , β0 ) = G(α0 , β0 ) αǫ0 , βǫ0 - vn ǫ- ǫ 

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

30/33

Separable models

• Model 4: α, K

 - z p(z = k|αk ) = αk ,  j

P

k

αk = 1

-m n H ? p(f j |zj = k) = N H j H P(mj k , vj k ) n p(f |z, m, v) = j N (mz j , vz j ) f v α0 , β0  n H ? p(g|f , vǫ ) = N (Hf , vǫ I)  R @ n g v ǫ αǫ0 , βǫ0 ǫ p(v |α , β ) = IG(αǫ0 , βǫ0 )  ǫ ǫ0 ǫ0

m 0 , v0

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

31/33

Separale models

 P - αn - z p(zj = k|αk ) = αk , k αk = 1  -m n m 0 , v0 H ? p(f j |zj = k) = N H j H P(mj k , vj k ) n p(f |z, m, v) = j N (mz j , vz j ) f v α0 , β0  n H ? p(g|f , vǫ ) = N (Hf , vǫ I)  R @ n g v ǫ αǫ0 , βǫ0 ǫ p(v |α , β ) = IG(αǫ0 , βǫ0 )  ǫ ǫ0 ǫ0

• Model 5: α0 , K

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

32/33

Separale models

• Model 6: α00 , β00

- Kn P X p(zj = k|αk ) = αk , 9 ?XXX  z  k αk = 1 - αn - z α0  p(f j |zj = k) = N P(mj k , vj k ) n m 0 , v0 mH p(f |z, m, v) = ? H j N (mz j , vz j ) j H n f v α0 , β0  p(g|f , vǫ ) = N (Hf , vǫ I) Hn ? p(vǫ |αǫ0 , βǫ0 ) = IG(αǫ0 , βǫ0 )  R @ - g αǫ0 , βǫ0 - vn ǫ- ǫ 

A. Mohammad-Djafari Variational Bayesian Approximation methods...

AgroParisTech, Paris, France, November 23, 2012.

33/33