supervised and unsupervised statistical models for ... - Michel Desvignes

T. Cootes, C. Taylor, D. Cooper and J. Graham. Active shape models: their training and application. Computer Vision and Image Understanding, vol. 61(1), pp.
87KB taille 2 téléchargements 250 vues
SUPERVISED AND UNSUPERVISED STATISTICAL MODELS FOR CEPHALOMETRY S. Aouda1 , M. Berar1 , B. Romaniuk2 , and M. Desvigne1

1 LIS, 961 rue de la Houille Blanche, BP 46, 38402 St. Martin d’Hères, France; 2 GREYC

CNRS UMR 6072, 6 Boulevard Maréchal Juin, 14050 Caen Cedex, France.

Abstract

In this paper, we present an algorithm for aligning and matching unlabeled sets of points and a method for building a statistical model composed of a mean observation and associated variability. This model is used to solve the cephalometric problem. The main idea of this paper consists in using a dual step strategy: estimate the pose and then the correspondence alternatively. Correspondence being computed automatically, this process is applied to supervised and unsupervised model learning on unordered sets of points.

Keywords:

pattern recognition; cephalometry; statistical model; alignment.

1.

Introduction

In the cephalometric problem, orthodontists had to annotate cephalograms with anatomical landmarks (cephalometric points) on a training set of lateral radiographs of the head. A-priori knowledge is used in our approach: an unknown spatial relation between the cranial contour and the cephalometric points. The main problem in the cephalometric analysis is to discover this relation. Fortunately, the cranial contour can be automatically extracted from the image. When analyzing a new image, the practitioner has to find cephalometric points. Our goal is to find a statistical estimation of these points on this new image using a statistical model. Our training set of points is then composed of the cephalometric points and the sampled extracted cranial contour. Unfortunately, in some circumstances, the order in which the points have been expertised is lost. In an observation from the training set, the first point is not the same landmark as in another observation. The same situation arises when geometrical landmarks are used. This problem is an unsupervised learning problem: homeomorphisms between observations in the training set must be found, i.e. a one to one correspondence between two points from different observations. This step is known as the correspondence problem. Of course, some points may be “false” in a par-

2

Figure 1.

Lateral radiograph of the head with studied landmarks.

ticular observation, because of noise, segmentation or expertise errors which will be later referred as outliers. Moreover, the position of the cranium is not always the same: each sample must be registrated. Pose parameters (translation, rotation, scaling) between two observations have to be found. This step corresponds to the alignment problem. Both steps have to be proceed simultaneously. For solving the correspondence or the transformation problem, it is easier to estimate the non-rigid transformation once the correspondence is known. On the other hand, knowledge of a reasonable spatial transformation can be helpful in for correspondence search. This leads to an alternative scheme of correspondence and transformation estimation. In the context of our application (building model for pattern recognition), several works have recently proposed new approaches. The iterative closest point algorithm1 (ICP) uses the nearest-neighbor relationship to assign a binary correspondence between points at each step. This estimate of the correspondence updates the transformation, and the scheme is iterated. It is guaranteed to converge to a local minimum, which is useful for small deformations. Cross2 has developed a unified statistical framework for alignment and correspondence. Correspondence is a probability density estimation problem using a Gaussian mixture model. The EM algorithm is used to solve the matching problem. The E-step estimates the correspondence under the given transformation, and the M-step updates the transformation based on the current estimate of the correspondence. Belongie3 has defined shape contexts based on completely connected graphs of points. They define a new descriptor (histogram of length and angle) attached to a node to represent and compare two graphs. Cootes’s4 statistical snakes use procrustes alignment before realizing modal analysis. Although this alignment is often used, it suffers from two major drawbacks: the points must be labeled and the cardinal of the samples must be the same (no outliers).

3 Our work is pretty close to the work of Chui5,6 , where the correspondence problem is represented by a permutation matrix and the point matching problem is solved by minimizing the average distance between the first observation X and the transformed observation T ×Y , weighted by the permutation matrix. In their original work, the minimization somehow did not guarantee a minimal number of outliers, leading to plausible but sub-optimal solutions. This paper is organized as follows: we first present the softassign algorithm of Chui and our contributions. Then, we describe an algorithm that allows building a statistical model based on a Principal Component Analysis. We propose here both supervised and unsupervised training approaches, the two models are compared in terms of reconstruction efficiency.

2.

Simultaneous pose and correspondence estimation

Let X = (x1 , y1 , x2 , y2 , · · · , yn1 ) and Y = (x01 , y10 , x02 , y20 , · · · , yn0 2 ) be two different configurations of the same object, distorted by a geometric transformation T and altered by noise. The number of points in each configuration is respectively n1 and n2 , allowing n1 and n2 to be different. Let M be the permutation matrix, i.e. Mij = 1 if xi corresponds to x0j , and Mij = 0 elsewhere. Outliers (points with no correspondence in the other set) are represented in this matrix by an extra row and an extra column. An outlier has a complete zero row (or column) except the last row (or column) which is equal to 1. In this matrix, the sum of a row or a column is equal to 1, except for last tow or column which gives the number of outliers. The distance function to minimize is the least square distance between X and the distorted version Y which is denoted T × Y : E(M, T ) =

n1 X n2 X

Mij × kXi − T × Yj k2

(1)

i=1 j=1

under ∀(i, j) ∈ {1, n1 + 1} × {1, n2 + 1} Mij ∈ {0, 1}, P 2 +1 P 1 +1 ∀j ∈ {1, n2 } nj=1 Mij ≤ 1, ∀i ∈ {1, n1 } nj=1 Mij ≤ 1. The correspondence problem is a binary; discrete and combinatory one, although the estimation of the transformation T is a continuous one. In order to circumvent this difficulty, the permutation matrix is transformed to a stochastic matrix. There are two interleaved optimizations: if T is known, M can be estimated. Conversely, if M is known, T is easy to estimate. This leads to an iterative dual step, as minimization estimation of T and estimation of M . 5,6 show that the minimization of E(M ) is equivalent to the following minimization, in a deterministic annealing scheme, when β is large :

4 n1 X n2 X

E(M, µ, v, β) =

Mij × kXi − T × Yj k2

i=1 j=1 n1 +1 nX 2 +1 1 X Mij × log(Mij ) β i=1 j=1

+

n1 X

+

i=1 n1 X

+

µi

nX 2 +1

n2 X

j=1

j=1

(Mij − 1) +

Mi n2 +1 +

i=1

n2 X

vi

nX 1 +1

(Mij − 1)

i=1

Mn1 +1 j .

j=1

The first term is the distance between the two configurations. The second term is entropy. It convexifies the energy for small values of b, assuming a positive value less than one for Mij . The third and fourth terms are lagrangian terms, which correspond to equality constrains (summations of rows and columns are equal to one). We add the fifth and sixth term in order to penalize situations where too many outliers are found, i.e. sum of the last row or column of the permutation matrix is large. The optimal solution is then obtained at the points E 0 , which partial derivatives are zero. M is given by: 2

Mij = e−β (kXi −T ×Yj k

+vj +µi )

= e−βvj . (2) Binary matrix M is then obtained by normalization of rows and columns of M , until convergence. Unfortunately, for a large β, the initial value of M tends to zero, as the error (X − T × Y ) is not exactly zero, which is expected for ideal data. With real and noisy data, this error is low but not zero and the normalization process reject too many points with several Mij equal to zero. Thus, this process is stopped after some iterations and the matrix M is binarized: the maximum value of each row (respectively column) is set to one, the other terms to zero. We also reject correspondence that exists only in one set of points. At last, a geometric constrain is added, based on an adjacency graph (Delaney triangulation): we keep the correspondence with the best local graph matching. M is now a binary one to one correspondence matrix, with last row and column representing outliers.

3.

, Mi

n2 +1

= e−βµi , Mn1 +1

j

Statistical model

In this application, a statistical model represents one rigid object composed of points: the sampled cranial contour and the cephalometric points. Individuals do not have the same cranial morphology that implies different position

5 for each cephalometric point. The model should contain a mean value and a representation of the variability, i.e. authorized variations of the shape around its mean value. More generally, points are landmarks and can be labeled or unlabelled. In the first case, each landmark in a set corresponds to another landmark in a second set, and this correspondence is known. This is a supervised model. In the second case, the correspondence between two different sets is unknown: the model is unsupervised. These models or representations are computed from a training database. Now, the problem is to compute this model from several set of points, assuming some inter-individual variability, which we want to represent, and different conditions of acquisition, i.e. rigid geometric transformations, which must not be represented in the model.

Hierarchical Mean Let Xi be the ith observation of the training set. Let n be the number of sets. To compute a mean model, the first step is to align these sets. Procrustes analysis needs labeled or at least ordered points and can not be used here. This is why we used the previously described alignment and correspondence estimation for alignment. The sets are aligned two by two and the resulting mean is computed in a hierarchical way (figure 2). The initial order of the sets is randomized. The process is iterated with another initial order until convergence of the mean. X1

X2

...

0.5(X 1 + T ❘ X 2)

X n−1

Xn

0.5(X n−1+ T ❘ X n)

Final mean

Figure 2.

New random ordered set

Hierarchical mean computationLateral.

Variation model A classical way to represent the authorized variability is Principal Component Analysis. Once the mean value Xm is computed, eigenvectors of the variance-covariance matrix compose the matrix Φ. Every observation can be rewritten by: X = Xm + Φb, where b are shape parameters.

6

4.

Results and conclusion

The alignment has been tested on several examples and compared to softassign algorithm. Synthetic data have been generated, with known transformations and permutations and with and without outliers. Results show a 10 to 20% better estimation of transformations and permutations as compared to 5,6 on simulated data. On the cephalometric data (85 observations), we have computed 2 models with the hierarchical mean. The first one is obtained with labeled points (permutation matrix is known). The second is obtained with unlabelled points (permutation matrix is computed). We have compared the accuracy of the reconstruction of one set using the two statistical model computed. As expected, the supervised method is more accurate, especially when points are dense and variability is high. Furthermore, the mean error of the reconstruction is very close in both cases, and the unsupervised method can be very useful to build statistical models. In conclusion, a general framework for building statistical model from labeled or unlabelled point sets is presented. It is based on a dual step registrationcorrespondence process and on a hierarchical mean computation. Future works will aim at localization of cephalometric points.

References 1. P.J. Besl and N.D. McKay. A method for registration of 3-D shapes. IEEE Trans. on PAMI, vol. 14 (2), pp. 239-256, 1992. 2. A.D.J. Cross and E.R. Hancock. Graph matching with a dual-step EM algorithm. IEEE Trans. on PAMI, vol. 20(11), pp. 1236-1253, 1998. 3. S. Belongie, J. Malik and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE Trans. on PAMI, vol. 24(4), pp. 509-522, 2002. 4. T. Cootes, C. Taylor, D. Cooper and J. Graham. Active shape models: their training and application. Computer Vision and Image Understanding, vol. 61(1), pp. 38-59, 1995. 5. H. Chui, A. Rangarajan and F.L. Bookstein. The softassign procrustes matching algorithm. Information Processing in Medical Imaging, pp. 29-42, 1997. 6. H. Chui and A. Rangarajan. A new point matching algorithm for non-rigid registration. Computer Vision and Image Understanding, vol. 89(3), pp. 114-141, 2003.