Spatio-temporal graphical modeling with ... - Bernard Chalmond

In a recent study [6], we proposed a general method for detecting modules ...... 141. 172. 143. 164. 155. 173. 130. 139. 154. 145. 177. 174. 175. 176(degU). 181.
2MB taille 3 téléchargements 308 vues
Spatio-temporal graphical modeling with innovations based on multi-scale diffusion kernel BERNARD CHALMOND University of Cergy-Pontoise, France and CMLA, Ecole Normale Sup´erieure de Cachan, France ∗

Abstract- A random field of interest is observed on an undirected spatial graph over time, thereby providing a time series of dependent random fields. We propose a general modeling procedure which has the potential to explicitly quantify intrinsic and extrinsic fluctuations of such dynamical system. We adopt a paradigm in which the intrinsic fluctuations correspond to a process of latent diffusion on the graph arising from stochastic interactions within the system, whereas the extrinsic fluctuations correspond to a temporal drift reflecting the effects of the environment on the system. We start with a spatio-temporal diffusion process which gives rise to the latent spatial process. This makes a bridge with the conventional Wold representation, for which the latent process represents the innovation process, and beyond that with the stochastic differential equation associated to the Fokker-Plank dynamic. The innovation process is modeled by a Gaussian distribution whose covariance matrix is defined by a multi-scale diffusion kernel. This model leads to a multi-scale representation of the spatio-temporal process. We propose a statistical procedure to estimate the multi-scale structure and the model parameters in the case of vector autoregressive model with drift. Modeling and estimation tasks are illustrated on simulated and real biological data. Keywords: Spatio-temporal graphical model, Spatial statistic modeling, Multi-scale heat diffusion kernel, Graph Laplacian, Intrinsic and extrinsic stochastic fluctuations, Multi-scale decomposition. .

1. Introduction We are interested in stochastic processes in time and space domains, of which the state of a variable at every time point is determined by the states of variables in its spatial neighborhood, as well as the states of a set of variables at previous time points. This vectorial process is denoted t is observed on an undirected spatial graph G, t , t = 1, ..., T }=Y. ˙ At every time point t, Y {Y composed of n nodes. The analysis of complex spatio-temporal processes from experiments is an important issue in many areas where one tries to extract information useful for the characterization of the spatial and temporal variability, in order to discover or understand the physical underlying dynamics. This study started with the following paradigm. At each instant the system has a basal spatial activity that maintains fluctuations over time. Such fluctuations arising from inherently proba∗

E-mail : [email protected]

1

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

2

B. Chalmond

bilistic interactions within the system are typically called intrinsic or internal. Furthermore, there could exist other fluctuations, called extrinsic or external, reflecting the influence of the environment on the system. At the beginning of this study, there was also the knowledge of the ability of the diffusion kernel to represent the dependencies of stochastic observations on a graph. Our objective is to produce a representation of the process into spatial and temporal components, taking into account the fact that the fluctuations of the dynamical system are related to the intrinsic and extrinsic effects, the basal intrinsic fluctuations being described by a multi-scale diffusion kernel. Modeling of stochastic spatio-temporal processes has been widely studied. Most of these studies are based on a Markovian representation of autoregressive or diffusion type. In this context, modeling and estimation of the covariance matrix K of the innovation process have been left in the background in the literature. Regardless of the fact that this knowledge is incomplete, it is still unclear how to extract diffusion components from experimental results and to track structural changes. We revisit these basic models, and doing so, the intrinsic component is represented by a process whose innovations are based on the graph Laplacian of G, and the extrinsic component by a drift. It is from this point of view that are defined concepts of intrinsic and extrinsic fluctuations. Related works. The distinction between intrinsic and extrinsic depends on how are defined system and environment in the considered experiment. The main area where this concept has been studied is that of biological networks [11, 34, 8, 12]. Many recent studies have reported on the phenotypic variability of organisms related to an intrinsic stochasticity that operates at basal level of gene expression. This characteristic is related to the activity of differentiated subnetworks, called modules. These endogenous subnetworks are regulatory structures controlling processes that are intrinsic to the cell. In a recent study [6], we proposed a general method for detecting t . However, in this modules at fixed-time, that is to say, in the case of a single random vector Y study, the detection of modules was also performed on time series, but independently momentafter-moment without taking into account the time dependence. To avoid this simplification, we propose a spatio-temporal modeling that improves our previous approach. The classic autoregressive model is based on innovations whose covariance matrix K is not theoretically restricted to a diagonal matrix [4]. A straightforward estimate of K is given by the empirical covariance matrix of the residuals that result from the least-square fitting of the model to the data [4, 37]. This estimation, which is consistent, only asymptotically, has the drawback to confound spatial and temporal components. An alternative is to impose a modeling of K, which is a means for taking into account explicitly the spatial dependencies. In this way, a mono-scale Matern model is adopted in [15], based simply on the inter-node geographical distances, but without graphical connections. Another modeling is based on the stochastic differential equation associated with the FokkerPlanck Markovian dynamic, as considered in [2] for social network analysis. This model has been the source of many studies where finally the modeling is reduced to a first-order autoregressive model, and furthermore with K limited to a diagonal matrix [27]. This simple autoregressive model is also known under the name of Dynamic Bayesian Networks [32]. But again, the fact of ignoring spatial dependencies amounts to integrating them into the temporal dependencies,

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

3

which prevents the distinction between spatial and temporal effects. The Markovian approach that is based on Gibbs distribution allows the modeling of the two components. It is an efficient technique to treat inverse problems in which the Markovian modeling is made on hidden random fields, and especially for Boolean networks [7, 39]. However, a main drawback to this approach is the difficulty to calibrate the hyper-parameters, which balance the different terms that compose the energy of the Gibbs distribution [5]. t , t = 1, ..., T }, [31] describes a For the simulation of biochemical dynamical processes {Y novel approach to reveal the existence of a meaningful manifold which approximates the slow dynamics of the process. They search for p < n new variables corresponding to the dynamically meaningful slowly varying coordinates. The method is based on a connectivity matrix T × T ti denoted A(τ ) whose elements are weights Ai,j (τ ) that depend on the covariance matrices of Y  and Ytj , and a scale parameter τ . By defining M(τ ) as the row stochastic matrix associated to A(τ ), the solution is given by the p first eigenvectors of M(τ ) that approximate the eigenfunctions of the Laplacian diffusion operator over the manifold. The difficulty here is the computation of A(τ ) due to the presence of the covariance matrices which are obtained in [31] by simulating the model, not to mention the difficulty of choosing τ and p as discussed in [23]. Although the initial model has no parameters, the method is in fact highly parameterized, because of the covariance matrices to estimate. Sketch of our contribution. The first-order vector autoregressive model is a Markov model, whose expression is : t, t−1 + W Yt = Φ1 Y

(1.1)

where Φ1 is a matrix of coefficients connecting the nodes activities at time t − 1 to those at time  t is a vectorial Gaussian noise with zero mean t, according to a temporal directed graph G. W 2 and covariance matrix σW In . The model (1.1) is stationary when Φ1 satisfies some properties. In  t } is simply there to maintain the dynamic of the temporal this case, the innovation process {W t }, but it does not contain any information specific to the intrinsic phenomenon, since process {Y  t is diagonal. at every time t, the covariance matrix of W As introduced above, we propose to take into account at each instant the basal state of the system, which is based on the dependence structure of the spatial graph G. To do this, we replace  t reflecting the spatial dependence. At first  t by a colored innovation Z the white innovation W 2  t by a covariance matrix K In of W glance, it would suffice to replace the covariance matrix σW such that Ki,j = 0 for any pair of connected nodes and Ki,j = 0 for the other ones. However, its estimation poses a dimensionality problem in the subsequent data analysis, since t far exceeds the number of time points T . The situation is quite the the number n of variables in Y opposite to classical time series analysis [16, 4]. A solution to significantly reduce the number of covariances to be estimated is to consider a parametric model for K. A well-known model is given by the diffusion kernel Kλ = exp(−λL) where L is the graph Laplacian of G and λ is a scale parameter [3, 21, 26, 37]. Kλ is a discretized version of the diffusion operator appearing in the solution of the heat differential equation in R2 . In doing so, the determination of Kλ boils down to estimate λ. The model (1.1) is thus replaced by :  t = Φ1 Y t t−1 + Z Y

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

(1.2)

Preprint

4

B. Chalmond

 t is distributed as a zero mean Gaussian vector with covariance matrix where the innovation Z Kλ . This short presentation has a more thorough justification. We show that the spatio-temporal model (1.2) arises from the composition of two diffusion differential equations : a spatial diffusion associated with the Laplacian L of parameter λ, and a temporal diffusion associated with the graph Laplacian L of G that makes that Φ1 is depending on a parameter τ . Denoting L and L their spatio-temporal version, the solution of this pair of differential equations is Y(λ, τ ) = e−τ L Y(λ, 0) Y(λ, 0) = e−λL W,  t } and Y(λ, τ ) the model of the process where Y(λ, 0) is the model of the innovation process {Z t }, which after simplification gives an expression similar to (1.2) : {Y t−1 (λ, τ ) + Y t (λ, 0) . t (λ, τ ) = Φ1 (τ )Y Y There is another point of view that brings up a drift term, alongside the diffusion process and this, in order to model the extrinsic component of the system. To do this we consider the classical stochastic differential equation associated with the Fokker-Planck Markovian dynamic [2, 31]. The discretized expression of this dynamic can be written : √ t + a(Y  , t)dt + b(Y  , t)W  t dt , t+dt = Y (1.3) Y  , t) is a diffusion term and a(Y  , t) is a drift term whose form depends on the appliwhere b(Y  t independent of t, the term cation. In the particular case where a(Y , t) is a linear function of Y    Yt + a(Y , t)dt can be rewritten Φ1 Yt , and therefore (1.3) becomes similar to (1.2). More generally (1.3) appears as a model susceptible to integrate a drift term related to the experiment, as we shall give an example. Beyond this first modeling, our contribution provides a multi-scale extension of the graphical innovation model, as well as a statistical estimation procedure with selection of the relevant scales. The benefits of the multi-scale approach have been demonstrated for issues related to our concerns in other areas. For instance, [1] shows that large social networks contain hierarchically organized community structures. One crucial step when studying the structure and dynamics of these networks is to identify communities. In computer vision, for shape comparison and shape matching, [33] proposes a scale-space representation of shape feature based on graph Laplacian. In our study, several scales are needed to correctly model the basal state. Therefore, we represent K by a weighted sum of diffusion kernels at different scales : r 

σj2 Kj where Kj = exp(−λj L) .

j=0

This model leads to a multi-scale representation of the spatio-temporal process in continuity with the one proposed in [6] for the detection of modules. A crucial point in modeling of dynamic systems is parameter inference from observed time series. Therefore in our article, the focus is also

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

5

on estimation issues. The estimation of the parameters σj2 is obtained using an exact maximum likelihood principle. Without this exact likelihood, the multi-scale representation would be not consistent. In time-series analysis, log |K| is often ignored since for large T its influence on the likelihood is small. This is not true for the spatial context where we know that ignoring it can result in inconsistent estimators (see [9] Section 7.2.2, and [14] Section 5.3). In addition, the number of scales r and these scales λj are selected using a Bayesian maximum likelihood under a quadratic constraint. Modeling and estimation tasks are detailed in Section 2. In Section 3, we illustrate this modeling on simulated and real data. To do that, a statistical hypothesis testing based on a log-likelihood ratio is proposed in order to decide between two hypotheses H0 and H1 . This procedure is implemented in two distinct situations: firstly, a test of temporal dependence, and secondly a test for drift detection.

2. Models and method 2.1. Background : Random field and diffusion process We recall some classical results on random field models on graph that we call Graphical Random Fields (GRF) 1 . Consider a real random vector Y = (Y (1) , ..., Y (n) ) on an undirected graph G = (V, E). This field is indexed by the nodes V = {v1 , ..., vn }. The set of undirected edges E ⊂ V × V is such that every edge (i, j) is identical to (j, i) which is denoted i ∼ j. The dependency structure between the random variables {Y (i) } depends on the topological structure given by E. This dependency structure is here limited to a covariance structure modeled by a diffusion kernel [21, 26, 3, 10], as follows.  by a random field model on G, denoted Y  (λ), whose covariance We seek to represent Y structure depends on a scale parameter λ > 0. This model is obtained by equalizing the variations due to a change of scale, with the spatial variations :  ˙ dλ [Y (j) (λ) − Y (i) (λ)] (2.1) Y (i) (λ + dλ) − Y (i) (λ) = j∈V : j∼i

that is written in vector form :  (λ + dλ) − Y  (λ) = −dλ L Y (λ) , Y L=D−A.

(2.2)

L is the Laplacian of the graph G. A is the binary adjacency matrix  (or connectivity matrix) defined by the edges : Ai,j = 1i∼j , and D = diag{di } where di = j Ai,j is the degree of node i, i.e. the number of edges connected to i. L is a symmetric positive semi-definite matrix. Ai,j can be extended to weights different from 1. (2.2) is the discretized version of the heat differential 1 A graphical random field is indexed by any graph, undirected and/or directed. This term includes Markov networks and Bayesian networks.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

6

B. Chalmond

 (0) at scale λ = 0 2 : equation that requires to choose an initial state Y 

d  dλ Y

(λ)  Y (0)

 (λ) , = −L Y given .

(2.3)

The solution of (2.3) is :  (λ) = Kλ Y  (0) , Y Kλ = e−λL ,

(2.4)

∞ M i where exp (M ) = i=0 i! . Kλ is called diffusion kernel. The result (2.4) is also valid for directed graphs.  (0). With the first one, we consider Y  (0) = Y  that provides a Consider two choices for Y  representation of Y at different scales, and this, in order to highlight specific structures of Y . This follows directly from the smoothing properties of the diffusion (2.4) :   (λ) = Kλ Y Y  Y (i) (λ) =

Kλ (i, j)Y (j) .

(2.5)

j∈V : j∼i

 (λ) is interpreted as a scale-space random field on V × R+ . Y The second choice concerns the generation of random fields with covariance matrix Kλ . This requires that the graph is undirected, since in this case, the exponential of the symmetric matrix  (0) = W  and therefore : L provides a semi-definite positive matrix. Here we consider Y  (λ) = Kλ W  , Y

(2.6)

2 where the W (i) are i.i.d. and verify IE(W (i) ) = 0, Var(W (i) ) = σW . The covariance matrix of 2 3  (λ) is then σ exp(−2λL) . The equation (2.3), with the initial state W  , allows to construct Y W  a random field Y (λ) with covariance matrix exp(−2λL) where the scale parameter λ rules the range of the spatial dependence. The more λ is large, the more the off-diagonal effects in Kλ increase.

2.2. Spatio-temporal GRF 2.2.1. Diffusion modeling and intrinsic innovations  was observed at fixed-time, i.e. at a given time t. Now Previously, the graphical random field Y we examine the situation where the field is observed over time, providing a time series of random T }. Note Y = (Y   , ..., Y   ) the concatenation of the T vectors Yt . Y can be seen 1 , ..., Y fields {Y 1 T as a graphical random field on a spatio-temporal graph G = {V, E} where V = ∪Tt=1 Vt with 2 3

Spatial Statistics

In the classic case of diffusion in R2 , λ is a time parameter. To lighten the writing we shall note Kλ instead of K2λ .

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

7

Vt = V for every t 4 . There are two types of edges : spatial edges E and temporal edges E, as follows :     T   E=E E = ∪Tt=1 Et ∪t=1 E t , (2.7) where Et ⊂ Vt × Vt and E t ⊂ Vt−1 × Vt . The edges (i, j) ∈ Et are undirected : i ∼ j, whereas the edges (j, i) ∈ E t are directed in the direction of time : j → i. The question here is the modeling of the random field Y by a spatio-temporal model. To do this, we extend the heat equation (2.3), assuming that there are two scales, a spatial scale λ and a temporal scale τ , and this with an initial state given by a random field W consisting of i.i.d. 2 : random variables with zero mean and variance σW ⎧ ∂ ⎨ ∂τ Y(λ, τ ) = −L Y(λ, τ ) ∂ (2.8) Y(λ, τ ) = −L Y(λ, τ ) ⎩ ∂λ Y(0, 0) = W. L and L denote respectively the Laplacian of the spatial graph (V, E) and the Laplacian of the temporal graph (V, E). Recalling (2.4), we obtain successively for each of these two scales : i)

Y(λ, τ ) = e−τ L Y(λ, 0)

ii)

Y(λ, 0) = e−λL W.

(2.9)

(2.9-ii) is the model (2.6) applied sequentially to each time t. At time scale τ = 0, this equation  t , t = 1, ..., T }. Both equations (2.9) t (λ, 0) = e−λL W provides T independent random fields {Y lead to a model of type Moving Average (MA), known as the Wold representation when the process is not limited in time. t (λ, τ )} defined in (2.9-i) is a process of type MA whose innoProposition 1. The process {Y  vation process is {Yt (λ, 0)} :  t−j (λ, 0) . Yt (λ, τ ) = Θj (τ )Y (2.10) j≥0

t (λ, τ )}. Moreover, we assume the process t } is modeled by the model {Y The time series {Y  t }. We model Z  t by Y t (λ, 0) :  {Yt } is maintained by a process of latent innovations, denoted {Z t (λ, τ ) , Yt ≡ Y t ≡ Y t (λ, 0) . Z t (λ, 0) and {Θj (τ )} correspond to the spatial Hence we have a parametric model in λ and τ . Y t (λ, 0) models the intrinsic part of the and temporal diffusions, respectively. At fixed-time, Y  process Yt . 4

Spatial Statistics

Bold characters are reserved to the spatio-temporal case.

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

8

B. Chalmond

Proof. To prove the property (2.10), it suffices to show that e−τ L is a lower matrix in (2.9-i). Without loss of generality, assume that connections are time-invariant : Et ≡ E and E t ≡ E , ∀ t .

(2.11)

Define the respective adjacency matrices A and A of these two sets of edges. A is a n × n symmetric matrix such as Ai,j = 1i∼j . A is a matrix of dimension n × 2n since it runs on ˇ 0], where A(i, ˇ j) = 1j→i two consecutive instants. It is composed of two blocks : A = [A, with j ∈ Vt−1 , i ∈ Vt , and 0 is a matrix identically zero. Relative to E and E, the adjacency matrices are A = diag[A, ..., A] and A = diag [A, ..., A]. A is a diagonal symmetric matrix and A is a lower triangular matrix, the main diagonals of the matrices 0 lying on the main diagonal of A. Finally, by denoting D and D the diagonal matrices of the adjacency degrees, the graph ˇ j), the r.h.s. of (2.9-i) Laplacians are written L = D − A and L = D − A. Denoting ai,j = A(i, is then written : ⎛ ⎡ ⎤⎞ .. ⎡ . ⎤ . .. ⎜ ⎢ ⎥⎟ ⎜ ⎢ ⎥ ⎥⎟ ⎢ −a1,1 ... −a1,n d1 0 ⎜ ⎢ ⎥ ⎥⎟ ⎢ ⎢ ⎟ .. ⎜ ⎢ ⎥ ⎥ .. .. t−1 (λ, 0) ⎥ ⎜ ⎢ ⎥⎟ ⎢ Y . . . ⎜ ⎢ ⎥ ⎥⎟ ⎢ ⎜ ⎢ ⎥ ⎥⎟ ⎢ −an,1 ... −an,n 0 dn ⎢ ⎢ ⎟ ⎜ ⎥ . ⎥ exp ⎜−τ ⎢... ...⎥⎟ ⎢ ⎥ −a ... −a d 0 1,1 1,n 1 ⎜ ⎢ ⎥ ⎥⎟ ⎢ ⎜ ⎢ ⎥ ⎥⎟ ⎢ Y  . . . t (λ, 0) .. .. .. ⎜ ⎢ ⎥ ⎥⎟ ⎢ ⎜ ⎢ ⎥ ⎥⎟ ⎢ ⎟ ⎜ ⎢ ⎥ ⎣ ⎦ −an,1 ... −an,n 0 dn ⎝ ⎣ ⎦⎠ .. .. . . Since the matrix A is lower triangular, the matrix exp (−τ L) is too. As a result, the process t (λ, 0)}. Since the t (λ, τ )} defined in (2.9-i) is a process of type MA where innovations are {Y {Y innovations are i.i.d., this model can be seen as a generalization of the Wold representation [4]. The n × n matrices Θj (τ ) are extracted from the t-th block row of the matrix [exp (−τ L)], as this is illustrated below. At this point, we can ask if some confusions could exist between spatial and temporal interactions. One knows that a recurrent network with feedback loops cannot be represented by a DAG because such a graph excludes the possibility of representing feedback loops in the graphical structure. However, if the interactions between the variables are not instantaneous, the recurrent network can be unfolded in time to obtain a directed, acyclic network (see Figure 1 in [19]). Our spatio-temporal model has both spatial and temporal connections. In order that the temporal DAG cannot be interpreted as an unfolded graph, we assume that spatial and temporal interactions occur at two distinct time scales. In other words, intrinsic interactions are instantaneous, compared to extrinsic interactions between two successive instants t and t + 1, which are much more slower.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

9

2.2.2. Two simplified models • MA(1) modeling- Limited to the order 1, (2.10) provides the MA(1) spatio-temporal model :  t + Θ1 (τ )Z  t−1 , t = Θ0 (τ )Z Y  t ∼ N (0, Kλ ) , with Z

(2.12)

where the innovations are assumed to be Gaussian. As an example, when exp(−τ L) is approximated by its one order Taylor expansion : exp(−τ L) ≈ InT − τ L = InT − τ D + τ A , we obtain the approximation Θ0 (τ ) ≈ In − τ D , Θ1 (τ ) ≈ τ Aˇ .

(2.13)

• AR(1) modeling- We rewrite the MA model as an autoregressive model, by using several known properties (see [4]). Formally, the AR(p) model is : t (λ, τ ) = Y

p 

t−j (λ, τ ) + Z  t (λ, 0) , Φj (τ )Y

j=1

 t (λ, 0) are independent to the past, which is the case by construction. where the innovations Z t (λ, 0) and {Φj (τ )} correspond to the spatial and temporal diffusions, As for the MA model, Y respectively. In fact, one knows that for the AR(p) model, there exists an equivalent MA representation whose matrices Θ’s are related to Φ’s as follows: Θj =

j 

Θj− Φ , ∀j = 1, 2, ...

(2.14)

=1

By recalling (2.12), and for sake of parsimony by restricting to p = 1,  provides Φ1 (τ ) = Θ−1 0 (τ )Θ1 (τ ). Thus, an AR(1) representation of Yt is : t−1 + Z t, t = Φ1 (τ )Y Y 0 = Z 0, Y  t ∼ N (0, Kλ ) . with Z

5

the system (2.14)

(2.15)

For every i ∈ Vt , the set πi = {j ∈ Vt−1 : Φ1 (τ )i,j = 0} represents the parents of i. The autoregressive model (2.15) generalizes the Dynamic Bayesian Network model [32] for which  t. the innovations are simply the white noise W 5

Spatial Statistics

Under some appropriate conditions, the MA(1) process is invertible and possesses an infinite AR representation.

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

10

B. Chalmond In the particular case of the example (2.13), we have in addition : Φ1 (τ ) ≈ (In − τ D)−1 τ Aˇ .

(2.16)

By replacing the Laplacian L by its normalized version LN = D−1/2 L D−1/2 whose diagonal elements are equal to 1, (In − τ D) becomes In (1 − τ ) and the new expression of Φ1 (τ ) in (2.16) is : τ N ΦN (2.17) Aˇ . 1 (τ ) = 1−τ 2.2.3. SDE modeling and extrinsic drift Our analysis, which began with the heat differential equation, will now continue with a stochastic differential equation (SDE). The heat equation is relating to the diffusion parameters λ and τ , while the SDE is relating to the time parameter t. This SDE is associated with the Markovian dynamic of Fokker-Planck [2] :  (t) = a(Y  , t)dt + b(Y  , t)dU  (t) , t ∈ R+ , 6 dY  , t) is a drift function, b(Y  , t) a diffusion matrix, and U (t) is a Brownian process. Its where a(Y sampled version is written as √ t+dt = Y t + a(Y  , t)dt + b(Y  , t)W  t dt , Y (2.18)  , t) = (Φ1 (τ ) − In )Y t and b(Y  , t) = Kλ , we  t ∼ N (0, In ). Formally, with dt = 1, a(Y where W  retrieve (2.15). In this case, a(Y , t) for which (Φ1 (τ ) − In ) does not depend on t, is not strictly speaking a drift. • A drift example - The advantage of the model (2.18) is the presence of a drift term which provides a means to model the extrinsic part of the system. We now present an example that will t } on a graph of size n = 3 such that for be processed further. Consider a random process {Y (2) every observed time series, the trajectory {yt } tends to be between the two other trajectories (1) (3) {yt } and {yt }, with a high probability. After a particular time tI < T , called intervention (1) (2) (1) time, the roles of Yt and Yt are reversed and therefore the trajectory {yt }t>tI tends to (2) (3) be between the trajectories {yt }t>tI and {yt }t>tI as illustrated in Fig.5. The effect of the (1) (2) intervention is an observable drift on the trajectories of {Yt } and {Yt } when t > tI . The simulated data in Fig.5 have been obtained using the ad hoc model : (1)

(1)

+ (Yt

(2)

(2)

+ (Yt

(3)

(3)

− (Yt

Yt+1 = Yt Yt+1 = Yt Yt+1 = Yt

(2)

+ Yt )cI 1t>tI + Zt

(3)

(1)

(1)

+ Yt )c + Zt

(1)

+ Yt )cI 1t>tI + Zt

(3)

(2)

(2.19)

(2)

(3)

,

where the drift function is (2)

 , t) = [(Yt a(Y 6

Spatial Statistics

(3)

(1)

+ Yt )cI 1t>tI , (Yt

(3)

(1)

+ Yt )c, −(Yt

(2)

+ Yt )cI 1t>tI ] ,

In this equation, Y (t) is momentarily a function of time, it should not be confused with the spatial model Y (λ).

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

11

(1)

(3)

with c < 0 and cI > 0. In (2.19), Yt and Yt are assumed to be negatively correlated, and the other two correlations are negligible. The correlation structure is taken into account in the  t }, while time dependence and drift are taken covariance matrix Kλ of the innovation process {Z  , t) = t with  ˙ Aˇt Y into account simultaneously in the function Yt + a(Y ⎡ ⎤ 1 cI 1t>tI cI 1t>tI ⎦ , c 1 c Aˇt = ⎣ −cI 1t>tI −cI 1t>tI 1 N . Aˇt is an adjacency matrix as Aˇ in (2.17). Stationarity of an AR(1) process requires that the eigenvalues of Φ1 (τ ) are smaller than 1 in absolute value, [4]. In the case of the drift (2.19), we consider as in (2.17) τ ˇ Φ1 (τ, t) = (2.20) A , 1−τ t 7

in which τ allows to prevent that the process explodes after tI , 8 .

2.3. Multi-scale modeling 2.3.1. Multi-scale GRF model at fixed-time Let us return to the situation of section 2.1. The outstanding issue at the end of the previous  (λ) of Y  modeling step (2.5) is the choice of λ. In other words, what is the smooth version Y  the most representative of some features of Y ? In fact, several scales may explain this profile. Therefore, the main idea consists of considering several scales Λ = {λ1 , ..., λq } [18, 35]. Fig.1  as a sum Y of q independent illustrates this representation. In this goal, we approximate Y random fields :  = Y + Y  /0 Y = =

q  j=1 q 

 /0  /j + Y Y

(2.21)

 /j + σ0 Y  /0 , σj Y

j=1

 /j denotes the component at scale λj , for j = 1, ..., q, and Y  /0 a residual white noise where Y process corresponding to λ0 = 0. The covariance matrix Kj of each unweighted component 7

 t ) = 0 and Cov(Y (1) , Y (3) ) < 0, before tI we have in Let’s comment (2.19) in terms of expectation. Since IE(Z t t (1)

the second equation μt equations, after tI , if

(3)

≈ −μt

(2) (μt

(3) + μt )

(2)

(2)

and thus μt+1 ≈ μt

≈ 0, μ denoting the expectation of Y . In the first and third

< 0 and

> 0 then μt

(2) μt

(1) (μt

(2) + μt )

(1)

(3)

and μt

tend to decrease if |c| and cI < 1. (1)

tends to increase and pass above the trajectory of μt . It implies in the second equation that 8 Fig.5 has been obtained as follows : K is mono-scale with K 1,1 = K3,3 = 0.9, K2,2 = 0.1, K1,3 = −0.9, and for the drift cI = −c = 0.9, τ /(1 − τ ) = 0.5. The trajectories were smoothed to improve readability.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

12

B. Chalmond

 /j is the diffusion kernel Kj with scale λj , (2.4). Therefore the covariance matrix of Y /j is Y Kj = σj2 Kj . The positive weight σj is all the more great as the scale λj significantly contributes  . Thus, the multi-scale diffusion kernel K ¯ σ¯ ,Λ of Y  is defined by to the random field Y  σ,Λ + K0 ¯ σ¯ ,Λ = K K = =

q  j=1 q 

Kj + K0 σj2 Kj + K0

(2.22)

j=1

=

q 

σj2 e−λj L + σ02 In ,

j=1

where σ denotes {σ1 , ..., σq } and σ ¯ = {σ0 , σ}. This is related to the additive spline models introduced by Wahba [38] chap.10, and later reintroduced under the name of multiple kernel [22] 9  /j } are unknown and have to be estimated. . The parameters (Λ, σ ¯ , q) and the components {Y  In this goal, we assume that Y is distributed according to the Gaussian law  ∼ N (0, K ¯ σ¯ ,Λ ) . Y

(2.23)

2.3.2. Multi-scale GRF decomposition at fixed time Component estimation is closely linked to the scale selection problem. Previously, Λ = {λ1 , ..., λq } denotes the scale domain in which there are q0 ≤ q unknown relevant scales. Given q0 and the parameters σ ¯ (their estimation will be described in a further section), the following Proposition expresses the scale components. Proposition 2. Assume the model (2.22) is over-parameterized, i.e. the decomposition Y in (2.21) depends only on q0 ≤ q unknown scales. Hence, given an observation y , the Bayesian estimation provides the scale components : yˆ/j = Kj U Dν−1ˆb , ∀j = 1, ..., q ˆb = (σ 2 D−1 + Iq )−1 U  y , 0 ν 0

(2.24) (2.25)

where Dν is the diagonal matrix of the q0 largest eigenvalues ν1 ≥ ... ≥ νq0 of the covariance matrix of Y : q   σ,Λ = Kj , (2.26) K j=1

 (Λ) = K ¯ σ¯ ,Λ Y  , which is the extension of (2.5) to the vectorial case, satisfies the heat Note that the model Y ∂    /j does not match the previous smooth version equation ∂λ Y (Λ) = −LY (Λ) for every j. In fact, the component Y 9

j

 (λj ) in (2.5), since the sum Y

Spatial Statistics



j

.  (λj ) does not reconstruct Y Y

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

13

 σ,Λ U = U Dν . The Bayesian and U is the matrix of the associated orthonormal eigenvectors : K  estimation is done with respect to the prior distribution B ∼ N (0, Dν ).  where B  is a Proof. Firstly, since the columns of U are independent, we can write Y = U B q0 -random vector. Then, the previous spectral equation allows to rewrite  =K  σ,Λ U D−1 B  Y = U B ν

=

q 

 Kj U Dν−1 B

=

j=1

where

Y /j

=

 Kj U Dν−1 B

q 

 /j , Y

j=1

,

(2.27)

 implies the covariance which provides in particular the components (2.24). Secondly, Y = U B matrix  = UK  σ,Λ U  = Dν . Cov(B) (2.28)  consists in maximizing For a given observation  y , the Bayesian estimation of the occurrence of B  ∼ the log-likelihood log p(b | y) = log p(y | b) + log p(b) + Cte. Given the Gaussian laws B /0 2  N (0, Dν ) and Y ∼ N (0, σ0 In ), this amounts to compute ˆb = argmax − 1 y − Ub2 − b D−1b , ν σ02 b

(2.29)

which provides the expression ˆb in (2.25). • Scale selection - To determine an estimate qˆ of q0 , we perform a diagonalization of the covari σ,Λ of which we retain only the qˆ largest eigenvalues according to the criteria ance matrix K qˆ ν qi=1 i = 1 −  , i=1 νi

(2.30)

where  is a positive parameter chosen close to 0, typically  = 0.01 or 0.05. This criterion is related to that used in Principal Component Analysis [17, 30]. It means the dispersion of Y can be approximatively represented by qˆ linearly independent components with an information loss determined by . ˆ These scales are associated From qˆ, we can then achieve the selection of relevant scales Λ. 2 with the qˆ largest σj , i.e. the scales whose components are the most involved in the dispersion  . As may be seen experimentally, the estimates σ ˆ have values of Y ˆj2 associated to scales in Λ\Λ close to 0. This selection therefore allows a pruning of non-significant scales, which evokes the Ridge regression [17].

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

14

B. Chalmond

2.3.3. Multi-scale decomposition of spatio-temporal GRF (2.21) is extended to obtain the multi-scale representation of the time series : t = Y =

q  j=1 q 

/j  /0 Yt + Y t

(2.31)  t/j σj Y

 t/0 σ0 Y

+

, t = 1, ..., T ,

j=1

or in matrix form

⎡ ⎤  /j Y Y1 q 1  ⎢ ⎢ .. ⎥ ⎢ .. ⎣ . ⎦= ⎣ . j=0 T  /j Y Y ⎡

T

or simply Y=

q 



⎤  /j Y 1 ⎥ ⎢ . ⎥ ⎥= ⎥ σj ⎢ ⎦ ⎣ .. ⎦ , j=0  /j Y

Y/j =

j=0



q 

T

q 

σj Y/j .

j=0

With these notations, (2.31) is rewritten  + Y/0 = Y=Y =

q  j=1 q 

Y/j + Y/0 σj Y/j + σ0 Y/0 .

j=1

 /j , ..., Y  /j ) obeys (2.9) or equivalently Property 1. For j = 1, ..., q, each component Y/j = (Y 1 T t/j } is such that Cov(Zt/j ) = Kj = e−λj L . (2.10). Its innovation process denoted {Z  is :  σ,Λ,τ of Y The covariance matrix K  σ,Λ,τ = K

q  j=1

Kj =

q 

σj2 Kj ,

(2.32)

j=1

where Kj and Kj are cross-covariance matrices denoted Kj = Cross-Cov(Y/j ) and Kj = Cross-Cov(Y/j ) and depending on (σj2 , λj , τ ) and (λj , τ ), respectively 10 . Therefore, we have ()

to determine the expression of Kj . This expression depends on the lag-covariance function Γj

10 (σ, Λ) refers to multi-scale, and τ to time scale. Thus, the covariance matrices K , K σ,Λ and Kσ,Λ,τ respectively λ denote the mono-scale spatial matrix, the multi-scale spatial matrix and the multi-scale spatio-temporal matrix.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

15



 t/j Y  /j ] of Y/j , as we now illustrate in the case of autoregressive model 11 . = IE[Y t+ • AR(1) model Proposition 3. For the AR(1) model, the following spatio-temporal factorization, made up of a D ’temporal’ matrix Φτ and a ’spatial’ matrix K σ,Λ , can be written in the form :  σ,Λ,τ = K D K σ,Λ Φτ .

(2.33)

 /j , ..., Y  /j ) is modeled by (2.15) : Proof. Recalling Property-1, each scale component Y/j = (Y 1 T  /j Y t

/j

/j

  = Φ1 (τ ) Y t−1 + Zt ,

/j

t ∼ N (0, Kj ). For this process, Kj is organized in blocks of size n × n (see [4]) satisfywith Z ing : ()

(0)

Kj [t, t + ] = Γj = Γj Φ1 (τ ) , for ≥ 1, (0)

with Γj

(0)

= Φ1 (τ )Γj Φ1 (τ ) + Kj ,

Kj [t, t − ] = Kj [t, t + ] . Therefore, Kj can be expressed as : Kj = KD j Φτ . Φτ is a Toeplitz block-matrix whose -th upper diagonal is filled with Φ1 (τ ) : Φτ [t, t + ] = Φ1 (τ ) , and Φτ [t, t− ] = Φτ [t, t+ ] for the lower diagonal. KD j is the T -diagonal block-matrix (0)

(0)

KD j = diag[Γj , ..., Γj ]. Recalling (2.32), we get the matrix factorization ⎛ ⎞ q q   D D  σ,Λ,τ = σj2 Kj = ⎝ σj2 Kj ⎠ Φτ = ˙ K K σ,Λ Φτ . j=1

j=1

 are • An approximated calculation - From (2.27) the spatio-temporal scale components of Y formally (2.34) Y/j = Kj UD−1 ν B , ∀j = 1, ..., q,  σ,Λ,τ . To avoid where (U, Dν ) comes from the spectral decomposition of the covariance matrix K  the computation of the full spectral decomposition (U, Dν ) of Kσ,Λ,τ , we reformulate the expression of the components Y/j by taking advantage of the factorization property (2.33). Recall 11 () Γj

This formulation can be extended to the non-stationary case in which (σj , λj ) depends on time t, and hence also ()

For this, it suffices to denote this dependency as (σj (t), λj (t)) and Γj,t , what we will not do to avoid overloading the formulas.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

16

B. Chalmond

 = UB (see Proof of Proposition 2). Instead of considering that the expression Y/j is based on Y  =U B  where this representation, we consider Y  = Φτ U˙ U D ˙ ˙ ˙ K σ,Λ U = UDν .

(2.35)

 is a basis of decomposition ”intermediate” between the spatial basis U ˙ and We conjecture that U D −1  σ,Λ U˙ D ˙ ν B,  and therefore =U B  = Φτ K spatio-temporal basis U. The new basis (2.35) implies Y with (2.32), we have ˙ ˙ −1  (2.36) Y/j = σj2 Φτ KD j U Dν B , ˙ but not the spatio-temporal basis U. which requires only to compute the spatial basis U,  =D ˙ ν , the generalization of (2.29) amounts to compute for every Finally, by imposing Cov(B) time-series y :  −1 ˙ν  ˆ = argmax − 1 y − Φτ U˙  bD b2 −  b, b 2 σ 0  b

(2.37)

   −1 U  = Φτ U. ˙ −1   y with U ˙ In fact, this generalˆ = (σ02 D which provides the expression b ν + U U)  ization is incomplete because unlike (2.28), Cov(B) is not a diagonal matrix as we can see from  =U B  since U  is not an eigenvector basis of K  σ,Λ,τ . Y

2.4. Parameter estimation • Random Fields at Fixed-Time - We consider the multi-scale spatial model (2.22)-(2.23) for which the weight parameters σ ¯ associated to a given set Λ, have to be estimated. Let y be an  . The unknown weights {σ 2 }q are estimated using the maximum likelihood occurrence of Y j j=0 ¯ for a given Λ. Assumprinciple. Let L(¯ σ |Λ) = log(pσ¯ ,Λ (y )) denote the exact log-likelihood of σ ¯ σ¯ ,Λ ), the log-likelihood is expressed up to an ing the probability density pσ¯ ,Λ is Gaussian N (0, K additive constant : ¯ σ¯ ,Λ | − y K ¯ −1 y . (2.38) L(¯ σ |Λ) = − log |K σ ¯ ,Λ The maximum likelihood estimate is computed under the constraint of positivity of the parameters σ ¯: σ ˆ (Λ) = argmax L(¯ σ |Λ) under the constraint σ ¯ >0. (2.39) σ ¯

For moderate sizes of n, the non-linear programming algorithms using gradient descent tech¯ σ¯ ,Λ | and niques are operational. For larger dimensions, the computation of the determinant |K −1 ¯ the inverse Kσ¯ ,Λ becomes a challenging issue. We experimented in the context of images on irregular grid an alternative method based on Monte Carlo computation [28], which avoids direct calculation and can be adapted to our case, (see also [13, 20] for other techniques). This computation is beyond the scope of our article.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

17

¯ σ¯ ,Λ and the spatio-temporal kernel is K ¯ σ¯ ,Λ,τ , • Random Field Time Series - The space kernel is K the latter depending on the new parameter τ . The simultaneous estimation of all parameters (¯ σ , τ ) is a heavy task. As in (2.39), it would require to maximize the likelihood −1

¯ σ¯ ,Λ,τ | − y K ¯ σ¯ ,Λ,τ y , L(¯ σ , τ |Λ) = − log |K 

¯ σ¯ ,Λ,τ = K  σ,Λ,τ + σ02 InT , as defined in Section 2.3.3. In practice, (¯ where K σ , Λ) is estimated at fixed-time using (2.39) 12 , then given this estimate and the spatio-temporal model, τ is estimated from the observed time-series. So for the AR(1) model, thanks to the independency of the  t , the maximum likelihood principle leads to compute the following estimate : innovations Z τˆ = argmax L(τ |σ, Λ) , τ

L(τ |σ, Λ) = −

T  t=2

yt − Φ1 (τ )yt−1 2 −1 .

(2.40)

Kσ,Λ

  t . In  σ,Λ = q σ 2 Kj is the multi-scale covariance matrix (2.26) of the innovations Z where K j=1 j our experiments, Φ1 (τ, .) is given by (2.17), or (2.20) in which it depends also on t. • Summary - The procedure for estimating the parameter set {¯ σ , Λ, τ, q, b} and the component set {y/j , j = 1, ..., q} is summarized as follows. Given an observed time series y and a domain Λ : 1. Compute at fixed-time σ ˆ (Λ) using (2.39). ˆ 2. Given , from Kσˆ ,Λ compute qˆ using (2.30) and then extract Λ. ˆ and σ ˆ . 3. Compute τˆ using (2.40), Λ ˆ (Λ) ˆ using (2.37) and {ˆy/j , j = 1, ..., qˆ} using (2.36). 4. Compute b

3. Experiments Fig.1 illustrates the multi-scale decomposition at fixed-time of a field yt that represents gene expressions of Bacillus Subtilis. The underlying graph G comes from the regulatory network of the bacterium. In steady-state, gene expressions are assumed to be governed by the model (2.23). We see the structuring effects of the method in terms of gene grouping as this had already been shown for other regulatory networks [10]. The following experiments are intended to illustrate the sensitivity of the method to highlight some temporal effects. This study is based on the use of statistical hypothesis testing [24]. 12

Spatial Statistics

For a single time t in case of stationarity, and for every time t otherwise.

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

18

B. Chalmond

3.1. Time effect testing Fig.2 relates to data simulated according to the AR(1) model (2.15). The graph G = (V, E) is structured into subgraphs called regulons (Fig.2-a, see also Fig.3). Each regulon has a main node, called regulator, connected to other nodes and regulators (Fig.2-b). The temporal graph (V, E) translates the connexions between the regulators. Fig.2-c shows an observation yt of the random field, and Fig.2-d depicts the profile of the first four observations y1 , ..., y4 . To highlight the effect of the temporal component, we consider the hypothesis test [H0 : τ = τ0 versus H1 : τ = τ0 ], where H0 imposes a restriction on τ . For instance, τ0 = 0 allows to test the significance of the one-lag AR model. The classical procedure of hypothesis testing requires a specific statistic of which we know the distribution under H0 . The well-known likelihood ratio statistic is LR = 2 log( ˆ1 / ˆ0 ) = 2(Lˆ1 − Lˆ0 ) where ˆi is the maximized likelihood under Hi . Under H0 , this asymptotically has a χ2 distribution with degrees of freedom equal to the number of restrictions imposed under H0 . In the case of large sample size, H0 is rejected if LR is greater than a χ2 critical value. For instance, to test the significance of a p1 th-order AR coefficient matrix ˆ 0 |/|K ˆ 1 |) where K ˆ i is the estimate against a lower p0 th-order, the likelihood ratio is LR = log(|K of the covariance matrix of the innovation process under Hi [4, 16].  σ,Λ is not required since this matrix depends only on In our case, the full estimation of K  σ,Λ , the likelihood ratio is σ that is given by the Step 1 of the Summary. Therefore, given K simply LR = 2 (L(ˆ τ |σ, Λ) − L(τ0 |σ, Λ)). In our simulations, T can be chosen large. However, in the case of small sample size, the χ2 approximation becomes too coarse. Critical values can be obtained using a Monte Carlo procedure, which is close to the popular bootstrap.  drawn from the distribution Let {zb , b = 1, ..., B} be a set of B independent samples of Z b  N (0, Kσˆ ,Λˆ ), and {y , b = 1, ..., B} the associated time-series generated under H0 using the AR model. For each time-series we compute the likelihood ratio LRb , which provides a sample of the LR statistic under H0 . From the histogram of this sample, the critical value is then computed for a given p-value. Unfortunately, the computational cost of this procedure is a major drawback. We now deal with real data. The graph G = (V, E) in Fig.3 is a small subgraph extracted from the regulatory network of B.Subtillis. This graph is composed of four regulons, respectively identified by four colors. A time series y1 , ..., yT of gene expressions has been acquired on G over T = 11 time points. Fig.4 shows the simplified gene expressions for this time series, as obtained in [6]. This treatment was done independently at each time t using a procedure that does not take into account time dependence, indeed it is based on (2.24). At each time t, the nodes are grouped into modules such that the expression profile of each module differs from that of its neighbors, relative to the Laplacian neighborhood structure L, as can be glimpsed in Fig.2-d. This procedure is an extension of the Lindeberg’s blob detection algorithm [25] to non regular graphs. The modules are organized around the local extrema of the scale components obtained as follows: /j argopt λj (Lyt )v , j,k,v  ∈Nvk

where Nvk ⊂ V denotes the neighboring nodes of v of order k, (k = 1, ..., κ). k = 1 means the nearest neighbors (NN), k = 2 means the nearest neighbors of v to which their NN are added,

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

19

and so on. The expression of the regulons fluctuates over time and therefore the shape of the detected modules also. In [6], a hypothesis was advanced which postulates that regulon expressions from time tI = 7 were influenced by a change in the nutritional environment of the bacteria, what should imply that the module configurations become quite stable from this instant. Nevertheless by examining Fig.4, we see that this hypothesis is contradicted between t = 7 and t = 11, at time t = 8. By repeating the same procedure as in [6] but using the model with time dependence (2.15, 2.36), the module configurations at time t = 8 become similar to the configurations at times t = 7, 9, 10, 11. This result is not so surprising since it is well known that Markovian dependence can highly improve the sensitivity of the detection of low signal, especially if it is surrounded by salient signals.

3.2. Drift on image sequence Fig.5 shows a SDE simulation for the drift (2.19). This drift has the effect of swapping the relative values of the variables Y (1) and Y (2) after the time tI . These variables are respectively identified by green and red colors. Hypothesis testing carried out on [H0 : cI = 0 versus H1 : cI = 0] is used to decide on the presence of a drift. The log-likelihood in (2.40) is rewritten L(τ |σ, Λ, cI ) where cI recalls that Φ1 (τ, t) depends on cI when t > tI . The likelihood ratio becomes LR = 2(L(ˆ τ |σ, Λ, cI )− L(ˆ τ |σ, Λ, 0)). This test can be generalized in order to estimate also tI . For this, LR is computed for each times tI = tmin , ..., tmax . From the resulting series {LR(tI )}, we select tˆI = arg maxt LR(tI ) and then we decide that tˆI is an intervention time if LR(tˆI ) is greater than a given critical value. We now briefly present the application that has motivated the introduction of the model (2.19). Fig.7 shows a time-series of T = 20 images acquired from a cell in suspension submitted to a dielectric field in a micro-fluidic device [29, 40, 36]. Dielectric field cage (DFC) technology has been demonstrated as a useful tool for manipulation of living cells in suspension. DFC can be used to spatially position non-adherent cells. It is also possible to perform cell fusion. If two cells are put into contact, the application of a high intensity electric pulse can lead to membrane rupture and subsequent fusion of cells. In this context, there is need for a quantitative analysis of membrane dynamics. On the images shown in Fig.7, the nodes of a graph G describing the structure of the membranes are overlaid. The fluorescent signals acquired on the n = 13 nodes over time compose a spatio-temporal GRF, which is represented by a SDE model of type (2.19). The structure of this graph is organized as above around three regulons. We see in Fig.6 that the mean intensity of the red regulon tends to become predominant compared to that of the green regulon. The model (2.19) is therefore appropriate for modeling the effects of the intervention on the three regulators, these effects being diffused over the whole graph. A detailed presentation of this work will be given in a future article.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

20

B. Chalmond

4. Concluding remarks At the beginning of this study, we had the multi-scale graphical model that we developed for module detection at fixed-time. Subsequently, our experiments have raised the issue of modeling a temporal process whose dynamics is maintained by an intrinsic latent process governed at every instant by the previous spatial model. Here, we show that this extension follows naturally from the spatio-temporal heat differential equation, its solution being a Wold representation whose innovations consist of the latent process. This property allows us to revisit classic models in conjunction with the Wold representation and also to add a drift term. A procedure is then described for estimating both the model parameters and the multiscale components of the process. This provides a general framework for obtaining a parametric representation of the phenomenon under consideration. In particular, this allows to perform quantified analyses on the basis of the estimated parameters as we have illustrated from statistical hypothesis testing. Some remarks can be made on two main assumptions underlying the model, namely, the knowledge of an invariant graph and the stationarity. The choice of the graph depends on the task at hand. However, temporally rewiring networks could be needed for capturing the dynamic interactions between variables. This difficult problem is accentuated by the small size of the time series. Stationarity is also required. Nevertheless, the drift component introduces a temporary change that a stationary model without drift would interpret as a non-stationarity. In our model, At is a time-dependent matrix. This is also the case in [32] in which a time-varying dynamic Bayesian network is proposed for modeling the varying directed dependency structures, but without drift. Furthermore, in this article the time-varying graph is seen as a non-stationarity. Finally, a delicate point is worth noting. It concerns the efficient computation of estimates, although the raised issues are old, as the inversion of Toeplitz matrices.

Acknowledgments The Editor and referees are gratefully thanked. Their comments have improved the manuscript. The author is grateful to Alain Trouv´e and Yong Yu for the experience we shared on the multiscale decomposition of images, which was an inspiration, as well as Benno Schwikowski and Xiaoyi Chen for valuable discussions about the adaptation of Bacillus subtilis to nutritional environments. I also thank warmly Philippa Gergaud for editing the English of this manuscript.

References [1] Yong-Yeol Ahn, James P. Bagrow, and Sune Lehmann. Link communities reveal multiscale complexity in networks. Nature, 466(5):761 764, 2010. [2] H.T. Banks, K. L. Rehm, and K. L. Sutton. Conversion of a dynamic social network stochastic differential equation model to Fokker-Planck model. Technical report, Center for Research in Scientific Computation, North Carolina State University, 2009. [3] Mikhail Belkin and Partha Niyogi. Semi-supervised learning on riemannian manifolds. Machine Learning, 56:209–239, 2004.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

21

[4] George E. P. Box, Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. John Wiley and Sons, 2008. [5] Bernard Chalmond. Modeling and Inverse Problems in Image Analysis. Springer-Verlag, 2003. [6] Bernard Chalmond and Xiaoyi Chen. A graphical modeling to scan network activity at modular level. Technical report, Institut Pasteur / Cergy-Pontoise University, 2012. Submitted. [7] Bernard Chalmond, Franc¸ois Coldefy, Etienne Goubet, and Blandine Lavayssi`ere. Coherent 3-d echo detection for ultrasonic imaging. IEEE Transaction on Signal Processing, 25(3):592–612, 2003. [8] Bor-Sen Chen and Wei-Sheng Wu. Robust filtering circuit design for stochastic gene networks under intrinsic and extrinsic molecular noises. Mathematical Biosciences, 211:342– 355, 2008. [9] Noel A. Cressie. Statistics for spatial data. John-Wiley and Sons, 1993. [10] Guro Dorum, Lars Snipen, Margrete Solheim, and Solve Saebo. Smoothing gene expression data with network information improves consistency of regulated genes. Statistical Applications in Genetics and Molecular Biology, 10(1), 2011. [11] Michael B. Elowitz, Arnold J. Levine, Eric D. Siggia, and Peter S. Swain. Stochastic gene expression in a single cell. Sciences, 297:1183–1186, 2002. [12] B.F. Finkenstadt, D.J. Woodcock, M. Komorowski, C.V. Harper, J.R.E. Davis, M.R.H. White, and D.A. Rand. Quantifying intrinsic and extrinsic noise in gene transcription: an application to single cell imaging data. Technical report, Centre for Research in Statistical Methodology, Warwick University, 2012. [13] Jerome Friedman, Trevor Hastie, and Robert Tibshiran. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9:432–441, 2008. [14] Carlo Gaetan and Xavier Guyon. Spatial Statistics and Modeling. Springer-Verlag, 2009. [15] C. A. Glasbey and D. J. Allcrof. A spatiotemporal auto-regressive moving average model for solar radiation. Journal of the Royal Statistical Society: Series C (Applied Statistics), 57(3):343–355, 2008. [16] James D. Hamilton. Time Series Analysis. Princeton University Press, 1994. [17] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer, 2009. [18] Lasse Holmstrom, Leena Pasanen, Reinhard Furrer, and Stephan R. Sain. Scale space multiresolution analysis of random signals. Computational Statistics and Data Analysis, 55:2840–2855, 2011. [19] Dirk Husmeier. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic bayesian networks. Bioinformatics, 19:2271–2282, 2003. [20] Harri Kiiveri and Frank de Hoogb. Fitting very large sparse gaussian graphical models. Computational Statistics and Data Analysis, 56:2626–2636, 2012. [21] Risi Imre Kondor and John Lafferty. Diffusion kernels on graphs and other discrete input spaces. In Morgan Kaufmann, editor, International Conference on Machine Learning, pages 315–322, 2002. [22] Gert R. G. Lanckriet, Tijl De Bie, Nello Cristianini, Michael I. Jordan, and William Stafford

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

22

[23] [24] [25] [26]

[27] [28] [29]

[30] [31]

[32] [33]

[34]

[35]

[36] [37]

[38] [39]

[40]

Spatial Statistics

B. Chalmond Noble. A statistical framework for genomic data fusion. Bioinformatics, 20(16):2626– 2635, 2004. Ann B. Lee and Larry Wasserman. Spectral connectivity analysis. Journal of the American Statistical Association, 105, 2010. Erich Leo Lehmann and Joseph P. Romano. Testing Statistical Hypotheses. Springer, 2004. Tony Lindeberg. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):77–116, 1998. Bojan Mohar. Some applications of Laplace eigenvalues of graphs. In G. Hahn and G. Sabidussi, editors, Graph Symmetry: Algebraic Methods and Applications, volume Ser. C 497, pages 225, 275. Kluwer, 1997. Chris. J. Oates and Sach Mukherjee. Network inference and biological dynamics. Annals of Applied Statistics, 6(3):1209–1235, 2012. R. Kelley Pace and James P. LeSage. A sampling approach to estimate the log determinant used in spatial likelihood problems. Journal of Geographical Systems, 11:209–225, 2009. Olivier Renaud, Jose Vina, Yong Yu, Christophe Machu, Alain Trouv´e, Hans Van der Voort, Bernard Chalmond, and Spencer Louis Shorte. High-resolution imaging of living cells in flow suspension using axial-tomography: 3-D imaging flow cytometry. Biotechnology Journal, 3(1):53–62, 2008. Ioannis D. Schizas and Georgios B. Giannakis. Covariance eigenvector sparsity for compression and denoising. IEEE Transactions on Signal Processing, 60(5):2408–2421, 2012. Amit Singer, Radek Erbanb, Ioannis G. Kevrekidisc, and Ronald R. Coifman. Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. PNAS, 106(38):16090–16095, 2009. Le Song, Mladen Kolar, and Eric P. Xing. Time-varying dynamic Bayesian networks. In NIPS, pages 1732–1740, 2009. Jian Sun, Maks Ovsjanikov, and Leonidas Guibas. A concise and provably informative multi-scale signature based on heat diffusion. In Eurographics Symposium on Geometry Processing, volume 28. Blackwell Publishing, 2009. Sorin Tanase-Nicola, Patrick. B. Warren, and Pieter Rein ten Wolde. Signal detection, modularity and the correlation between extrinsic and intrinsic noise in biochemical networks. Physical Review Letters, 97(6), 2006. Kevin Thon, Havard Rue, Stein Olav Skrovseth, and Fred Godtliebsen. Bayesian multiscale analysis of images modeled as gaussian markov random fields. Computational Statistics and Data Analysis, 56:49–61, 2012. Guilhem Velve-Casquillasa, Mal Le Berrea, Matthieu Piela, and Phong T. Trana. Microfluidic tools for cell biological research. Nano Today, 5(1):28–47, 2010. S. V. Vishwanathan, Alexander J. Smola, and Ren´e Vidal. Binet-cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes. International Journal of Computer Vision, 73(1):95–119, 2007. Grace Wahba. Spline models for observational data. SIAM, 1990. Zhi Wei and Hongzhe Li. A hidden spatial-temporal markov random field model for network-based analysis of time course gene expression data. Annals of Applied Statistics, 2:408–429, 2008. Yong Yu, Alain Trouv´e, Jiaping Wang, and Bernard Chalmond. An integrated statistical

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

23

approach for volume reconstruction from unregistered sequential slices. Inverse Problems, 24(4):58–74, 2008.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

24

B. Chalmond

gabT

gabR

cdd

dps

yjgD ycbP

ywiE

yxnA

csbA

ypuB ydaS

folD

xpt

ydaT

glgA

ysxE

yodQ

yheD

prkA

yusR usd

yjdH

yodR

ycgG

spmA yngG

yhjR

yqfZ yknT

yheC ydhDyqfD

yitC ydcC ypjB

ycgF

yunB

yaaH

yfhM

mreC

recU

ytvI

yjcA

ytrH

yfnD

ponA mreB ugtP

fabF

plsC

tuaD

ymfF

bltR

yrhH

hemX

uxaB

bdbC

yxjF

ctaA narK

psd

bdbD

pssA rsiX

yrrS ywnJ spoIIP

yknX yknW ybfP

yxjJ

fapR

yqhP

ycnJ

walR

yknZ

feuB

minD

fadM uxaC

fadR hmp

yesM

plsX

tuaH

dhbF

icd

nasE

ppsC

degQ

comK

sigH cypX rapH

kipR kipI ywlG comFA

yvrN

ycsI nasC

gltA

ureC ureB

yvzA yvcB

ureA

sigF

pbpH

yddJ

yvmC

ywbD

gerD

tlp

ycbCycbD

yhjC

oppD

yhcN

hutU

yydJ

yphF

yflB

ydjH

dnaG

ywbF ydeJ

ppsD

rapA

flgL gabP

racA

yorB

yocD pnbA

skfG

yqcG

ylbA

cotM

ppsA rocG skfH

cgeA

ynzD

yclJ

ybdN

spoIIGA

cgeD

yybL

ybaK

ylqB

tlpA

cotB

cotX

cgeE cgeC cotV ftsY cgeByxeE

yybN yybM

ybdO

codY

cwlH

cotW

yjcN

skfE

yxnB

cotY

sspG

sigL

yqxJ

yxbC yxbD

sinI

sigA

skfC

skfF

asnH dppA

bacF

yvqJ skfB

yxbB

dppE dppCkapB yxaM

dppD

bacB

yxkC

−2

yobB

slrR yfmI

kinB

bacC bacA bacE bacD

ymaE

scoC flgM

sigG

ydiPyneB

ruvA yqzHyhaO

yybK

spo0A

yscB

spoVAB

ylaJ

lytC

yhcV

yraF

gerKA

ybxH

ydfS

bcd

appF yfkQ

appC

salA

yfjR

cotH cotZ

200

400

600 800 1000 component at scale5

1200

1400

1600

200

400

600

1200

1400

1600

1

yclH

glcU

nfo

hag senS

ilvD

fabL

yufP

sspE

yufQ metP

sepF

bkdR yrrL

rocB lytF

ylmE

ykaA

metS

yvyE

yppE

yrkP

pdxS

pdxT

ycgA

ywcH

yusE

dnaN

clpQ

ahrC

yrkO

yqcF

ylmD

yqzC divIVA

med ylmH

ftsX tkt

yneF codV ylmA

yfmJ yqzD accD

yerB

nfrA

ilvA yufO

ypmP frlB frlN yufN frlD yuiA

yrkN

yrkQ

comZ ykfC

metQ metN

frlM frlO

−1

ylmG accAftsE yvhJ

yuxH

ykfD

yurJ

ykfA

guaB

csrA

0

yngB

ykcC

rocR

rocC

flgC cheB fliJ fliY fliL fliP cheA flgB fliM cheY flhA ylxF ycgN fliI flgE flhF cheW fliE fliR fliF cheD fliQ

ywfH

yuiB

pbpF

ykcB

bkdAAbuk

rocD rocE fliZ

cheC

fliG

yisY

yraE

yqfX sspD yrrD spoVAC ydfR gerKC adhB sspJ splA ykoU

fliK

sigD

flhB

ycgM

yraD

spoVAD sspP

bkdB bkdAB

ylxH fliH

gdh

ykoV

yfkR

yycA

yngA

ptb

lpdV

appB

yoaR

sspK

gerKB gerBA

appD

sspA

spoVAF

ccpB

1600

yclI sspL yraG

splB

pdaA

1400

cotG

lytA

yitF sspO yndE yndD

1200

yurS

lytB

rocA gerBB ypzA

exoA sspM

600 800 1000 component at scale4

cotU

yndF

csgA sspI

yvdQ sleB gerBC

400

0

sipV

ydgH

yhaZ ligA

yhaM yozL ruvB aprX

sdpA yoqM

phrA dppB

yvyF yvyG yuiC

ykuL

parE yqjX tagC

xynD

lysA yusN

yteA

gerE

lytR

yjdG

yitG

spoVAA sspF yozQ sspB yqfU

200

yhjE dnaE xkdA levR

yrpD

sdpB

ansZ

flgK

yjcM

2

ynzC

levG

acsA

yomJ

spo0FkinA

ykuV

ydgG

ydiO

yneA

sacC levD

hutM ackA hutH hutI hutG

yxaJ

yydI

ydjG

epr

1600

yhjD

sda

uvrC pcrA dinB

levE

yolC ptkA

yxaL

rapC nprE rapE

gerAC sspN spoIVB

yhcQ ypeB sspC

uvrB

uvrX uvrA yerH

parC ykvR yorC

levF

hutP

gerAB gerAA

yckD

−2

deoR

licR

lexA

acoBacoL acoA

epsK

yydH

spoVT pbpG

1400

hbs

yydG yqxI

rok

sspH

epsI epsC epsB

aprE spoIIAA spoIIAB

dacF

oppF

nupC

licA

acoR

ykuU

yolB

epsJ

oppA oppB oppC

ycbG

licB

yozM

bdbA

bdbB

acoC

gpr

1200

licH

licC ctaB nrdE

albC

sunA

epsH

arsB yqhG yqcK

yopL malP

glpF

albD

albE albA albB sboX sboA

yolJ

tasA

rapG

yabT yqhH seaA sacX

galK

glpT

glpP

glpK

gmuA rbsA

albF albG

sunT

yfmG

sinR

epsO

ycbJ

600 800 1000 component at scale3

0

pdp

yorD

sipW

ydjI

spoIIQ spoIIR

arsR

xylR

treA

ymaB

glpD

gmuGgmuB gntK gmuD rbsC

dltD

dltC phy

recA ilvB

abrB ilvH leuD leuC

epsA

leuB

ybaJ

yokL

ilvC

leuA epsG

yheJ yokK

ytfJ yhfM

lonB yjbA

yhfW

xynB

msmX cimH

dltAdltB

phrE

spoVG epsE

bdhA

epsF

epsN yqzG

yyaC arsC ylbB yphA

kdgA

licT

bglH

xynPyxiE

xylB

yvmB

yclK

csn

epsL

epsM

rsfA

ksgA

ypfB nasB

iolB treR

malA

gmuR

gmuC yvnA rbsR amyE gmuE

phrC

yocH

yrzI

yphE ylbC ytfI

yodF

yveA

nrdI

gntR

rbsD

dltE epsD phrK

ctaO ywcE

yvcA

yuaB

glnM

gltB

400

2

bglS

xylA

kdgK

treP sucD glpQ

gntZ

skfA

spo0E

yweA

yvrO

tnrA

wapA

bpr levB

sacY

200

acuC

odhA

cydC nrdF

gmuF

srfAAcomS srfAB

yqaP

yvrP

glnR kipA

ycsF yycC ywdI

sacB

gltC

iolR

iolG

iolD

dctP

odhB

bglP

srfAC srfAD

yokI xynC

yokJ

ppsB

degU

ykzB

lipC sipT glnP ywdK

rapK

comA

glnA

ywrD nasA

iolJ acuA yvcI ywdA

iolC iolF

iolI

ccpA

rbsK

ctaC

ctaF ctaD ctaE ctaG

cydB

rbsB gntP

cwlS

yttP

abh

cccA qcrA

cwlD

ylaE

iolT

araL

iolH iolE

yxkF

kdgR

yngC

resD qcrC

qcrB

ycsG yqzE glnH

ssbA

araB

sacA acuB

nagA

kduD cotD

kduI

sdpI

yobO

yerI

pel

iolS

citM

araA araQ pta

araE

cydA

phrG

ycdA

ydeH yolA

−0.1

sacP

araD lcfB

etfA

cotC

yhaR kdgT

rapF

yncM

yydF

yxeD antE

ppsE

sacT

ypiF

araN

fadN

nasD

ftsZ

ftsA

phrF

dacC

yttA

ywlF alsT

noc dprA sbcD yisB

fadB lcfA

fadE

cydD

sdpR

vpr

lytE phrI

spoVS ywhH

ywoF

comEC

yflN

abfA

yjdB stoA

citB

dhbC

ycnK

guaC

bofC yhcM csbX

yycB

glnQ ywdJ

citZ

mdh

ctpB

nasF

dhbE yqxD besA dhbB dhbA

tuaF tuaG

ycxA yuzA

pucK pucL pucM

yxxG maf comGD

fadA

araR

sucC

araP

ccpC fabD

katX

pucJ

sbcC

citT

etfB

fabG

pucR addB rpsR rpsF comFC comGE yhjB ybdK

comGB addAglcR rsmG comFB comGC comC

cstA

araM

ylbP resB resA resE resC

yflA

ssbB nucA

1600

0

galT

sigX

comGA

1400

acdA

yfkN

nsrR

yqhB

ykoL

comGG comEA

abnA rpoE

minC

pucG

nin comEB comGF

1200

0.1

fadH fadG

spoIVCB

perR

yknY

yvgN

pucI

comN

600 800 1000 component at scale2

fadF

hemB

spoIIIC

yoaW

yxzE

pspA

ymzB yvgO yxaB

radC

yqhQ

arfM

ahpF hemC

fnr

exuR

ypuD

rsiW ybfO

ybbA feuA

narG mrgA

katA

ahpC

scoA

phoP

pbpI

pucA pucFpucH pucE

narI

zosA

hemA

yjmC exuT yxjC mmgA mmgC

yjmD uxuA mmgB

ybfM

spo0M

ytxJ

hemL

uxaA scoB

rsbV rsbW rsbX

tagA

ytxG yvyD ytxH feuC

fabI

pucD pucC pucB

−5

narJ narH hemD rapD pbpX

tagE

ydjM tagD tagF

mcsB mcsA

fur

fabHA fabHB

400

sigM

phoR

tagB

yjeA

acpA

ywjB

200

ywbO

yxjI

ykvT cwlO iseA

clpC

yfmC yclP

yjbC

guaD fhuD

yfhC

fhuG

1600

0

blt

spoIIID

phoD yfhA

fhuC

ykuO

csbB pstS tuaC

tuaB

tatCD

fhuB yfmD

yclO yxeB

ykuN

yfiY ykuP yhfQ ycgT yclQ yclN

1400

bltD

ylxX

tatAD

pstA pstC

pstBB

ywjA yusV yfmF yfiZ

yfmE

csoR

1200

ydfK ymfD

ylxW

murBsbp metA

divIC

pstBA copZ yotD

copA

600 800 1000 component at scale1

5

ymfH

mta

spoIID spoIIIAG

divIB

400

yndA

coxA spoIIIAD

spoIIIAH bofA spoVE spoIIIAE

cotJC yebC

murF ytpA

200

yqfC yokU

cotJB yesK

ycgR ycgQ

phoB ydhF

bcrC phoA

ykvI

spsK

mreD

ytpB

disA

yfkM tuaE

spoIVCA murGspoVD cotJA spoIIIAA spoIIIAC spoIIIAF yesJ

sigE ypbG

ydaH

ykuT radA

ysxC

ctsR

yodT

sodF yhaX spoIIIAB

yhaL coaX

yhdK

ywtF

ywaC

yqjL yceC

sigI

ywmF

clpXlonA

ysnD

kamA cotO yhdL

ypuA

ftsH

sigW yceE

ispDyceH

sigB

rnr

spsJcwlJ

yodP

yjfA

bmrR hprT

secDF bmr

yceD yceG yceF yacL

ylxP

yoaA

clpP trxA

spoVK yyaD yobW

ddl

tilS

ysdB

gsiB

spoVB

yngI

yfhL

rsbRD

yflH

ysnF

yfhD

cypC

yfkS

cotE

dacB

yngE

mbl glgD

yabQ

ytrI

yfkT yocB

clpE

ypqA ispG

glgP

spmB

spoIVFA

ydeC ctc yhdF yxiS

mgsR

nusB

purR

gerM

spoVID

yabR

spoIIM safA

ykgA

yhxD

ygxB

purE

yabJ

purN

purH

spoVR

yqxA

yngF glgB

yjaV asnO

5 0 −5

yngJ

yteV nucB

yodS

yocL spoVMspoIVA

yeaA

ydjP

rodA yxbG

purF

pbuO

yybI yhbH sqhC ylbJ spoIVFBykvUglgC yhxC

yuzC

yqeZ

yqfB yozO pbpE

racX

opuE yoxB

ispF

nadE

yhdN

yraA

aag

purB

purC purL

purS pbuX glyA purM

yitD

ydcA ytxC

ydbT

yteJ

yqfA sppA xpaC

katE

ykgB

yaaI

yfkH

ytkL

nhaX purQ

purA pbuG

purDpurK

data X ydbS

yycD

yuaI yvrE

yckC

yvlB mreBH rsgI ybfQ yoaG

yvaK

ywjC

yfkD

ywzA

ykzI ywlB

ydaD csbC

yxzF

yfhE

ydaP

yfhK

era

ythQ yvlD

yfkI ywsB ydaG

yflT bmrU aldY

yhcO

yvlA

yfhF yjgB

yjgC

yoxC ydhK

ohrB

yobJ yjoB

yvlC ythP

ydaE

yugU

yocK ybyB ywmE

ywtG sodA

gspA

yjzE

yfkJ yxkO gtaB

ytaB

yitT

yuaF yaaNydjO ywrE yoaF fosB

gabD

ycdF csbD yerD ydaF

ycdG

ydbD

yppD

800

1000

spoIIE

kinC

clpY ykfB

yusD

yfhP

yfmS

dnaA

yoaH

frlR

degR

yjcP hemAT

pgdS

flhO tlpB mcpBlytD

tlpC

mcpA

ywlC

motB

cheV

yjcQ

yydA

yfmT

fliS

motA

yvyC

yjfB flhP mcpC fliD

fliT

argF

argD

carB argC argG

argJ

argB

carA

argH

(a)

(b)

gabT

gabT

gabR

cdd

ykzI

yjgC

yoxC ydhK

ywsB

ywlB

yxzF ywiE

aag

purB

folD

xpt

yxbG yxnA

csbA

purDpurK

clpE

ydaT

yceD yceG yceF yacL

yfkS

sigB

rnr

yoaA

ponA mreB ugtP

radA

disA

purR

mreD

ytpB

ykuT

trxA

clpXlonA

ydhF

purF

ctsR tuaE

fabF

plsC

phoA

tuaD

bltD

ylxX

tatAD

yfiZ

yfmE

hemX

csoR

fhuD

yfiY ykuP yhfQ ycgT yclQ yclN ykuO

tagB

tagE

yjeA

tagF

tagA

bdbC

pssA rsiX

fhuG

clpC

spo0M

ytxJ

fur

pbpI feuC

yrrS ywnJ spoIIP

exuR

yqhP

walR

yflA

katX addB rpsR rpsF comFC comGE yhjB ybdK

nasE

cydD

bofC yxxG comGG comEA

ywlF

degQ

ycsG yqzE glnH

comK

sigH

nasA

cypX rapH

kipR kipI ywlG comFA

yvrN

ycsI nasC

gltA

ureC ureB

yvzA yvcB

ureA

sigF

pbpH

yokL

gltA

yvmC

arsB yqhG yqcK

ywbD

sspH

ptkA

yxaL yydJ

yphF

hutP yflB

ydjH

dnaG yjcM

bacC bacA bacE bacD

dppD

sigA

yhjE

yvmC

arsB yqhG yqcK

ywbD

yybK

cotU

cotG yurS

sspC

yfjR

yclH

bcd

appF yfkQ

appC

salA

ycgM

hag senS

yuiB

pbpF

ilvD

fabL

yufP

sspE

guaB

cheC

yurJ

rocD rocE

ylmE

yrkQ

ykfC ycgA

ywcH dnaN

clpQ yneF codV ylmA

yfmJ yqzD accD

yerB

nfrA

ilvA yufO

ahrC

ftsX tkt

yusE med ylmH

ylmD

yqcF

yrkO

yrkN

appC

salA

mcpA

ycgM

hag senS

yuiB

pbpF

ilvD

fabL

yufP

sspE

guaB

yqzC divIVA

degR

cheC

yurJ

rocD rocE

yrrL

ylmE

pdxT yrkQ

ykfC ycgA

ywcH dnaN

clpQ yneF codV ylmA

yfmJ yqzD accD

yerB

nfrA

ilvA yufO

yrkP

pdxS

ylmG accAftsE yvhJ

comZ metQ

ykaA

metS

yvyE

yppE yuxH

ykfD

metN

ypmP frlB frlN yufN frlD yuiA

csrA

yngB

ykcC

sepF

bkdR

rocB lytF

ahrC

ftsX tkt

yusE med ylmH

ylmD

yqcF

yrkN

yrkO

yqzC divIVA

yppD spoIIE

kinC

clpY ykfB

yusD

yfhP

yfmS yoaH

frlR

yjcQ yvyC

ywlC yydA argD

yfmT

fliS

motA

yjfB flhP mcpC fliD

fliT

yclH

yycA

ykcB

bkdAAbuk

rocR

rocC

fliZ

frlM frlO

yppD

pgdS

flhO tlpB mcpBlytD

tlpC

carB argC argG

fliK

sigD

flgC cheB fliJ fliY fliL fliP cheA flgB fliM cheY flhA ylxF ycgN fliI flgE flhF cheW fliE fliR fliF cheD fliQ

ywfH yufQ

metP ykfA

flhB fliG

yisY

dnaA motB

cheV

yngA

bkdB bkdAB

ylxH fliH

glcU yraE

ptb

lpdV

appB

yoaR gdh

yraD

yfmT

fliS yjcP hemAT

cotG yurS

cotU

yfjR yclI

bcd

appF yfkQ

ykoV

spoIIE

yusD yfmS

yvyC

argF

cotH cotZ

lytA lytC

ydfS

yqfX sspD yrrD spoVAC ydfR gerKC adhB sspJ splA ykoU

yoaH yjcQ

argD

cotX

lytB

rocA

gerKA sspK

yfkR

kinC ykfB

motA

yjfB flhP mcpC fliD yydA

yybK

spo0A

yscB

appD

sspA yraF

gerKB

clpY

dnaA frlR

ywlC

tlpA

cotB

cgeE cgeC cotV ftsY cgeByxeE

sspL yraG splB

gerBA

nfo

pdxT comZ

metQ

yfhP

ybdO

codY

cwlH

cotW cgeD

yybM

lysA

spoVAD sspP

yrkP

pdxS

yhaM yozL ruvB aprX

sipV

ydiPyneB

cotM cotY

sspG

yclJ yjcN yybL yybN

gerBB ypzA

yhcV

spoVAF

ykaA

metS

yvyE

ylmG accAftsE yvhJ

ydgH

yorB

cgeA

sigL ybdN

ybaK

yndF

sspO yndE yndD ylaJ

ybxH

yrrL yppE yuxH

parE yqjX tagC yhaZ ligA ruvA yqzHyhaO

ppsA rocG ynzD

skfE

yusN

exoA sspM

gerBC

ccpB

sepF

bkdR

rocB lytF

ykfD

metN

ypmP frlB frlN yufN frlD yuiA

yycA

yngB

ykcC

rocR

rocC

fliZ

frlM frlO

csrA

ykcB

bkdAAbuk

pdaA

fliK

sigD

flgC cheB fliJ fliY fliL fliP cheA flgB fliM cheY flhA ylxF ycgN fliI flgE flhF cheW fliE fliR fliF cheD fliQ

ywfH yufQ

metP ykfA

flhB fliG

yisY

yhjE dnaE

gerE

ylbA

skfH

ylqB

yitF yngA

bkdB bkdAB

ylxH fliH

glcU

yraD yraE

ptb

lpdV

appB

yoaR gdh

ykoV

yqfX sspD yrrD spoVAC ydfR gerKC adhB sspJ splA ykoU

ydiO ynzC

xkdA levR

yvqJ

skfG

yqcG yqxJ

yxbC yxbD

spoIIGA

csgA sspI

yvdQ sleB

yclI

ydfS nfo

yneA

levG

lytR

yjdG

yocD pnbA

skfC

skfF

asnH

sinI

yxnB

yitG

spoVAA

yteA

lytC

gerKA sspK

yfkR

spoVAD sspP

levE sacC levD

ykuL

xynD

yrpD

yobB

slrR skfB

yxbB

dppE dppCkapB yxaM

dppA sigA

spoVAB

lytA

appD

sspA yraF

gerKB gerBA

dppD

bacB

bacF

ydgG

sdpA yoqM yfmI

bacC bacA bacE bacD

hutM ackA hutH hutI hutG acsA

ywbF ydeJ

ppsD

racA

kinB ymaE

scoC

yxkC

lytB

sspL yraG splB

ybxH ccpB

yphF

hutP yxaJ

yomJ yjcM sdpB

rapA phrA dppB

yvyF yvyG yuiC

yhjD

sda

uvrC pcrA dinB

yydI dnaG spo0FkinA

ykuV

flgL gabP

flgM

sspF yozQ sspB yqfU

spo0A rocA

pdaA

yolC hutU

yydJ

ydjG

epr

flgK

sigG

yhcQ ypeB

ykvR yorC

levF

ptkA

yxaL

yflB

ydjH

rapC nprE rapE

gerAC sspN spoIVB

ansZ

yitF

yhcV

spoVAF

uvrB

parC

lexA

acoBacoL acoA

epsK

yydH

yhjC

oppD

yhcN

gerAB gerAA

cotH cotZ

sspH

epsI epsC epsB

aprE spoIIAA spoIIAB

dacF

oppF

gerD

tlp yckD

cotY cwlH

cotX

deoR

licR

uvrX uvrA yerH

yydG yqxI

rok

oppA oppB oppC gpr

ycbG

ycbCycbD

spoVT pbpG

tlpA

gerBB ypzA

sspO yndE yndD ylaJ

yabT yqhH seaA sacX

yhaM yozL ruvB aprX

sipV

ydiPyneB

cotM

yybM

ybdO

codY yscB

spoVAB

exoA sspM

ydgH

yorB

cotW cotB

cgeE cgeC cotV ftsY cgeByxeE

yndF

csgA sspI

yvdQ sleB gerBC

ydiO

dnaE

parE yqjX tagC yhaZ ligA ruvA yqzHyhaO

sspG

cgeD

nupC

licA

hbs yfmG acoC

ynzC

gerE

yclJ yjcN yybL yybN

licB

yozM

acoR

ykuU

yolB

epsJ

cgeA

sigL ybdN

lysA yusN

yteA

yneA

xkdA levR

ppsA rocG ynzD

ybaK

ylqB

yitG

spoVAA sspF yozQ sspB yqfU

levE sacC levD

ykuL

ylbA

skfH

skfE spoIIGA

licH

ctaB nrdE

bdbA

bdbB

epsO

yvqJ

skfG

yqcG yqxJ

yxbC yxbD

sinI

yxnB

licC

glpF

albC

sunA

tasA

epsH

lytR

yocD pnbA

skfC

skfF

asnH dppA

bacF

yopL malP

glpT

glpP

glpK

gmuA rbsA

albD

albE albA albB sboX sboA

yolJ

yorD

leuB sipW

sinR

rapG

yobB

slrR skfB

yxbB

dppE dppCkapB yxaM

bacB

yxkC

pdp galK

glpD

gmuGgmuB gntK gmuD rbsC

albF

xylR

treA

yvmB ymaB

albG

sunT

ilvH leuD leuC

epsA

yddJ

xynB

msmX

dltD

dltC phy

recA ilvB

abrB ilvC

leuA epsG

yheJ

ydjI

licT

bglH

xynPyxiE

xylB

kdgA

cimH

dltAdltB

phrE

spoVG epsE

bdhA

epsF

epsN

ycbJ

iolB treR

malA

yclK

csn

epsL

epsM

yokK ybaJ

glpQ

gmuR

spoIIQ spoIIR

yjbA arsR

xylA

kdgK

treP sucD nrdI

gntR

gmuC yvnA rbsR amyE gmuE

phrC

yocH

yrzI

yphE

yokL

gmuF

rbsD

dltE

pbpH

ytfJ yhfM

lonB

yhfW

iolT

odhA

cydC nrdF

gntZ

skfA

epsD phrK

ctaO ywcE

yvcA

yuaB

yqzG

arsC ylbB yphA

ydgG

sdpA yoqM yfmI

kinB ymaE

scoC flgM

xynD

yrpD yjdG

racA

phrA dppB

yvyF yvyG yuiC

levG

acsA

ywbF ydeJ

ppsD

sdpB

rapA

flgL gabP

hutM ackA hutH hutI hutG

yxaJ

yomJ

spo0FkinA

ykuV

epr ansZ

sigG

yvzA yvcB

rsfA

ksgA

ypfB nasB

sacY

sda

uvrC pcrA dinB

yydI

ydjG

rapC nprE rapE

gerAC sspN spoIVB

flgK

sspC

hutU

epsK

yydH

yhjC

oppD

yhcN

gerAB gerAA

ykvR yorC

levF

yolC

epsI epsC epsB

aprE spoIIAA spoIIAB

dacF

oppF

gerD

tlp yckD

spoVT pbpG

yhcQ ypeB

ureB ureA

ylbC ytfI

yodF

yyaC

oppA oppB oppC gpr

ycbG

ureC

sigF

glnM

gltB

yveA

lexA

acoBacoL acoA

rbsK

ctaC

ctaF ctaD ctaG

srfAAcomS srfAB

spo0E

yweA

yvrO

tnrA

ycsF yycC ywdI wapA

bpr levB yhjD

uvrX uvrA yerH

parC

acoC

ycbCycbD

yvrN

ycsI nasC lipC sipT glnP ywdK

sacB

gltC

uvrB

hbs

yydG yqxI yolB

rok

ctaE

cydB

rbsB gntP

cwlS

srfAC srfAD

yokI xynC

yqaP

yvrP

glnR kipA

deoR

licR

ppsB

degU

ykzB

nupC

licA

yozM

acoR

ykuU

epsH

yabT yqhH seaA sacX

licB

rapH

kipR ywlG comFA

licH

ctaB nrdE

bdbA

bdbB

sigH yokJ

cccA qcrA

cwlD

yttP

abh

rapK

comA

cypX

odhB

bglS bglP

resD qcrC

ylaE

ppsC

degQ

comK glnA

kipI

licC

glpF

albC

sunA

tasA

epsJ

ywlF

ywrD nasA

acuC

iolD

dctP

yobO

yerI

pel

qcrB

ycsG yqzE glnH

ssbA

yopL malP

glpT

glpP

glpK

albD

albE albA albB sboX sboA

yolJ

yfmG

sinR

rapG

yolA

iolR

iolG

iolI

ccpA ycdA

ydeH

phrF

dacC

iolJ acuA yvcI ywdA

iolC iolF

yxkF

kdgR

yngC

ppsE

alsT

noc dprA sbcD yisB

comEC

gmuA rbsA

albF

xylR

kduD cotD

kduI

sdpI

ywoF

pdp galK

glpD

gmuGgmuB gntK gmuD rbsC

yddJ

epsO

ycbJ

xynB

treA

yorD

sipW

ydjI

spoIIQ spoIIR

yjbA arsR

xylB

kdgA

yvmB ymaB

araL

iolH iolE

cydA

phrG

iolS

sacA acuB

pta

araE

rapF

yncM

yydF

yttA

glnQ ywdJ

citM araB

nagA

yhaR kdgT

nasD

yxeD antE

yhcM csbX

yycB

maf comGD

nin comEB comGF

albG

sunT

ilvH leuD leuC

epsA

leuB

ybaJ

glpQ

comGG comEA

msmX

dltD

dltC phy

recA ilvB

abrB ilvC

leuA epsG

yheJ yokK

ytfJ yhfM

lonB

yhfW

bofC yxxG

licT

bglH

xynPyxiE comN

cimH

dltAdltB

phrE

spoVG epsE

bdhA

epsF

epsN yqzG

yyaC arsC ylbB yphA

xylA

iolB

sacP araA

cotC

sdpR

vpr

lytE

ftsZ

ftsA

sacT

ypiF

araN araD lcfB

etfA

cydD

citB

phrI

spoVS ywhH

citT

yflN

abfA

fadB lcfA

fadE

araQ

ycnK

guaC

pucK pucL pucM

citZ

fadN

dhbC ycxA yuzA pucJ

comGB addAglcR rsmG comFB comGC comC

fadA

icd mdh

araR

sucC

yjdB stoA

sbcC

treR

malA

yclK

csn

epsL

epsM

rsfA

ksgA

ypfB nasB

katX rpsR rpsF comFC comGE yhjB ybdK

nasE

cstA

araM araP

ctpB

nasF

dhbE yqxD besA dhbB dhbA

ssbB nucA comGA

bglS

kdgK

gmuR

yesM

etfB

ccpC

dhbF

tuaF

iolR

acuC

odhA

treP sucD nrdI

gntR

gmuC yvnA rbsR amyE gmuE

phrC

yocH

yrzI

yphE ylbC ytfI

yodF

yveA

odhB

cydC nrdF

gmuF

rbsD

dltE epsD phrK

ctaO ywcE

yvcA

yuaB

glnM

gltB

cydB

gntZ

skfA

spo0E

yweA

yvrO

tnrA

wapA

bpr levB

sacY

rbsK

ctaC

ctaF ctaD ctaG

srfAAcomS srfAB

yqaP

yvrP

glnR kipA

ycsF yycC ywdI

sacB

gltC

ctaE

srfAC srfAD

yokI xynC

yokJ

ppsB

degU

ykzB

lipC sipT glnP ywdK

rapK

comA

glnA

ywrD

ssbA

abh

ppsC

alsT

noc dprA sbcD

cccA qcrA

rbsB gntP

cwlS

yttP

galT

fadR

yflA

tuaH

addB

iolT

araL

iolJ acuA yvcI ywdA

bglP

resD cwlD

qcrB

fadM uxaC

resA resE resC

plsX

iolG

iolD

dctP

yobO qcrC

ylaE

acdA

yfkN

nsrR

ylbP

tuaG

abnA rpoE

minD

yqhB

ykoL

iolI

yerI

pel

ywoF

yisB

comEC

yolA

ppsE

fadH fadG

spoIVCB

perR

yknY

hmp

fabD

iolH

iolC

kdgR

ccpA ycdA

ydeH

phrF

dacC

yknZ

fabG

iolF

yxkF

yttA

glnQ ywdJ

fadF

spoIIIC

yoaW

yxzE

pspA

ymzB yvgO yxaB

radC

minC

pucR

sacA acuB

pta nagA

kduD cotD

kduI

sdpI

yngC

exuR

ypuD

rsiW ybfO

yvgN

iolS

citM

iolE cydA

phrG

yncM

yydF

walR

yqhQ

arfM

ahpF

hemB

sigX

araB

araE

rapF

yhcM csbX

yycB

maf comGD

nin comEB comGF

comN

yxjJ

sacP araA

cotC

yhaR kdgT

nasD

yxeD antE

ywhH

yknX yknW ybfP

pucG

sacT

ypiF

araN araD lcfB

etfA

sdpR

vpr

lytE

ftsZ

ftsA spoVS

sbcC

yrrS ywnJ spoIIP

pucI

yflN

abfA

fadB lcfA

fadE fadN

citB

phrI

guaC

pucK pucL pucM

citZ

araQ

ycnK

pucJ

comGB addAglcR rsmG comFB comGC comC

pbpI

feuA

narG

katA

hemC

fnr

phoP

ytxG yvyD ytxH feuC

fapR

ahpC

scoA

resB

yjdB stoA

dhbC ycxA yuzA

ssbB nucA comGA

fadA

icd mdh

ctpB

nasF

dhbE yqxD besA dhbB dhbA

spo0M

ytxJ

mrgA hemA

yjmC exuT yxjC mmgA mmgC

yjmD uxuA mmgB

ybfM

mcsA

fur

ycnJ

pucD pucC pucB

citT

araR

sucC

araP

ccpC

dhbF

tuaF

clpC acpA

ybbA

cstA

araM

yxjF

ctaA narK

psd

bdbD

mcsB

feuB

plsX

tuaH

bdbC

zosA

hemL

uxaA scoB

rsbV rsbW rsbX

tagA

pssA rsiX

pucA pucFpucH pucE

fadR yesM

etfB

resA resE resC

fabG

tuaG

tagF

fabHA fabHB fabI

galT

ylbP resB

ykoL

tagE

ydjM tagD yjeA

fadM uxaC

yqhB hmp

fabD

tagB ykvT cwlO

acdA

yfkN

nsrR

uxaB

iseA

rpoE

minD

sigX

pucR

hemX

fhuD

yfhC

narI

hemD rapD pbpX

spoIVCB

perR minC

bltR

narJ narH

phoR

ywjB abnA

spoIIIC

yoaW

yknY

yvgN

blt

phoD

yqhP yknZ

feuB

pucG

pucI

bltD

sigM

yxjI

fhuG

fadH fadG

yxzE

pspA

ymzB yvgO yxaB

radC

yqhQ

ykuO

ypuD

rsiW ybfO

ybbA feuA

fadF

yrhH

guaD

yfiY ykuP yhfQ ycgT yclQ yclN

ahpF

hemB

yfmC yclP yknX

yknW ybfP

yxjJ

fapR ycnJ

pucA pucFpucH pucE

csoR

katA

hemC

phoP

ytxG yvyD ytxH

fabHA fabHB fabI

ahpC

scoA

fnr

ymfH ymfF ydfK ymfD

spoIIID

yjbC

yfhA

fhuC

hemA

yjmC exuT yxjC mmgA mmgC

yjmD uxuA mmgB

ybfM

mcsA

acpA

ywjB

pucD pucC pucB

yxjF

ctaA narK

psd

bdbD

mcsB

arfM

fhuB yfmD

yclO yxeB

ykuN

narG mrgA

ylxX

ywbO

csbB pstS tuaC

tuaB

tatCD

yfmF yfiZ

yfmE

narI

zosA

hemL

uxaA scoB

rsbV rsbW rsbX

ydjM tagD

ykvT cwlO iseA

yfhC

yfmC yclP

uxaB

yndA

mta

spoIID

murBsbp metA

tatAD

pstA

hemD rapD pbpX

yqfC yokU coxA spoIIIAD

spoIIIAG

ylxW

pstBA pstC

pstBB

yusV

copA narH

phoR

yxjI

phoA

divIB

murF ytpA

tuaE

ykvI

spoIIIAH bofA spoVE spoIIIAE

cotJC yebC

divIC

bcrC

tuaD

yodT

ysnD

yfnD

ydhF

cotJB yesK

ycgR ycgQ

phoB

ctsR

ywjA

guaD

ytrH sodF

spsK

mreD

mreB

fabF

spsJcwlJ ytvI kamA

ponA ytpB

ugtP

plsC

spoVK yyaD yobW

yhaX spoIIIAB

ydaH

yqjL yceC

sigI

yfkM

yotD

yfhA

fhuC

yitC ydcC ypjB

spoVB yodP

spoIVCA murGspoVD cotJA spoIIIAA spoIIIAC spoIIIAF yesJ

sigE ypbG

ykuT

copZ

sigM

yjbC

phoD

fhuB yfmD

yclO yxeB

ykuN

sigB

rnr

yoaA

narJ

yfmF

copA

yrhH

usd yqfZ yknT

yunB

yhaL coaX

yhdK

ywtF

ywaC

clpXlonA

bltR

csbB pstS tuaC

tuaB

tatCD

yusR

yheC ydhDyqfD

cotO

ypuA

ftsH

sigW yceE

ispDyceH

ywbO

pstA pstC

pstBB

ywjA yusV

yheD

spmA yngG

yhjR

ycgF

yjcA

yceD yceG yceF yacL

ysxC

pstBA copZ yotD

yodQ yjdH

ycgG

yhdL

secDF bmr

ysnF

disA

purR

glgA

ysxE prkA

yodR

yjfA

radA

pbuO

cotE

dacB

yngE

mbl glgD

yabQ

yngI

purF

spoIIID

ypqA ispG

glgP

spmB

ytrI

yfhD

yfkS

gerM

spoVID

yabR

spoIIM

spoIVFA

ddl

bmrR

ywmF

blt

metA

divIC

bcrC

ydaT

spoVR

yqxA

yngF glgB

yjaV asnO

safA

ydjP mreC

recU

hprT

ylxP

trxA

purH

yngJ

yteV nucB

yodS

yocL spoVMspoIVA

yeaA

yaaH

clpP

purM

yybI yhbH sqhC ylbJ spoIVFBykvUglgC yhxC

yuzC

yqeZ

yqfB

ysdB

yfkT

yhdF

cypC

yitD

ydcA ytxC

ydbT

yteJ

yozO pbpE

tilS

yocB

clpE

purL

purS purN

ymfF ydfK ymfD

yoaG

yfhM

gsiB yflH

yxiS

pbuX glyA

ymfH

mta

spoIID spoIIIAG

murBsbp

ytpA

yfkM

purDpurK

spoIIIAH bofA spoVE spoIIIAE

ylxW

yckC

yvlB mreBH rsgI ybfQ

yqfA

racX

yfhL

rsbRD

ydeC ctc

yhxD

mgsR

yhcO yobJ yjoB

yvlA

ythQ yvlD

rodA yxnA

csbA

ykgA

ygxB

purE

nusB

cotJC

divIB

murF

ysxC pbuO

cotJB

yebC

folD

xpt

yabJ

purC

spoIIIAD

yesK

ycgR ycgQ

phoB

purA pbuG yndA

coxA

aag

purB

yqfC yokU

ydaH

yqjL yceC

sigI

ywmF purN

purH

ykvI

spoIVCA murGspoVD cotJA spoIIIAA spoIIIAC spoIIIAF yesJ

sigE ypbG

ypuB ydaS

yoxB

yxbG

nadE

yhdN

yraA

ydbS

opuE yaaI ispF

ytkL

nhaX purQ

spsK

yhaL coaX

yhdK

ywtF

ywaC

yhaX spoIIIAB

yhdL

ypuA

ftsH

sigW yceE

ispDyceH

ywlB

ydaD csbC

yxzF ywiE

yfhE yodT

ysnD

sodF yfnD

secDF bmr

ysnF

yfhD ylxP

clpP

purS pbuX glyA purM

ytrH kamA

cotO

sppA xpaC

katE

ykgB

yfkH

aldY

ytvI

yjcA

bmrR hprT

yuaF yaaNydjO ywrE yoaF fosB yvlC ythP

yycD

yuaI

ywsB

ohrB

spsJcwlJ

yodP

yjfA

ddl

tilS

yocB

cypC

mreC

recU

ysdB

yfkT

yhdF yxiS

purL

yaaH

yfhM

gsiB

spoVB

yngI

yfhL

rsbRD

yflH

yvrE

ydaG

yflT bmrU

spoVK yyaD yobW

yfhF

yvaK

ywjC

yfkD

ywzA

ykzI

yjgC

yoxC ydhK

yitC

yunB

era

yfkI

ydcC ypjB

ycgF

ytrI

ycbP

ybyB ywmE

ywtG sodA

gspA

yqfZ yknT

yheC ydhDyqfD

yabQ

spoIVFA

ydeC ctc

yhxD

mgsR

nusB

safA

yeaA

ykgA

ygxB

purE

yabJ

purC

usd

ydaP

yfhK

yocK

yusR

ydaE yjgB

dps

yjgD

yheD

spmA yngG

ydjP

rodA

nhaX

purA pbuG

yodQ yjdH yhjR

yjzE

yxkO gtaB ytaB

yitT glgA

ysxE prkA ycgG

yugU

ydaF

ycdG

ydbD

cotE

dacB

yngE

mbl glgD yodR

yoxB

ispF

ypuB ydaS

racX

ypqA ispG

glgP

spmB

yfkJ

yerD gerM

spoVID

yabR

spoIIM

cdd

csbD

spoVR

yqxA

yngF glgB

yjaV asnO

gabD

ycdF yngJ

yteV nucB

yodS

yocL spoVMspoIVA

ytkL

nadE

yhdN

yybI yhbH sqhC ylbJ spoIVFBykvUglgC yhxC

yuzC

yqeZ

yqfB

opuE

ydaD csbC

yitD

ytxC

ydbT

yteJ

yozO pbpE

yaaI

yfkH ydaG

yflT bmrU aldY

yraA

yoaG yqfA sppA xpaC

katE

ykgB

ydcA

mreBH rsgI ybfQ

ydbS

yycD

yuaI yvrE

yckC

yvlB ythQ yvlD

yvaK

ywjC

yfkD

ywzA

yfkI

ohrB yfhE

ydaP

yfhK

era

yhcO

yvlA

yfhF yjgB

ycbP

yobJ yjoB

yvlC ythP

ydaE

yugU

dps

yjgD

purQ

gabR

yjzE

yfkJ yxkO gtaB

ytaB

yitT yocK ybyB ywmE

ywtG

yuaF yaaNydjO ywrE yoaF fosB

gabD

ycdF csbD yerD ydaF

ycdG

ydbD

sodA

gspA

yjcP hemAT mcpA

motB

cheV degR

pgdS

flhO tlpB mcpBlytD

tlpC fliT

argF carB argC

argJ

carA

argG

argB argH

(c)

argJ

carA

argB argH

(d)

Figure 1. Multi-scale decomposition at fixed-time of gene expression of Bacillus subtilis, as presented in [6]. Nodes size and color are related to their degree. (thanks to Cytoscape viewer [www.cytoscape.org]) (a) Original data. (b) Multi-scale decomposition profiles. (c) Scale component for λ = 2. (d) Scale component for λ = 16.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

25

26

21

M3

12

20

25 11 14

38

18

29

15

M4

19 17

13 10

33

22

39 16 28

31

40

9

41

24

34

36

M1

35

32

27

23

30

M2

2 7

37 1 42 8

3

4 51 5

58 6

56

47 55

46

57

53 44

43

M5

54

48 50 49

45

52 59

60

(a)

(b) 26

GRF time series, T=4 1

21

12

20

0.5

25 11 14

38

18

29

15

19 17

13 10

33

22

39 16 28

31

40

9

41

35

32

24

34

36

27

0

23

30

2 7

37 1 42 8

3

−0.5 4 51 5

58 6

56

47

−1

55

46

57

53 44

43

54

48 50 49

45

−1.5

0

10

20

30

40

50

60

52 59

60

(c)

(d)

Figure 2. Simulated data. (a) Graph G, (colors identify regulons). (b) The regularors of regulons and their connections. (c) An observation  yt of the graphical random field. (d) Profiles of the first four observations  y1 , ...,  y4 of a time series, (the nodes are arranged in an arbitrary order while respecting the grouping around the regulons, as this is showed by the color segments at the bottom of the figure. These segments locate the regulons).

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

26

B. Chalmond

182

189

177

187

181

184

186 180

185

194

175

208

215

198

206

190

178

183

209

192 213

174

188

196 205

214

193

202

203

179

204 212

197

201

200

176(degU)

149

211 191

199(rok)

151

166

161

139

132

145

154

169

150 133

171

144

157 156

146

143

135

158

168

153

165

160

130

142

155 128

37(comK)

137

147

138 164

163 136

131 173 172

195

210

207

148

134

36(codY)

162

170

140 141

129 159

152

108

167

44

94 18

96

26

112 59 125

70

72 100

83

1 49

3

41 30

68

87 19

73

46

104

10 23

101

62

107

9

20

50

80 63

120 15

124 79

64

52

117 86

4

88

11

97 21

65

82

58

51 75

66

109

98

89

38 33

13 126

84

119

57 45

74 92

12

60

85

43

69 116

17 71

40

56

106 48

54

91

118

93

39 81 8

29

31 122

42

78 95

67

76 123

61 90

27

113

22

103

28

111 47

53

16

2

99

114 7

110

115

121

32 25

102

24 34

77

105

14

5 6

55 127

35

Figure 3. A small graph extracted from that of Fig.1. It is organized around 4 regulons.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

27

182

189

182

189

177

187

180

180

178 179

149

163

208

205

137

132

163

150

165

212

133

205

137

132

153

130 157

138

153

156

157

138

195

210

197

154

156

207

134

168

170

141

115

14

61

14

102

113

96 26

102

83

31 71

72

92

36

100

70

12

56 1

78

4

46

19

85

43

79

64

52

41

73

57

11

97

3

23

182

189

149

205

166

161

163

150

165

205

137

147

153

157

138

162

135

153

156

134

168

157

162

135

156

214

142

207

134

168

115

61

2

106

69

67 99

114

26

71

72

92

36

100

84

103

28

12

56 1

78

4

46

60

19

85

43 41

73

57

66

109

79

64

52

30

63

21

45

23

30

175

179

149

166

161

163

150

165

205

137

147

153

156

153

134

214

142

211

170

207

156

134

168

115

61

25

2

102

25

48

99

114

26

71

72

49

95

60

19

85

43 3

4

46

66

109

79

64

52

73

57

11

97 63

21

45

3

23

23

15

5 6 55

182

189

177

187 181 180

181

184

186 174

174

185

175

183

188

176

188

176 178

179

179

149

149

148

148

151 209

192

163

198

169

150

165

205

137

166

161

206

190

163

150

165

194 208 206

205

137

132

147

190

196

169

203

198

136

131 173

202

215

160

139 196

132

213

171 208

215

160 136

131

209

192 194

213

171 166

139

202

203

147

172

212

133

172

204

199

153 158

200 157

138

153

156

158

201 200

157

138

195

210

197

193

130

145

162

135

143

162

135

156

195

210

197

143

164

204

199

146

144

201

193

212

133

154

146

144

130

145

62

107

127

(c) t=9

178

154

23

9

104

35

175

173

79

10

117 15

124

20

50

101 120

177

184

186 185

161

63

21 65

86

187

183

151

58

4

64

52

11 80

45

182

189

66

109

97

82

(b) t=8

180

73

57

6 55

127 35

(a) t=7

19

119 41

30 5

117 86

62

107

120 82

60 85

43 3

46

88

104

10

51 75

33

87

49

95

9

89 38 12

1

20

50

101

65

6 55

127 35

79

64

52

11 63

21

45

5

117 15

66

109

97 80

30

62

107

120 86

19

73

57

126

84

122

56

98

74 92

42

78

119 41

104

10

101

65 82

60 85

20

50

80

30

49

95

9

43

119 41

71

103

28

124

13

17

36

125

4

46

88

69 116

31

22

70

51 75

33

87

106 48 54

91

118

68

83

100

89 38 12

56 1

26

72 59

58

122 42 103

28

124

78 88

39 81 8

29

90

27

112

7 126

84

99

114

98

13

74 92

36

125 70

51 75

33

87

71 22

76 123

61

40 113 67

17

72

58

38 12

56 1

78

116

31

47 53

121 32 2

102 96

54

91

118

68

83

100

89

42 103

28

26

115

93

77

25

48 69

112 59

126

84

122

125 70

92

36

100

7

74

22

59

99

114

98

13

17

112

7

116

31

111 24

34

8

29

90

27

191

110

44

94 18

105 14 106

113

108

16 37

81

67

54

91

118

68

83

2

102 96

69

207

167

152

39

123 61

40

211

170

141

76

47 53

121 32

8

29

90

27

214

142

110

115

93

14

67

195

210

197

140

111 24

34 77

105

81 106

113

96

134

168 155

191

159 108

16 44

94 18 39

123

53

121 40

200 162 156

129

37 76

47

93 32

201

193

157 135

128

111 24

34 77

211 207

167

152

110

44

94 18

105

214 170

141 159

108

16 37

14

153 158

138 143

140

128 129

167

152

203 204

199

130

145 195

210

197

142

155

191

141 159

202

212 146

164

140

128 129

205

133 144

201 200

162

135

143 164

155

208 206

190

196 165 137

154

193

157

138

195

210

197

130

145

162

135

198

150

172

204

199

158

200 157

138 164

194

215

160 136

169 132

146

144

201

193

158 130

143

212

133

154

163

131 203

147

172

204

199

146

168

166

161 139

173

202

147 212

133 144

209

192

208 206

190

196

169 132

213

171

198

136

131 173 203

215

160

139 196

202

194

213

171 208 206

190

205

148

151 209

192 194

215 198

165 137

149

148

151 209

192 213

160 136

150

172

188

176

179

171 163

169

181 174

178

149

166

131

177

184

185

183

188

176

148

161

182

189

186

179

132

5 6 55 127

187

180

178

139

62

107 117

15 35

181

183

188

176

23

9

104

101 120

86

175

178

79

20

50 10

63

21 65

82

174

185

175

183

124

64

52

11 80

45

5

177

184

186 180

66

109

97

6 55

127

187 181 174

4

46

119

182

189

177

184

19

73

57

30

62

117

75 88

60 85

41

35

182

186 185

151

23 107

120 15

58 51

33

87

49 43 3

104

101

65

86

12

56 1

95

9

89

20

50 10

63

21

45

82

55 127

187

173

79

64

52

11 80

35

189

66

109

98

126

84

38

103

28

124

78

97

6

117 15

19

73

57

5 107

120 86

41

62

101

65 82

4

46

119

3 104

10

36 42

70

13

74 92

122

125

88 60 85

20

50

80

75

106

69 116 17 71

100

58 51

33

87

49

95

9

43 11

97

12

56 1

78

119

3

103

28

124

88

49

95

75

84

38

48 54

91

31

22

89

42

39 81 8

29

118

68

83

72 59

126

122

125 70

51

33

87

36

100

58

38

26

112

7

74 92

76 123

61

90

27

113 67

99

114

98

13

17 71

22

89

42

69 116

47 53

121 40 2

102 96

54

31

115

93 32

25

48

91

118

68

83

72 59

126

122

125 70

26

112

7

74

22

59

99

114

98

13

17

112

7

116

31

111 24

34 77

106 8

29

90

27

113

191

110

44

94 18

105 14

67

54

91

118

68

83

2

108

16 37

81

40 25 102 96

207

167

152

39

123 61

211

170

141

76

47 53

121 32

48

214

142 140

110

115

93

14

8

29

90

27

113

134

168

111 24

34 77

105

81

40 25 102 96

195

210

197

155

191

159 108

16 44

94 18 39

123

53

121

162 156

129

37 76

47

93 32

157 135

128

111 24

34 77

207

167

152

110

44

94 18

105

211

170

141 159

108

16 37

14

200

138

140

128 129

167

152

201

193

143

214

142

155

191

141 159

153

164

211

170

203 204

199

158

195

210

197

143

140

128 129

202

212 146

130

145

164

155

205

133 144

201 200

138

195

210

197

143 164

208 206

190

196 165 137

154

193

130

145

198

150

172

204

199

158

200

194

215

160 136

169 132

146

144

201

193

158

212

133

154

163

131 203

147

172

204

199

146

130

180

166

161 139

173

202

147 212

133 144

213

171 208 206

190

196

169 132

209

192 194

198

136

131 203

215

160

139

173

202

148

151 209

192 213

171 208 206

190

196 165 137

149

148

151 194

215 198

150

172

145

179

209

192 213

160 136

169

154

175

188

176 178

179

171 163

131

181 174

185

183

188

176

149

166

161

177

184

186 180

183

148

151

132

182

189 187 181

179

139

55 127

35

178

173

5 6

117 15

175

188

176

62

107

120

174

185

178

104

10 23

101

65

86

9

20

50

80 63

21

45

177

184

186 180 175

183

11

82

187 181 174

124 79

64

52

97

30 5

127

182

189

177

184

186 185

66

109

35

187

180

73

57

6 55

15

19

119 41

62

117 86

4

46

88

104

10 23 107

120 82

55 127

51 75

33

87 60

85

3

101

65

6

117

35

58

38 12

56

20

50

80 63

21

45

5 107

120 15

11

97

30

62

101

65

86

73

57

104

10

63

21

45

82

41

98

126

84 89

1 49

43

119

20

50

80

30

85

43

119

3

92

36

103

95

9

74

122

28

124 79

64

69 13

17 71

42

70

4

46

52

48

116

31

100

78 66

109

83

54

22

59 125

88 19

39 81 8

29

91

118

68

72

58 51

75

33

87 60

76 123

61

106

113

26

112

126

84

38 12

56 49

95

9

99

7

89

1

78 66

109

92

36

103

28

124

88 60

111 47

90

27 67

98

13

74

122 42

70

51 75

33

87

49

102

114

17 71

100 125

38

103

28

95

31

22

59 58

89

42

83

2

96

69 116

53

121 40

54

91

118

68

72

126

84

122

125

26

112

7

74

22

59

98

13

99

114

17

112

7

116

110

115

93 32

25

48

67

54

91

118

68

113

96

69

67 99

114

24 34

77

14 106

8

29

90

27

108

16 44

94 18

105

81

40 2

191

167

152

37

61

32 25

48

207

141

129 159

8

29

90

27

170

39

123

53

121

211

140

76

47

93

106

40 2

195

210

197 214

155

191

110

115

34 77

105

81

32 25

162 156

142

111 24

18 39

123

53

121

44

94 76

47

93

108

16 37 111

24 34

77

207

141

110

44

94 18

105

157 135

134

168

167

152

159 108

16 37

201 200

128

129

167

152

159

211

140

128

129

203

193

143

214

142

155

191

140

128

153

138

195

210

197

204

146

144

164

211

170

155

202

199

130

145

162

135

143

214

142

205 212

158

200

164 134

168

208 206

190

196 165 137 133

201

193

130

145

162

135

143

194

198

150

132 172

204

199

158

200

164

145

203

215

160 136

169

146

144

201

193

158

154

202

212

133

154

146

144

163

131

147

172

204

199

209

192

166

161 139

173

147

172

213

171 208 206

190

196

169

203

198

136

131

202

194

215

160

139

173

147

148

151

166

161

206

190

196 165

213

171

198

150

209

192 194

215

160 136

169

149

148

151 209

192 213

171 166

131

145

179

149

148

161

188

176

178

139

154

175

183

188

176

179

151

174

185

175

183

178

173

181

184

186 174

185

175

188

176

177

187 181

184

186 174

185

182

189

177

187 181

184

186 180

183

164 134

168

214

142

211

170

155

207

128

115

77

105

61

25

2

99

114

25

26

7

71

95

3

46

4

60

19

85

66

109

41

73

57

63

50

3

60

5

117

(d) t=10

46

66

109

4

41

73

57

11

97

50

62 5

107

120 117 15

104

10 23

101

65

86

9

20

80 63

21

45

82

124 79

64

52

119

6 55

127 35

19

85

30

62

107

120 15

58 51

75

33

87

88

104

10 23

101

65

86

95

89 38 12

56 1

43 11

97 21

45

82

9

126

84

122 42

49

98

13

74 92

20

80

30

79

64

52

119

71

103

28

124

78 88

43

70

51 75

33

87

69 116 17

36

125

38 12

56 1 49

106 48 54

91

31

22

89

42 103

78

39 81 8

29

118

68

83

100

58

122

125 28

26

72 59

126

84

76 123

61

112

7

74 92

36

99

114

98

13

17

22 100

70

116

31

72 59

111 47

90

27

113 67

54

91

118

68

83

2

102 96

69

112

110

40

48

67

53

121 32

8

29

90

27

113

115

93

14 106

40 102

24 34

77

105

81

32

96

44

18 39

123

53

121

14

94 76

47

93

108

16 37 111

24 34

191

141

110

44

94 18

207

167

152

159 108

16 37

211

170

129

167

152

214

142 140

128 141

129 159

134

168 155

191

140

6 55

127 35

(e) t=11

Figure 4. Module detection on a time-series of length T = 11. As in [6], this treatment was done without taking into account time dependence. In this case, the detection at time t = 8 is different from that at times t = 7, 9, 10, 11. On the contrary, with the spatio-temporal model, this difference does not exist anymore.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

28

B. Chalmond

15 10 5 0 −5 −10 −15 −20

0

5

10

15

20

25

30

35

40

45

50

Figure 5. Drift simulation. The drift has the effect of swapping the relative positions of the variables Y (1) (green) and Y (2) (red) after the intervention time tI = 25.

180

160

140

120

100

80

60

40

0

5

10

15

20

Figure 6. Time series of the mean value of the three regulons computed on the graphs G in Fig.7.

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint

Spatio-Temporal Graphical Modeling

29

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Figure 7. Time series of images, and overlaid graph G describing the structure of the membranes. This graph is organized around three regulons (red, blue and green).

Spatial Statistics

http://dx.doi.org/10.1016/j.spasta.2013.11.004

Preprint