Ultrasonic computed tomography based on full ... - Dimitri Komatitsch

Aug 9, 2017 - Bone diseases such as osteoporosis affect bone remodeling, which results in ... the most commonly-used predictor of bone fracture risk (Ström et al 2011). ... methods commonly used in USCT of soft tissues (Li et al 2009, Lavarello and ...... tions of wave speed between the muscles, fat, and skin), but these ...
3MB taille 10 téléchargements 316 vues
Institute of Physics and Engineering in Medicine Phys. Med. Biol. 62 (2017) 7011–7035

Physics in Medicine & Biology https://doi.org/10.1088/1361-6560/aa7e5a

Ultrasonic computed tomography based on full-waveform inversion for bone quantitative imaging Simon Bernard , Vadim Monteiller , Dimitri Komatitsch and Philippe Lasaygues Aix Marseille University, CNRS, Centrale Marseille, LMA, Marseille, France E-mail: [email protected] Received 31 March 2017, revised 30 June 2017 Accepted for publication 7 July 2017 Published 9 August 2017 Abstract

We introduce an ultrasonic quantitative imaging method for long bones based on full-waveform inversion. The cost function is defined as the difference in the L2-norm sense between observed data and synthetic results at a given iteration of the iterative inversion process. For simplicity, and in order to reduce the computational cost, we use a two-dimensional acoustic approximation. The inverse problem is solved iteratively based on a quasi-Newton technique called the Limited-memory Broyden–Fletcher–Goldfarb–Shanno method. We show how the technique can be made to work fine for benchmark models consisting of a single cylinder, and then five cylinders, the latter case including significant multiple diffraction effects. We then show pictures obtained for a tibia-fibula bone pair model. Convergence is fast, typically in 15 to 30 iterations in practice in each frequency band used. We discuss the so-called ‘cycle skipping’ effect that can occur in such full waveform inversion techniques and make them remain trapped in a local minimum of the cost function. We illustrate strategies that can be used in practice to avoid this. Future work should include viscoelastic materials rather than acoustic, and real data instead of synthetic data. Keywords: ultrasound, acoustics, imaging, full-waveform inversion, bone (Some figures may appear in colour only in the online journal) 1. Introduction Bone strength is related to bone mass, geometry, architecture, and composition (Hernandez and Keaveny 2006). In particular, recent studies have shown that the thickness and porosity of the cortical bone layer plays a major role in bone resistance to fracture at various sites 1361-6560/17/177011+25$33.00 © 2017 Institute of Physics and Engineering in Medicine Printed in the UK

7011

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

such as the hip, radius and vertebrae (Holzer et al 2009, Zebaze et al 2010, Roux et al 2010, Bala et al 2014). Bone diseases such as osteoporosis affect bone remodeling, which results in increased cortical bone porosity and reduced cortical thickness, and therefore in an increased risk of fracture. Osteoporotic fractures cause substantial suffering to the patient, have a high mortality rate, and are an increasing source of burden in aging societies (Ström et al 2011). Bone mineral density (BMD) assessed from dual-energy-x-ray absorptiometry (DXA) is the most commonly-used predictor of bone fracture risk (Ström et al 2011). DXA-measured BMD correlates to fracture risk, but has a limited predictive power (Schuit et al 2004). This comes from the fact that DXA measures bone mineral quantity but ignores other relevant parameters related to bone geometry and mechanical properties. Higher-resolution methods such as high-resolution peripheral quantitative computed tomography (HR-pQCT) and highresolution magnetic resonance imaging (HR-MRI) can provide detailed images of bone geometry and architecture (Donnelly 2011) but are of limited clinical use because of high costs, irradiation, or long acquisition time. Quantitative ultrasound (QUS) techniques have been developed as an alternative to x-ray methods for bone quality assessment (Laugier 2008, Laugier and Haat 2011). Ultrasounds can indeed provide information on bone mechanical parameters through the measurement of the speed of sound and/or of attenuation, as well as geometrical information. They are nonionizing and low cost, two considerable advantages over x-rays. While QUS methods have originally focused on trabecular bone assessment, they have recently been extended to cortical bone measurements (Raum et al 2014). Most notably, model-based approaches relying on guided ultrasound wave propagation in the cortical shell of long bones have yielded promising results for instance for thickness measurements of the radius bone (Vallet et al 2016). Ultrasonic computed tomography (USCT) has been proposed as a tool to provide quantitative images of the speed of sound in the cross-section of long bones with millimetric resolution (Lasaygues et al 2010), allowing for accurate assessment of cortical thickness. The main difficulty of bone USCT is the large impedance contrast between hard bones and the surrounding soft tissues. In that context, ray-based and first-order Born or Rytov approximation methods commonly used in USCT of soft tissues (Li et  al 2009, Lavarello and Hesford 2013) do not provide quantitative images, except in the case of specific but time-consuming algorithms such as Compound USCT (Lasaygues et al 2005). Iterative approaches based on high-order approximations have been proposed to address the nonlinear inverse problem for high-contrast targets (Lu et al 1996, Guillermin et al 2013). Lasaygues et al (2010) applied one of such iterative methods, called distorted Born diffraction tomography (DBDT), to bonemimicking phantoms and obtained fairly accurate estimates of their geometry, with a relative error on their size of about 5%, as well as of their wave speed, with a relative error of about 10%. Recently, a Born-based inversion method was proposed to image the internal structure of long bones from reflection data acquired in an axial configuration (Zheng et al 2015), but no quantitative assessment of the wave speed was obtained. In another community, geophysicists have long been interested in computing parametric images of the speed of waves inside the Earth from seismic data recorded at the surface, and a whole variety of tomographic approaches have been developed towards that goal (for an overview see e.g. Aki and Richards (1980) and Liu and Gu (2012)). Full waveform inversion (FWI) is an imaging method for heterogeneous media that is based on numerical modeling of wave propagation and minimization of the difference between the full recorded and simulated waveforms. It was mainly developed in the oil industry and in seismology to obtain maps of the celerity of seismic waves. Such methods based on the iterative fitting of complete time series (called seismograms) through optimization of the propagation medium parameters were designed as early as in the 70s and 80s (Bamberger et al 1977, 1982, Tarantola 1984) but until 7012

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

recently were too expensive to be of practical interest. With the drastically-increasing power of computers they are nowadays gaining in popularity; for recent reviews see e.g. Tromp et al (2008), Virieux and Operto (2009), Fichtner (2010) and Liu and Gu (2012). They offer higher resolution and accuracy than methods that use only part of the recorded information (travel time i.e. time-of-flight) or that are based on approximations (Born approximation, ray tracing), at the cost of a higher computational time. The advantages of FWI come at the cost of a very large and ill-posed inverse problem, particularly when high-frequency components are present in the data (Bunks et al 1995, Sirgue and Pratt 2004, Virieux and Operto 2009). Recent studies however indicate that it is possible to regularize and solve this problem for realistic data sets (Prieux et al 2013, Monteiller et al 2015, Wang et al 2016) using an efficient quasi-Newton method instead of a simpler steepest descent algorithm, and a gradual increase of the frequency content of the inverted data to avoid remaining trapped in a local minimum of the cost function (Bunks et al 1995, Sirgue and Pratt 2004). In recent years, FWI methods have also started to be applied in medical ultrasound imaging (Bachmann 2016), particularly in the context of breast tomography, see e.g. Sandhu et al (2015), Wang et al (2015) and Pérez-Liva et al (2017) and references therein, as well as shear wave elastography (Arnal et al 2013). FWI is used in these contexts to improve the image quality offered by time-of-flight methods, in terms of resolution and contrast. Applications in ultrasonic non-destructive testing also exist (e.g. Rao et al (2016)). We hypothesize that the progress accomplished towards high-frequency FWI in seismic imaging could benefit the bone imaging issue as well. Our objective here is therefore to tackle the bone USCT inverse problem in the framework of FWI. We will first present the theoretical aspects of the problem. We will then illustrate the method based on synthetic data sets of increasing complexity, with or without noise, including tomographic imaging of a tibiafibula bone pair using a circular transducer array, which is a challenging but realistic situation involving multiple scattering between the two bones. We will show that an accurate map of the longitudinal wave speed can be obtained in a reasonable number of iterations of the inverse problem. 2. Full waveform inversion method 2.1. Ultrasound tomography of long bones

In the present study, we consider a circular array of transducers surrounding the bones to be imaged, at a peripheral site such as the forearm or the leg. The imaging plan (the plan of the array) is orthogonal to the long axis of the bones (i.e. we do cross-sectional imaging) and we therefore consider a 2D problem. This is a classical hypothesis for ultrasound tomography of bone (Lasaygues et al 2010) and breast (Duric et al 2007), which in practice relies on the use of cylindrically-focused transducers to limit out-of-plane diffraction effects. For simplicity and illustration purposes in this first study, and also to reduce the computational cost, we furthermore consider the media as acoustic and isotropic, including bones and surrounding tissues. This implies that we neglect the effect of attenuation and shear wave propagation inside the bones, keeping in mind that in reality shear waves could occur due to mode conversion at the interface between hard bones and fluid-like surrounding tissues. Indeed, it has recently been shown (figures 12 and 13 of Wang et al (2015)) that when total variation regularization is used the effect of attenuation on the inversion results is weak. The isotropy hypothesis is well justified by the fact that the imaging plane is orthogonal to the bone axis. Indeed, cortical bone is generally considered as a transversely-isotropic medium with a principal axis of symmetry 7013

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

oriented along the longitudinal axis of the diaphysis. This assumption has been confirmed in the literature by measurements of compression and shear wave velocities in various directions at the femur (Ashman et al 1984, Lasaygues and Pithioux 2002, Bernard et al 2013) and tibia (Rho 1996). 2.2. Forward and adjoint wave propagation problems

The acoustic pressure wave field p(x, t) is given by the following equation of state, which is the first-order elastodynamics system for a fluid medium: ρ(x)∂t vx (x, t) = ∂x p(x, t) ρ(x)∂t vy (x, t) = ∂y p(x, t) 1 ∂t p(x, t) = ∂x vx (x, t) + ∂y vy (x, t) + fs (t) , κ(x)

(1)

where vx and vy are the components of the particle velocity vector, fs (t) is a point source term, and κ(x) and ρ(x) are the spatially-varying bulk modulus and mass density, respectively. In the remainder of the paper, the spatial dependence of κ and ρ is implicit. In more compact form the previous system can be written C(ρ, κ)∂t u = A∂x u + B∂y u + Fs ,

(2)

T

T,

Fs = (0, 0, fs ) ,   ρ 0 0 C(ρ, κ) = 0 ρ 0  ,

where u = (vx , vy , p)

0

0



 0 0 1 A = 0 0 0 , 1 0 0

1 κ

and



 0 0 0 B = 0 0 1 . 0 1 0

(3)

We consider that the medium is initially at rest, i.e. u(x, t = 0) = 0 . The circular array contains S sources and R receiving elements (which may in practice be the same transducers). All emitters are successively excited, while the acoustic pressure pobs (t) is recorded at all receiving positions. A complete acquisition is therefore composed of S × R time series. Full waveform inversion means that one considers these time series (possibly filtered) as the basic observables that one wants to fit, instead of selecting a subset of measurements only in each time series (e.g. the time of flight of the first arrival). One thus searches for the model that minimizes the mean squared-difference between observed and synthetic time series of acoustic pressure: S X S Z T X 1 2 χ (m) = kpr,s (t; m) − pobs (4) r,s (t)k dt. 2 0 s=1 r=1

This functional quantifies the L2 difference between the observed waveforms pobs r,s (t) at receivers xr, r = 1, ..., R produced by sources at xs, s = 1, ..., S, and the corresponding synthetic time series pr,s (t; m) computed in model m = (ρ, κ). The goal is therefore to find a model of the medium that can explain the recorded signals over a time period T, which will include through-transmitted waves as well as (multiply) reflected and scattered waves. The L2 norm is the optimal one for noise-free synthetic data or in the presence of additive Gaussian noise. 7014

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

In the vicinity of m , this misfit function (4) can be expanded into a Taylor series: χ(m + δm) ≈ χ(m) + g(m) · δm + δmT · H(m) · δm ,

(5)

where g(m) is the gradient of the waveform misfit function g(m) = ∂χ(m) ∂m and H(m) is 2 χ(m) . The nearest minimum of χ in (5) with respect to the model the Hessian H(m) = ∂ ∂m 2 −1 perturbation δm is reached for δm = −H · g . The local minimum of (4) is thus given by perturbing the model in the direction of the gradient preconditioned by the inverse Hessian. A direct method to compute the gradient is to take the derivative of (4) with respect to model parameters: S X R Z T X  ∂χ(m) ∂pr,s (t; m)  =− · pr,s (t; m) − pobs (6) r,s (t) dt . ∂m ∂m 0 s=1 r=1

This equation can be reformulated as the matrix-vector product g = −J∗ · δd, where J∗ is the adjoint of the Jacobian matrix of the forward problem that contains the Fréchet derivatives of the data with respect to model parameters, and δd is the vector that contains the data residuals. The determination of J would require computing the Fréchet derivatives for each time step in the time window considered and for all the source-station pairs, which is completely prohibitive on current computers (and this will not change any time soon; however it may of course change one day). It is however, and fortunately, possible to obtain this gradient without computing the Jacobian matrix explicitly. The idea is to resort to the adjoint state, which corresponds to the wave field emitted and back-propagated from the receivers; this idea was introduced in nonlinear optimization by Chavent (1974) and Bamberger et  al (1977). The Lagrangian functional is the cost function augmented by the following constraint given by the equation of state (Plessix 2006): S R Z T X X 2  L(u, ua , ρ, κ) = { dt pr,s (t) − pobs r,s (t) s=1



Z

T

0

Z

r=1

0

ua (x, t) [C(ρ, κ)∂t u − A∂x u − B∂y u − Fs ]}. V

(7)

In this expression, ua is the Lagrange multiplier that needs to be found by zeroing the partial derivative of the Lagrangian functional (7) with respect to the wave field u : S R Z T X X   ∇u L · δu = { pr,s (t) − pobs r,s (t) δp dt s=1



Z

T 0

Z

r=1

0

ua (x, t) · [C(ρ, κ)∂t δu − A∂x δu − B∂y δu]} = 0 . V

(8)

Using an integration by parts for both the spatial and temporal variables one gets S R Z T X X   ∇u L · δu = { pr,s (t) − pobs r,s (t) δp dt s=1

+ −

Z

0

Z

V

r=1 TZ

0

δu · [C(ρ, κ)∂t ua − A∂x ua − B∂y ua ]

V T

[ua (x, t) · δu]0 } = 0 .

7015

(9)

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

Since the previous equation  is verified for any u + δu that satisfies the equation  of state, for any source s one gets the adjoint state equation  to compute the adjoint wave field ua C(ρ, κ)∂t ua = A∂x ua + B∂y ua +

R X 

pr,s (t) − pobs r,s (t)

r=1



(10)

ua (x, T) = 0 .

The gradient of the misfit function can then be found by zeroing the partial derivative of the Lagrangian function (7) but with respect to the model parameters this time, Z TZ S X ∇(ρ,κ) L · (δρ, δκ) = { ua (x, t)∂(ρ,κ) C(ρ, κ)∂t u · (δρ, δκ)} = 0, (11) s=1

0

V

which gives the expression of the gradient kernels for ρ and κ in terms of direct u and the adjoint ua field, S Z T X ∇ρ χ(ρ, κ) = vax (x; t)∂t vx (x; t) + vay (x; t)∂t vy (x; t) s=1

∇κ χ(ρ, κ) =

0

S Z T X s=1

0



pa (x, t)∂t p(x; t) . κ(x)2

(12)

Using the chain rule these gradients can be expressed in the (ρ, vp ) parametrization that we will use in this work: ∇ρ χ(ρ, vp ) = ∇κ χ(ρ, κ)v2p + ∇ρ χ(ρ, κ) ∇vp χ(ρ, vp ) = 2∇κ χ(ρ, κ)ρvp .

(13)

As seen from the above equations, the principle of the adjoint-state method is thus to correlate two wave fields: the forward field that propagates from the sources to the receivers, and the adjoint field that propagates from the receivers backwards in time. Computing the gradient of the misfit function therefore requires performing only two simulations of the wave propagation problem per source.

2.3. Discrete approximation of the forward and adjoint problems

For their numerical implementation, the above equations need to be discretized both in time and space. We resort to a velocity-stress finite difference approximation of the wave equation, with explicit conditionally-stable time stepping in a staggered 2D Cartesian grid (Levander 1988). However it is important to note that other numerical approximations could be used, for instance the spectral-element method (Cristini and Komatitsch 2012). The time and space coordinates are discretized as follows: t = l∆t or t = (l ± 1/2)∆t , x = mh or x = (m ± 1/2)h and y = nh or y = (n ± 1/2)h , with ∆t and h the time and space steps, respectively, and l = 0, ..., L − 1, m = 0, ..., M − 1, and n = 0, ..., N − 1 the time and space indices. A discrete version of the elastodynamics system (1), with fourth-order accuracy in space and second-order accuracy in time, is then (Levander 1988):

7016

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

1 ∆t 4 1 ρ(m, n)D2t vx (m, n, l − ) = D p(m + , n, l) 2 h x 2 1 2 1 1 ∆t 4 1 1 1 D p(m + , n, l) ρ(m + , n + )Dt vy (m + , n + , l − ) = 2 2 2 2 2 h y 2 ∆t 4 1 1 1 2 4 [D vx (m, n, l + + Dy vy (m, n, l + ] + f (m, n, l), Dt p(x, t) = h x 2 2 κ(m + 12 , n)

(14)

where D2t and D4x and D4y are discrete derivative operators, as defined in Levander (1988). The isotropic spatial grid step h satisfies h ≈ λmin /10, where λmin is the smallest (expected) wavelength of the problem, obtained for the highest frequency fmax at which the source has significant energy. It is worth mentioning that a grid step h ≈ λmin /5 would be sufficient with regard to the accuracy of the numerical scheme, but λmin /10 leads to smoother and less pixelized images. The time step dt satisfies the Courant-Friedrichs-Lewy (CFL) stability condition, i.e. ∆t = CFL × min(h/vp ), with CFL ≈ 0.495. Perfectly matched absorbing Layers (PML) (Komatitsch and Martin 2007) are placed at the boundaries of the domain to absorb outgoing waves. Acoustic sources and receivers can be placed at locations that do not correspond to grid points if needed (Hicks 2002). In this discretized framework the cost function involves a sum over the time steps instead of a time integral: χ (m) =

S X R X L−1 X 1 s=1 r=1 l=0

2

2 kpr,s (l∆t; m) − pobs r,s (l∆t)k ∆t,

(15)

where the vector m contains the values of the velocity and mass density at the spatial grid points. 2.4. Minimization algorithm

Given the ability to compute the gradient g(m), a straightforward way to minimize the misfit function (4) is the steepest descent method: at step k, the current model mk is perturbed in a direction opposite to the gradient, i.e mk+1 = mk − α · g(mk ). This method is however known to converge very slowly, and is therefore not recommended for large problems in which the cost to evaluate the misfit function and its gradient is important. The Newton method, on the other hand, which uses the inverse Hessian to compute the search direction, i.e. mk+1 = mk − H−1 · g(mk ), has a faster convergence rate, but requires a huge amount of computing and storage space. We therefore resort to a quasi-Newton technique called the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) with so-called Wolfe stopping conditions (Wolfe 1969, Fletcher 1987, Nocedal and Wright 2006). This method iteratively constructs an approximation of the Hessian matrix using only the gradient and misfit function evaluations at previous steps, without explicitly storing and inverting H(m). It has recently been shown to be far more efficient in terms of convergence than steepest descent for FWI (Monteiller et al 2015). The reader is referred to (Nocedal and Wright 2006) for a detailed presentation of the L-BFGS method. 2.5. Regularization

The presence of noise is classically handled through regularization of the cost function. Tikhonov (e.g. Monteiller et al (2015) and references therein) or total variation (Epanomeritakis et al 2008, Li et al 2009, Zhang et al 2012, Castellanos Lopez 2014, Wang et al 2015, Matthews 7017

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

et  al 2017) regularization terms can be added to the functional. Tikhonov regularization is more suited for media with smooth gradients, while total variation is more suited for media with sharp contrasts or interfaces, as it promotes sparsity of the image gradient (i.e. it favors piecewise-constant solutions). A regularization term TV(m) is added to the cost function (15) with a weight λ, i.e. χTV (m) =

S X R X L−1 X 1 s=1 r=1 l=0

2

2 kpr,s (l∆t; m) − pobs r,s (l∆t)k ∆t + λTV × TV(m).

The TV regularization term is defined as r M−1 X N−1 X 1 TV(m) = [([vp ]i+1,j − [vp ]i,j )2 + ([vp ]i,j+1 − [vp ]i,j )2 ] + εTV h2 i=2 j=2 r 1 + [([ρ]i+1,j − [ρ]i,j )2 + ([ρ]i,j+1 − [ρ]i,j )2 ] + εTV , h2

(16)

(17)

where the wave velocity vp and mass density ρ parameters are organized here as 2D matrices according to the discretization grid, and where εTV is a small term added to avoid having some divisions by zero in some areas when calculating the gradient of the regularization term (Zhang et al 2012). Its exact value has little influence on the reconstructed images as long as it is small. We used εTV = 10−8 . We selected the weight λTV of the regularization term after testing several trial values (see section 3.4). 3. Numerical examples All the examples presented below use a 20 cm × 20 cm physical domain. The N sources and M receivers are placed on a 18 cm diameter circular array with even angular sampling. The background velocity and mass density are always set to 1500 m · s−1 and 1000 kg · m−3 , respectively. Wave propagation is computed over a time interval equal to twice the diagonal of the domain divided by the background velocity. The source waveform is a second derivative of a Gaussian (a so-called ‘Ricker’ wavelet) with dominant frequency fc:   2 2 2 R(t) = 1 − 2π 2 fc2 (t − t0 )2 e−π fc (t−t0 ) (18) This function and its amplitude spectrum are represented in figure 1 for fc = 200 kHz. The amplitude spectrum becomes small above fmax ≈ 2.5fc . This frequency fmax , and the associated wavelength in the water background, are thus used for the computation of the grid step h. 3.1. Simple case: a single homogeneous cylinder

In this first numerical example we consider a 4 cm diameter disk with a compressional wave speed of 1800 m · s−1 and a density of 1000 kg · m−3 , located in the center of the antenna. We use 128 receivers and 8 point sources to produce synthetic pressure time series. The source time function is a Ricker wavelet (equation (18)) with a dominant frequency of 100 kHz. The grid step is h = 0.6 mm and the 20 × 20 cm grid is therefore discretized with 377 × 377 nodes, including 22 nodes in the absorbing layers on each side of the domain. From the CFL condition, the time step ∆t is 0.16 µs and the solution is computed for about 400 µs (2578 time steps). 7018

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

Figure 1. Ricker source time function (second derivative of a Gaussian) (a) given by equation  (18) and corresponding amplitude spectrum (b), for a dominant frequency fc = 200 kHz.

We use this simple case to test the effects of the acquisition geometry, of the frequency content of the wave, and of the number of sources/receivers of the antenna on the performance of FWI, as described in the next three paragraphs. For now, the synthetic data that we create are free of noise, and thus regularization based on total variation is turned off. 3.1.1. Effect of the acquisition geometry. Figure 2 shows different acquisition setups that we use in order to test the sensitivity of the FWI solution to the acquisition geometry. We compare the results obtained using (1) the full acquisition (i.e. 8 × 128 signals), (2) transmitted signals only, using receivers located within a 150°–210° angle from each source, (3) reflection data only (−20° to +20°) and (4) refraction data only (70°–110°). The results are presented in figure 3. It can be observed that the reconstructed velocity map strongly depends on the acquisition geometry. With a full acquisition the model displayed in Panel (a) is almost perfectly reconstructed. With only large scattering angles (Panel (b), transmission geometry) a cylinder is obtained, but the velocity is under-evaluated and the edges are smooth. On the contrary, with small scattering angles (Panel (c), reflection geometry) only sharp variations in velocity are obtained at the position of the interfaces. The refraction geometry, in Panel (d), provides intermediate results between reflection and transmission. This experiment is an illustration of the well-known formula in seismic FWI that links the observation angles to spatial frequency content of the reconstructed image (Virieux and Operto 2009):   4πf θ |k| = cos , (19) c 2

where k is a wave number, f is the frequency of the incoming wave, and θ is the scattering angle. To be able to retrieve all the spatial wavelengths of the velocity model it is thus mandatory to sample the medium with the largest possible angle span. With transmission data (angles close to 180°), only the slow spatial variations of the model can be retrieved, and we observe a smoothed-out version of the cylinder. On the contrary, with reflection data (angles close to 0°), only the sharp variations in velocity, corresponding to high frequency content, are retrieved. These results illustrate the importance of having a circular sampling of the wave field. It would not be possible to reconstruct an accurate and high-resolution image with transmission or reflection data only.

7019

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

Figure 2. The four configurations used for data acquisition. In all cases we consider

8 point sources and 128 receivers evenly spaced on a circle of radius 9 cm. (a) Full acquisition: each receiver records waves from each source. (b) Transmission acquisition: only the receivers located in front of each source (angles 150°–210°), highlighted in blue for a source at the bottom, are used. (c) Reflection acquisition: only receivers located on the same side of each source record pressure (−20° to +20°). (d) Refraction acquisition: only the receivers located on the side of each source record pressure. 3.1.2. Cycle skipping. The other well-known issue in seismic FWI is the so-called ‘cycle

skipping’ effect. Due to the strong nonlinearity of the cost function, and since one uses a local descent direction algorithm, the choice of the initial model is crucial to avoid being trapped in a local minimum of the functional. Most notably, the waveform computed for the initial model must not have a phase shift larger than half a dominant period T/2 compared to the data waveforms, as illustrated e.g. in Virieux and Operto (2009). In our previous example, considering a homogeneous water medium as the initial guess, the direct arrival delay is about 4 ms between the actual velocity model and the homogeneous model, for a receiver situated in front of the source. This implies that all energy beyond 125 kHz is affected by cycle skipping. A heuristic solution to this problem is to perform an inversion with low-pass filtered signals, to ensure that no cycle skipping occurs, and to then 7020

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

Figure 3. FWI results for the four acquisition setups described in figure 2. The disk is correctly reconstructed only for the full acquisition setup (a). The transmission setup (b) under-estimates the wave velocities inside the cylinder and leads to smooth edges. The reflection (c) and refraction (d) setups only retrieve the edges of the disk. All models except the one from the full setup exhibit significant artefacts.

use the solution obtained as the initial guess for a new inversion with a slightly increased frequency content (Bunks et  al 1995, Pratt et  al 1998, Sirgue and Pratt 2004, Virieux and Operto 2009). Little by little, all the available frequency content of the wave is included in the inversion. One can see this effect in figure 4, which displays the result of FWI for different starting cutoff frequencies. When the algorithm converges in the initial (smaller) frequency range, the cutoff frequency is increased by steps of 100 kHz, until the maximum frequency of 500 kHz is reached. When using all the frequencies simultaneously, a lot of energy is present beyond 125 kHz and the inversion is trapped in a local minimum of the cost function, as seen in Panel (a). The same phenomenon occurs when starting with cutoff frequencies of 150 and 200 kHz (Panels (b) and (c)). Only when we begin with a low frequency of 100 kHz are we able to converge towards the right model (Panel (d)). This might be an important limitation of FWI in 7021

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

Figure 4. FWI results for different filtering strategies. Result (a) was obtained by inverting directly the unfiltered waveforms. Results (b)–(d) were obtained by starting the inversion with low-pass filtered waveforms with cutoff frequencies of 200, 150 and 100 kHz, respectively. FWI failed in cases (a)–(c) because of cycle skipping, but is successful in case (d) as predicted by the theory.

practice, as frequencies low enough to avoid cycle skipping may not be available in the experimental signals with a sufficient signal-to-noise ratio. Potential solutions to this issue include the use of wide-band transducers, or multiple transducers with different dominant frequencies, or initialization of the problem with a ray-based time-of-flight inversion (Wang et al 2015). 3.1.3. Effect of the number of sources and receivers. Now that we have exhibited a simple

case that works reasonably well with 8 sources and 128 receivers, it is interesting to analyze more precisely how a suitable experimental setup for FWI needs to be defined in terms of the numbers of transducers. Such issues have been addressed for instance in Tarantola (1986), and more recently in Simonetti et al (2007). As shown in Simonetti et al (2007) (equation (32)), by studying the grating lobes of a circular antenna, one can obtain a simple relationship between the number of receivers R and the radius r0 that is free of significant grating lobe effects in the middle of the antenna, λ being the wavelength of the incident wave: 7022

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

Figure 5. Gradient of the waveform misfit function (equation (4)) in the case of

a simple point diffractor (positive velocity perturbation) located in the center of the antenna and a 100 kHz incident Ricker wave, for a single source but for an increasing number of receivers going from 8 to 128. Blue and red colors correspond to negative and positive gradient values, respectively. As predicted by the theory, grating lobes are present outside the black circle, whose radius is given by equation (20). For 128 receivers, the area free of grating lobes is almost as large as the antenna.

R=

4πr0 . λ

(20)

Having grating lobes implies that the antenna is going to send energy not only at the focal point but also in these lobes. When computing a gradient for FWI based on the adjoint method, this implies that the technique is going to back-project residuals to the wrong location. Let us thus perform a numerical study in which we compute the gradient of the waveform misfit function for a simple point diffractor in the middle of the antenna, first making the number of receivers increase, and then the number of sources. For simplicity we only compute the gradient at the first iteration of the FWI iterative algorithm, i.e. the gradient starting from a homogeneous medium. Figure 5 shows the gradient for a single source, for an increasing number of receivers. The black circle represents the area given by equation (20) for the dominant frequency of the incident wave. One can see that indeed some significant artefacts are present outside of the circle. It is thus important to honor the criterion given by Equation (20) when selecting the number of receivers to use (or else set the gradient to zero outside of the circle, since it is contaminated by artefacts). Figure 6 shows the gradient for an increasing number of sources, using a sufficiently large number of receivers (equal to 128 here). One can see that some artefacts are also present when the number of sources is small, but that is far less critical than in figure 5. This seems to indicate that the most important factor is to have a sufficiently large 7023

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

Figure 6. Same as figure 5 but when varying the number of sources for a fixed and sufficiently large number of receivers equal to 128. Grating lobes are also present, but far less pronounced than in figure 5, which indicates that FWI can be successful with fewer sources than receivers in this configuration.

number of receivers (evenly distributed in this case), and that in such a case it is not so critical to use a large number of sources. This is an interesting property in the case of FWI performed in the time domain because the simulation of each source requires an independent numerical simulation and thus adds to the total numerical cost, while recording the field at any number of points i.e. of receivers in the numerical mesh is (almost) free. Note that the analysis leading to equation (20) performed in Simonetti et al (2007), as well as the numerical examples illustrated above, considered point sources. In actual experiments, transducers are likely to have some kind of focusing that will further reduce the grating lobes of the antenna. 3.2. Multiple diffractors (five cylinders of different sizes)

In this section we consider a case with multiple (five) cylindrical scatterers with low velocity contrasts (610%). The array is composed of 16 sources and 256 receivers. The dominant frequency of the incident wave is fc = 100 kHz. The spatial and temporal grid steps are the same as in the previous example. Due to the low velocity contrast and to the small size of the cylinders, no cycle skipping occurs for the frequencies contained in the incident wave and the inversion can thus be performed directly on unfiltered waveforms. We use the example to illustrate the benefit of the L-BFGS quasi-Newton algorithm compared to a simple gradient descent approach. The convergence of the two methods is illustrated in figure 7. The value of the cost function for the initial model is about 0.2 and is the same for both methods. After 100 iterations the L-BFGS algorithm has converged, as it can no 7024

S Bernard et al

Phys. Med. Biol. 62 (2017) 7011

Figure 7. Comparison of gradient descent and L-BFGS performance for the example

with five cylindrical scatterers. Panel (a) shows the true velocity map, with low velocity contrast (1500 m · s−1 < vp