Multiple Position Room Response Equalization in Frequency

-means clustering is applied to the room magnitude responses at ..... The basic equation for per- .... A simple and effective solution for the room response proto- ..... [31] A. Papoulis and S. U. Pillai, Probability, Random Variables and Stho-.
2MB taille 2 téléchargements 275 vues
122

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 1, JANUARY 2012

Multiple Position Room Response Equalization in Frequency Domain Alberto Carini, Senior Member, IEEE, Stefania Cecchi, Member, IEEE, Francesco Piazza, Member, IEEE, Ivan Omiciuolo, and Giovanni L. Sicuranza, Life Senior Member, IEEE

Abstract—This paper deals with methods for multiple position room response equalization. Differently from a well-known technique working in the time domain and based on fuzzy -means clustering, in the proposed approach most of the operations are performed in the frequency domain and, in particular, the fuzzy -means clustering is applied to the room magnitude responses at different positions. It is shown that working in the frequency domain allows us to obtain equalization performances at least similar to those of the time domain approach with a strongly reduced computational complexity. In addition, different techniques that can replace the fuzzy -means clustering algorithm in the derivation of the prototype room response equalizer, with additional reduction of the number of operations, are discussed. Finally, the results of three sets of experiments are used to illustrate the performance, the robustness and the quality of the proposed room response equalization method using alternative prototype design strategies applied to different environments. Index Terms—Clustering, fuzzy -means, magnitude equalization, pattern recognition, room response equalization.

I. INTRODUCTION OOM response equalization (or room equalization) has been studied in theory and applied in practice for improving the objective and subjective quality of sound reproduction systems in cinema theaters, home theaters, and car Hi-Fi systems [1]. In a room response equalization system, the room transfer function (RTF) characterizing the path from the sound reproduction system to the listener is compensated with a suitably designed equalizer. Both minimum-phase and mixed-phase room equalizers have been proposed in the literature [2]. Minimum-phase room equalizers acting on the minimum-phase part

R

Manuscript received September 15, 2010; revised December 13, 2010; accepted May 10, 2011. Date of publication June 02, 2011; date of current version November 09, 2011. Parts of this paper have been published in the Proc. IWAENC 2008 (I. Omiciuolo, A. Carini, and G. L. Sicuranza, “Multiple position room response equalization with frequency domain fuzzy C-Means prototype design,” Seattle, WA, Sep. 14-17, 2008), and in Proc. 6th ISPA 2009 (A. Carini, I. Omiciuolo, G. L. Sicuranza, “Multiple Position Room Response Equalization: Frequency Domain Prototype Design Strategies,” Sep. 16-18, 2009, Salzburg, Austria). The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Vesa Välimäki. A. Carini is with DiSBeF, University of Urbino “Carlo Bo,” 61029 Urbino, Italy (e-mail: [email protected]). S. Cecchi and F. Piazza are with A3LAB, DIBET, Università Politecnica delle Marche, 60100 Ancona, Italy. I. Omiciuolo was with DEEI, University of Trieste, 34127 Trieste, Italy. He is now with Prase Engineering s.r.l., 30020 Noventa di Piave (VE), Italy. G. L. Sicuranza is with the University of Trieste, 34127 Trieste, Italy. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TASL.2011.2158420

of the RTF phase response can be used in order to shape the RTF magnitude response. In contrast, in mixed-phase room equalizers the non-minimum-phase part of the RTF phase response can be corrected too. In principle, in mixed-phase room equalizers, some of the room reverberation can be removed [3], but particular care must be taken to avoid “pre-echoes” caused by the errors in the non-causal part of the equalizer. Modal equalizers have also been proposed recently [4], [5]: the behavior of acoustic resonances at low frequencies (below 200 Hz) is controlled in order to reduce the decay time of the associated modes [4]. Room equalizers can be divided in two categories: single position and multiple position. In single position room equalizers, the equalization filter is designed on the basis of a measurement of the room impulse response in a single location [6]. These equalizers can achieve the room equalization only in a reduced zone of the size of a fraction of the acoustic wavelength, around the measurement point. However, the room impulse response varies significantly with the position in the room [7] and also with time [3]. Indeed, the room should be considered as a “weakly nonstationary” system. For this reason, the use of complex spectral smoothing and short equalization filters was proposed in [3] to contrast the audible distortions caused by equalization errors due to variations of the impulse responses. In multiple position room equalizers, the equalization filter is designed on the basis of measurements of the impulse response at different locations within the enclosure. These equalizers are able to suitably enlarge the equalized zone. Different techniques for multiple position room equalization have been proposed in the literature [1], [8]–[13]. A least-square approach for inverting mixed-phase room responses was presented by Mourjopoulos, Clarkson, and Hammond [8]. Miyoshi and Kaneda have proposed an exact multiple position equalization technique based on MINT, the multiple-input/multiple-output inverse theorem [9]. Using this approach, exact equalization can be achieved when the number of sound sources (loudspeakers) is larger than the number of equalization points, provided that the room responses do not have common zeros. An adaptive multi-channel equalizer was proposed by Elliot and Nelson in [10], where the sum of the squared errors between the equalized responses and a delayed version of the signal is adaptively minimized. A multiple-point equalization filter using the common acoustical poles of RTFs was proposed by Haneda, Makino, and Kaneda [11]. A room response equalization system based on a -means with a splitting clustering algorithm applied to all-pole RTF measures was presented by Mourjopoulos in [12]. More recently, wave domain adaptive filters [13] for the equalization of mas-

1558-7916/$31.00 © 2011 IEEE

CARINI et al.: MULTIPLE POSITION ROOM RESPONSE EQUALIZATION IN FREQUENCY DOMAIN

123

Fig. 1. Multiple position room response equalization. (a) The time domain method of [1] and (b) the proposed frequency domain method.

sive multichannel sound reproduction systems have been investigated and are presently a topic of active research. This paper deals with the multiple position room response equalization technique based on fuzzy -means clustering and frequency warping introduced by Bharitkar and Kyriakakis in [1] and [14]. Even though the technique of [1] allows only the magnitude equalization of the room response, it is effective and robust against displacement effects. In the approach of [1], given a set of room impulse responses measured at different positions, a fuzzy -means algorithm is applied for clustering the room responses on the basis of their similarity. Then, a prototype impulse response, derived by combining the cluster centroids, is used for designing a low-order equalization filter by means of linear predictive coding (LPC) analysis. Frequency warping [15], [16] of the measured room responses with a psychoacoustically motivated Bark scale [17] is also applied in [1] in order to obtain a better fit from the LPC model to the room response in the low frequency region. The method of [1] was further developed in [18] with the introduction of a weighted fuzzy -means clustering algorithm, which allows the use of different weights for the room impulse response samples. In this paper, first the technique of [1] is built upon performing most of its operations in the frequency domain. Differently from [1], the fuzzy -means clustering is applied to the room magnitude responses at different positions. Working in the frequency domain can avoid many onerous operations performed in the approach of [1], and thus a computationally efficient algorithm for room equalization can be obtained. Secondly, it is shown that the fuzzy -means clustering algorithm can be effectively replaced in the derivation of a prototype response from the measured room responses by different simpler techniques. These techniques naturally arise from the frequency domain implementation of the room response equalization system and offer a much lower computational complexity than the fuzzy -means clustering. As far as the equalization performances are concerned, the systems equipped with the proposed prototype design techniques were able to give results at least comparable with those of the original technique in [1]. The paper is organized as follows. Section II provides an overview of the multiple position room response equalization of [1]. Section III describes the proposed approach where most of the operations are performed in the frequency domain. Section IV discusses different techniques for replacing the fuzzy -means clustering. Section V presents the results of three sets of experiments in order to illustrate the performance,

the robustness and the perceived subjective quality of the proposed room response equalization. Finally, Section VI contains some concluding remarks. Throughout the paper, small boldface letters are used to denote vectors and italic letters are used for scalar quantities. II. ROOM RESPONSE EQUALIZATION USING PATTERN RECOGNITION Fig. 1(a) describes the steps of the room equalization technique given in [1]. In the following, a detailed description of the operations performed in each block of Fig. 1(a) is provided. room impulse responses (IRs) of memory length 1) are measured at different positions in the zone we want to equalize. 2) The room responses are frequency warped with an approximate Bark scale. Frequency warping was introduced in [15] and it involves the replacement of every delay element with an all pass filter of order 1: (1) Frequency warping is used in order to improve low-order modeling at low frequencies. Positive values of the warping parameter, , produce higher resolution at low frequencies while producing lower resolution at higher frequencies. By properly choosing the parameter, the frequency resolution can approximate the Bark scale [17]. It should be noted that the frequency warping operation performed in the time domain is a computationally heavy multiplications and operation that requires an order of additions. 3) The minimum-phase part of the warped IRs is extracted. In [19], Bharitkar and Kyriakakis suggested to compute the minimum-phase sequences from cepstrum. The operation requires the computation of two fast Fourier transsamforms (FFTs) and two inverse FFTs (IFFTs) on ples, logarithms, and exponentiations for each of the IRs. The minimum-phase extraction is important for the clustering operation, in order to avoid undesirable time and frequency domain effects, due to incoherent linear combination [1]. 4) Fuzzy -means clustering is applied to the minimum-phase IRs. Clustering is exploited to extract the common patterns of the IRs using centroids. Let us indicate with , for , the warped minimum-phase IRs, with

124

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 1, JANUARY 2012

, for , the centroids, and with the cluster membership functions. Fuzzy -means clustering is performed by iteratively applying the following equations [1]:

(2)

(3)

where , , and . Equations (2) and (3) are iterated until the variation of the objective function (4)

Fig. 2. IIR unwarped filter.

coefficients of the warped prototype filter, and is the same parameter used for warping the IRs [20]. III. PROPOSED FREQUENCY DOMAIN TECHNIQUE

that should be minimized is lower than a small constant . The cluster membership functions are typically initialized with random numbers between 0 and 1 so that for all , and is an integer close to . A final prototype filter of length , , is then constructed with a weighted mean of the cluster centroids as follows:

(5)

5)

6)

7)

8)

The prototype filter represents the room response component at the different positions that should be equalized. The autocorrelation function (ACF) of the IR prototype is computed: it is needed by the Levinson–Durbin algorithm used in LPC analysis. The ACF can be computed in the time domain using time averages, or it can be obtained from the inverse of the squared magnitude response. This second approach is often the most computationally efficient, and it requires the computation of an FFT on samples for estimating the frequency response of the prosamples for estimating totype filter and of an IFFT on the ACF. The Levinson–Durbin algorithm is applied to derive an all-pole LPC model of the prototype filter, with a low-order . The use of a low order equalizer is beneficial not only from the computational complexity point of view, but also for improving the equalizer robustness with respect to displacement effects and slow time variations in the room response [3]. The all-pole LPC model is inverted in order to obtain an FIR equalizer in the warped domain with memory length . The frequency response of the equalizer is unwarped. This operation can be performed by replacing the FIR warped are the equalizer, with the IIR filter of Fig. 2, where

Fig. 1(b) describes the steps of the proposed room equalization technique. In contrast to the approach of [1], most of the operations, including the fuzzy -means clustering, are performed in the frequency domain. In this way, most of the computational burden of the technique in [1] is avoided. The operations performed by the proposed equalizer are the following: room IRs of memory length are measured at different 1) positions in the zone to be equalized. , with 2) The room magnitude responses , are estimated by means of FFTs samples. By discarding the phase information, the of extraction of the minimum phase component of the IRs, which was needed for implementing the fuzzy -means algorithm in time domain, is avoided. 3) Warping is performed in the frequency domain by samequally pling the room magnitude responses. Given spaced frequencies in the warped frequency domain (with ), the corresponding frequencies in the linear frequency domain are determined according to the rule in (1). By sampling the room magnitude responses at these freare quencies, the warped magnitude responses obtained. A fractional octave smoothing of the magnitude responses can optionally be applied before the frequency warping. The fractional octave smoothing allows us to remove notches in the frequency responses that could disturb the LPC modeling operation. The fractional octave smoothing can be performed on the magnitude spectrum using the methodology of [21]. The basic equation for performing the non-uniform magnitude spectrum smoothing is given by of a frequency response

(6) where function and

,

is a zero-phase window is the half-window length which is a

CARINI et al.: MULTIPLE POSITION ROOM RESPONSE EQUALIZATION IN FREQUENCY DOMAIN

125

TABLE I COMPUTATIONAL COMPLEXITY COMPARISON BETWEEN THE METHOD OF [1] AND THE PROPOSED METHOD (I NUMBER OF ITERATIONS OF FUZZY c-MEANS ALGORITHM)

TABLE II COMPUTATIONAL COMPLEXITY COMPARISON BETWEEN THE METHOD OF [1] AND THE PROPOSED METHOD IN TERMS OF MULTIPLICATIONS (FOR M ,c , AND I )

=8 =3

monotonically increasing function of the frequency index . This method simulates a well-known property of the auditory system which presents a poorer frequency resolution at higher frequencies. In this way it is possible to consider a nonuniform resolution which decreases with increasing frequency to obtain a less precise equalization at higher frequency resulting in a broader equalized zone [22]. Moreover, the spectrum smoothing makes the room equalizer more robust toward displacement errors and time variations of the room response [23]. 4) The fuzzy -means clustering is used to extract the common patterns of the room magnitude responses. The algorithm can be applied by iterating (2) and (3), where now (7) for and the computed cluster centroids represent the common patterns of the room magnitude responses. A prototype magnitude response, , can be obtained from the weighted mean of the cluster centroids using (5). The prototype magnitude response represents the room response component to be equalized. 5) The autocorrelation function of the prototype IR is estimated by inverting the squared magnitude spectrum, without the need of estimating the prototype IR itself. The other steps—6, 7, and 8—are identical to those of Section II. With the methodology of this section many heavy operations performed in the approach of [1] can be avoided. Specifically, the frequency warping, which in the time domain is a very computationally intensive operation, is done in the proposed approach by suitably sampling the magnitude responses. In the experiments, this operation was implemented at zero cost by to the desired simply selecting the closest samples of warp frequencies. The sampling operation allows us also to obtain a more compact representation of the magnitude responses, with a reduced number of samples . Moreover, with the proposed approach, the extraction of the minimum-phase component of the IRs is avoided and the computation of the autocor-

= 20

relation of the IR of the prototype filter is done simply by computing the IFFT of a squared magnitude response with length equal to samples. The computational complexity of the fuzzy -means clustering algorithm strongly depends on the number of iterations necessary for the algorithm convergence. In our experiments, the performance of the fuzzy -means algorithm in the frequency domain and in the time domain was similar. Table I provides a comparison of the computational complexity of the method of [1] and of the proposed method in terms of multiplications, divisions, logarithms, complex exponentiations, and square roots. In the proposed method the fracmultional octave smoothing, which costs around tiplications, has not been taken into account. As an example, Table II gives a comparison of computational complexity of the two methods in terms of multiplications for different lengths of the impulse responses. The proposed method allows us to considerably reduce the computational complexity of the equalizer design especially for large values of . This can be very useful in all those situations where the equalizer is designed using a processor with limited computational capabilities, e.g., an embedded processor. IV. ALTERNATIVE PROTOTYPE DESIGN STRATEGIES The objective of the fuzzy -means clustering algorithm is that of “grouping responses having “similar” acoustical structure and subsequently forming a generalized representation that models the variations of responses between positions” [1]. In practice, the fuzzy -means clustering algorithm is used to extract the common trend of the room responses by rejecting the outliers and to form a prototype response that is representative of this common trend. By working in the frequency domain it is easy to devise simple and computationally efficient techniques that are capable to achieve the same goal. In what follows, several techniques that can be used in place of the fuzzy -means clustering algorithm in order to derive the prototype room response are described and commented. A. RMS Method Due to its simplicity, this is a well known and widely used method in room response equalization [1]. The prototype filter

126

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 1, JANUARY 2012

Fig. 3. Loudspeakers and microphones positions in the room. (a) Room 1. (b) Room 2. (c) Room 3.

is computed from the root-mean-square of the measured magnitude responses as follows:

(8) By using a sufficiently large number of responses the averaging operation will represent the common trend of the responses with a reduced influence of the outliers. B. Mean Method The prototype filter is here computed from the mean of the room magnitude responses: (9) Compared with the RMS Method, the mean in (9) is able to reduce the influence of peaks and notches of the room magnitude responses and it is often capable to obtain a better estimation of the common component of the room magnitude responses. C. Min-Max Method The prototype room response is designed in order to minimize the maximum distance from the room responses. The method has a very simple implementation in the frequency domain (10) With this method the envelope of the measured room responses equalized with is centered around the 0-dB line. However, even though the maximum distance from the measured room responses is minimized, the prototype filter in (10) is very sensitive to outliers in the measured room responses. D. Median Method A simple and effective solution for the room response prototype design comes from the concepts of nonlinear signal processing. The common trends of the responses can be extracted from the median room magnitude response defined as (11) where, for each , the are ordered for increasing is odd, the central value is taken, while if values and, if is even the mean of the two central values is taken. Among all

possible design techniques for the prototype filters, this is surely the method least affected by the outliers in the room responses. Indeed, the median filter has the property of rejecting any peak or notch that appears only on a small number of room responses, by keeping only the central response in the set of the measured responses. It should be noted that the computational complexity of all the approaches of this section is much lower than a single iteration of the fuzzy -means algorithm in time or frequency domain. V. EXPERIMENTAL RESULTS In this section, some experimental results that compare the proposed techniques of Section III and the different prototype design strategies of Section IV with the original method of [1] are presented. Furthermore a comparison with the least-square method of [8] and with the adaptive least-square equalizer of [10] is provided. Several tests have been conducted on three different rooms; loudspeakers and microphones positions are reported in Fig. 3(a)–(c), respectively, for each environment, showing room dimensions. For each measurement, the distance of loudspeakers and microphones from the floor has been set to 1.2 m. Measurements have been performed using a professional ASIO sound card (MOTU Traveler) and AKG 417 microphones with an omnidirectional response. A personal computer running NU-Tech platform has been used to manage all I/Os [24]. The loudspeaker was a Genelec 6010A with free field frequency response 74 Hz–18 kHz. The IRs have been derived using a logarithmic sweep signal excitation [25] at 48-kHz sampling frequency. Fig. 4(a)–(c) represent their magnitude responses.1 It should be noted that the first and second room have a comparable reverberation time in the medium-high frequency range (Fig. 5), while at low frequencies the second room has higher reverberation time value. The third room shows another different behavior: the reverberation time is considerably higher than the other environments, even if the rooms’ dimensions are comparable. All these differences are due to the different absorption characteristics of the walls and to the position/composition of the furniture. Two measures have been selected to analyze the equalization results: • the spectral deviation, which gives a measure of the deviation of the magnitude response from a flat one [1], considering each IR before and after the equalization; 1To improve the curves visibility, in the figures all plots of room magnitude responses have been 1/3rd octave smoothed with a rectangular window.

CARINI et al.: MULTIPLE POSITION ROOM RESPONSE EQUALIZATION IN FREQUENCY DOMAIN

127

Fig. 4. Room magnitude responses. (a) Room 1. (b) Room 2. (c) Room 3.

differences between the Euclidean distances in the high- and low-dimensional space. In formulae, the following objective function is minimized

(14)

with

Fig. 5. Reverberation time versus frequency.

• the Sammon map [26], which takes into account all the measured IRs at a time, and it is a useful representation of room responses and of the room response equalization effect. , of a frequency response The spectral deviation, can be expressed as

(12) where

(13) and are the lowest and the highest frequency, respectively, of the equalized band. In the experimental results an ini, calculated with , tial spectral deviation , computed after equalization and a final spectral deviation , where by considering represents the designed equalizer, are provided. A mean spectral deviation measure (MSDM) that represents the mean value of the final spectral deviation measures over the entire set of the considered IRs has also been considered. The Sammon map is a nonlinear projection method that maps multidimensional data onto lower dimensions (e.g., two or three). The main property of the Sammon map is that it retains the geometrical distances between signals in a multidimensional space on the two or three-dimensional space. Given magnitude responses , , of the the measured IRs, the Sammon map algorithm iteratively minimizes, by a gradient descent scheme, the cumulative sum of the

(15) (16) is the number of equally spaced frequencies in the where warped domain and is the dimension of Sammon map space. In the Sammon map the point associated with is represented as . Considering a two dimensional mapping ( ), upon convergence, the points with are configured on a two-dimensional plane such that the relative distances between the different are visually discernible. After equalization, the resulting performance can be determined from the size and shape of the region determined by the equalized frequency responses on the map. A circular shape indicates uniform equalization at all locations [26]. It is worth noting that, in all reported cases, the equalized magnitude responses are processed by subtracting the individual means, computed in the equalization range (100–16000 Hz), to give the zero mean equalized magnitude responses. In this way, under ideal equalization, the magnitude responses are located at the origin of the Sammon map, so that a displacement from the origin indicates a deviation from the ideal flat equalization. Three sets of experiments have been considered: in the first set, the techniques presented above for the prototype definition are evaluated in order to identify the approach achieving the best performance for the specific set of measured IRs taking into account different environments; in the second set, the equalization robustness is evaluated by assessing its effectiveness in positions different from those used for the prototype extraction; in the last experiment, the equalization procedure is evaluated in terms of listening experience taking into account subjective judgments.

128

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 1, JANUARY 2012

Fig. 6. Room 1 magnitude responses after equalization. (a) Method of [1]. (b) Fuzzy c-means in the frequency domain. (c) Mean Method. (d) Median Method. (e) Min-Max Method. (f) RMS Method.

Fig. 7. Room 2 magnitude responses after equalization. (a) Method of [1]. (b) Fuzzy c-means in the frequency domain. (c) Mean Method. (d) Median Method. (e) Min-Max Method. (f) RMS Method.

To do a fair comparison with the method of [1], in the experimental results no spectral smoothing has been introduced before warping in the methods of Sections III and IV. When relevant, some comments about the effect of smoothing are provided in the discussion of the experimental results. In all the experiments the equalization band has been set to 100–16000 Hz in order to avoid the influence of the loudspeaker and microphone characteristics. Further experimental results obtained in different rooms can be found in [27] and [28]. A. Comparison of Prototype Design Strategies Three different rooms have been considered for prototype extraction, taking into account five impulse responses (i.e., IR1, IR2, IR3, IR4, IR5) for each environment.2 Table III reports the 2While 25 IRs have been measured in room 1, only the first five IRs are used in this experiment.

MSDM values averaged over the five IRs for each selected environment, while Figs. 6–8 depict their magnitude spectra after the equalization procedures taking into consideration all the proposed methods and Fig. 9 compares the equalizer magnitude responses in the different conditions. It is clear that all the prototype design approaches discussed in Sections III and IV provide very good performances that are comparable with those of [1]. In particular, considering the frequency fuzzy -means method, the performances are very similar to those in [1] in terms of spectral deviation. However, taking into consideration Figs. 6–9, little differences can be noticed specially at the boundaries of the frequency region. This is probably due to the fact that warping is done in different domains, so the resolution at higher and lower frequencies is slightly different between the time and frequency approach. The Mean Method for the prototype design provides performances which are almost identical to those of the fuzzy -means

CARINI et al.: MULTIPLE POSITION ROOM RESPONSE EQUALIZATION IN FREQUENCY DOMAIN

129

Fig. 8. Room 3 magnitude responses after equalization. (a) Method of [1]. (b) Fuzzy c-means in the frequency domain. (c) Mean Method. (d) Median Method. (e) Min-Max Method. (f) RMS Method.

Fig. 9. Equalizers magnitude responses for the proposed techniques. (a) Room 1. (b) Room 2. (c) Room 3. TABLE III MEAN SPECTRAL DEVIATION MEASURES CONSIDERING DIFFERENT ENVIRONMENTS

TABLE IV MEAN SPECTRAL DEVIATION MEASURES FOR THE EQUALIZED RESPONSES CONSIDERING ROOM1

method in the frequency domain. Moreover, even the performances with the Median, Min-Max, and RMS prototype design are good and comparable to those obtained in [1]. Taking into consideration the magnitude responses after equalization of Figs. 6–8, the differences between the different approaches can be hardly distinguished. For comparison purposes, Table III reports also the MSDM values for the least-square method of [8] and for the adaptive least-square method of [10]. Their performance is worse than that of [1] and of the proposed methods. The graphs of the magnitude responses confirm this result and they have not been included for space limitations.

It should be noted that in all our experiments without spectral-smoothing before warping, the fuzzy -means algorithm in [1] and the proposed approach based on the frequency implementation of the fuzzy -means tend to converge to the mean magnitude of the room responses. This behavior is caused by the great variability of the room responses in all measurements. The same behavior has been observed also in the simulations with synthetic IRs obtained with the image method [29] in similar conditions. With a 1/3rd octave smoothing before warping, the fuzzy -means algorithm does not converge to the mean magnitude but the performances slightly degrade at high frequencies [27].

130

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 1, JANUARY 2012

TABLE V SPECTRAL DEVIATION MEASURES FOR THE EQUALIZED REFERENCE RESPONSE IR13 CONSIDERING ROOM1

TABLE VI DISTANCE FROM THE ORIGIN IN THE SAMMON MAP OF THE REFERENCE RESPONSE IR13 CONSIDERING ROOM1

B. Performance Evaluation and Robustness The robustness of the equalization and the dependence of the equalization performance on the number of IRs and the size of equalized zone have also been investigated. In the experiment, the different prototype design strategies described in Section IV have been assessed considering IRs measured in the first room (but similar results were obtained also in other rooms). The equalization robustness is usually evaluated by assessing the equalization effectiveness in positions different from those considered for the prototype definition. In this experiment, one microphone position has been selected for monitoring the performance variations, i.e., IR13 in Fig. 3(a), which is the central position where the listener is ideally positioned. Four test sessions have been performed in this phase. Different IR sets for prototype extraction were considered, covering an increasing size of the equalized zone from the first test to the last one. • Test 1 The set for prototype extraction is constituted by five IRs positioned along a line [Fig. 3(a)], positions 1, 2, 3, 4, 5). • Test 2 The set for prototype extraction is constituted by ten IRs positioned along two adjacent lines [Fig. 3(a)], positions 1, 2, 3, 4, 5, 6, 7, 8, 9, 10). • Test 3 The set for prototype extraction is constituted by eight IRs positioned in a square [Fig. 3(a)], positions 7, 8, 9, 14, 19, 18, 17, 12). • Test 4 The set for prototype extraction is constituted by sixteen IRs positioned in a square [Fig. 3(a)], positions 1, 2, 3, 4, 5, 10, 15, 20, 25, 24, 23, 22, 21, 16, 11, 6). Table IV reports the MSDM considering the aforementioned tests; the mean value for each test is calculated taking into account all IRs. Table V reports the spectral deviations of the reference response IR13 before and after equalization, while Table VI reports the distances from the origin in the Sammon map of the reference response IR13 before and after the equalization obtained in the four tests. Considering Test 1 and Test 2, with the increment in the number of IRs the spectral deviation on IR13 and the distance from the origin of the points on the Sammon map decrease, confirming the good performance obtained with the Fuzzy -means and Mean approach. These observations are also confirmed by the Sammon Maps: Fig. 10 reports the results

obtained with the time domain fuzzy -means, while Fig. 11 presents the results derived from the frequency domain fuzzy -means (which are practically identical to those obtained with the Mean approach). Taking into account a different geometry, e.g., IRs positioned in a square (Test 3 and Test 4), good performances in terms of both spectral deviation and distance from the origin can also be achieved. By comparing the spectral deviation measures of Table V, the spectral deviation on IR13 increases with the size of the equalization zone and the distance between the measurement microphones. The performances of all methods are very similar. Moreover, in both Figs. 10 and 11, the position of IR13 is very close to the axes origin and the constellation shows a uniform distribution. Furthermore, comparing the different geometries, the best performances are obtained for the square configuration of Test 3, where a low value of spectral deviation is supported by a very small value of distance from the origin in the Sammon map (Table VI). This fact is also clear considering the Sammon maps of Figs. 10 and 11 that report the results for the Fuzzy -means and Mean approach: the reference position IR13 is properly equalized in all tests, but the best results are obtained for Test 3. Fig. 13 shows the magnitude response of the equalizers in each test; it is clear that all the magnitude responses have a similar behavior in the equalization band. C. Listening Tests To assess the audio quality perception of the equalization procedure, a listening test according to the ITU-R BS.1284-1 [30] has been performed. This recommendation is intended as a guide to the general assessment of perceived sound quality. Following the guidelines of [30], expert listeners are preferred to “give a better and a quicker indication of the likely results in the long term.” Therefore, the subjects used in the experiment were 15 expert listeners (13 males and 2 females), with ages ranging from 21 to 35. All listeners had a technical background in acoustics, which means that they were not representative of the normal population. However, in order to avoid bias for the listening test results, the listeners did not know the study and the expected results: they were only asked to evaluate some attributes of several sound tracks in comparison with a reference signal.

CARINI et al.: MULTIPLE POSITION ROOM RESPONSE EQUALIZATION IN FREQUENCY DOMAIN

131

Fig. 10. Sammon map of the IRs of (a) Test 1 after the equalization, (b) Test 2 after the equalization, (c) Test 3 after the equalization, and (d) Test 4 after the equalization, taking into consideration the time domain approach of [1].

Fig. 11. Sammon map of the IRs of (a) Test 1 after the equalization, (b) Test 2 after the equalization, (c) Test 3 after the equalization, and (d) Test 4 after the equalization, taking into consideration the proposed approach.

The test has been conceived as a comparison test and the seven-grade comparison scale reported in Fig. 12 has been used. The listeners were requested to provide a score according to this continuous quality scale with a recommended resolution of 1 decimal place as recommended in [30].

The test was based on paired comparisons with references, taking into consideration a repetition (four times consecutively) of the same program sequence in the following order: • reference sequence without equalization;

132

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 1, JANUARY 2012

TABLE VIII LIST OF SOUND TRACKS USED FOR THE LISTENING TESTS

Fig. 12. Quality scale of [30] used for the listening tests. TABLE VII LIST OF ATTRIBUTES USED FOR THE LISTENING TESTS

• same sequence, equalized with one of the selected equalization techniques; • reference sequence without equalization (repeated); • same sequence, equalized with one of the selected equalization techniques (repeated). This procedure was iterated taking into consideration three equalized signals: a signal equalized using the method proposed in [1], a signal equalized with the proposed approach in the frequency domain using the fuzzy -means technique, and a signal equalized with the proposed approach considering the Mean method. For each subject, a total of fifteen tests has been carried out considering five reference signals (of different music genres) and three paired comparisons between each reference signal and the corresponding equalized signal. As suggested in [30], the stimuli did not exceed the 20 s length, having a length between 15 to 20 s. Since the reference signals were musical items, care was taken in order to guarantee that the phrase did not appear to be interrupted [30]. For each reference signal, the programme sequence was presented in random order, and the listener did not known which equalization methodology was under test. Following the procedure suggested in [30] before the listening test, a training test was presented to the listeners to familiarize them with the test procedure, the test materials and the test environment: it was ensured the possibility to listen to each audio item in all conditions under evaluation. Since the test was aimed at evaluating different equalization techniques, the listeners were instructed to assess some of the attributes suggested by [30], in particular the most relevant attributes for monophonic signals, which are summarized in Table VII. The most important attribute is the basic audio quality based on the main impression that includes all aspects of the sound quality to be assessed. The reference signals were chosen considering different music genres in order to test the room response equalizer using

different spectral contents. Table VIII shows the selected music genres and the sound tracks used as reference signal. High-quality studio listening equipment was used, i.e., one loudspeaker (Genelec 6010A) with a professional sound card MOTU traveler connected to an Intel Centrino 2 Laptop. The test was performed in the second room [Fig. 3(b)], where the loudspeaker was positioned in front of the listener (i.e., at IR3), as also suggested in [30]. For the prototype extraction, the impulse responses of Section V-A were used. The acoustical characteristics of the listening room are discussed in Section V-A. The same listening level has been used for all processed items, setting the SPL of reproduction to 70 dB. As suggested in [30], the subjective data have been processed to derive the mean values and the confidence intervals. A significance level of 0.05 has been considered for computing the confidence intervals [31]. Fig. 14 shows the results obtained: the mean grade values, computed over all involved subjects, are plotted with vertical colored blocks, while the confidence intervals are given by vertical black lines. The results obtained for Timbre, Transparency, and Main Impression are very similar in all the experiments, and we can refer only to the Main Impression in the analysis of the different methods. The results with the proposed techniques working in the frequency domain (fuzzy -means and Mean methods) are similar in all the experiments, and they are limited to the better and slightly better intervals, showing an improvement of the performance in all conditions compared with the unequalized signal. As for the method of [1], it also presents positive results, but limited to the slightly better interval. The only exception is with pop music, where a high perception of the low frequencies has been perceived as annoying by the listeners. In many cases, i.e., with soul, jazz, and classic music, all the methods appear equivalent, while the proposed techniques provide better results in the tests with rock and pop music, i.e., with a rich spectral content. The involved subjects have reported positive comments with regards to the proposed techniques, defining the evaluated sound signals as clear. Considering Fig. 14(f), which displays mean grade values computed over all the experiments, all tested equalization techniques produce a global enhancement of the perceived sound. The method of [1] has obtained a grade around 0.5 which corresponds to slightly better interval, while the proposed approaches show grades around 1.3, which corresponds to better interval. The difference can be related to the fact that the method of [1] designs the equalizer in the time domain, while the proposed methods design it in the frequency domain. Since tuning the parameters of [1] (e.g., tuning the equalization band) could foster a better subjective behavior, the only conclusion that can be derived from the subjective listening results is that the proposed

CARINI et al.: MULTIPLE POSITION ROOM RESPONSE EQUALIZATION IN FREQUENCY DOMAIN

133

Fig. 13. Equalizers magnitude responses for the proposed techniques. (a) Test 1. (b) Test 2. (c) Test 3. (d) Test 4.

Fig. 14. Results of listening test considering the mean value and a confidence intervals calculated taking into consideration the standard deviation and a significance level of 0.05. (a) Rock music. (b) Popular music. (c) Soul music. (d) Jazz music. (e) Classical music. (f) All tracks.

equalization techniques are suitable for improving the subjective quality of the sound reproduction system, with a perfor-

mance improvement which has been graded in the better, or slightly better interval.

134

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 20, NO. 1, JANUARY 2012

VI. CONCLUSION In this paper, the computational complexity of the multiple position room response equalization technique [1], based on fuzzy -means clustering, has been reduced. In the proposed approach most of the operations have been implemented in the frequency domain and the fuzzy -means clustering has been applied to the room magnitude responses at different positions. It has been shown that working in the frequency domain can avoid many of the onerous operations of the approach in [1]. As a consequence, a computationally efficient algorithm for multiple position room response equalization can be obtained. Different techniques that can replace the fuzzy -means clustering algorithm in the derivation of the prototype room response have also been discussed. All these techniques naturally arise from the frequency domain implementation of the system and have a computational complexity much lower than a single iteration of the fuzzy -means algorithm in time or frequency domain. In all our experiments, the equalization performances of the system equipped with the proposed techniques have been at least similar to those of the original approach in [1]. Furthermore, the equalization robustness and its dependence on the size of the equalization zone have been investigated. The experimental results indicate that the proposed methods are efficient and effective solutions for dealing with the room response equalization.

REFERENCES [1] S. Bharitkar and C. Kyriakakis, Immersive Audio Signal Processing. New York: Springer, 2006. [2] M. Karjalainen, T. Paatero, J. N. Mourjopoulos, and P. D. Hatziantoniou, “About room response equalization and dereverberation,” in Proc. 2005 IEEE Workshop Applicat. Signal Process. Audio Acoust., New Paltz, NY, Oct. 2005, pp. 183–186. [3] P. D. Hatziantoniou and J. N. Mourjopoulos, “Errors in real-time room acoustics dereverberation,” J. Audio Eng. Soc., vol. 52, no. 9, pp. 883–899, 2004. [4] A. Mäkivirta, P. Antsalo, M. Karjalainen, and V. Välimäki, “Modal equalization of loudspeaker-room responses at low frequencies,” J. Audio Eng. Soc., vol. 51, no. 5, pp. 324–343, May 2003. [5] M. Karjalainen, P. Antsalo, and A. Makivirta, “Modal equalization by temporal shaping of room response,” in Proc. 23rd Int. Conf. Signal Process. Audio Record. Reproduct., Copenhagen, Denmark, May 2003, pp. 1–11. [6] S. Neely and J. Allen, “Invertibility of a room impulse response,” J. Acoust. Soc. Amer., vol. 66, pp. 165–169, 1979. [7] J. Mourjopoulos, “On the variation and invertibility of room impulse response functions,” J. Sound Vibr., vol. 102, no. 2, pp. 217–228, Sep. 1985. [8] J. Mourjopoulos, P. Clarkson, and J. Hammond, “A comparative study of least-squares and homomorphic techniques for the inversion of mixed phase signals,” in Proc. ICASSP 1982, Int. Conf. Acoust., Speech, Signal Process., Paris, France, May 1982, pp. 1858–1861. [9] M. Miyoshi and Y. Kaneda, “Inverse filtering of room acoustics,” IEEE Trans. Signal Process., vol. 36, no. 2, pp. 145–152, Feb. 1988. [10] S. J. Elliot and P. A. Nelson, “Multiple-point equalization in a room using adaptive digital filters,” J. Audio Eng. Soc., vol. 37, no. 11, pp. 899–907, Nov. 1989. [11] Y. Haneda, S. Makino, and Y. Kaneda, “Multiple-point equalization of room transfer functions by using common acoustical poles,” IEEE Trans. Speech Audio Process., vol. 5, no. 4, pp. 325–333, Jul. 1997. [12] J. Mourjopoulos, “Digital equalization of room acoustics,” J. Audio Eng. Soc., vol. 42, no. 11, pp. 884–900, Nov. 1994. [13] S. Spors, H. Buchner, R. Rabenstein, and W. Herbordt, “Active listening room compensation for massive multichannel sound reproduction systems using wave-domain adaptive filtering,” J. Acoust. Soc. Amer., vol. 122, no. 1, pp. 354–369, Jul. 2007.

[14] S. Bharitkar and C. Kyriakakis, “A cluster centroid method for room response equalization at multiple locations,” in Proc. 2001 IEEE Workshop Applat. Signal Process. Audio Acoust., New Paltz, NY, Oct. 2001, pp. 55–58. [15] A. Oppenheim, D. Johnson, and K. Steiglitz, “Computation of spectra with unequal resolution using the fast Fourier transform,” Proc. IEEE, vol. 52, no. 2, pp. 299–301, Feb. 1971. [16] M. Karjalainen, E. Piirilä, A. Järvinen, and J. Huopaniemi, “Comparison of loudspeaker equalization methods based on DSP techniques,” J. Audio Eng. Soc., vol. 47, no. 1/2, pp. 14–31, Feb. 1999. [17] J. O. Smith and J. S. Abel, “Bark and ERB bilinear transforms,” IEEE Trans. Speech Audio Process., vol. 7, no. 6, pp. 697–708, Nov. 1999. [18] S. Turmchokkasam and S. Mitaim, “Effects of weights in weighted fuzzy c-means algorithm for room equalization at multiple locations,” in Proc. IEEE Int. Conf. Fuzzy Syst., Vancouver, BC, Canada, Jul. 2006, pp. 1468–1475. [19] S. Bharitkar and C. Kyriakakis, “Perceptual multiple location equalization with clustering,” in Proc. 36th Asilomar Conf. Signals, Syst., Comput., Pacific Grove, CA, Oct. 2002, vol. 1, pp. 179–183. [20] M. Karjalainen, A. Härmä, U. K. Laine, and J. Huopaniemi, “Warped filters and their audio applications,” in Proc. IEEE Workshop Applat. Signal Process. Audio Acoust., New Paltz, NY, Oct. 1997, pp. 19–22. [21] P. D. Hatziantoniou and J. N. Mourjopoulos, “Generalized fractionaloctave smoothing of audio and acoustic responses,” J. Audio Eng. Soc., vol. 48, no. 4, pp. 259–280, Apr. 2000. [22] R. Genereux, “Signal processing considerations for acoustic environment correction,” in Proc. U.K. 7th Conf. Digital Signal Process. (DSP), Sep. 1992, pp. 175–195. [23] P. D. Hatziantoniou and J. N. Mourjopoulos, “Results for room acoustics equalization based on smoothed responses,” in Proc. 114th AES Conv., Amsterdam, Netherlands, 2003, pp. 1–8. [24] A. Lattanzi, F. Bettarelli, and S. Cecchi, “NU-Tech: The entry tool of the hArtes toolchain for algorithms design,” in Proc. 124th AES Conv., May 2008, pp. 1–8. [25] S. Müller and P. Massarani, “Transfer-function measurement with sweeps,” J. Audio Eng. Soc., vol. 49, no. 6, pp. 443–471, 2001. [26] S. Bharitkar and C. Kyriakakis, “Visualization of multiple listener room acoustic equalization with the Sammon map,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 542–551, 2007. [27] I. Omiciuolo, A. Carini, and G. L. Sicuranza, “Multiple position room response equalization with frequency domain fuzzy c-means prototype design,” in Proc. IWAENC Int. Workshop Acoust. Echo Noise Control, Seattle, WA, 2008, pp. 1–4. [28] A. Carini, I. Omiciuolo, and G. L. Sicuranza, “Multiple position room response equalization: Frequency domain prototype design strategies,” in Proc. 6th Int. Symp. Image Signal Process. Anal. (ISPA 2009), Salzburg, Austria, 2009. [29] J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small room acoustics,” J. Acoust. Soc. Amer., vol. 65, no. 4, pp. 943–950, 1979. [30] General Methods for the Subjective Assessment of Sound Quality, Geneva, 2003, ITU-R BS. 1284-1. [31] A. Papoulis and S. U. Pillai, Probability, Random Variables and Sthocastic Processes. New York: McGraw-Hill, 2002.

Alberto Carini (S’89–A’97–M’02–SM’06) was born in Trieste, Italy, in 1967. He received the Laurea degree (summa cum laude) in electronic engineering and the Dottorato di Ricerca (Ph.D.) degree in information engineering from the University of Trieste in 1994 and 1998, respectively. In 1996 and 1997, during his doctoral studies, he was a Visiting Scholar with the University of Utah, Salt Lake City. From 1997 to 2003, he was a DSP Engineer with Telit Mobile Terminals SpA, Trieste, where he led the audio processing R&D activities. In 2003, he was with Neonseven srl, Trieste, as an audio and DSP expert. From 2001 to 2004, he collaborated with the University of Trieste as a Contract Professor of digital signal processing. Since 2004, he has been an Associate Professor at the University of Urbino, Urbino, Italy, first with the Information Science and Technology Institute, and now with the Department of Base Sciences and Fundamentals. His research interests include adaptive filtering, nonlinear filtering, nonlinear equalization, acoustic echo cancellation, active noise control, and room response equalization.

CARINI et al.: MULTIPLE POSITION ROOM RESPONSE EQUALIZATION IN FREQUENCY DOMAIN

Prof. Carini was a member-at-large of the Conference Board of the IEEE Signal Processing Society in 2009 and 2010. He is currently a member of the Editorial Board of Signal Processing, Elsevier, and a member of the SPTM Technical Committee of the IEEE Signal Processing Society. He received the Zoldan Award for the best Laurea degree in electronic engineering at the University of Trieste for the Academic Year 1992–1993.

Stefania Cecchi (M’06) was born in Amandola, Italy, in 1979. She received the Laurea degree (with honors) in electronic engineering in November 2004 at the University of Ancona, Ancona, Italy, and the Ph.D. degree in electronic engineering from the Polytechnic University of Marche, Ancona, in November 2007. She has been a Post Doc Researcher at the Department of Biomedic, Electronics, and Telecommunications Engineering (DIBET), University of Marche, since February 2008. She is the author or coauthor of several international papers. Her current research interests are in the area of digital signal processing, including adaptive DSP algorithms and circuits, speech, and audio processing. Dr. Cecchi is a member of the Acoustic Engineering Society (AES).

Francesco Piazza (M’87) was born in Italy in 1957. He received the Laurea (with honors) degree in electronic engineering from the University of Ancona, Ancona, Italy, in 1981. From 1981 to 1983, he worked on image processing in the Physics Department, University of Ancona. In 1983, he worked at the Olivetti OSAI Software Development Center, Ivrea, Italy. In 1985, he joined the Department of Electronics and Automatics, University of Ancona, first as a Researcher in electrical engineering, then as an Associate Professor. Currently, he is a Full Professor at the Università Politecnica delle Marche, Ancona. He is author or coauthor of more than 200 international papers. His current research interests are in the areas of circuit theory and digital signal processing, including adaptive DSP algorithms and circuits, artificial neural networks, speech, and audio processing. Mr. Piazza is a member of the Acoustic Engineering Society (AES).

135

Ivan Omiciuolo was born in San Donà di Piave, Italy, in 1982. He received the B.S. degree from the University of Padua, Padua, Italy, in 2005 and the M.S. degree (summa cum laude) in telecommunication engineering from the University of Trieste, Trieste, Italy, in 2008. In 2008, he was a Visiting Scholar at the University of Urbino, where he carried out the research on audio signal processing and room response equalization. During his studies, he improved his knowledge on acoustics and audio signal processing collaborating with different audio rental companies as a Sound Engineer and Consultant. Currently, he is with Prase Engineering s.rl., Noventa di Piave, Italy, as a Technical Support Manager and Application Engineer.

Giovanni L. Sicuranza (M’73–SM’85–LSM’06) is Professor Emeritus of Signal and Image Processing at the University of Trieste, Trieste, Italy. He was the founder of the Image Processing Laboratory, Electrical Electronic and Computer Science Department, University of Trieste, in 1980 and was its Director until 2005. His research interests include multidimensional digital filters, processing of images and image sequences, image coding, nonlinear filters, polynomial filters, and adaptive algorithms for echo cancellation and active noise control. He has published more than 180 papers in international journals and conference proceedings. He contributed chapters for seven books and is the coeditor with Prof. S. Mitra, University of California at Santa Barbara, of the books Multidimensional Processing of Video Signals (Kluwer, 1992) and Nonlinear Image Processing (Academic, 2001), and with Prof. S. Marshall, University of Strathclyde, Glasgow, U.K., of the book Advances in Nonlinear Signal and Image Processing (Hindawi, 2006). He is the coauthor with Prof V. J. Mathews, University of Utah at Salt Lake City, of the book Polynomial Signal Processing (Wiley, 2000). Prof. Sicuranza has been a member of the technical committees of numerous international conferences, the Chairman of the VIII European Signal Processing Conference, EUSIPCO-96 (Trieste, Italy, 1996) and the Sixth Workshop on Nonlinear Signal and Image Processing, NSIP-03, (Grado, Italy, 2003). He was also the Co-chair of the IEEE International Conference on Image Processing, ICIP-05 (Genova, Italy, 2005). He was a member of the editorial board of the EURASIP journal Signal Processing and of the IEEE Signal Processing Magazine. He is currently an Associate Editor of Multidimensional Systems and Signal Processing and a member of the editorial board of the Springer journal Signal, Image and Video Processing. He has served as a Project Manager of ESPRIT and COST research projects funded by the European Commission and as a consultant to several industries. He was the Awards Chairman of the Administrative Committee of EURASIP and a member of the IMDSP Technical Committee of the IEEE Signal Processing Society. He received in 2006 the EURASIP Meritorious Service Award.