Processing gas chromatographic data and confidence interval

Jan 31, 2006 - Journal of Chromatography A, 1110 (2006) 146–155 ... The paper compares existing PRV data processing methods (linear ..... GC apparatus.
211KB taille 9 téléchargements 309 vues
Journal of Chromatography A, 1110 (2006) 146–155

Processing gas chromatographic data and confidence interval calculation for partition coefficients determined by the phase ratio variation method Samuel Atlan a , Ioan Cristian Trelea a,∗ , Anne Saint-Eve a , Isabelle Souchon a , Eric Latrille b a

UMR G´enie et Microbiologie des Proc´ed´es Alimentaires, INRA INA P-G, BP 01, 1 Avenue Lucien Bretign`eres, F-78850 Thiverval-Grignon, France b Laboratoire de Biotechnologie de l’Environnement, INRA, Avenue des Etangs, F-11100 Narbonne, France Received 10 November 2005; received in revised form 10 January 2006; accepted 11 January 2006 Available online 31 January 2006

Abstract The phase ratio variation (PRV) method is widely used for the determination of partition coefficient values (dimensionless Henry’s law constants) by headspace gas chromatography. Traditional data processing by linear regression has several drawbacks: potential bias introduced by linearization, absence of quality indicator of the resulting value and, in case of replicate determinations, poor utilisation of the existing measurements leading to unnecessarily large confidence intervals. The paper compares existing PRV data processing methods (linear and nonlinear regression, parametric) and derives confidence intervals for the resulting partition coefficient values. The possibility of using several series of measurements to derive a single partition coefficient value with tighter and more reliable confidence intervals is presented for all three processing methods. The methods are tested on published literature data and new experimental data for 12 volatile organic compounds in water at 25 ◦ C. The nonlinear regression based on several series of measurements appears to be the method of choice. © 2006 Elsevier B.V. All rights reserved. Keywords: Partition coefficient; Henry’s law constant; Phase ratio variation (PRV); Linear regression; Nonlinear regression; Parametric method; Confidence interval; Volatile organic compound

1. Introduction The determination of the equilibrium partitioning of volatile organic compounds between solutions and vapour phase has extremely diverse practical applications, ranging from the release of pollutants in the atmosphere and the design of separation processes in industry to fragrance release from laundry and perception of the aroma in food during consumption. Among the many existing static headspace methods for the determination of the partition coefficients of volatile compounds, indirect methods that require neither the external calibration of the detector nor the exact knowledge of the initial concentration of the volatile compound in the solution become increasingly popular. They can be fully automated and applied



Corresponding author. Tel.: +33 130815490; fax: +33 130815597. E-mail address: [email protected] (I.C. Trelea).

0021-9673/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.chroma.2006.01.055

to mixes of several volatile molecules. The sample preparation is simplified and experimental uncertainties are reduced. Two widely used indirect methods are the equilibrium partitioning in closed system (EPICS) method, originally proposed by Lincoff and Gosset [1] and the phase ratio variation (PRV) method described by Ettre et al. [2]. In the EPICS method, two sample vials with different solution volumes but the same amount of volatile compound are used. The partition coefficient is obtained by measuring the equilibrium vapour concentration ratio in the two vials by gas chromatography (GC). In the PRV method, several volumes of a solution with the same initial concentration are introduced in the vials and the equilibrium gas concentration is also measured by GC. Chai and Zhu [3] proposed a combination of these two methods in which only two solution volumes are used as in EPICS but the initial concentration of the solution is the same in both vials as in PRV. This last method will be called parametric method in this work, and treated as a

S. Atlan et al. / J. Chromatogr. A 1110 (2006) 146–155

special case of the PRV, when the number of vials is reduced to two. In the classical PRV method [2], a nonlinear relationship between the chromatographic areas and the partition coefficient is established and is artificially linearised by considering the reciprocals of the chromatographic areas. This allows the determination of the desired partition coefficient from the ratio of the intercept and the slope of a straight line obtained by linear regression. This chromatographic data analysis method has the advantage of simplicity but also several important drawbacks. Firstly, the partition coefficient values determined by linear regression are statistically biased, i.e. the average of the determined values does not approach the true one, even if the number of replicate experiments is very large. As explained in detail by Ramacharandran et al. [4], this is mainly due to the nonlinear transformation of the measurement error of the chromatographic area by the reciprocal function. Secondly, quality indicators (e.g. standard deviations) of the partition coefficients determined by linear regression are not readily available because the partition coefficients are obtained neither as the intercept nor as the slope of the straight line, but as the ratio of the two. In the original work on PRV [2] the only quality indicator of the linear regression was the determination coefficient, which is only very indirectly related to the uncertainty on the calculated partition coefficient. Ramachandran et al. [4] give confidence intervals based on 120 repeated determinations of the partition coefficient; this was possible, however, only because the data was computer generated rather than a result of actual measurements. Thirdly, the classical linear regression is unable to properly handle repeated series of measurements in order to derive a more reliable partition coefficient. Processing the series separately and averaging the resulting partition coefficients together results in a very poor statistical utilisation of the existing degrees of freedom (GC measurements). Processing all series in a single regression may give unnecessarily high uncertainty on the resulting partition coefficient, as illustrated in Section 4. The objective of the paper is to derive a GC data processing method for PRV free of the above mentioned limitations, i.e. delivering statistically unbiased values of the partition coefficients together with tight and reliable confidence intervals, by making the best possible use of the available GC measurements. 2. Partition coefficient and confidence interval calculation methods Three existing methods of PRV data processing (linear regression [2], nonlinear regression [4] and parametric [3]) are reviewed and extended in order to allow confidence interval determination for the resulting partition coefficient. All methods are presented in two versions, which differ in the use of the multiple data series. In the first version, the whole data set, including possible replicates, is used in one regression. This is only possible if all samples are processed over a short time period, such that the GC calibration remains unchanged, and the initial concentration in all vials is strictly identical. In

147

the second version, each replicate series of measurements is allowed to have a different GC calibration and a different initial volatile concentration, but contributes to a unique partition coefficient. 2.1. Basic PRV equation The gas to matrix partition coefficient (Kgm ) of a volatile compound is defined as the ratio between the gas and the matrix concentrations of the compound at thermodynamic equilibrium: Cg Cm

Kgm =

(1)

This is also one of the possible definitions of the (dimensionless) Henry’s law constant and the reciprocal of the parameter K as defined by Ettre et al. [2]. According to the PRV procedure, the target volatile com0. pound is first diluted in a matrix at initial concentration Cm Various volumes of the matrix are introduced in closed vials until thermodynamic equilibrium is reached. The experimental measurements are GC peak areas, which are proportional to the concentration of the target compound in the gas phase. Based on mass balance considerations, it was theoretically shown, e.g. by Ettre et al. [2], that the GC peak areas are given by: A=

0 FCm

(2)

−1 Kgm +β

where F is the response factor of the detector and β is the volume ratio of the gas and matrix phases in the vial: β=

Vg Vm

(3)

2.2. Partition coefficient determination by linear regression In the classical linear regression method, the basic PRV equation (Eq. (2)) is rearranged to be linear with respect to the phase ratio β: 1 = x1 β + x2 A where 1 x1 = 0 FCm x2 =

(4)

(5)

1 0K FCm gm

(6)

2.2.1. Method L1 : Single data series Consider a series of M experimental measurements A1 , . . ., AM , obtained for M volume ratios β1 , . . ., βM . The coefficients x1 and x2 are obtained by fitting a straight line according to Eq. (4) [2], i.e. by minimizing with respect to x1 and x2 the fit function: fL1 (x1 , x2 ) =

M   i=1

1 x 1 βi + x 2 − Ai

2 (7)

148

S. Atlan et al. / J. Chromatogr. A 1110 (2006) 146–155

The partition coefficient is then determined from Eq. (5) and (6): Kgm

x1 = x2

(8)

(9)

where t(1−α)/2,ν is the inverse of the cumulative Student distribution at confidence level α and ν degrees of freedom. The confidence level is usually selected as: α = 0.95

(10)

and the number of degrees of freedom is: ν =M−2

fLN (x1 , ..., xN , xN+1 ) =

Mj  N  

xj βij +

j=1 i=1

The standard deviation σ K for Kgm can be determined approximately from the covariance matrix of x1 and x2 . The covariance matrix can be calculated as described in most standard textbooks on parameter identification, e.g. by Bury [5], and is also returned by many commonly used regression software packages. The formula for σ K is derived in Appendix A. A confidence interval for Kgm can be determined as: Kmin/max = Kgm ± t(1−α)/2,ν σK

with Eq. (7), the fit equation was written as:

(11)

since two parameters (x1 and x2 ) are determined from M measurements. For example, considering a data series of five measurements (five volume ratios), there would be 3 degrees of freedom and the width of the 95% confidence interval would be ±3.18 times the standard deviation. This method of determination of Kgm and of the associated confidence interval essentially relies on the following assumptions: (AS1) The response factor of the detector F and the initial target 0 are constant for all measurements compound concentration Cm in the data series. (AS2) The experimental error for the reciprocal of the peak area 1/A is normally distributed with zero mean and constant variance. (AS3) The experimental error for the phase ratio β is negligible. These assumptions and their implications are further discussed in Section 4. 2.2.2. Method LN : Multiple data series Consider N series of M1 , . . ., MN measurements Aij , obtained for the volume ratios βij . The number of measurements and the volume ratios need not to be the same in all data series. The response factor of the detector F and the initial target compound 0 have to be constant for each data series, but concentration Cm may be different among the different series. This situation often arises in practice if the various data series are processed over a long time period (F may change) or if they come from different 0 may change). matrix preparations (Cm Unfortunately, there is no way to arrange Eq. (2) such as to make it linear with respect to the unknown coefficients xj and to result in a single value of Kgm for all data series. By analogy

xj xN+1

1 − Aij

2

(12)

where xj =

1 0) (FCm j

for each data series j

(13)

and xN+1 = Kgm

(14)

Fitting the unknown coefficients in Eq. (12) requires nonlinear regression software. The standard deviation σ K for Kgm can be determined directly as the square root of the last (N + 1, N + 1) element of the covariance matrix of x1 , . . ., xN+1 . The approximate covariance matrix can be calculated as described for example by Bury [5] and is also returned by most nonlinear regression software packages. A confidence interval for Kgm can be determined from Eq. (9). The number of degrees of freedom ν is equal to the total number of the measurements minus the number of the fitted coefficients: ν=

N 

Mj − (N + 1)

(15)

j=1

For example, considering three data series of five measurements each, there would be 11 degrees of freedom and the width of the confidence interval would be ±2.20 times the standard deviation. In contrast, treating each data series independently would result in three distinct values of Kgm . Computing a confidence interval from these three values is still possible, but there would be only 2 degrees of freedom and the width of the confidence interval would be ±4.30 times the standard deviation. The standard deviation would be different, however in the two cases. The assumptions (AS1), (AS2) and (AS3) are still necessary in order to make the calculation of the partition coefficient and of the associated confidence interval valid. 2.2.3. Partition coefficient determination by nonlinear regression It appeared in the previous section that processing several data series simultaneously in order to get a single and accurate value of the partition coefficient requires nonlinear regression. A further improvement, with no additional complexity increase, can be obtained by using the basic PRV equation (Eq. (2)) directly, instead of its rearranged form (Eq. (4)). This also requires nonlinear regression, but the fit function involves the measured peak areas instead of their reciprocals. Thus, assumption (AS2) is replaced by: (AS4) The measurement error for the peak area A is normally distributed with zero mean and constant variance.

S. Atlan et al. / J. Chromatogr. A 1110 (2006) 146–155

Since the determined values of the partition coefficient are expected to vary over several orders of magnitude (e.g. from 10−4 to 1) it is numerically more accurate to compute the logarithm of Kgm rather than the value of Kgm itself. The logarithmic function being monotonic, confidence intervals on Kgm are derived directly from the confidence intervals on its logarithm. 2.2.4. Method N1 : Single data series Consider a series of M experimental measurements A1 , . . ., AM , obtained for M volume ratios β1 , . . ., βM , and the fit function derived directly from Eq. (2): fN1 (z1 , z2 ) =

M   i=1

exp(z1 ) − Ai 1/exp(z2 ) + βi

2 (16)

where the unknown coefficients are: 0 z1 = log(FCm )

(17)

and z2 = log(Kgm )

(18)

The unknown coefficients and the associated covariance matrix are obtained by numerical minimization of Eq. (16) using nonlinear regression software. The standard deviation σ L for the logarithm of Kgm is obtained as the square root of the last (2,2) element of the covariance matrix. The confidence interval is computed as: Kmin/max = exp(z2 ± t(1−α)/2,ν σL )

(19)

The number of degrees of freedom ν is given by Eq. (11), as in the case of the linear regression. 2.2.5. Method NN : Multiple data series Consider N series of peak area measurements, as in the case of method LN . The fit function given by Eq. (12) is replaced by Eq. (20) below, in order to take into account assumption (AS4) instead of (AS2): fNN (z1 , ..., zN , zN+1 ) =

Mj  N   j=1 i=1

2

exp(zj ) − Aij 1/exp(zN+1 ) + βij (20)

where 0 zj = log(FCm )j

for each data series j

(21)

and zN+1 = log(Kgm )

(22)

The confidence interval for Kgm is determined similarly to Eq. (19), with zN+1 replacing z2 . The number of degrees of freedom ν is given by Eq. (15), as for the linear regression.

149

2.3. Parametric method for partition coefficient determination Linear as well as nonlinear regressions can be avoided alto0 can gether by using the fact that the unknown parameter FCm be eliminated from the basic PRV equation (Eq. (2)), by considering a couple of measurements (A1 and A2 ) and the associated values of the phase ratio (β1 and β2 ): Kgm =

A1 − A 2 β2 A2 − β 1 A1

(23)

This idea was originally used in the EPICS method [1] and suggested for the PRV method by Chai and Zhu [3], where the selection of the phase ratios in order to increase the accuracy of the partition coefficient is discussed in detail. For this calculation to be valid, assumptions (AS1) and (AS3) must hold, the two measurement must come from the same data series and must have be performed for different values of the phase ratio β. Application of this method to all possible couples of measurements that satisfy the above condition (let this number be Q) results in a relatively large set of different values for Kgm : {Kgm1 , Kgm2 , . . . , KgmQ }

(24)

The exact statistical properties of this set are not known. If the set is sufficiently large, however, a robust estimator for the partition coefficient value would be its median, and an approximate confidence interval given by empirical quantiles. For example, for a 95% confidence intervals one should take the interval between the 2.5 and the 97.5 percentiles. 2.3.1. Method P1 : Single data series If the data set consists of a single series of M peak area measurements performed for different values of β, then the number of possible couples, and hence the number of the calculated values for the partition coefficient is: 2 Q = CM =

M! 2!(M − 2)!

(25)

For example, from a data series of five measurements could be calculated 10 different values for Kgm . After sorting these values, the estimated value for Kgm would be the average of the fifth and the sixth values, the lower bound of the 95% confidence interval would be the smallest value and the upper bound the largest value. 2.3.2. Method PN : Multiple data series If the data set consists of N data series of Mj measurements 0 , couples can only be with potentially different values of FCm taken within series but not across (assumption AS1). The number of the calculated values for the partition coefficient is: Q=

N  j=1

2 CM = j

N  j=1

Mj ! 2!(Mj − 2)!

(26)

150

S. Atlan et al. / J. Chromatogr. A 1110 (2006) 146–155

For example, from three data series of five measurements each, 30 different values for Kgm could be calculated. After sorting these values in ascending order, the estimated value for Kgm would be the average of the 15th and the 16th values, the lower bound of the 95% confidence interval would be 1/4 way between the smallest and the second smallest value, and the upper bound would be 3/4 way between the second largest and the largest value.

3.2. Sample preparation

3. Experimental

Gas to matrix partition coefficients were determined at 25 ◦ C. The equilibrium state in vials was reached after 12 h at 25 ◦ C. A syringe took 2 mL of headspace gas that was injected with an automatic headspace sampler CombiPal (CTC Analytics, Switzerland) into a gas chromatography-flame-ionization (GC-FID) system (HP6890; Agilent Technologies, Waldbronn, Germany) heated at 250 ◦ C. Volatile organic compounds were transferred in a semi-capillary column of 30 m length, 0.53 mm internal diameter, 1 ␮m film thickness (BP20 Carbowax, Interchim, France). The carrier gas was helium at a flow rate of 8.6 mL/min. For the FID detector, H2 and air flow rates were 40 and 450 mL/min, respectively. The oven program was 37.5 min long: starting at 50 ◦ C, 4 ◦ C/min up to 70 ◦ C;

3.1. Volatile organic compounds This work was based on experiments with an aroma constituted of 12 volatile organic compounds in a solvent, propylene glycol at 80% (w/w). The aroma was then diluted in pure water at 0.3% (w/w). The volatile organic compounds were: cis-3-hexenol, decanoic acid, diacetyl, ethyl acetate, ethyl butyrate, ethyl hexanoate, ethyl octanoate, ␥decalactone, limonene, linalool, methyl cinnamate an vanillin. Volatile organic compounds were purchased from Aldrich, France.

Vials of 22.4 mL (Chromacol, France) were filled with different volumes of matrix: 0.05, 0.2, 0.5 and 2 mL. The corresponding volume ratios were 447, 111.00, 43.80 and 10.20. All experiments were performed in triplicate. 3.3. GC apparatus

Table 1 Comparison of air–water partition coefficients and their 95% confidence intervals calculated using three different data processing methods Linear regression (Ettre et al. [2]a )

Linear regression method L1

Nonlinear regression method N1

Parametric method P1

(45 ◦ C)

0.345

Benzene (60 ◦ C)

0.441

Benzene (70 ◦ C)

0.585

Benzene (80 ◦ C)

0.602

Butanone-2 (45 ◦ C)

0.0069

Butanone-2 (60 ◦ C)

0.0145

Butanone-2 (70 ◦ C)

0.0196

Butanone-2 (80 ◦ C)

0.0270

Chlorobenzene (45 ◦ C)

0.235

Chlorobenzene (60 ◦ C)

0.312

Chlorobenzene (70 ◦ C) Chlorobenzene (80 ◦ C)

0.538

Toluene (45 ◦ C)

0.422

Toluene (60 ◦ C)

0.538

Toluene (70 ◦ C)

0.658

Toluene (80 ◦ C)

0.826

0.344 [0.329 0.360] 0.407 [0.351 0.473] 0.658 [0.507 0.854] 0.588 [0.522 0.663] 0.0069 [0.0042 0.0113] 0.0142 [0.0112 0.0254] 0.0199 [0.0156 0.0244] 0.0269 [0.0207 0.0350] 0.228 [0.216 0.242] 0.312 [0.306 0.318] 0.432 [0.327 0.621] 0.571 [0.350 0.932] 0.404 [0.351 0.464] 0.532 [0.484 0.585] 0.829 [0.471 1.457] 0.766 [0.539 1.090]

0.346 [0.336 0.357] 0.411 [0.382 0.481] 0.615 [0.528 0.750] 0.588 [0.535 0.647] 0.0070 [0.0029 0.0156] 0.0142 [0.0108 0.0158] 0.0213 [0.0175 0.0244] 0.0273 [0.0206 0.0403] 0.225 [0.224 0.242] 0.311 [0.308 0.316]

0.391

0.344 [0.332 0.355] 0.439 [0.370 0.508] 0.583 [0.463 0.704] 0.601 [0.530 0.672] 0.0069 [0.0039 0.0099] 0.0144 [0.0115 0.0173] 0.0195 [0.0153 0.0237] 0.0269 [0.0218 0.0319] 0.234 [0.220 0.248] 0.311 [0.308 0.314] 0.389 0.451 [0.282 0.496] 0.537 [0.402 0.672] 0.420 [0.388 0.453] 0.538 [0.493 0.583] 0.658 [0.424 0.893] 0.825 [0.671 0.979]

Molecule Benzene

[0.338 0.525] 0.547 [0.416 0.775] 0.411 [0.371 0.434] 0.529 [0.493 0.565] 0.728 [0.569 1.134] 0.791 [0.638 0.997]

Experimental data from Ettre et al. [2]. a For direct comparison with the air–water partition coefficients (K ) considered in this paper, the reciprocals of the original water–air partition coefficients (K) gm from Ettre et al. [2] are given.

S. Atlan et al. / J. Chromatogr. A 1110 (2006) 146–155

5 ◦ C/min up to 170 ◦ C; 8 ◦ C/min up to 220 ◦ C and 6 min at 220 ◦ C. 3.4. Numerical software All calculations described in this paper were performed using Matlab 7 (The MathWorks, MA) and the associated Statistical Toolbox. Interested readers could perform similar calculations using most available statistical software packages, such as Statistica (StatSoft Inc., OK), StatGraphics*Plus (Manugistics, MAR), SPlus (Insightful corp. WA), SAS (SAS Institute GmbH, Germany), etc. 4. Results and discussion 4.1. Comparison of methods on previously published results The first experimental data set was chosen from the classical work on PRV by Ettre et al. [2]. Published chromatographic peak areas are averages of two measurements, and individual measurements are not given. Hence, this data set was processed as if it contained no replicates, with methods L1 , N1 , and P1 , and the results are presented in Table 1. A clear added value of the data processing methods considered in this paper is the presence of confidence intervals, as quality indicators of the calculated partition coefficient values. For example, in the case of benzene at 45 ◦ C and chlorobenzene at 60 ◦ C the width of the 95% confidence interval is less than 10% of the most probable value, indicating quite reliable partition coefficients. Conversely, large confidence intervals are associated to a high scatter of the measured GC areas around the fitted theoretical models and should be regarded as a warning concerning the reliability of the results. For example, confidence intervals for butanone-2 at 45 ◦ C and toluene at 70 ◦ C exceed 50% of the most probable value. Most of the compounds listed in Table 1 fall between these extremes. It appears from Table 1 that, when the confidence intervals are small, the most probable values given by all three calculation methods are in very good agreement. This is expected, since small confidence intervals reflect the fact that the theoretical models fit the data well and all tree calculation methods are equivalent in the absence of the measurement error. Conversely, large confidence intervals are often associated to large differences between the most probable partition coefficient values among the calculation methods. This arises because large

151

confidence intervals reflect poorer fit between measured GC areas and the regression curve, the misfit being interpreted as measurement noise. Here is when the differences between the statistical assumptions concerning the measurement noise (i.e. AS2 and AS4) become important, leading to different values of the partition coefficient. In cases when linear and nonlinear regression results disagree, the nonlinear result should in principle be preferred, since it is based on the most natural and widely used assumption concerning the measurement error (assumption AS4). In contrast, the ad-hoc assumption (AS2) was only introduced to accommodate the usage of the linear regression, and is likely to be wrong, the consequence being a statistically biased value of the partition coefficient. The fact that the parametric method, which makes no explicit assumption about the measurement noise, is in most cases in better agreement with the nonlinear method than with the linear one, tends to support this theoretical expectation. Performing weighted regressions is an alternative way of dealing with the statistical bias, recommended by Ramacharandran et al. [4]. Weighted regressions were not considered in this paper because (i) the determination of statistical weights relies on approximations valid only for “small” measurement errors, (ii) the calculation of confidence intervals is more involved and (iii) under assumption (AS4), weighted and unweighted nonlinear regressions are strictly equivalent. It should be emphasized, however, that in cases when the results obtained with the different methods disagree, the confidence intervals given by all the methods are large indicating a high uncertainty of the result. This uncertainty could hardly be anticipated considering the high quality of the regression as indicated the high correlation coefficient given in the original work by Ettre et al. [2]: higher than 0.99 in all cases and higher than 0.9999 in most cases. Rigorous calculation of confidence intervals appears to be far more informative than the correlation coefficient of the regression. The second data set was selected from the work of Jouquand et al. [6] in order to illustrate different possibilities of treating replicate series of measurements. This paper reports three series of GC peak areas, each reported area being itself an average of three actual measurements, which are not given. The data set was thus treated as if contained three replicates, instead of the potential nine. The results are given in Table 2. As far as the value of the partition coefficient is concerned, it appears that any given method (linear, nonlinear or parametric) gives almost identical results, irrespective of the way in which

Table 2 Comparison of the different calculation methods and of the different ways of processing replicates for the air–water partition coefficient of hexanal at 60 ◦ C and the associated 95% confidence interval Use of replicates

Linear regression method L*

Nonlinear regression method N*

Parametric method P*

Separate processing of each replicate series followed by average Simultaneous processing of all replicate series 0 value, method with a common FCm *1 Simultaneous processing of all replicate series 0 values, method with separate FCm *N

0.0835 [0.0773 0.0897] 0.0834 [0.0624 0.1040] 0.0834 [0.0783 0.0885]

0.0785 [0.0743 0.0828] 0.0785 [0.0634 0.0973] 0.0786 [0.0717 0.0861]

0.0774 [0.0636 0.0913] 0.0797 [−0.0002 0.2570] 0.0797 [0.0479 0.0940]

Experimental data from Jouquand et al. [6].

152

S. Atlan et al. / J. Chromatogr. A 1110 (2006) 146–155

the replicates were processed. For example, the linear regression gives a partition coefficient of hexanal in water at 60 ◦ C very close to 0.0834 in all cases: average of three independent partition coefficients as in the original work [6], a single regression on data from all replicates grouped together (method L1 ) or a sin0 values (method L ). The same gle regression with separate FCm N holds for the nonlinear regression and the parametric method, with one exception (separate processing of replicate series with the parametric method). Differences among methods are obvious, however, due to different hypothesis on the measurement noise. Again, the nonlinear and the parametric methods agree relatively well together, and differ from the linear regression. In contrast, the way in which the replicate series of measurements are processed has a significant impact on the width of the confidence intervals. Processing all replicate series together 0 appears to be the poorest choice, for all with a common FCm calculation methods. The reason is illustrated in Fig. 1(A) in the case of the nonlinear regression. Even slight experimental variations in either F (e.g. due to processing the vials over a long time 0 (e.g. due to different matrix preparations) or both period) or Cm result in “parallel” series of measurements. When processing them together, the unavoidable misfit is interpreted as an abnormally large measurement error, which in turn translates into an unnecessarily large uncertainty on the partition coefficient. Mixing parallel series of measurements together is particularly harmful for the parametric method. Processing each replicate series either independently or simultaneously with a separate 0 value gives relatively similar confidence intervals for this FCm 0. data set, significantly smaller than in the case of a common FCm The reason is illustrated in Fig. 1(B) in the case of the nonlin0 values allow a good fit of each of ear regression: separate FCm the “parallel” series. Processing replicate series simultaneously 0 should be preferred, however, because the with separate FCm confidence intervals are more reliable, being based on a larger number of degrees of freedom. When comparing the confidence intervals provided by the three calculation methods, it appears from Table 2 that the parametric method is the most conservative. This could be expected since this method makes no explicit hypothesis on the measurement error. The confidence intervals provided by linear and

nonlinear regression are similar. The nonlinear regression should be preferred, however, since it requires less approximations (Appendix A). 4.2. Comparison of methods on new laboratory data In order to further refine the comparison of the proposed data processing methods, a new data set including 12 molecules with a wide range of volatilities (partition coefficients between 10−5 and 10−1 ) and three replicate series of measurements was generated, as described in Section 3. In the following, the results obtained using the parametric method are omitted, because the resulting confidence intervals were extremely conservative compared to the other methods. This confirms the results of the previous section and tends to rule out the parametric method. For the same reason are omitted the results obtained by processing replicate series of measurements 0 values. Again, this confirms the conclusions with the same FCm of the previous section and indicates that this way of processing replicates is inappropriate. Table 3 gives the partition coefficient values, together with their confidence intervals, obtained using the linear and the nonlinear regressions. The replicate series of measurements are either processed separately and the results of the three replicates averaged together, or processed simultane0 values (methods L and N ). ously with separate FCm N N The 12 molecules considered in this data set can be roughly divided into three groups. The first group includes eight molecules (four ethyl esters, ␥-decalactone, limonene, linalool and methyl cinnamate) with partition coefficients of order of 10−3 and 10−2 . The partition coefficients of the molecules in this group can be determined relatively reliably by PRV. The second group includes two molecules (cis-3-hexenol and diacetyl) with partition coefficients around 5 × 10−4 . The volatility of these molecules being relatively low, their partition coefficient can be determined by PRV only very approximately. Mathematically, this arises because in the denominator of Eq. (2) the −1 and the variations of phase ratio β is small compared to Kgm the chromatographic peak area due to variations of β are not clearly distinguishable from the measurement errors [2]. The third group includes two molecules (decanoic acid and vanillin)

Fig. 1. Nonlinear regressions for the determination of the partition coefficient of hexanal in water at 60 ◦ C using three replicate series of measurements. Data from 0 value for all replicates, method N . (B) Separate FC 0 values for each replicate series, method N . Jouquand et al. [6]. (A) A unique FCm 1 N m

S. Atlan et al. / J. Chromatogr. A 1110 (2006) 146–155

153

Table 3 Air–water partition coefficients of 12 volatile organic compounds at 25 ◦ C and associated 95% confidence intervals Molecule

Scale factor

Separate linear regressions

Linear regression method LN

Separate nonlinear regressions

Nonlinear regression method NN

cis-3-Hexenol

×10−4

Decanoic acid

×10−5

Diacetyl

×10−4

Ethyl acetate

×10−3

Ethyl butyrate

×10−2

Ethyl hexanoate

×10−2

Ethyl octanoate

×10−2

␥-Decalactone

×10−3

Limonene

×10−2

Linalool

×10−3

Methyl cinnamate

×10−3

Vanillin

×10−4

5.43 [1.10 9.77] 3.02 [−92 98] 6.66 [3.24 10.1] 9.63 [5.57 13.7] 2.31 [1.02 3.60] 3.29 [0.69 5.90] 2.01 [−2.01 6.03] 3.37 [2.79 3.94] 1.04 [0.65 1.42] 3.19 [2.71 3.67] 2.44 [1.97 2.91] 3.13 [−4.02 10.3]

5.36 [1.59 9.14] 3.00 [−56 62] 6.64 [3.57 9.70] 9.57 [6.82 12.3] 2.27 [1.02 3.51] 3.23 [2.47 3.98] 1.96 [0.44 3.47] 3.36 [3.00 3.72] 1.03 [0.78 1.28] 3.19 [2.97 3.41] 2.44 [2.15 2.73] 3.16 [−3.00 9.32]

5.75 [2.13 9.37] 8.15 [−82 98] 7.19 [3.41 10.1] 8.39 [5.88 10.9] 1.79 [1.45 2.13] 3.35 [2.81 3.88] 7.75 [4.29 11.2] 3.87 [3.30 4.43] 1.17 [1.00 1.34] 2.95 [2.49 3.41] 2.52 [1.99 3.05] 3.22 [−3.75 10.2]

5.80 [3.05 11.0] 1.66 [10−15 1014 ] 7.17 [4.52 11.3] 8.39 [7.36 9.56] 1.79 [1.60 2.01] 3.35 [3.04 3.70] 7.71 [5.75 10.3] 3.87 [3.12 4.80] 1.17 [1.07 1.28] 2.95 [2.59 3.37] 2.51 [2.05 3.08] 2.99 [0.39 22.9]

Comparison of data processing methods.

with partition coefficients lower than 5 × 10−4 . The PRV method cannot be used for the determination of the partition coefficient of such molecules, at least with practically obtainable values of β, because variations of the chromatographic peak areas due to variations of β are overwhelmed by measurement noise. Considering the molecules in the first group, it appears that the linear and nonlinear regression give slightly, but consistently, different results. This is expected, due to the different assumptions concerning the measurement errors. In the case of ethyl octanoate, the difference is quite significant, and is due to two outliers in the region of small GC peak areas. When transformed through the reciprocal function, these two data points have a very

strong effect on the linear regression (Fig. 2(A)) but almost no effect on the nonlinear regression (Fig. 2(B)). It should be noted that in this case the fit to the theoretical model is poor for the linear regression, resulting in large confidence intervals. For the molecules in the first group, the average width of the confidence interval, relative to the most probable value of the partition coefficient, is 116% for separate linear regressions, 60% for the unique linear regression, 44% for separate nonlinear regressions and 32% for the unique nonlinear regression. The fact that separate regressions overestimate the confidence interval compared to unique regressions was theoretically expected (but not clearly observed in the previous section)

Fig. 2. Determination of the partition coefficient of ethyl octanoate in water at 25 ◦ C using the same three replicate series of measurements. (A) Linear regression, method LN . (B) Nonlinear regression, method NN .

154

S. Atlan et al. / J. Chromatogr. A 1110 (2006) 146–155

due to a better utilisation of the degrees of freedom. The fact that nonlinear regressions provide tighter confidence intervals than linear ones might be due to a better adequacy of the measurement error assumption (AS4 rather than AS2) to actual experimental data. It should be noted, however, that the main goal is the calculation of reliable confidence intervals, rather than small but incorrect ones. In this respect, unique regressions are also theoretically superior to separate ones since statistical conclusions are more reliable when based on a larger amount of data, and nonlinear regressions to linear ones, because the calculation of the confidence intervals involves less approximations (Appendix A).

Nomenclature

A Cg Cm 0 Cm

F Kgm M

5. Conclusion The paper focuses on the mathematical processing of the chromatographic data obtained by the PRV method in order to derive reliable values of the partition coefficients, together with associated measures of the uncertainty of the result. Three methods of GC data processing proposed in the literature were considered: classical linear regression [2], nonlinear regression [4] and a parametric method [3]. All methods were extended to provide statistical confidence intervals of the resulting partition coefficient. It was theoretically argued and practically verified on several sets of experimental data that the nonlinear regression should be preferred because it provides potentially unbiased values of the partition coefficient, together with tight and reliable confidence intervals. Three ways of handling repeated series of measurements in order to derive a single, and hopefully more reliable, value of the partition coefficient were examined: processing the repeated series separately, processing them together with a single set 0) of detector calibration and initial volatile concentration (FCm 0 values, and processing them together with separate FCm values. From both theoretical considerations and practical results it appeared that processing repeated series together with separate 0 values should be the preferred approach, because it results FCm in the most tight and reliable confidence intervals. Thus, the recommended approach is nonlinear regression with simultaneous processing of repeated series of measurements, each with its own detector calibration constant and initial volatile concentration. This approach is certainly more involved than the traditional separate linear regressions for each series of measurements, but should be accessible to users familiar with general-purpose statistical software. It was also practically found and theoretically justified that, with gas to matrix phase ratios up to about 450, partition coefficient values down to about 5 × 10−4 could be measured relatively accurately. Measuring partition coefficients of less volatile molecules would require higher phase ratios, which are experimentally difficult to obtain and could be subject to significant uncertainty. More generally, the practically important question of the optimal selection of the phase ratios in order to improve the accuracy of the partition coefficient (similar to the work done by Chai and Zhu [3] for the parametric method) remains a subject of future research.

N Q t Vg Vm x z

chromatographic peak area (mV) concentration of the volatile compound in the gas phase at equilibrium (kg m−3 ) concentration of the volatile compound in the matrix at equilibrium (kg m−3 ) initial concentration of the volatile in the matrix (kg m−3 ) GC detector response factor (mV kg−1 m3 ) gas to matrix partition coefficient at equilibrium (kg kg−1 ) number of different volume ratios in a series of measurements number of replicate series of measurements number of couples of measurements inverse of the cumulative Student distribution volume of gas in the vial (m3 ) volume of matrix introduced in the vial (m3 ) unknown coefficients in the linear regression unknown coefficients in the nonlinear regression

Greek symbols α statistical confidence level β gas to matrix volume ratio (m3 m−3 ) ν number of degrees of freedom σK standard deviation of the partition coefficient σL standard deviation of the logarithm of the partition coefficient Acknowledgements This work was supported by French government project CANAL-ARLE. The authors gratefully acknowledge the scientific advice of its participants. Appendix A. Approximate standard deviation of the ratio of two dependent random variables Consider two (not necessarily independent) random variables x1 and x2 with means x¯ 1 and x¯ 2 respectively, and covariance matrix V:     x1 x¯ 1 E = x2 x¯ 2  E

x1 x2

=V =



 −



x¯ 1



x¯ 2

σ12

σ12

σ12

σ22

 ([x1

x2 ] − [¯x1

x¯ 2 ])



where E denotes the statistical expectation operator. Let K be a general analytic function of x1 and x2 . Near x¯ 1 and x¯ 2 it can be approximated by a Taylor series expansion to first

S. Atlan et al. / J. Chromatogr. A 1110 (2006) 146–155

order: ∂K K(x1 , x2 ) ≈ K(¯x1 , x¯ 2 ) + (¯x1 , x¯ 2 )(x1 − x¯ 1 ) ∂x1   x1 − x¯ 1 ∂K + +B (¯x1 , x¯ 2 )(x2 − x¯ 2 ) = A ∂x2 x2 − x¯ 2 where

∂K A= (¯x1 , x¯ 2 ) ∂x1

∂K (¯x1 , x¯ 2 ) ∂x2



155

In the special case of x1 K(x1 , x2 ) = x2 one has:

1 A= x¯ 2

−¯x1 x¯ 22



and the standard deviation for K can be computed as: σK ≈ AVAT

and B = K(¯x1 , x¯ 2 ).

References

With this approximation, the expected value for K and its variance are:     x1 − x¯ 1 ¯ K=E{K(x + B = B = K(¯x1 , x¯ 2 ) 1 , x2 )}≈E A x2 − x¯ 2

[1] A.H. Lincoff, J.M. Gosset, in: W. Brutsaert, G.H. Jirka (Eds.), Gas Transfer at Water Surfaces, Reidel, Dordrecht, 1984, p. 17. [2] L.S. Ettre, C. Welter, B. Kolb, Chromatographia 35 (1993) 73. [3] X.S. Chai, J.Y. Zhu, J. Chromatogr. A 799 (1998) 207. [4] B.R. Ramachandran, J.M. Allen, A.M. Halpern, Anal. Chem. 68 (1996) 281. [5] K.V. Bury, Statistical Distributions in Engineering, Cambridge University Press, New York, 1999. [6] C. Jouquand, V. Ducruet, P. Giampaoli, Food Chem. 85 (2004) 467.

2 ¯ 2} σK = E{(K − K)    x1 − x¯ 1 ≈E A [ x1 − x¯ x2 − x¯ 2

 x2 − x¯ 2 ]A

T

= AVAT