Variance Estimation in a Random Coefficients Model - CiteSeerX

Page 1 .... In order to prove nonsinqularity of M. consider its rank: If (x'.P') were not of full .... We may further calculate the expected variance of the estimated h h.
356KB taille 12 téléchargements 316 vues
Variance Estimation in a Random Coefficients Model by EKKEHART SCHLICHT

Paper presented at the Econometric Society European Meeting Munich 1989

published by www.semverteilung.vwl.uni-muenchen.de

Variance Estimation in a Random Coefficients Model* by

Ekkehart Schl icht Darmstadt

Institute of Technology

Schloss. 6100 Darmstadt West Germany

(c) November 1988 revised March 1989

ABSTRACT

Consider the regression model

Yt =

with y,

E

a't x t + u t

R. x t

estimated and u

R" O b S e r V a t i ~ n ~a.

E

t

t



T: u t - H(0.u2)

t= 1;2.....

E

coefficients to be

R"

R normal disturbances for the time periods

The coefficients are assumed to be generated by a

t=l.Z.....T.

random walk with normal disturbances v t e R a

t

= a

t-l

+ vt

t=l.Z.....T:

v

n t

- H(0.Z)

The variance-covariance matrix Z is assumed diagonal

-2

-

Thus the variances in the model are a'

and I or ( a 2 . a, 2

....:o

This paper develops a method for estimating these variances b y means of certain "expected statistics estimators".

These

estimators are compared to maximum likelihood estimators.

).

Ekkehart Schlicht Darmstadt Institute of Technology Schloss, 6100 Darmstadt West Germany

(c) November 1988 revised March 1989

Comments welcome

The research reported in this paper has been financially supported by the Deutsche Forschungsgemeinschaft. I thank also Daniela Diekmann, The0 Dijkstra. Walter Kramer and Ralf Pauly for valuable comments and suggestions.

Typesetting by Christine Woerlein

Introduction

Consider the regression model

with y t

E

R. x

t

R" observations. a t

E

estimated and ut



E

R

n

coefficients to be

IR normal disturbances for the time periods

t=1.2,....T. The coefficients are assumed to be generated by a random walk with normal disturbances v t



R

n

The variance-covariance matrix X is assumed diagonal

Thus the variances in the model are o 2 and X or (a2. a , 2

....a n2

The estimation problem is the following: Given the observations ( x 1 , X2..

. . .xT)

and (y,, y, ....,yT).

path of the coefficients (a,, a,....a and X?

how to estimate the time T

) and the variances

0 '

).

-5

-

The main difficulty here is to obtain estimates for the variances. Once the variances are determined i t is relatively easy to give estimates for the coefficients. either by recursive Kalman filtering or. still easier, by the method described in Schlicht

(1985.

52-56).

One possibility would be to estimate the variances by the maximum likelihood method. The purpose of this paper is to propose a variance estimator which compares favorably to the maximum likelihood estimator in several respects:

-

i t is asymptotically equivalent to the maximum likelihood

estimator:

-

i t is computationally much easier to implement:

-

i t has a direct intuitive interpretation also in small

samples:

- and

i t seems to work better in small samples.

The plan of the paper is a s follows: Part 1 gives some notation and preliminary results. Part 2 introduces the "expected statistics" estimators and compares them with maximum likelihood estimators. The appendix gives a numerical illustration.

1. 1.1

The Model Notation

Define

order

Tx 1

order a n d w r i t e (1).

Define further

Tx 1

T x Tn (2) a s

Tnx 1

(T-1)nxl

which permits us to write

~ e n o t efurther b y e 1.



IRn

the n-th column of an nxn identity

matrix and define

which permits us to write

where v = i

( v .

v3.. i

..

denotes the time path of the change in

the i-th coefficient.

1.2 A Likelihood Function

Consider now the time averages of the coefficients

By using the Tn x n matrix

can be expressed also als

We note

P Z = 0 , Z'Z = I. P'(PP*)-'p

(15)

+ zz* =

I

Define the Tn xTn matrix

(' 6 )

Eqs. (7) and ( 1 4 ) can be combined now to

Since

5-1

= (P*(PP*)-'.

Z). this can be solved for a:

Inserting this into (6) yields

-9

-

y = XZa + w ,

w:= XP'(PP')-'V

+ u

We note that

Thus (19) stands for a standard G L S regression in the time-averages

a

of the coefficients, and i t is reasonable to

assume that XZ has full rank:

The disturbances w in (19) are normally distributed

w

- N(0.V).

V:= XP'(PP')-'s(PP*)-'PX'

+

0 ' 1

likelihood function associated with (19) is therefore

(23)

L (&.o2.o1 2

....en):= 2

104 det

Minimization with respect to

2

v +

(9.- g S z * x * ) V-~(XZ~-~)

yields the Aitken estimate

- 10 h

We may thus view

2

as a function of the variances and the

observations and insert i t into (23) in order to obtain a concentrated likelihood function

(25)

*

2

L (a2.p1

h

. . . . .a;):=

~ ( a . o * . o2,

. . . . .a;)

+ constants

which could be used, in principle, to determine the variances This can. however. be simplified considerably.

1.3

Estimates for the Coefficients

For given

a.

y. and X. t h e system (18).(22) defines the

conditional normal distribution of a with mode and expectation equal to

We replace the parameter

a

by

its estimate

a

and take the A

resulting expression as our estimate for the coefficients a

This estimate can be represented also in a different way.

..

Proposition 1 (Schlicht 1985. 55-56) The estimate a in (27) satisfies

h

M a = x'y

where

is nonsingular.

Proof. Eq. (28) is proved by evaluating the left-hand side explicitly, which leads to the result X'y.

In order to prove nonsinqularity of M. consider its rank:

If (x'.P') 11

were not of full rpnk, there would exist vectors

c t € R , t = 1.2.

. . .T.

not all of them zero, such that

by

i s satisfied. I f (31) is premultiplied

from (13). -this

leads to Z'X'c, = 0 which implies. together with (21).

c,=O.

Since P' is o f full rank (T-l).n, this implies also that c,. c,.....~ are zero. This proves the proposition. T

A

In view of Prop. 1 , the estimate a can be given a direct desriptive characterization: It minimizes the weighted sum of squares

u'u +

a

a 2 v i 'v i

Z i=1

i

This minimization is, for given variances, equivalent with the minimization o f the expression

Q : = u'u

+ a2

= (y'-a'X*)(Xa-y)

V'S-~V

+ a2a'~*s"~a

Eq. (28) is just the first-order condition for a minimum of Q with respect to a .

1.4 Another Representation of Likelihood

We may define the estimated disturbances associated with the estimated coefficients in a natural way:

A

h

u : = y-xa.

A

h

v : = Pa.

A

v

h

i

:= P.a,

A

w : = XP'(PP')-'v

1

A

h

+ u.

i = 1.2.....n

All

these a r e functions of the variances

We may

(and the observations).

insert them into (32) a n d o b t a i n the estimated sum o f

s q u a r e s a s a function o f the variances:

Position 2 (Schlicht 1985.55). function L

*,

a s defined in Eq.

L

*

The concentrated (23).

is equivalently given by

1 ( 0 2 . ~=)

log det

likelihood

v +

a

.

Q

Proof. The first terms in (23) and (35) a r e identical. n

We must p r o v e that the second term in ( 2 3 ) is equal to Q/a2. From (19).

(24). a n d (33) we find for this term

14

h

w*v-lw =

- u'v

h

u

(3

h

Using the definition of V a n d the relation X'u can be derived from (28).

which c o m p l e t e s the p r o o f

(29). a n d (33).

h

= u ~ P ' s - ' ~ ,which

this r e d u c e s to

1.5 Notes on Computation of the Maximum Likelihood Estimates

The representation (35) of the likelihood function makes i t possible to actually do maximum likelihood estimation since a inversion of V is avoided. The determinant of V can be determined practically since each element of V can be expressed by a simple A

formula (Schlicht 1985. 57-78). The sum of squares Q is also rather easy to compute since i t requires, basically, to solve the A

system (28) for a . The matrix M is a very simple symmetric band matrix of band width (n-1).

The system can be solved accurately

and efficiently by a Cholesky decomposition. When actually doing these computations, I encountered repeatedly the problem, however, that the likelihood function was rather badly behaving for short time series. An example is provided in the appendix. Further, the intuitive understanding of the estimation procedure seemed hard to me to obtain. This led to the development of another kind of estimator, which will be described in the following part of the paper.

2. Variance Estimation

2.1 The Heuristic Argument

h

The estimated coefficients a along with the estimated disturbances are random variables. Their distribution is determined by the true variances along with the observations. We may write for instance

A

by using (28) and (6). This gives a in terms of the true coefficients a and the true disturbances. Since

(39)

= X'Xa

X'(Xa+u)

and v = Pa from

(7)

+

X'U + U ~ P * S - ' P- O ~ P ' S - ~ P

Eq. (38) can be re-written a s

r\

a = a + M-'(x'u

(40)

- 02p's-'v).

Premultiplication of (40) with P i yields

A

Similarily, u = y

A

- Xa

A

= X(a-a)

+ u can be formed and

is obtained.

h

h

h

h

Thus u and v, v2....v

-

n

are linear functions of the normal random

variables u and v , and we may calculate the expectation of the squared errors:

deriving (43) and (44) we note that

X'X + a 2 ~ ' s " ~ X'X +

n X i=1

and that E(('t)

= E(tr(ff'))

02

a i

P i 'Pi

for any random vector ( . )

The expectations (43) and (44) are functions of the variances and the observations:

o

fo(o.Z):

= o2

- T tr

1

XM-'x'

= E(T

A

h

u'u)

A

h

On the other hand. the estimated errors v 1. and u are functions of the variances and the observations, too, and the corresponding "empirical variances" can be written as functions of the theoretical variances again:

A

A

The proposed estimation procedure is to select variances a 2 and Z such that the "empirical variances" (48). (49) are just equal t o the corresponding expectations (46) and (47):

h

A

h

A

m.(a2. Z) = fi(a2. Z). 1

i = 0.1,2....,n

We call these estimators "expected statistics estimators*'. The intuition underlying these estimators is straightforward: We select the variances such that some observed statistics the values of the moments (48) and (49)

- i.e.

- a r e j u s t equal t o their

expectations under the assumption that the postulated variances are the true variances.

Before we proceed to analyze our variance estimators further, a small digression on the underlying estimation principle might be in place. -

Some Remarks on the Method of Expected Statistics.

The method of expected statistics is obviously a simple generalization of the well-known method of moments where theoretical moments are equated to their empirical counterparts. I t leads actually to very familiar results in many cases. as the

following two examples might indicate.

1.

The Parameters of a Normal Distribution. Consider a random

draw (x,,

x,.

...x n )

from a normal population with unknown mean p

and unknown variance a 2 . In order to employ the method of expected statistics, we need two statistics. Take the mean

x

and

the variance s 2

Since x

i

is normally distributed,

with the expectations

and

x

and s 2 are random variables

Equating (51) with (53) and (52) with (54) gives the estimators for

jl

and 0 2 :

which are just the usual unbiased moment estimators.

2. Parameter Estimation in the Classical Reqression Model.

Consider, as a further example, the classical regression problem

with a e R

n

.

C E R ~ .y e ~ T - a n Y d

a real T x n matrix. Observations are Y

and y , and the parameters p and o 2 are to be estimated.

We may calculate the expectation of the empirical cross-correlations Y'y:

-20

-

This is equated to the observed vector Y'y and yields the least squares estimate

We may further calculate the expected variance of the estimated h

error

&

h

= y-Yp = (I-Y(Y*Y)-'Y')&

which is

h

h

Equating this expectation with the calculated value of u'u yields the usual best quadradic unbiased estimator

In a similar but less straightforward fashion we may also obtain the GLS estimators via expected statistics, and we could

interpret the

itk ken-estimator ( 2 4 )

for

along these lines.

2.3 Another C h a r a c t e r i z a t i o n

Consider the function (62) 2

K(u'.u,

1

U

)

:

log det M +

,

7 (5

Q-T(n-1)

log u'+(T-1)zlog

which w e wish to minimize. We n o t e ( u s i n g the "envelope a n d r e p r e s e n t a t i o n (45)) that

a

1

7 log det M da

.

a a a 2 log det M = i

=

z

-

i

i

0

tr P .1 M - ' P ~ '

4

- a4 tr 1

P i M - l ~ i . i= 1.2......n

of

theorem*'

-22 Necessary conditions for a minimum of (62) a r e :

The first term in (64) is equal to (Tn h ' A

-

..

trxM-'~')/o~. and the

h

last two terms add to u u/04. Thus we may write instead

Comparing these equations with our estimation equations (46)

-

(50) we see that they a r e equivalent. In case K has a unique minimum w e might characterize our variance estimators therefore also a s minimizers of K.

-23

Asymptotic Equivalence With Maximum Likelihood Estimators

In this section i t will be shown that the "statistics criterion" K , as defined in (62) is asymptotically equivalent to the

"Likelihood criterion" L

*

a s given in (25) or (35). We show

that

-

log det M (68)

n T(n-1)

log

+ (T-1)

a2

Z

i= 1

A

log

o2

i

+

Q/02

A

log det V +

Q / U ~

approaches unity i f T goes to infinity

Consider the Tn x Tn matrixes

Note that

?-'

=

? ' ( ? ? ' ) - ' and consider

which is obtained by substituting P and S in the definition (22) of V b y ?

we find

and

z.

Since

which tends to zero with increasing T. This implies that

de t

(ii)

det (V)

+

i

and we may approximate det V by det

for T +

7

00

for large T.

Consider now the matrix

by

which is obtained by substituting P's"P

E*%

in the definition (29) of M .

We note that

which approches zero for larqe T , and we may approximate M by

ii

large T.

W e are going to consider now how

the matrix

and

ii

are interrelated. Define

We note that

(77)

and

Denote the T eiqenvalues of AA' b y p,. p2,...,pT. These are also eiqenvalues of A'A, but A'A

has in addition Tn-T zero

eiqenvalues. The eigenvalues of AA' i=1,2.

. . . , T.

+ a 2 1 are

These are also eiqenvalues of A'A

= pi+02.

A

+

0 2 1 . but

this

matrix has, in addition, the eiqenvalue o 2 with multiplicity Tn-T. Since the determinant of a matrix is equal to the product of its eiqenvalues. we obtain

(79)

Tn-T det (A'A + a 2 1 ) = ( a 2 )

a n d , together with (77) and (78).

det (AA'+a21)

Since

-T1 .

-v

det P =

det

n = 1 . We note further that det S = ( ll oi) i=1 -

we find det

?:*

N

take-

logarithms in (80). rearrange terms. and obtain

n

log det

% +

T

. i=l Z

(82>

log det

log a i2

v

+

-

A

T (n-1) log a 2

+

Q/02

h

Q/a2

Compare this with (68). For large T we can approximate M by by

G

and T-1

= 1

i.

V

by T . T ~ ~establishes S the asymptotic equivalence

between maximum likelihood estimators and the expected moments estimators proposed here.

2.5 Computation

In this section, w e drop the circumflexes and denote our estimates simply by (65)

02.

o f . etc. Multiply Eq. (64) by o 2 and Eqs

by o f . I f we add the resulting equations. we obtain

is inserted into (62) and we obtain the concentrated loss function which involves only the variance ratios

Note that Q and M are functions o f these variance ratios. rather than of the variances themselves:

Disregarding constants, the resulting loss function can be written as

(86)

H(p) = log det M(p)

+

(T-n)log Q(p) + (T-1)Zlog p i

We shall refer to this function as the "statistics criterion" henceforth.

The estimation equations (46)

-

(SO) may be expressed in terms of

the variance ratios as

P i = qi(P)

with

where v.'v. 1

1'

Q and M are functions of p.

In order to calculate tr P.M"P: 1

1

we use the decompostition

M = 0 0 ' which has been used for solving the normal equation, and we note that tr P.M-'P: 1

elements of B-'P;

.

1

is equal to the sum of all squared

We need not store 8 '

(which is

a banded)

in order to do this calculation. i t is only necessary to compute two colums o f B"

at a time. In this way, we determine g.(p) and 1

update the weights according to

This process has been found to converge in many examples. (I have not found a single case where (88) dit not converge).

I t has not

been possible up to now to establish general concavity of the statistics criterion, however.

2.6 Comparison With the Maximum Likelihood Estimator

The likelihood (35) may be expressed in terms of the variance rat ios by using

which is a function only of the variance ratios. This leads to

(90)

L

*

= log det W

+

1

A

Q u

+ T-log u 2

which may be compared with (86).

Minimization with respect to a 2 leads to a 2 = Q/T which may be inserted into (90). We disregard constants and write the resulting likelihood function as

(91)

L**(~) = log det W(p)

+ T * l o g Q(P)

This is the "likelihood criterion" which may be compared with the statistics criterion (86). In order to minimize this function, we may calculate the deviatives with respect to p i and put them to zero. The resultung conditions (given in Schlicht 1985;58) are numerically rather complicated, however. and much less tractable than (87). They involve an inversion of a full (rather than banded) T x T matrix. I f T is large, this is practically infeasible, but then the expected statistics estimators, which are much easier to compute, are equivalent, and the estimators proposed here seem better. If T is small, however. we typically encounter convergency problems. It has been observed, a s a rule.

-30

that the function L

**

-

has no reasonable minima i f T is small.

whereas the minimization of (86) give at least a definite result. The example given in the appendix illustrates that.

3. Concluding Comments

The proposed variance estimator seems to be a useful alternative to maximum likelihood estimators. Many questions are still open

-

uniqueness and consistency in particular. The asymptotic equivalence of the proposed estimator and the maximum likelihood estimator in conjunction with computational manageability and (arguably) better performance in small samples might render i t even the superior alternative.

Let me conclude with a quite general remark regarding the, estimation of the time-path of the coefficients in ( 1 )

-

(3): We

cannot recover the coefficients a from the observations on X and

y since there are much more coefficients than data points. We can. however obtain sensible guesses about the state of the h

economy. and these are our estimates a as given in (27). They denote the expected mean of the distribution of a which remains a random variable with non-zero variance even if we enlarge the time horizon and the sample size to infinity.

I f we generate data and coefficients according to ( 1 ) and (2) on A

a computer, we may compute estimates for the variance ratios p A

h

and compare the estimated time-path of the coefficients a(p) with h

the estimation a(p) we would get i f we had used the true variance A

ratios p for computing a , but i t does not make very much sense to A

h

compare a(p) with the true time-path of the coefficients a. since they deviate randomly from their expectation. In Monte-Carlo A

studies we should take not the true coefficients, but rather a(p) as the benchmark.

APPENDIX

Assume n = 2. T= 100.

a 2 = .I. a 2 = .I and a 2 = i 2

.or.

a,, =

a,, = 2 and generate coefficients according to (2). Let e

1

denote

t

a random variable uniformely distributed over the interval and generate observations x t=1....100.

1.t

Generate a time series of y

t

= 1 and

x

2.t

= e

t

for all

according to (1).

A

possible outcome is summarized in Table 1.

From x and y we may compute the likelihood criterion (93) and the statistics criterion (88) for alternative variance ratios. This is done in Table 2.

We note that the true variance ratios are p, = 1 and p 2 = . l , and that the minimum both of the likelihood and of the statistics criterion is fairly close to this (We may further compute the variances

T- 1

T Z

t=l

(ait

-

a.

1-1

)2

from the data and compute their

ratios. These "empirical variances" and the corresponding "empirical variances ratios" are also given in the tables).

If we use only T = 25 rather than T = 100, we obtain table 3. We see that the two criteria suggest different results

-

-33

-

We find in particular that the minimization of the likelihood criterion leads to rather unreasonable corner solutions. I t is my impression that this is a quite general phenomena in small samples. which is even more pronouced when we deal with more than two explanatory variables. The "expected statistics" estimators, on the other hand, do not seem to tend to corner solutions.

Figures 1 and 2 illustrate, finally, the decomposition. Fig. 1 depicts the time path of the true coefficients (light) and the h

time path of the optimal estimates a(p)

(heavily drawn curve). h

Figure 2 depicts the time path of the optimal estimates a(p) and A

h

with the estimated time-path of the coefficients a(p), A

computed

h

p, = 7.2948 and p, = 1.4684 (light). We see that the estimated variance ratios are greater than the true values, and the h

resultinq time-paths exhibit slightly more variability than a(p). h

h

h

The paths a(p) and a(p) are qualitatively very similar. We observe also a rather close connection between the true h

h

h

coefficients a and their expectations a(p) and a(p).

As an aside we note further that the averages of the true coefficients are (4.7953. 1.6742).

The estimated averages are

h

A

= (5.1580, 1,3803). Estimating

A

by

OLS

yields (6.2160. .3210)

which differs significantly from the true averages. Thus the assumption of time-invariant coefficients, although not unreasonable in the example. leads to a considerable underestimation of the influence of the exogeneous variable x,.

APPENDIX B

Expected Statistics Estimators: A Definition by Ekkehart Schlicht, Technische Hochschule, Schloss, 6100 Darmstadt September 1989

The expected statistics estimators introduced in the text can be defined as follows:

Consider the model given by the density function f(ylx,e)

where y

endogenous observables

x

exogenous observables

e

exogenous non-observables, parameters

A statistic is a function

Define the expected statistic as

A

A solution 8 of

is termed expected statistics estimator,

The set of solutions to this equation is determind by the model, the statistics selected, and the obervations.

If this estimation principle has been proposed somewhere, please let me know!

References

Ekkehart Sch-licht (1985) Isolation and Aggregation in Economics. Berlin-Heidelberg-New-York-Tokyo:

Springer

Table 1

(cont

.)

(cont. )

Table

1

L i k e l i h o o d Criterion

Rote: add 411.96 to obtain true ualues,

S t a t i s t i c s Criterion

Note: add 825.93 to obtain true ualues, EXRIlPLE1:4, T= 100 Theoretical uariances do)= .1 , s(l)= .l , ~(2): ,01 Uariance ratios r(1b 1 , r ( 2 b .I Enplrical variances s W = 9.5R18145MF-2' , s W = 8,77t9465#1?E-2 , s(2)= 9.59396827311E-3 Variance ratios r(i)= ,923255749008 , r ( 2 b 1OO977432076

Table

2

Likelihood Criterion

Note: add 70.64 to obtain true ualues.

Statistics Criterion

Note: add 174,69 to obtain true ualues. EXfUlPLE1:4, T= 25 Theoretical uariances s(VP .l , dl)=.l , ~(27% .01 Uariance ratios r(l)= 1 , r(Z)= ,1 Enplrlcal uariances s(O)= .l6%11386656 , d l ) = 7. ~ f ~ 8 3 7 3 6 Z f -,2 s ( 2 b 5.1001 624545%-3 Variance ratios r ( 1 b ,536268006967 , r ( 2 b 3.54300032505E-2 88/10118

Table

3

Figure 1

Figure 2