Variance Estimation in a Random Coefficients Model by EKKEHART SCHLICHT
Paper presented at the Econometric Society European Meeting Munich 1989
published by www.semverteilung.vwl.uni-muenchen.de
Variance Estimation in a Random Coefficients Model* by
Ekkehart Schl icht Darmstadt
Institute of Technology
Schloss. 6100 Darmstadt West Germany
(c) November 1988 revised March 1989
ABSTRACT
Consider the regression model
Yt =
with y,
E
a't x t + u t
R. x t
estimated and u
R" O b S e r V a t i ~ n ~a.
E
t
t
€
T: u t - H(0.u2)
t= 1;2.....
E
coefficients to be
R"
R normal disturbances for the time periods
The coefficients are assumed to be generated by a
t=l.Z.....T.
random walk with normal disturbances v t e R a
t
= a
t-l
+ vt
t=l.Z.....T:
v
n t
- H(0.Z)
The variance-covariance matrix Z is assumed diagonal
-2
-
Thus the variances in the model are a'
and I or ( a 2 . a, 2
....:o
This paper develops a method for estimating these variances b y means of certain "expected statistics estimators".
These
estimators are compared to maximum likelihood estimators.
).
Ekkehart Schlicht Darmstadt Institute of Technology Schloss, 6100 Darmstadt West Germany
(c) November 1988 revised March 1989
Comments welcome
The research reported in this paper has been financially supported by the Deutsche Forschungsgemeinschaft. I thank also Daniela Diekmann, The0 Dijkstra. Walter Kramer and Ralf Pauly for valuable comments and suggestions.
Typesetting by Christine Woerlein
Introduction
Consider the regression model
with y t
E
R. x
t
R" observations. a t
E
estimated and ut
€
E
R
n
coefficients to be
IR normal disturbances for the time periods
t=1.2,....T. The coefficients are assumed to be generated by a random walk with normal disturbances v t
€
R
n
The variance-covariance matrix X is assumed diagonal
Thus the variances in the model are o 2 and X or (a2. a , 2
....a n2
The estimation problem is the following: Given the observations ( x 1 , X2..
. . .xT)
and (y,, y, ....,yT).
path of the coefficients (a,, a,....a and X?
how to estimate the time T
) and the variances
0 '
).
-5
-
The main difficulty here is to obtain estimates for the variances. Once the variances are determined i t is relatively easy to give estimates for the coefficients. either by recursive Kalman filtering or. still easier, by the method described in Schlicht
(1985.
52-56).
One possibility would be to estimate the variances by the maximum likelihood method. The purpose of this paper is to propose a variance estimator which compares favorably to the maximum likelihood estimator in several respects:
-
i t is asymptotically equivalent to the maximum likelihood
estimator:
-
i t is computationally much easier to implement:
-
i t has a direct intuitive interpretation also in small
samples:
- and
i t seems to work better in small samples.
The plan of the paper is a s follows: Part 1 gives some notation and preliminary results. Part 2 introduces the "expected statistics" estimators and compares them with maximum likelihood estimators. The appendix gives a numerical illustration.
1. 1.1
The Model Notation
Define
order
Tx 1
order a n d w r i t e (1).
Define further
Tx 1
T x Tn (2) a s
Tnx 1
(T-1)nxl
which permits us to write
~ e n o t efurther b y e 1.
€
IRn
the n-th column of an nxn identity
matrix and define
which permits us to write
where v = i
( v .
v3.. i
..
denotes the time path of the change in
the i-th coefficient.
1.2 A Likelihood Function
Consider now the time averages of the coefficients
By using the Tn x n matrix
can be expressed also als
We note
P Z = 0 , Z'Z = I. P'(PP*)-'p
(15)
+ zz* =
I
Define the Tn xTn matrix
(' 6 )
Eqs. (7) and ( 1 4 ) can be combined now to
Since
5-1
= (P*(PP*)-'.
Z). this can be solved for a:
Inserting this into (6) yields
-9
-
y = XZa + w ,
w:= XP'(PP')-'V
+ u
We note that
Thus (19) stands for a standard G L S regression in the time-averages
a
of the coefficients, and i t is reasonable to
assume that XZ has full rank:
The disturbances w in (19) are normally distributed
w
- N(0.V).
V:= XP'(PP')-'s(PP*)-'PX'
+
0 ' 1
likelihood function associated with (19) is therefore
(23)
L (&.o2.o1 2
....en):= 2
104 det
Minimization with respect to
2
v +
(9.- g S z * x * ) V-~(XZ~-~)
yields the Aitken estimate
- 10 h
We may thus view
2
as a function of the variances and the
observations and insert i t into (23) in order to obtain a concentrated likelihood function
(25)
*
2
L (a2.p1
h
. . . . .a;):=
~ ( a . o * . o2,
. . . . .a;)
+ constants
which could be used, in principle, to determine the variances This can. however. be simplified considerably.
1.3
Estimates for the Coefficients
For given
a.
y. and X. t h e system (18).(22) defines the
conditional normal distribution of a with mode and expectation equal to
We replace the parameter
a
by
its estimate
a
and take the A
resulting expression as our estimate for the coefficients a
This estimate can be represented also in a different way.
..
Proposition 1 (Schlicht 1985. 55-56) The estimate a in (27) satisfies
h
M a = x'y
where
is nonsingular.
Proof. Eq. (28) is proved by evaluating the left-hand side explicitly, which leads to the result X'y.
In order to prove nonsinqularity of M. consider its rank:
If (x'.P') 11
were not of full rpnk, there would exist vectors
c t € R , t = 1.2.
. . .T.
not all of them zero, such that
by
i s satisfied. I f (31) is premultiplied
from (13). -this
leads to Z'X'c, = 0 which implies. together with (21).
c,=O.
Since P' is o f full rank (T-l).n, this implies also that c,. c,.....~ are zero. This proves the proposition. T
A
In view of Prop. 1 , the estimate a can be given a direct desriptive characterization: It minimizes the weighted sum of squares
u'u +
a
a 2 v i 'v i
Z i=1
i
This minimization is, for given variances, equivalent with the minimization o f the expression
Q : = u'u
+ a2
= (y'-a'X*)(Xa-y)
V'S-~V
+ a2a'~*s"~a
Eq. (28) is just the first-order condition for a minimum of Q with respect to a .
1.4 Another Representation of Likelihood
We may define the estimated disturbances associated with the estimated coefficients in a natural way:
A
h
u : = y-xa.
A
h
v : = Pa.
A
v
h
i
:= P.a,
A
w : = XP'(PP')-'v
1
A
h
+ u.
i = 1.2.....n
All
these a r e functions of the variances
We may
(and the observations).
insert them into (32) a n d o b t a i n the estimated sum o f
s q u a r e s a s a function o f the variances:
Position 2 (Schlicht 1985.55). function L
*,
a s defined in Eq.
L
*
The concentrated (23).
is equivalently given by
1 ( 0 2 . ~=)
log det
likelihood
v +
a
.
Q
Proof. The first terms in (23) and (35) a r e identical. n
We must p r o v e that the second term in ( 2 3 ) is equal to Q/a2. From (19).
(24). a n d (33) we find for this term
14
h
w*v-lw =
- u'v
h
u
(3
h
Using the definition of V a n d the relation X'u can be derived from (28).
which c o m p l e t e s the p r o o f
(29). a n d (33).
h
= u ~ P ' s - ' ~ ,which
this r e d u c e s to
1.5 Notes on Computation of the Maximum Likelihood Estimates
The representation (35) of the likelihood function makes i t possible to actually do maximum likelihood estimation since a inversion of V is avoided. The determinant of V can be determined practically since each element of V can be expressed by a simple A
formula (Schlicht 1985. 57-78). The sum of squares Q is also rather easy to compute since i t requires, basically, to solve the A
system (28) for a . The matrix M is a very simple symmetric band matrix of band width (n-1).
The system can be solved accurately
and efficiently by a Cholesky decomposition. When actually doing these computations, I encountered repeatedly the problem, however, that the likelihood function was rather badly behaving for short time series. An example is provided in the appendix. Further, the intuitive understanding of the estimation procedure seemed hard to me to obtain. This led to the development of another kind of estimator, which will be described in the following part of the paper.
2. Variance Estimation
2.1 The Heuristic Argument
h
The estimated coefficients a along with the estimated disturbances are random variables. Their distribution is determined by the true variances along with the observations. We may write for instance
A
by using (28) and (6). This gives a in terms of the true coefficients a and the true disturbances. Since
(39)
= X'Xa
X'(Xa+u)
and v = Pa from
(7)
+
X'U + U ~ P * S - ' P- O ~ P ' S - ~ P
Eq. (38) can be re-written a s
r\
a = a + M-'(x'u
(40)
- 02p's-'v).
Premultiplication of (40) with P i yields
A
Similarily, u = y
A
- Xa
A
= X(a-a)
+ u can be formed and
is obtained.
h
h
h
h
Thus u and v, v2....v
-
n
are linear functions of the normal random
variables u and v , and we may calculate the expectation of the squared errors:
deriving (43) and (44) we note that
X'X + a 2 ~ ' s " ~ X'X +
n X i=1
and that E(('t)
= E(tr(ff'))
02
a i
P i 'Pi
for any random vector ( . )
The expectations (43) and (44) are functions of the variances and the observations:
o
fo(o.Z):
= o2
- T tr
1
XM-'x'
= E(T
A
h
u'u)
A
h
On the other hand. the estimated errors v 1. and u are functions of the variances and the observations, too, and the corresponding "empirical variances" can be written as functions of the theoretical variances again:
A
A
The proposed estimation procedure is to select variances a 2 and Z such that the "empirical variances" (48). (49) are just equal t o the corresponding expectations (46) and (47):
h
A
h
A
m.(a2. Z) = fi(a2. Z). 1
i = 0.1,2....,n
We call these estimators "expected statistics estimators*'. The intuition underlying these estimators is straightforward: We select the variances such that some observed statistics the values of the moments (48) and (49)
- i.e.
- a r e j u s t equal t o their
expectations under the assumption that the postulated variances are the true variances.
Before we proceed to analyze our variance estimators further, a small digression on the underlying estimation principle might be in place. -
Some Remarks on the Method of Expected Statistics.
The method of expected statistics is obviously a simple generalization of the well-known method of moments where theoretical moments are equated to their empirical counterparts. I t leads actually to very familiar results in many cases. as the
following two examples might indicate.
1.
The Parameters of a Normal Distribution. Consider a random
draw (x,,
x,.
...x n )
from a normal population with unknown mean p
and unknown variance a 2 . In order to employ the method of expected statistics, we need two statistics. Take the mean
x
and
the variance s 2
Since x
i
is normally distributed,
with the expectations
and
x
and s 2 are random variables
Equating (51) with (53) and (52) with (54) gives the estimators for
jl
and 0 2 :
which are just the usual unbiased moment estimators.
2. Parameter Estimation in the Classical Reqression Model.
Consider, as a further example, the classical regression problem
with a e R
n
.
C E R ~ .y e ~ T - a n Y d
a real T x n matrix. Observations are Y
and y , and the parameters p and o 2 are to be estimated.
We may calculate the expectation of the empirical cross-correlations Y'y:
-20
-
This is equated to the observed vector Y'y and yields the least squares estimate
We may further calculate the expected variance of the estimated h
error
&
h
= y-Yp = (I-Y(Y*Y)-'Y')&
which is
h
h
Equating this expectation with the calculated value of u'u yields the usual best quadradic unbiased estimator
In a similar but less straightforward fashion we may also obtain the GLS estimators via expected statistics, and we could
interpret the
itk ken-estimator ( 2 4 )
for
along these lines.
2.3 Another C h a r a c t e r i z a t i o n
Consider the function (62) 2
K(u'.u,
1
U
)
:
log det M +
,
7 (5
Q-T(n-1)
log u'+(T-1)zlog
which w e wish to minimize. We n o t e ( u s i n g the "envelope a n d r e p r e s e n t a t i o n (45)) that
a
1
7 log det M da
.
a a a 2 log det M = i
=
z
-
i
i
0
tr P .1 M - ' P ~ '
4
- a4 tr 1
P i M - l ~ i . i= 1.2......n
of
theorem*'
-22 Necessary conditions for a minimum of (62) a r e :
The first term in (64) is equal to (Tn h ' A
-
..
trxM-'~')/o~. and the
h
last two terms add to u u/04. Thus we may write instead
Comparing these equations with our estimation equations (46)
-
(50) we see that they a r e equivalent. In case K has a unique minimum w e might characterize our variance estimators therefore also a s minimizers of K.
-23
Asymptotic Equivalence With Maximum Likelihood Estimators
In this section i t will be shown that the "statistics criterion" K , as defined in (62) is asymptotically equivalent to the
"Likelihood criterion" L
*
a s given in (25) or (35). We show
that
-
log det M (68)
n T(n-1)
log
+ (T-1)
a2
Z
i= 1
A
log
o2
i
+
Q/02
A
log det V +
Q / U ~
approaches unity i f T goes to infinity
Consider the Tn x Tn matrixes
Note that
?-'
=
? ' ( ? ? ' ) - ' and consider
which is obtained by substituting P and S in the definition (22) of V b y ?
we find
and
z.
Since
which tends to zero with increasing T. This implies that
de t
(ii)
det (V)
+
i
and we may approximate det V by det
for T +
7
00
for large T.
Consider now the matrix
by
which is obtained by substituting P's"P
E*%
in the definition (29) of M .
We note that
which approches zero for larqe T , and we may approximate M by
ii
large T.
W e are going to consider now how
the matrix
and
ii
are interrelated. Define
We note that
(77)
and
Denote the T eiqenvalues of AA' b y p,. p2,...,pT. These are also eiqenvalues of A'A, but A'A
has in addition Tn-T zero
eiqenvalues. The eigenvalues of AA' i=1,2.
. . . , T.
+ a 2 1 are
These are also eiqenvalues of A'A
= pi+02.
A
+
0 2 1 . but
this
matrix has, in addition, the eiqenvalue o 2 with multiplicity Tn-T. Since the determinant of a matrix is equal to the product of its eiqenvalues. we obtain
(79)
Tn-T det (A'A + a 2 1 ) = ( a 2 )
a n d , together with (77) and (78).
det (AA'+a21)
Since
-T1 .
-v
det P =
det
n = 1 . We note further that det S = ( ll oi) i=1 -
we find det
?:*
N
take-
logarithms in (80). rearrange terms. and obtain
n
log det
% +
T
. i=l Z
(82>
log det
log a i2
v
+
-
A
T (n-1) log a 2
+
Q/02
h
Q/a2
Compare this with (68). For large T we can approximate M by by
G
and T-1
= 1
i.
V
by T . T ~ ~establishes S the asymptotic equivalence
between maximum likelihood estimators and the expected moments estimators proposed here.
2.5 Computation
In this section, w e drop the circumflexes and denote our estimates simply by (65)
02.
o f . etc. Multiply Eq. (64) by o 2 and Eqs
by o f . I f we add the resulting equations. we obtain
is inserted into (62) and we obtain the concentrated loss function which involves only the variance ratios
Note that Q and M are functions o f these variance ratios. rather than of the variances themselves:
Disregarding constants, the resulting loss function can be written as
(86)
H(p) = log det M(p)
+
(T-n)log Q(p) + (T-1)Zlog p i
We shall refer to this function as the "statistics criterion" henceforth.
The estimation equations (46)
-
(SO) may be expressed in terms of
the variance ratios as
P i = qi(P)
with
where v.'v. 1
1'
Q and M are functions of p.
In order to calculate tr P.M"P: 1
1
we use the decompostition
M = 0 0 ' which has been used for solving the normal equation, and we note that tr P.M-'P: 1
elements of B-'P;
.
1
is equal to the sum of all squared
We need not store 8 '
(which is
a banded)
in order to do this calculation. i t is only necessary to compute two colums o f B"
at a time. In this way, we determine g.(p) and 1
update the weights according to
This process has been found to converge in many examples. (I have not found a single case where (88) dit not converge).
I t has not
been possible up to now to establish general concavity of the statistics criterion, however.
2.6 Comparison With the Maximum Likelihood Estimator
The likelihood (35) may be expressed in terms of the variance rat ios by using
which is a function only of the variance ratios. This leads to
(90)
L
*
= log det W
+
1
A
Q u
+ T-log u 2
which may be compared with (86).
Minimization with respect to a 2 leads to a 2 = Q/T which may be inserted into (90). We disregard constants and write the resulting likelihood function as
(91)
L**(~) = log det W(p)
+ T * l o g Q(P)
This is the "likelihood criterion" which may be compared with the statistics criterion (86). In order to minimize this function, we may calculate the deviatives with respect to p i and put them to zero. The resultung conditions (given in Schlicht 1985;58) are numerically rather complicated, however. and much less tractable than (87). They involve an inversion of a full (rather than banded) T x T matrix. I f T is large, this is practically infeasible, but then the expected statistics estimators, which are much easier to compute, are equivalent, and the estimators proposed here seem better. If T is small, however. we typically encounter convergency problems. It has been observed, a s a rule.
-30
that the function L
**
-
has no reasonable minima i f T is small.
whereas the minimization of (86) give at least a definite result. The example given in the appendix illustrates that.
3. Concluding Comments
The proposed variance estimator seems to be a useful alternative to maximum likelihood estimators. Many questions are still open
-
uniqueness and consistency in particular. The asymptotic equivalence of the proposed estimator and the maximum likelihood estimator in conjunction with computational manageability and (arguably) better performance in small samples might render i t even the superior alternative.
Let me conclude with a quite general remark regarding the, estimation of the time-path of the coefficients in ( 1 )
-
(3): We
cannot recover the coefficients a from the observations on X and
y since there are much more coefficients than data points. We can. however obtain sensible guesses about the state of the h
economy. and these are our estimates a as given in (27). They denote the expected mean of the distribution of a which remains a random variable with non-zero variance even if we enlarge the time horizon and the sample size to infinity.
I f we generate data and coefficients according to ( 1 ) and (2) on A
a computer, we may compute estimates for the variance ratios p A
h
and compare the estimated time-path of the coefficients a(p) with h
the estimation a(p) we would get i f we had used the true variance A
ratios p for computing a , but i t does not make very much sense to A
h
compare a(p) with the true time-path of the coefficients a. since they deviate randomly from their expectation. In Monte-Carlo A
studies we should take not the true coefficients, but rather a(p) as the benchmark.
APPENDIX
Assume n = 2. T= 100.
a 2 = .I. a 2 = .I and a 2 = i 2
.or.
a,, =
a,, = 2 and generate coefficients according to (2). Let e
1
denote
t
a random variable uniformely distributed over the interval and generate observations x t=1....100.
1.t
Generate a time series of y
t
= 1 and
x
2.t
= e
t
for all
according to (1).
A
possible outcome is summarized in Table 1.
From x and y we may compute the likelihood criterion (93) and the statistics criterion (88) for alternative variance ratios. This is done in Table 2.
We note that the true variance ratios are p, = 1 and p 2 = . l , and that the minimum both of the likelihood and of the statistics criterion is fairly close to this (We may further compute the variances
T- 1
T Z
t=l
(ait
-
a.
1-1
)2
from the data and compute their
ratios. These "empirical variances" and the corresponding "empirical variances ratios" are also given in the tables).
If we use only T = 25 rather than T = 100, we obtain table 3. We see that the two criteria suggest different results
-
-33
-
We find in particular that the minimization of the likelihood criterion leads to rather unreasonable corner solutions. I t is my impression that this is a quite general phenomena in small samples. which is even more pronouced when we deal with more than two explanatory variables. The "expected statistics" estimators, on the other hand, do not seem to tend to corner solutions.
Figures 1 and 2 illustrate, finally, the decomposition. Fig. 1 depicts the time path of the true coefficients (light) and the h
time path of the optimal estimates a(p)
(heavily drawn curve). h
Figure 2 depicts the time path of the optimal estimates a(p) and A
h
with the estimated time-path of the coefficients a(p), A
computed
h
p, = 7.2948 and p, = 1.4684 (light). We see that the estimated variance ratios are greater than the true values, and the h
resultinq time-paths exhibit slightly more variability than a(p). h
h
h
The paths a(p) and a(p) are qualitatively very similar. We observe also a rather close connection between the true h
h
h
coefficients a and their expectations a(p) and a(p).
As an aside we note further that the averages of the true coefficients are (4.7953. 1.6742).
The estimated averages are
h
A
= (5.1580, 1,3803). Estimating
A
by
OLS
yields (6.2160. .3210)
which differs significantly from the true averages. Thus the assumption of time-invariant coefficients, although not unreasonable in the example. leads to a considerable underestimation of the influence of the exogeneous variable x,.
APPENDIX B
Expected Statistics Estimators: A Definition by Ekkehart Schlicht, Technische Hochschule, Schloss, 6100 Darmstadt September 1989
The expected statistics estimators introduced in the text can be defined as follows:
Consider the model given by the density function f(ylx,e)
where y
endogenous observables
x
exogenous observables
e
exogenous non-observables, parameters
A statistic is a function
Define the expected statistic as
A
A solution 8 of
is termed expected statistics estimator,
The set of solutions to this equation is determind by the model, the statistics selected, and the obervations.
If this estimation principle has been proposed somewhere, please let me know!
References
Ekkehart Sch-licht (1985) Isolation and Aggregation in Economics. Berlin-Heidelberg-New-York-Tokyo:
Springer
Table 1
(cont
.)
(cont. )
Table
1
L i k e l i h o o d Criterion
Rote: add 411.96 to obtain true ualues,
S t a t i s t i c s Criterion
Note: add 825.93 to obtain true ualues, EXRIlPLE1:4, T= 100 Theoretical uariances do)= .1 , s(l)= .l , ~(2): ,01 Uariance ratios r(1b 1 , r ( 2 b .I Enplrical variances s W = 9.5R18145MF-2' , s W = 8,77t9465#1?E-2 , s(2)= 9.59396827311E-3 Variance ratios r(i)= ,923255749008 , r ( 2 b 1OO977432076
Table
2
Likelihood Criterion
Note: add 70.64 to obtain true ualues.
Statistics Criterion
Note: add 174,69 to obtain true ualues. EXfUlPLE1:4, T= 25 Theoretical uariances s(VP .l , dl)=.l , ~(27% .01 Uariance ratios r(l)= 1 , r(Z)= ,1 Enplrlcal uariances s(O)= .l6%11386656 , d l ) = 7. ~ f ~ 8 3 7 3 6 Z f -,2 s ( 2 b 5.1001 624545%-3 Variance ratios r ( 1 b ,536268006967 , r ( 2 b 3.54300032505E-2 88/10118
Table
3
Figure 1
Figure 2