Prediction
variance
Back
Background.
A standard linear regression model, y~N(Xb,s2I),
is
assumed
here.
Furthermore, it is
assumed
that A components
have been
selected. The response
value of the new sample, x0, is estimated as y(x0)=b1T
x0.
The variance of y(x0) is
(approximately) given by
Var(y(x0))A = s2 (1+x0T (X1TX1)+ x0)
with
s2
@
s2A = [(yTy)-
{(yTt1)2/(t1Tt1)+…+(yTtA)2/(tATtA)}]/(N-A-1)
x0T
(X1TX1)+
x0 =
[(x0Tr1)2/(t1Tt1)
+ … + (x0TrA)2/(tATtA)]
Here we have neglected the possible bias. When
we are finding the next score vector, tA+1 (=XAw),
we know the results of the Ath step.
Here
XA denotes the residual
X-matrix, XA=X-(t1p1T+...+tApAT).
To simplify the analysis,
the
following
notation is used:
f = (yTy)-
{(yTt1)2/(t1Tt1)
+…+ (yTtA)2/(tATtA)}
g = 1 + (x0Tr1)2/(t1Tt1) + … + (x0TrA)2/(tATtA)
With
this notation the variance after the (A+1)th step
can be written
as
Var(y(x0))A+1=[f - (yTtA+1)2/(tA+1TtA+1)]
´ [g + (x0TrA+1)2/(tA+1TtA+1)]/(N-A-2)
Decrease
in the variance. We want to obtain a decrease in the variance. An important way to look
at the situation is the following result:
Var(y(x0))A+1 < Var(y(x0))A
if
and only if
(1)
(yTtA+1)2 > f/[g/(x0TrA+1)2 + 1/(tA+1TtA+1)].
In
order to simplify the expression, we have assumed that 1/(N-A-1) @
1/(N-1-(A+1)) (In the numerical computations we use that they are different).
This property follows from rearrangements of terms of Var(y(x0))A
- Var(y(x0))A+1.
When looking for
the next score vector tA+1,
the values of f and g are
known.
The result suggests that we should
make
the squared covariance (yTtA+1)2
as large as possible and compare it to the possible values of the right hand
side of (1).
From
the Cauchy-Scwhartz inequality, (yTy)´ (tA+1TtA+1)³ (yTtA+1)2,
we know that the score vector tA+1 is not close to zero, if (yTtA+1)2
is large. The squared score values of x0, (x0TrA+1)2,
depends on the sample x0. In the analysis it is natural to
choose x0 as the samples in the present data. This is done in
the example below.
Example.
We shall look closer at the Cleaner data. We know that the number of components
should be around four. Therefore we compute (1) for components 3 to 6.

Figure.
Squared covariance, (yTtA)2, horizontal line.
Curve the right hand side of (1). Components from 3 to 6. x-axis the sample
number, 1 to 289.
We see that the squared covariance satisfies (1) for all samples for three components. There are few samples that do not satisfy (1), when four components are used, and many samples, when five or six components are used. We could also study the differences, Var(y(xi))A -Var(y(xi))A+1, and see, when they start to be negative. We get similar results here that few values of Var(y(xi))3 -Var(y(xi))4, are negative, but many, when we look at further differences.