Prediction variance
Back

Background. A standard linear regression model, y~N(Xb,s2I), is assumed here. Furthermore, it is assumed that A components have been selected. The response value of the new sample, x0, is estimated as y(x0)=b1T x0. The variance of y(x0) is (approximately) given by 

                       Var(y(x0))A = s2 (1+x0T (X1TX1)+ x0)

with

                       s2 @ s2A = [(yTy)- {(yTt1)2/(t1Tt1)+…+(yTtA)2/(tATtA)}]/(N-A-1)

                       x0T (X1TX1)+ x0 = [(x0Tr1)2/(t1Tt1) + … + (x0TrA)2/(tATtA)]

Here we have neglected the possible bias. When we are finding the next score vector, tA+1 (=XAw), we know the results of the Ath step. Here XA denotes the residual X-matrix, XA=X-(t1p1T+...+tApAT). To simplify the analysis, the following notation is used:

                       f =  (yTy)- {(yTt1)2/(t1Tt1) +…+ (yTtA)2/(tATtA)}

                       g = 1 + (x0Tr1)2/(t1Tt1) + … + (x0TrA)2/(tATtA)

With this notation the variance after the (A+1)th step can be written as

   Var(y(x0))A+1=[f - (yTtA+1)2/(tA+1TtA+1)] ´ [g + (x0TrA+1)2/(tA+1TtA+1)]/(N-A-2)

Decrease in the variance. We want to obtain a decrease in the variance. An important way to look at the situation is the following result:

                       Var(y(x0))A+1 < Var(y(x0))A

if and only if

(1)                       (yTtA+1)2 > f/[g/(x0TrA+1)2 + 1/(tA+1TtA+1)].

In order to simplify the expression, we have assumed that 1/(N-A-1) @ 1/(N-1-(A+1)) (In the numerical computations we use that they are different). This property follows from rearrangements of terms of Var(y(x0))A - Var(y(x0))A+1. When looking for the next score vector tA+1, the values of f and g are known. The result suggests that we should

make the squared covariance (yTtA+1)2 as large as possible and compare it to the possible values of the right hand side of (1).

From the Cauchy-Scwhartz inequality, (yTy)´ (tA+1TtA+1)³ (yTtA+1)2, we know that the score vector tA+1 is not close to zero, if (yTtA+1)2 is large. The squared score values of x0, (x0TrA+1)2, depends on the sample x0. In the analysis it is natural to choose x0 as the samples in the present data. This is done in the example below.

Example. We shall look closer at the Cleaner data. We know that the number of components should be around four. Therefore we compute (1) for components 3 to 6.

Figure. Squared covariance, (yTtA)2, horizontal line. Curve the right hand side of (1). Components from 3 to 6. x-axis the sample number, 1 to 289.

We see that the squared covariance satisfies (1) for all samples for three components. There are few samples that do not satisfy (1), when four components are used, and many samples, when five or six components are used. We could also study the differences, Var(y(xi))A -Var(y(xi))A+1, and see, when they start to be negative. We get similar results here that few values of Var(y(xi))3 -Var(y(xi))4, are negative, but many, when we look at further differences.

 Back