Basic
tasks of modeling
Back
The primary task of the modelling data is to
obtain a model that gives as small uncertainties as possible, when the modelling
results are applied. The basic tasks involved are illustrated by considering a standard linear regression model, y~N(Xb,s2I). The response
value of the new sample, x0, is estimated as y(x0)=bTx0, where b is the estimated regression coefficient. If
X is an N times K
matrix, the variance of y(x0) is given by
Var(y(x0))
= s2
x0T
(XTX)+
x0
@
[(yTy)-
{(yTt1)2/(t1Tt1)+…+(yTtA)2/(tATtA)+...+(yTtK)2/(tKTtK)}]/(N-K)
´ [(x0Tr1)2/(t1Tt1)
+ … + (x0TrA)2/(tATtA)
+...+ (x0TrK)2/(tKTtK)]
There are many ways to arrive at this kind of
decompsition. Denote by T=(t1,..,tK) and similarly for P and R. If
XTX=PDPT is a factorisation of
XTX, the matrices R and T can be
defined as R=(PT)-1 and T=XR.
Suppose that we are at step A, which
means that
TA,
PA and RA have been found. When
finding the next score vector, tA+1,
there are two aspects to take
into consideration:
i)
Reduction in fit:
(yTtA+1)2/(tA+1TtA+1)
ii) Increase in model variation:
(x0TrA+1)2/(tA+1TtA+1)
When handling these two terms, the term (x0TrA+1)2 may be considered as a constant even though rA+1 will depend on the score vector, tA+1, that is being found. The reason is that rA+1 typically has length around one. Also, the value of (x0TrA+1) will vary with new data, x0. From this it can be seen that the modelling task is basically to find an optimal balance between the improvement in fit and the associated increase in the model variation.
The procedure to obtain an optimal balance can be based on other measures of
reduction in fit and also other measures of increase in model variation. The
main issue is to take the increase in model variation into account, when finding
new score vectors. Furthermore, we have considered here a standard linear
regression model. Other mathematical models will suggest different measures of
fit (improvement of the modeling criterion) and measures of prediction
associated with the solution at each step.