Basic tasks of modeling
Back

The primary task of the modelling data is to obtain a model that gives as small uncertainties as possible, when the modelling results are applied. The basic tasks involved are illustrated by considering a standard linear regression model, y~N(Xb,s2I). The response value of the new sample, x0, is estimated as y(x0)=bTx0, where b is the estimated regression coefficient. If X is an N times K matrix, the variance of y(x0) is given by 

                       Var(y(x0)) =  s2 x0T (XTX)+ x0 

@ [(yTy)- {(yTt1)2/(t1Tt1)+…+(yTtA)2/(tATtA)+...+(yTtK)2/(tKTtK)}]/(N-K) 

´ [(x0Tr1)2/(t1Tt1) + … + (x0TrA)2/(tATtA) +...+ (x0TrK)2/(tKTtK)] 

There are many ways to arrive at this kind of decompsition. Denote by T=(t1,..,tK) and similarly for P and R. If XTX=PDPT is a factorisation of XTX, the matrices R and T can be defined as R=(PT)-1 and T=XR. Suppose that we are at step A, which means that TA, PA and RA have been found. When finding the next score vector, tA+1, there are two aspects to take into consideration: 

i) Reduction in fit:                        (yTtA+1)2/(tA+1TtA+1)
ii) Increase in model variation:     (x0
TrA+1)2/(tA+1TtA+1)

When handling these two terms, the term (x0TrA+1)2 may be considered as a constant even though rA+1 will depend on the score vector, tA+1, that is being found. The reason is that rA+1 typically has length around one. Also, the value of (x0TrA+1) will vary with new data, x0. From this it can be seen that the modelling task is basically to find an optimal balance between the improvement in fit and the associated increase in the model variation.

The procedure to obtain an optimal balance can be based on other measures of reduction in fit and also other measures of increase in model variation. The main issue is to take the increase in model variation into account, when finding new score vectors. Furthermore, we have considered here a standard linear regression model. Other mathematical models will suggest different measures of fit (improvement of the modeling criterion) and measures of prediction associated with the solution at each step.

Back