NIPALS algorithm for PLS regression

For given two data blocks, X, N times K matrix, and Y, N times M matrix, the algorithm of Herman Wold reduces to:

  1. Select a K-weight vector w, e.g., a non-zero row of X. Normalize it to length 1.
  2. Compute a score vector t=Xw.
  3. Compute a Y-loading vector q=YTt.
  4. Compute a Y-score vector u=Yq.
  5. Compute a new weight vector w1=XTu. Scale w1 to length 1.
  6. If |w-w1|<eps, the convergence is obtained, otherwise w=w1 and start at 2.

The results of the iterations are two score vector, one for X, t, and one for Y, u. Assuming that these results are good choices, the question was now how to get the next pair of (t,u) score vectors. Svante suggested that X should be adjusted for the score vector and regressionen of Y onto t should be computed and Y adjusted for the results found. This gives

    7. Compute the loading vector p=XTt/(tTt)
    8. Adjust X for what has been found: Xnew=X-t pT
    9. Compute regressionen of Y onto t: b=(YTt)/(tTt)
    10. Adjust Y for what has been selected: Ynew=Y-t bT
    11. If more pairs (t,u) are needed go to 1. with X=Xnew and Y=Ynew

Harald suggested a slightly different but equivalent version of 7.-10. But 1.-11. is the PLS regression algorithm that was known in the 1980s. In the literature it is sometimes still presented in this way, although it would give a clearer picture of the method to present it is as an application of the H-method.

In the paper PLS regression methods. Journal of Chemometrics, 2, 1988, p 211-228, it was shown that 1.-6. is equivalent to finding X-weight vector w and Y-weight vector q that maximized (tTu) subject |w|=|q|=1. Furthermore, it was shown that there is close analogy between this approach and Canonical Correlation analysis in Multivariate Statistics. An application of this similarity is shown. In fact the distribution theory follows along the same lines as that of Canonical Correlation.

Back