![]() | ![]() |
More Information | ![]() | ![]() |
![]() | ![]() |
![]() | ![]() |
|
Principal Component Analysis (PCA)
The primary objective of this multivariate data analysis method is data reduction.
Why Using Principal Component Analysis?
Principal Component Analysis Output (and Graphical Representation of Results)
Introduction
If studying a data set with 100 variables and 100 observations (data objects), you must look into 100 means, 100 variances, and (100 * 100-100)/2 covariances, for a total of 5150 statistics to be studied as the representation of the underlying multivariate normal population sampled. PCA is a method of simplifying this task. The key to the problem is that much of the variability in the data set is not independent, i.e., there is a lot of covariation between the variables. If from all variables under consideration we could extract two variables that captured most of the independent variability in the entire data set, a simple binary scatter diagram would reveal most of the information in the data. Accordingly, the primary objective is to extract a few uncorrelated variables that may capture most of the variability in the data set, while preserving the orthogonality of these new optimal reference axes/variables (i.e., principal components). The 1st principal component captures the maximum variation in the data set. The 2nd principal component has the next most variation, and so on.
- The coefficients of these new optimal reference axes are called loadings, and the projections of the original data onto these axes are called scores.
- In a standard principal component analysis, the new reference represents the eigenvectors of the covariance matrix of the data by default. However, you can use the correlation matrix instead.
- If you choose to standardize the data prior to the analysis, the new reference will represent the eigenvectors of the correlation matrix of the data.
The Principal Component Analysis Output
The PCA output includes the PC scores (that will be placed in the source worksheets) and a matrix plot, as well as PC loadings and the correlation or covariance matrix data (that will be placed in auto-generated worksheets and displayed graphically using Aabel charts and/or table editor) (see the example below; source of data: Fisher (1936), reproduced by Andrews and Herzberg (1985).
PC scores
- The PC score plot that is auto-generated when running a PCA, only displays a full matrix diagram of the 1st, 2nd, and 3rd principal components as shown in the image below. However, you can add variables to matrix, use a half matrix, draw the convex hulls of groups, or customize the matrix diagram using the tools provided in the graphic viewer.
PC loadings
- While PC scores are useful for pattern recognition or identifying outliers, the PC Loadings show the eigenvalue that corresponds to each principal component. PC loadings will be placed in an auto-generated worksheet, and you can use the Aabel table editor to display the PC loadings (see the table in the right-hand side image below).

Correlation or Covariance Matrix
- The generated matrix data will be placed in an auto-generated worksheet. For a graphical representation of correlation or covariance matrix, you can use the Aabel table editor, which allows optional color-coding of the cells or use the basic heatmap:
![]() Pre-Processing the DataPCA and factor analysis methods allow optional pre-processing of the data prior to the main analysis. Examples of data transformations that can be used (as part of PCA or factor analysis) are:
|
![]() |












