PCA and Factor Analysis |
|
|||||||||||||||
|
If studying a data set with 100 variables and 100 observations (data objects), you must look into 100 means, 100 variances, and (100 * 100-100)/2 covariances, for a total of 5150 statistics to be studied as the representation of the underlying multivariate normal population sampled. PCA is a method of simplifying this task. The key to the problem is that much of the variability in the data set is not independent, i.e., there is a lot of covariation between the variables. If from all variables under consideration we could extract two variables that captured most of the independent variability in the entire data set, a simple binary scatter diagram would reveal most of the information in the data. Accordingly, data reduction is the primary objective to extract a few uncorrelated variables that may capture most of the variability in the data set, while preserving the orthogonality of these new optimal reference axes/variables (i.e., principal components). The 1st principal component captures the maximum variation in the data set. The 2nd principal component has the next most variation, and so on.
|
![]() |
The PCA output includes the PC scores (that will be placed in the source worksheets) and a matrix plot, as well as PC loadings and the correlation or covariance matrix data (that will be placed in auto-generated worksheets and displayed graphically using Aabel charts and/or table editor).
The graphs displayed in the matrix diagram above and bar charts below are from PCA analysis of some data from Davis, J.C. (2002).


The factor analysis output includes the scores, loadings, and Correlation matrices.
To represent the factor loadings graphically, you can for example, use the binary scatter chart while using the option of plotting the X and Y axes through zero and connecting the data points to origo. You can use the multiple-Y column graphs to compare the fraction of variance of the variables explained by the model and the fraction that is not. For a graphical representation of the correlation matrices, you can use the heatmap diagram.













