Aabel Statistics & Multivariate Data Analysis Methods

Aabel Stats Analyzer

The Stats Analyzer has a modern interface that is designed to provide flexibility and ease of use. It is an integral part of the graphic viewer that allows creating presentation and publication quality graphic outputs (tables and graphs).

  • Aabel Stats Analyzer is designed to allow different analyses without the need to create a different viewer for each method.
  • The graphs and tables generated by statistical methods and multivariate data analysis will be displayed on the viewer page(s).
  • The viewer filters, if activated, will be available to Stats Analyzer.
  • Many methods allow storing the results in auto-generated worksheets that can be used for follow-up analysis, etc.

    To view a short QuickTime movie of the ease of use of Stats analyzer, click the link below:

Ease of Use of Stats Analyzer: 00.00.28

Inferential Statistics and Multivariate Data Analysis Methods

Testing for Normality

Many inferential tests makes certain assumptions about the shape of an underlying population distribution, and in this regard, the most commonly encountered assumption is that the underlying population from which each of the samples is derived is normal.

The shape of a normal distribution (also referred to as Gaussian distribution) is such that the closer a score is to the mean, the more frequently it occurs. The more extreme the deviation of a score from the mean, the less frequently it occurs.

    Aabel provides:

  • Probability plot and accompanying Shapiro-Wilk test
  • Kolmogorov-Smirnov test for testing normality of a single sample
  • Shapiro-Wilk test for normality

Testing for Homogeneity of Variance

Homogeneity of variance is a reference to equal variances across groups/samples.

  • Hartley's Fmax test
  • Bartlett's chi-square test
  • These tests evaluate whether or not the population variances represented by k >= 2 samples/groups are equal.

Analysis of Variance (ANOVA)

Aabel provides the following ANOVA options:

  • One-way between-subjects ANOVA:

      The test statistic evaluates if there is a significant difference between at least two of the group means in a set of k means; for more information, click here

  • Two-way between-subjects ANOVA:

      The test statistic evaluates the effect of two independent variables (factors) on a response variable simultaneously. That is, it evaluates the variation among the differences between means for different levels of one factor over different levels of the other factor; for more information, click here

  • One-way within-subjects (repeated measures) ANOVA:

      The test statistic evaluates if there is a significant difference between at least two of the conditions of repeated measures in a set of k conditions; Repeated measures ANOVA is used when all subjects of a random sample/group are measured under a number of different conditions; for more information, click here

  • Two-way repeated measures ANOVA (within-subjects factorial) ANOVA:

      Two-way repeated measures ANOVA combines elements of two-factor between-subjects design and single-factor within-subjects design. That is, the variability is computed for the main effects of Factor A, the main effect of Factor B, and the interaction (AB), but the design requires three error terms, one for each effect; for more information, click here)

  • Two-way mixed factorial ANOVA (mixed between-within design):

      Two-way mixed model ANOVA, also known as split-plot factorial design, combines one independent sample factor (Factor A) and one correlated groups factor (Factor B), evaluating the main effects of the Factor A, Factor B, and the interaction (AB); for more information, click here

Multiple Comparisons/Post-Hoc Tests Accompanying ANOVA

Single-Factor Between-Subjects Analysis of Covariance (ANCOVA)

This is an extension of single-factor between-subjects ANOVA. If some of the variations in the dependent variable scores are caused by the effect of another continuous variable (covariate), use of ANCOVA will remove this variation from the error or random variance, resulting in increased sensitivity of the test for treatment effects. The test output includes:

  • Analysis of covariance, analysis of variance for dependent, and analysis of variance of covariate.
  • Testing for homogeneity of regression
  • Tukey's HSD Test on Adjusted Means
  • Scheffé Test Adjusted Means
  • For more information regarding single-factor between-subjects ANCOVA, click here.

Chi-Square Tests

t-Tests

Aabel provides the following t-Tests:

  • Single samples t-test:

      This test evaluates whether a sample of n observations comes from a parent population in which the mean equals a specific (hypothesized) value.

  • Paired samples t-test:

      This test evaluates whether two dependent samples represent populations with same or different means. The two samples must have the same number of subjects/objects, and must be matched as nearly as possible to exclude the effects of extraneous variation.

  • Unpaired samples t-test:

      This test evaluates whether two independent samples/groups represent two populations with different means, while assuming the populations having the same variances.

F-Test

This is a parametric test that evaluates whether or not two populations from which the two sample are taken have equal variances (or equal standard deviations). The number of subjects/objects can be the same or different for the two samples.

z-Tests

Aabel provides the following z-Tests:

  • Paired samples z-test:

      This test evaluates whether two dependent samples represent populations with different means while using the known variances of the populations to calculate the z value. The two samples must have the same number of data points, and must be matched as nearly as possible.

  • Unpaired samples z-test:

      This test evaluates whether two independent samples/groups represent populations with different means while using the known variances of the populations to calculate the z value. The two samples can have the same or a different number of subjects/objects.

Correlations

Aabel provides the following correlation methods:

  • Correlations and covariance matrices:

      This method allows generating a correlation or covariance matrix from numeric data columns.

  • Pearson product-moment correlation coefficient (Pearson's r):

      Pearson r is a measure of correlation between two variables. The associated tests evaluate whether the correlation coefficient for the underlying population is different from zero, i.e., whether there is a monotonous relationship between the two variables.

  • Fisher's z transformation (zr) (provided for both correlation matrix and Pearson's r):

      This option is provided to allow transforming a skewed sampling distribution into a normalized format.

  • Spearman's rank-order correlation coefficient (Spearman's ρ) (non-parametric):

      Spearman's ρ is the non-parametric analog of the Pearson product-moment correlation coefficient. The results or the former and latter are closely similar, as the Spearman correlation is calculated in a very similar manner as Pearson, except that Spearman first ranks the data.

  • Kendall's rank correlation coefficient (Kendall's τ) (non-parametric):

      Like Spearman's ρ, Kendall's τ is a non-parametric method of correlation between two variables, but has an advantage over Spearman's ρ: Kendall's τ also indicates the difference between the probability that the observed data are in the same order for the two variables vs. the probability that the observed data are in different orders for the two variables.

    For more information about correlation methods, click here.

Internal Consistency Reliability

Certain quantities of interest in medicine, psychology, etc., can not be measured explicitly. Accordingly, the assessment is approached by asking a series of questions and combining the answers into a single numerical value, or by a scale such as pass/fail, yes/no, or other dichotomous items.

To form a scale in this manner requires internal consistency, i.e., the items should all measure the same construct. Aabel provides the following methods for internal consistency reliability estimates:

  • Cronbach's alpha
  • Kuder-Richardson rho (Formula 20)
  • Kuder-Richardson rho (Formula 21)
  • For more information, click here.

Non-Parametric Tests

Non-parametric tests are used when assumptions required by the parametric counterpart tests are not met or are questionable. All tests involving ranked data are non-parametric.

Aabel provides the following non-parametric tests:

Contingency Table Analysis and Mosaic Matrix Diagrams

Kaplan-Meier Survival Analysis

Aabel provides methods for Kaplan-Meier survival analysis using either raw survival data or summarized survival data.

If the Kaplan-Meier curve is plotted without taking into account the number of subjects remaining at risk beyond completion of the study, the shape of the curve will be unaffected, but the survival probability values will be affected.

  • Some raw data include the subjects at risk beyond completion of the study1 as additional rows in the data table (these added rows are tagged with a status code different from the event or censor code). Others may not include the subjects at risk beyond completion of the study. If your data is of the latter type, but you know the number of subjects at time zero, Aabel provides a flexible means of including the information in the analysis.
  • The Kaplan-Meier survival analysis output includes:

  • Survival curves (default output) comparing the cumulative probability of survival at any specific time, with the option of displaying the censored observations
  • Life table computation results
  • Logrank significance test report - When we have generated two or more survival curves, the logrank test report is used to determine whether the differences in survival between groups, treatments, etc., are more than we would expect by chance alone.
    • The logrank is a non-parametric test that makes use of the full survival data without making any assumption about the shape of the survival curve or the distribution of survival times. The logrank test results include the hazard Ratio (i.e., the risk factor for one group, treatment, etc. compared to another group, treatment. etc.), the logrank chi-square and p values

    For more information about Kaplan-Meier survival analysis and the accompanying log-rank test, click here.

Receiver Operating Characteristic Curves (ROC)

ROC curves were originally developed to analyze noisy radio signals. Today, however ROC curves are widely used to display a plot of the true positive rate against the false positive rate for the different possible cut-points of a diagnostic test.

    Aabel's implementation of ROC allows plotting:

  • ROC for a single test
  • ROC for two tests on paired samples
  • ROC for two tests on unpaired samples
  • For more information, see examples.

Regression Analysis Concerning Two Continuous Variables

The regression methods grouped under this title either deal with finding the relationship between one outcome (dependent) variable and one predictor (independent) variable or finding the relationship between two variables where the designation of dependent and independent variables is irrelevant.

Pre-defined regression including:

  • Linear (X on Y)
  • Linear (thru zero)
  • Major axis
  • Reduced major axis
  • Polynomial
  • Exponential
  • Logarithmic
  • Power
  • For more information and examples, click here.

Cubic Spline interpolation:

  • A cubic spline is made from piece-wise third-order polynomials which pass through the control points provided. You can apply the cubic spline interpolation to a graph, or generate a specified sequence of interpolations for other uses.
  • User-Defined Non-Linear Regression

    • User-define regression provides a library of functions and an interactive interface.
    • For more information regarding regression methods in Aabel, click here.

Logistic Regression

Logistic regression allows you to predict a discrete outcome from a set of independent variables that may be continuous, discrete, or binary. The dependent variable is binary/dichotomous/binominal. Logistic regression in Aabel includes probability and logit (probability). Aabel allows generating:

  • Probability plots with one independent variable
  • Probability plots with multiple dimension projections
      • Logistic regression statistics (for probability plots with multiple dimension projections) includes logistic regression full model summary report and logistic regression parameters summary report.

    For more information and examples, click here.

Multiple Regression

Partial Least Squares Regression (PLS)

Principal Component Analysis (PCA)

PCA is a data reduction method with the primary objective of extracting a few uncorrelated variables that may capture most of the variability in a data set, while preserving the orthogonality of these new optimal reference axes/variables (i.e., principal components). The 1st principal component captures the maximum variation in the data set. The 2nd principal component has the next most variation, and so on.

  • PCA analysis allows optional pre-processing of the data prior to the main analysis; the options includes:
      • Standardizing
      • Normalizing
      • Logarithmisizing
      • Log centering
      • Mean centering
      • Taking square root
      • Ranking variables individually
      • Ranking variables jointly
  • PCA results can be stored in the source worksheet or in a new worksheet.
  • The scores and loadings can be graphically displayed using Aabel graphs and tables.
  • For more information, click here.

Factor Analysis

The defining characteristic that distinguishes between PCA and factor analysis is that in PCA we assume that all variability in an item should be used in the analysis, while in factor analysis, we define a priory the number of factors that we want to extract, and the extracted axes will be scaled to the variance along the new improved axes.

  • Factor analysis allows optional pre-processing of the data prior to the main analysis; the options includes:
      • Logarithmisizing
      • Log centering
      • Mean centering
      • Taking square root
      • Ranking variables individually
      • Ranking variables jointly
  • Aabel provides R-mode and Q-mode factor analyses and the option for Kaiser varimax rotation.
  • The loadings, communality, and "unique" data will be placed in an auto-generated worksheet.
  • You can use Aabel graphing capabilities to display the loadings, the fraction of variance of the variables explained by the model (i.e., communality) and the fraction that is not (i.e., "unique").

    For more information, click here.

Outlier Analysis

Polynomial Trend Surface Analysis (Map Analysis)

The trend surface analysis methods in Aabel can be used to e.g., derive a continuous smooth surface from irregular data or isolating regional trends from local variations. Aabel allows:

  • Polynomial trend surface analysis of XYZ data
  • Polynomial trend surface analysis of matrix data

      The analysis output includes the calculated trend grid, the XYZ estimates and residuals, and an ANOVA report for significance of regression of Kth-order polynomial trend surface.

  • Matrix cell-wise operations

      These operations are performed cell-wise (and must not be confused with matrix arithmetic). For example, cells of one matrix are added or subtracted from similar cells of another matrix.

    For more information about trend surface analysis and examples, click here.

K-Means Cluster Analysis

This multivariate method is used for clustering/grouping data with similar characteristics.

  • The method partition a data set of n objects into k clusters via an iterative process that continues until the sum of squares from points to the assigned cluster centers is minimized, i.e., until all cluster centers are at the mean of their voronoi sets.
  • Aabel uses the method of Hartigan and Wong (1979), performs several random starts, and attempts to converge to a global minimum of the squared error distortion.
    • For graphical representation of k-means analysis, you can use a 2-D (binary or matrix) or 3-D plot that allows displaying scatter data points; members of each cluster, which have similar characteristics in multidimensional space, will be displayed with identical marker properties.

      Aabel allows optional pre-processing of the data prior to the main k-means clustering analysis. The options include:

        • Standardizing
        • Normalizing
        • Logarithmisizing
        • Log centering
        • Mean centering
        • Taking square root
        • Ranking variables individually
        • Ranking variables jointly

    For more information, click here.

Hierarchical Cluster Analysis

The method used in Aabel is called the weighted pair-group with arithmetic averaging. When objects/observations are defined by a set of numeric variables, each object (worksheet row) is positioned in a multi-dimensional space of a dimension proportional to the number of variables (worksheet columns) used to define the object.

    The dissimilarity/similarity measures in Aabel are based on one of the following options:

    • The correlation coefficient (Pearson's product moment correlation coefficient)
    • Euclidean distance similarity (straight line)
    • Standardized Euclidian distance coefficient
    • Manhattan distance similarity measure

    For an example, click here.

Statistical Quality Control Using Shewhart and Other Control Charts

Control charts are used to monitor a process for some quality characteristic such as e.g., thickness, weight, defective fractions etc.

    Shewhart Control Charts for Variables: These charts are based on quality characteristics that can be measured and expressed numerically:

    • Xbar (R) chart
    • Xbar (S) chart
    • R chart
    • S chart

    Individual Measurements and Moving Range: Control charts for individual measurements use the moving range of two successive observations to measure the process variability.

    • Individual measurements chart
    • Moving range chart

Shewhart Control Charts for Attributes: The QC charts for attributes are based on quality characteristics that are attributes and expressed categorically, for example "conforming" or "non-conforming", "defective" or "non-defective", etc.

  • p chart
  • np chart
  • c chart
  • u chart

Levey-Jennings chart: This chart plots the original process variable against time, date, run number, etc.

Special Cause Variation (Test for Special Causes):

  • Western Electric Company (WECO) Rules: These rules can be applied to Xbar (R), Xbar (S), and individual measurements charts, for detecting small shifts in the process average.
  • Westgard Multi-Rules Procedure: These rules can be applied to Levey-Jennings charts, for detecting trends or shifts by examining individual values to determine the status of the measuring system.

For more information about QC charts and examples, click here.

Dot Plots

Dot plots are an alternative to histograms of continuous data; in a dot plot, each data point (individual observation) is plotted on a continuous scale using a symbol (on the X-axis). Aabel applies the method published by Leland Wilkinson (1999) to generate dot plots, but in addition, uses modifications necessary to allow:

  • Dynamically compensating for modifying the plot aspect ratio
  • Scaling of dot size (width)
  • Using ellipsoids dot stacks for making the plot more readable when dot stacks are either too uniform or too cluttered by vertical overlaps
  • Option of applying object marker colors to dot symbols allows distinguishing groups of data.
  • For more information and examples, click here.

Parallel Dot Plot of Repeated Measures

This graph is designed to display dot plots of score (response values) form k >=2 repeated measures (dependent samples) on axes that are parallel to one another and equally spaced, with all axes having the same value range (see the right-hand side image).

  • Lines extending from their positions on one axis to the next connect the dependent data points.
  • The probability significance test (p value) provided with the plot is based on one-way repeated measures ANOVA.
  • For more information and an example, click here.

Frequency Distribution/ Histograms

These charts are used to display frequency of continuous or categorical data.

Histograms of continuous data:

  • Absolute count, relative histogram
  • Cumulative histogram, cumulative relative
  • Cumulative frequency, cumulative relative frequency
  • z-Score absolute count, z-score relative histogram

Cateorical histogram chart types:

  • Absolute count or relative histogram
  • Pareto charts
  • Ogive
  • Spine plots
  • For more information and examples, click here.

Frequency Analysis of Categorical Data

Probability Plot

Probability charts display the cumulative distribution relative to a uniform (linear) or normal distribution function.

  • Normal Probability
  • Uniform Probability
  • For more information and examples, click here.

Box & Whisker and Box-Percentile Plots

The box & whisker graphs display rank statistics. They can have the form of a box or notched box that spans the distance between the two quartiles surrounding the median.

    Aabel provides the following graph types:

  • One-way box & whisker plots (regular and notched)
  • Two-way box & whisker plots (regular and notched)
  • Different options for plotting the whisker and outliers
  • One-Way box-percentile plots
  • Two-Way box-percentile plots
  • For more information regarding box & whisker and box-percentile plots, click here.

    The options for plotting the whisker and outliers:

  • Whiskers extended to extreme data points
  • Q1 - 1.5 * IQR, Q3 + 1.5 * IQR
  • Q1 - 1.5 * IQR, Q3 + 1.5 * IQR (and outliers)
  • 10th percentile, 90th percentile
  • 10th percentile, 90th percentile (and outliers)
  • 5th percentile, 95th percentile
  • 5th percentile, 95th percentile (and outliers)

Bar and Line Plots of Mean, Median, Max., Min.

These plots are used for comparing mean, medians, maximum, or minimum values of multiple variables, or of subgroups/categories of variables.

    Aabel provides the following graph types:

  • One-way bar plots
  • Two-way bar plots
  • One-way line plots
  • Two-way line plots

    To add error bars to mean bars/lines, the options include:

    • Standard error of mean
    • Standard deviation
    • Confidence interval (including options of 90.0%, 95.0%, 97.5%, 99.0%)

For more information regarding bar plots, click here; for more information regarding line plots, click here.

Interaction plots

Interaction plots are stacks of mean lines, used to display the effect of one factor at each level of another factor.

With no or insignificant interaction, the mean lines are approximately parallel. The more the lines deviate from being parallel, the more significant the interaction effect.

    To add error bars to interaction plots, you can choose one of the following options:

    • Standard error of mean
    • Standard deviation
    • Confidence interval (including options of 90.0%, 95.0%, 97.5%, 99.0%)

For more information and an example graph, click here.

3-Way Mean Plots

These plots compare the response values (scores, measurements) obtained from k >=2 samples/groups, each of which representing data from pqs levels of experimental conditions. The plot options include:

  • Three-way mean bar plot
  • Three-way mean dot plot
  • To add error bars to mean bars/dots, you can choose one of the following options:
    • Standard error of mean
    • Standard deviation
    • Confidence interval (including options of 90.0%, 95.0%, 97.5%, 99.0%)

For more information and example plots, click here.

Diamond Mean Comparison Plots

In these plots, the horizontal dashed line is the overall mean. The line through the center of each diamond is the group mean. The top and bottom diamond vertices are the respective upper and lower 95% confidence limits (CI) about the group mean.

In groups with equal sample sizes, overlapping marks indicate that the two group means are not significantly different at the 95% confidence level.

    Aabel provides the following diamond plot types:

  • One-way diamond mean comparison plot
  • Two-way diamond mean comparison plot

For more information, click here.

Bland & Altman and Paired t-Test Difference Plots

The Bland & Altman method comparison compares two methods of measurement or two paired variables, and provides a plot of difference vs. mean, in which the standard deviation of differences between measurements made by the two methods provides an index of the comparability of the methods.

In Aabel, the Bland & Altman method provides:

  • A plot of differences vs. mean
  • A plot of differences as a % of averages vs. mean
  • A plot of ratios vs. mean
  • In addition to the plot, the results can optionally be displayed in a table or stored in a worksheet.
  • Furthermore, Aabel has the option of showing:
    • The ±1.96 standard deviation, which is commonly used in a typical Bland & Altman plot
    • A paired t-test difference plot with the 95% CI of mean difference above and below the mean line
    • A paired t-test difference plot with the 99% CI of mean difference above and below the mean line

For example graphs, click here.