Analysis of Variance - One-Way ANOVA

From Q
Jump to navigation Jump to search


One-Way ANOVA tests the relationship between a numeric variable and a categorical variable.

How to Run a One-Way ANOVA

  1. Add the object by selecting from the menu Anything > Advanced Analysis > Analysis of Variance > One-Way ANOVAAutomate > Browse Online Library > Analysis of Variance > One-Way ANOVA
  2. In Inputs > Outcome specify the outcome variable.
  3. Specify the predictor variable in Inputs > Predictor

Example

The table below shows the pairwise comparison of age grouped by relationship. For each pair, the Difference between group means and its Standard Error. t statistic and Corrected p value are given.

Options

The options in the Object Inspector are organized into two tabs: Inputs and Properties.

Inputs

Outcome The variable to be predicted.

Predictor A variable containing 2 or more groups. If not categorical, it is converted into categories in the analysis.

Compare Specifies the contrasts to be performed.

To mean The post hoc testing compares the mean of each category to the overall average (i.e., the grand mean).
To first The post hoc testing compares the mean of each category to the mean of the first category.
Pairwise The post hoc testing compares the mean of each pair of categories.

Correction The multiple comparisons correction applied when computing the p-values of the post-hoc comparisons. This correction is applied within each variable (i.e., there is no adjustment for multiple comparisons across variables within this function. Such adjustments are possible in Statistical Assumptions for ordinary tables. The Correction calculations take into account the settings in Compare. For example, when Tukey Range is selected in conjunction with Pairwise, Tukey's HSD is performed, whereas when set with To First Dunnett's test is performed (both tests are based on the same statistical notion of ranges in t-statistics, with the difference between the two being which comparisons are performed). The options are:

Tukey Range. This is the default.
False Discovery Rate
Benjamini & Yekutieli
Bonferroni
Free Combinations(Westfall et al. 1999).
Hochberg
Holm
Hommel
Single-step (Bretz et al. 2010)
Shaffer
Westfall

Alternative hypothesis The alternative used in computing the p-values in the post hoc tests.

Two sided This is the default.
Greater
Less

Robust standard errors Computes standard errors that are robust to violations of the assumption of constant variance. See Robust Standard Errors.

Missing data (see Missing Data Options):

Error if missing data
Exclude cases with missing data

Variable names Displays Variable Names in the output.

Binary variables Automatically converts non-ordered categorical variables into binary variables. Note that if this option is not selected, categories values are inferred based on the order of the categories (i.e., the Value Attributes are ignored).

Filter The data is automatically filtered using any filters prior to estimating the model.

Weight Where a weight has been set for the R Output, the calibrated weight is used. See Weights in R.

Properties

This tab contains options for formatting the size of the object, as well as the underlying R code used to create the visualization, and the JavaScript code use to customize the Object Inspector itself (see Object Inspector for more details about these options). Additional options are available by editing the code.

Technical details

When 'Tukey Range' is selected, p-values are computed using t-tests, with a correction for the family-wise error rate such that the p-values are correct for the largest range of values being compared (i.e., the biggest difference between the smallest and largest means). This is a single-step test.

The method of calculation for all the post hoc corrections is valid for balanced, unbalanced samples (Bretz et al. 2011), weighted samples and consequently the results may differ from those in other programs (which typically are only valid for balanced samples).

Acknowledgements

The linear model is fitted using the lm and manova functions in R. See Analysis of Variance - One-Way ANOVA for acknowledgements relating to the ANOVAs in the outputs.

References

Bretz,Frank, Torsten Hothorn and Peter Westfall (2011), Multiple Comparisons Using R, CRC Press, Boca Raton.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289-300.

Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165-1188.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65-70.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800-803.

Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75, 383-386.

Hothorn, Torsten, Frank Bretz and Peter Westfall (2008), Simultaneous Inference in General Parametric Models. Biometrical Journal, 50(3), 346-363.

Shaffer, Juliet P. (1986), Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81, 826-831.

Shaffer, Juliet P. (1995). Multiple hypothesis testing. Annual Review of Psychology 46, 561-576.

Sarkar, S. (1998). Some probability inequalities for ordered MTP2 random variables: a proof of Simes conjecture. Annals of Statistics 26, 494-504.

Sarkar, S., and Chang, C. K. (1997). Simes' method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association 92, 1601-1608.

Tukey, John (1949). "Comparing Individual Means in the Analysis of Variance". Biometrics. 5 (2): 99-114.

Westfall, Peter H. (1997), Multiple testing of general contrasts using logical constraints and correlations. Journal of the American Statistical Association, 92, 299-306.

Westfall, Peter H., R. D. Tobias, D. Rom, R. D. Wolfinger, Y. Hochberg (1999). Multiple Comparisons and Multiple Tests Using the SAS System. Cary, NC: SAS Institute Inc.

Wright, S. P. (1992). Adjusted P-values for simultaneous inference. Biometrics 48, 1005-1013.

Code

To access the underlying code in Displayr, go to Properties > R CODE.

var heading_text = 'One-Way ANOVA';
if (!!form.setObjectInspectorTitle)
    form.setObjectInspectorTitle(heading_text, heading_text)
else 
    form.setHeading(heading_text);
form.dropBox({label: "Outcome", 
            types:["Variable: Numeric, Date, Money, Categorical, OrderedCategorical"], 
            name: "formOutcomeVariables",
            prompt: "Numeric dependent variable to be predicted",
            multi:false})
form.dropBox({label: "Predictor",
            types:["Variable: Numeric, Date, Money, Categorical, OrderedCategorical"],
            prompt: "Categorical grouping variable used to predict the outcome variable",
            name: "formPredictor"})
form.comboBox({label: "Compare", 
              alternatives: ["To mean", "To first", "Pairwise"],
              name: "formCompare", default_value: "Pairwise",
              prompt: "Compare groups and overall mean, or groups and first group or pairs of groups"})
form.comboBox({label: "Correction", 
              alternatives: ["Tukey Range", "None", "False Discovery Rate", "Benjamini & Yekutieli", "Bonferroni", "Free Combinations", "Hochberg", "Holm", "Hommel", "Single-step", "Shaffer", "Westfall"], 
              name: "formCorrection",
              prompt: "Multiple comparisons correction used when calculating p-values",
              default_value: "Tukey Range"})
form.checkBox({label: "Robust standard errors", name: "formRobust", default_value: false,
               prompt: "Compute standard errors that are robust to violations of the assumption of constant variance"})
form.comboBox({label: "Alternative hypothesis", 
              alternatives: ["Two-sided", "Greater", "Less"],
              name: "formAlternative", default_value: "Two-sided", prompt: "The alternative used in computing the p-values in the post hoc tests"})
form.comboBox({label: "Missing data", 
              alternatives: ["Error if missing data", "Exclude cases with missing data"], 
              name: "formMissing", default_value: "Exclude cases with missing data", prompt: "Treatment of missing data values"})
form.checkBox({label: "Variable names", name: "formNames", default_value: false, prompt: "Whether to use variable names instead of labels"})
library(flipAnalysisOfVariance)

WarnIfVariablesSelectedFromMultipleDataSets()

Ianova <- OneWayANOVA(QInputs(formOutcomeVariables),
    QInputs(formPredictor),
    weights = QCalibratedWeight,
    subset = QFilter,
    compare = formCompare,
    correction = formCorrection,
    alternative = formAlternative,
    robust.se = formRobust,
    missing = formMissing,
    show.labels = !formNames,
    outcome.name = deparse(substitute(outcome)),
    predictor.name = deparse(substitute(predictor)),
    p.cutoff = 0.05,
    seed = 1223)