Analysis of Variance - One-Way ANOVA

From Q
Jump to: navigation, search

This feature is only available in Q5.

One-Way ANOVA (Analysis Of Variance), is a statistical test which tests the relationship between a numeric variables and a categorical variable. Testing is conducted using an F-test.

Options

Outcome The variable to be predicted.

Predictor A variable containing 2 or more groups. If not categorical, it is converted into categories in the analysis.

Compare Specifies the contrasts to be performed.

To mean The post hoc testing compares the mean of each category to the overall average (i.e., the grand mean).
To first The post hoc testing compares the mean of each category to the mean of the first category.
Pairwise The post hoc testing compares the mean of each pair of categories.

Correction The multiple comparisons correction applied when computing the p-values of the post-hoc comparisons. This correction is applied within each variable (i.e., there is no adjustment for multiple comparisons across variables within this function. Such adjustments are possible in Statistical Assumptions for ordinary tables. The Correction calculations take into account the settings in Compare. For example, when Tukey Range is selected in conjunction with Pairwise, Tukey's HSD is performed, whereas when set with To First Dunnett's test is performed (both tests are based on the same statistical notion of ranges in t-statistics, with the difference between the two being which comparisons are performed). The options are:

Tukey Range. This is the default.
False Discovery Rate
Benjamini & Yekutieli
Bonferroni
Free Combinations(Westfall et al. 1999).
Hochberg
Holm
Hommel
Single-step (Bretz et al. 2010)
Shaffer
Westfall

Alternative hypothesis The alternative used in computing the p-values in the post hoc tests.

Two sided This is the default.
Greater
Less

Robust standard errors Computes standard errors that are robust to violations of the assumption of constant variance. See Robust Standard Errors.

Missing data (see Missing Data Options):

Error if missing data
Exclude cases with missing data

Variable names Displays Variable Names in the output.

Binary variables Automatically converts non-ordered categorical variables into binary variables. Note that if this option is not selected, categories values are inferred based on the order of the categories (i.e., the Value Attributes are ignored).

Filter The data is automatically filtered using any filters prior to estimating the model.

Weight Where a weight has been set for the R Output, the calibrated weight is used. See Weights in R.

Technical details

When 'Tukey Range' is selected, p-values are computed using t-tests, with a correction for the family-wise error rate such that the p-values are correct for the largest range of values being compared (i.e., the biggest difference between the smallest and largest means). This is a single-step test.

The method of calculation for all the post hoc corrections is valid for balanced, unbalanced samples (Bretz et al. 2011), weighted samples and consequently the results may differ from those in other programs (which typically are only valid for balanced samples).

Acknowledgements

The linear model is fitted using the lm and manova functions in R. See Analysis of Variance - One-Way ANOVA for acknowledgements relating to the ANOVAs in the outputs.

References

Bretz,Frank, Torsten Hothorn and Peter Westfall (2011), Multiple Comparisons Using R, CRC Press, Boca Raton.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289-300.

Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165-1188.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65-70.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800-803.

Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75, 383-386.

Hothorn, Torsten, Frank Bretz and Peter Westfall (2008), Simultaneous Inference in General Parametric Models. Biometrical Journal, 50(3), 346-363.

Shaffer, Juliet P. (1986), Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81, 826-831.

Shaffer, Juliet P. (1995). Multiple hypothesis testing. Annual Review of Psychology 46, 561-576.

Sarkar, S. (1998). Some probability inequalities for ordered MTP2 random variables: a proof of Simes conjecture. Annals of Statistics 26, 494-504.

Sarkar, S., and Chang, C. K. (1997). Simes' method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association 92, 1601-1608.

Tukey, John (1949). "Comparing Individual Means in the Analysis of Variance". Biometrics. 5 (2): 99-114.

Westfall, Peter H. (1997), Multiple testing of general contrasts using logical constraints and correlations. Journal of the American Statistical Association, 92, 299-306.

Westfall, Peter H., R. D. Tobias, D. Rom, R. D. Wolfinger, Y. Hochberg (1999). Multiple Comparisons and Multiple Tests Using the SAS System. Cary, NC: SAS Institute Inc.

Wright, S. P. (1992). Adjusted P-values for simultaneous inference. Biometrics 48, 1005-1013.

Code

form.setHeading('One-Way ANOVA');
form.dropBox({label: "Outcome", 
            types:["Variable: Numeric, Date, Money, Categorical, OrderedCategorical"], 
            name: "formOutcomeVariables",
            multi:false})
form.dropBox({label: "Predictor",
            types:["Variable: Numeric, Date, Money, Categorical, OrderedCategorical"], 
            name: "formPredictor"})
form.comboBox({label: "Compare", 
              alternatives: ["To mean", "To first", "Pairwise"],
              name: "formCompare", default_value: "Pairwise"})
form.comboBox({label: "Correction", 
              alternatives: ["Tukey Range", "None", "False Discovery Rate", "Benjamini & Yekutieli", "Bonferroni", "Free Combinations", "Hochberg", "Holm", "Hommel", "Single-step", "Shaffer", "Westfall"], 
              name: "formCorrection", default_value: "Tukey Range"})
form.checkBox({label: "Robust standard errors", name: "formRobust", default_value: false})
form.comboBox({label: "Alternative hypothesis", 
              alternatives: ["Two-sided", "Greater", "Less"],
              name: "formAlternative", default_value: "Two-sided"})
form.comboBox({label: "Missing data", 
              alternatives: ["Error if missing data", "Exclude cases with missing data"], 
              name: "formMissing", default_value: "Exclude cases with missing data"})
form.checkBox({label: "Variable names", name: "formNames", default_value: false})
warning("This function is experimental. It has yet to be passed through our Quality Assurance Process. Please check results carefully.")
library(flipAnalysisOfVariance)
anova <- OneWayANOVA(QInputs(formOutcomeVariables), 
    QInputs(formPredictor), 
    weights = QCalibratedWeight,
    subset = QFilter,
    compare = formCompare,
    correction = formCorrection,
    alternative = formAlternative,
    robust.se = formRobust,
    missing = formMissing,
    show.labels = !formNames,
    outcome.name = deparse(substitute(outcome)),
    predictor.name = deparse(substitute(predictor)),
    p.cutoff = 0.05,
    seed = 1223)