# Analysis of Variance - One-Way ANOVA

One-Way *ANOVA* (Analysis Of Variance), is a statistical test which tests the relationship between a numeric variable and a categorical variable. Testing is conducted using an F-test. It determines whether there are statistically significant differences between group means.

## Example

The table below shows the pairwise comparison of *age* grouped by *relationship*. For each pair, the *Difference* between group means and its *Standard Error*. *t* statistic and *Corrected p* value are given.

## Options

**Outcome** The variable to be predicted.

**Predictor** A variable containing 2 or more groups. If not categorical, it is converted into categories in the analysis.

**Compare** Specifies the *contrasts* to be performed.

**To mean**The*post hoc*testing compares the mean of each category to the overall average (i.e., the*grand mean*).**To first**The*post hoc*testing compares the mean of each category to the mean of the first category.**Pairwise**The*post hoc*testing compares the mean of each pair of categories.

**Correction** The multiple comparisons correction applied when computing the *p*-values of the *post-hoc* comparisons. This correction is applied within each variable (i.e., there is no adjustment for multiple comparisons across variables within this function. Such adjustments are possible in Statistical Assumptions for ordinary tables. The **Correction** calculations take into account the settings in **Compare**. For example, when **Tukey Range** is selected in conjunction with **Pairwise**, **Tukey's HSD** is performed, whereas when set with **To First** Dunnett's test is performed (both tests are based on the same statistical notion of ranges in t-statistics, with the difference between the two being which comparisons are performed). The options are:

**Tukey Range**. This is the default.**False Discovery Rate****Benjamini & Yekutieli****Bonferroni****Free Combinations**(Westfall et al. 1999).**Hochberg****Holm****Hommel****Single-step**(Bretz et al. 2010)**Shaffer****Westfall**

**Alternative hypothesis** The alternative used in computing the *p*-values in the post hoc tests.

**Two sided**This is the default.**Greater****Less**

**Robust standard errors** Computes standard errors that are robust to violations of the assumption of constant variance. See Robust Standard Errors.

**Missing data** (see Missing Data Options):

**Error if missing data****Exclude cases with missing data**

**Variable names** Displays Variable Names in the output.

**Binary variables** Automatically converts non-ordered categorical variables into binary variables. Note that if this option is not selected, categories values are inferred based on the order of the categories (i.e., the Value Attributes are ignored).

**Filter** The data is automatically filtered using any filters prior to estimating the model.

**Weight** Where a weight has been set for the R Output, the calibrated weight is used. See Weights in R.

## Technical details

When 'Tukey Range' is selected, p-values are computed using t-tests, with a correction for the family-wise error rate such that the p-values are correct for the largest range of values being compared (i.e., the biggest difference between the smallest and largest means). This is a single-step test.

The method of calculation for all the post hoc corrections is valid for balanced, unbalanced samples (Bretz et al. 2011), weighted samples and consequently the results may differ from those in other programs (which typically are only valid for balanced samples).

## Acknowledgements

The linear model is fitted using the `lm` and `manova` functions in R. See Analysis of Variance - One-Way ANOVA for acknowledgements relating to the ANOVAs in the outputs.

## References

Bretz,Frank, Torsten Hothorn and Peter Westfall (2011), Multiple Comparisons Using R, CRC Press, Boca Raton.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289-300.

Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165-1188.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65-70.

Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800-803.

Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75, 383-386.

Hothorn, Torsten, Frank Bretz and Peter Westfall (2008), Simultaneous Inference in General Parametric Models. Biometrical Journal, 50(3), 346-363.

Shaffer, Juliet P. (1986), Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association, 81, 826-831.

Shaffer, Juliet P. (1995). Multiple hypothesis testing. Annual Review of Psychology 46, 561-576.

Sarkar, S. (1998). Some probability inequalities for ordered MTP2 random variables: a proof of Simes conjecture. Annals of Statistics 26, 494-504.

Sarkar, S., and Chang, C. K. (1997). Simes' method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association 92, 1601-1608.

Tukey, John (1949). "Comparing Individual Means in the Analysis of Variance". Biometrics. 5 (2): 99-114.

Westfall, Peter H. (1997), Multiple testing of general contrasts using logical constraints and correlations. Journal of the American Statistical Association, 92, 299-306.

Westfall, Peter H., R. D. Tobias, D. Rom, R. D. Wolfinger, Y. Hochberg (1999). Multiple Comparisons and Multiple Tests Using the SAS System. Cary, NC: SAS Institute Inc.

Wright, S. P. (1992). Adjusted P-values for simultaneous inference. Biometrics 48, 1005-1013.

## Code

```
form.setHeading('One-Way ANOVA');
form.dropBox({label: "Outcome",
types:["Variable: Numeric, Date, Money, Categorical, OrderedCategorical"],
name: "formOutcomeVariables",
prompt: "Numeric dependent variable to be predicted",
multi:false})
form.dropBox({label: "Predictor",
types:["Variable: Numeric, Date, Money, Categorical, OrderedCategorical"],
prompt: "Categorical grouping variable used to predict the outcome variable",
name: "formPredictor"})
form.comboBox({label: "Compare",
alternatives: ["To mean", "To first", "Pairwise"],
name: "formCompare", default_value: "Pairwise",
prompt: "Compare groups and overall mean, or groups and first group or pairs of groups"})
form.comboBox({label: "Correction",
alternatives: ["Tukey Range", "None", "False Discovery Rate", "Benjamini & Yekutieli", "Bonferroni", "Free Combinations", "Hochberg", "Holm", "Hommel", "Single-step", "Shaffer", "Westfall"],
name: "formCorrection",
prompt: "Multiple comparisons correction used when calculating p-values",
default_value: "Tukey Range"})
form.checkBox({label: "Robust standard errors", name: "formRobust", default_value: false,
prompt: "Compute standard errors that are robust to violations of the assumption of constant variance"})
form.comboBox({label: "Alternative hypothesis",
alternatives: ["Two-sided", "Greater", "Less"],
name: "formAlternative", default_value: "Two-sided", prompt: "The alternative used in computing the p-values in the post hoc tests"})
form.comboBox({label: "Missing data",
alternatives: ["Error if missing data", "Exclude cases with missing data"],
name: "formMissing", default_value: "Exclude cases with missing data", prompt: "Treatment of missing data values"})
form.checkBox({label: "Variable names", name: "formNames", default_value: false, prompt: "Whether to use variable names instead of labels"})
```

```
library(flipAnalysisOfVariance)
anova <- OneWayANOVA(QInputs(formOutcomeVariables),
QInputs(formPredictor),
weights = QCalibratedWeight,
subset = QFilter,
compare = formCompare,
correction = formCorrection,
alternative = formAlternative,
robust.se = formRobust,
missing = formMissing,
show.labels = !formNames,
outcome.name = deparse(substitute(outcome)),
predictor.name = deparse(substitute(predictor)),
p.cutoff = 0.05,
seed = 1223)
```