Statistical Assumptions

From Q
Jump to navigation Jump to search
Related Online Training modules
Type 1 Error
Generally it is best to access online training from within Q by selecting Help > Online Training

Statistical Assumptions can be set for an entire project in Edit > Project Options > Customize and for selected tables and charts in Edit > Table Options. The Table Options Statistical Assumptions show as blank. This is because unless you specifically set these options, each table or chart will automatically have the assumptions that have been set for the entire project in the Project Options.

Default Statistical Assumptions

The default statistical assumption settings are outlined below. More commonly adjusted settings are highlighted in orange.
(Beginning in Q5.14.1.0, Statistical Assumptions is divided into four tabs rather than just one. The new tabs are called: Significance Levels, Test Type, Exception Tests, and Column Comparisons)

1035 x 676px


  1. Show significance: Arrows and font colors will designate significant results in tables
  2. Overall significance level: testing will be done at the 95% confidence level and above
  3. Minimal sample size for testing: you must have at least 2 respondents in your sample to test

  4. Statistical tests for categorical and numeric data:
  5. Proportions: non-parametric tests will be done on categorical data
  6. Means: t-test will be done on numeric data and corrected with Bessel’s correction
  7. Correlations: default is Pearson
  8. Equal variance in tests when sample size is less than: if the sample size is less than 10 variance is assumed equal

  9. Cell comparisons: the complement of the cell with be tested
  10. Multiple comparison correction: False Discovery Rate is by default applied to help reduce for the number of false positives based on the entire table
  11. Weights and significance: Automatically a mix of Taylor Series Linearization and Kish's Effective Sample Size Formula
  12. Date questions: tests compare across all dates rather than previous period
  1. Significance levels and appearance: Arrows: get longer with increased significance. Colors: Blue = significantly higher. Red = significantly lower. Font: Letters for column comparisons become capitalized after .001 is reached.
  2. Column comparisons: take affect only if Column Comparisons are selected

  3. Multiple Comparison correction: False Discovery Rate is by default applied to help reduce for the number of false positives based on the number of columns within the row & column span
  4. Overlaps: Default is for Q to ignore the sample that overlaps between columns when respondents in columns are not mutually exclusive
  5. ANOVA-Type Test: ANOVA is not run before displaying significance
  6. Show redundant tests: show significance on one cell (the one with the higher value)
  7. Show as groups: Show letters for insignificant columns rather than significant
  8. Recycle column letters: each span begins labeling columns at A
  9. No test symbol: - is shown if a test isn’t performed due to settings
  10. Symbol for non-significant test: nothing is shown if a test comes back insignificant


More detail on these settings can be found below.

Show significance

Show higher or lower significance with arrows, font colors (tables only), or using symbols to show differences between columns. Some outputs and export formats do not support all options. Show significance is new in Q 4.7. For more information see Ways of Showing Statistical Significance.

Overall significance level

The Overall significance level is used throughout Q when determining which results to show as being statistically significant. By default, this is set at 0.05. It is applied by Q when determining which results to highlight as being significant on tables and charts, whether or not to show symbols for Column Comparisons and when using Smart Tables. The precise meaning of the Overall significance level is determined by other Statistical Assumptions settings (see Interpretation of the Overall Significance Level by Q).

Minimal sample size for testing

Where cells have sample sizes of less than this value, no significance test is conducted when conducting automated tests of statistical significance between cells (i.e,. Cell Comparisons and Column Comparisons). By default this is set to 2.

When Weights and significance is set to Kish approximation or has been specified by the user, the Effective Sample Size is used instead of the actual sample size.

Statistical tests for categorical and numeric data

You can control the type of test conduct by Q when testing proportions, means and correlations. The options for controlling these tests are described in this section.

Proportions

Related Online Training modules
Population Weights
Non-Proportional Sampling Weights
Generally it is best to access online training from within Q by selecting Help > Online Training


The consequence of this setting depends upon the data being viewed. See:

Proportions Bessel's correction

Apply's Bessel's correction when computing the variance.

Means

This setting applies to tables involving means (i.e., tables that, by default, show the Average statistic).

The consequence of this setting depends upon the data being viewed. See:

Means Bessel's correction

Apply's Bessel's correction when computing the variance.

Correlations

This setting determines how the correlations are computed. See Correlations - Comparing Two Numeric Variables .

Equal variance in tests when sample size is less than

This setting determines whether to assume variances are equal or unequal when conducting t-tests and z-tests of means from independent samples involving unweighted data.

Cell comparisons

Multiple comparison correction

Selects the multiple comparison correction employed when determining which cells in a table are or are not shown as significant.

See Multiple Comparisons (Post Hoc Testing) for more information. By default, Q applies these corrections to the entire table simultaneously, but by checking Within row and span they are applied within each span within each row (e.g., if you have a table showing brand preference in the rows and age and gender in the columns, the corrections will be applied within age in each row and within gender in each row).

Within row and span

By default, Q applies these corrections to the entire table simultaneously, but by checking this box, they are applied within each span within each row. See the diagram beneath ANOVA-Type Test for an understanding of how having this option checked results in the comparisons being grouped.

Weights and significance

Determines how Q deals with weights when computing significance.

Automatic A mixture of Taylor Series Linearization and Kish's Effective Sample Size Formula. Information about how the design effect is taken into account for specific tests can be found in the description of the actual tests (see Tests Of Statistical Significance)
Taylor series linearization
Kish approximation See Kish's Effective Sample Size Formula.
Set to Enter a known design effect.
deff=1. Prior to Q4.10, this was called Set to.
deff = Sample size / sum of weights. Introduced in Q4.10.
Unweighted sample size in tests. Introduced in Q4.10.

These are discussed in more detail in Weights, Effective Sample Size and Design Effects.

Date questions

When charting data from Date questions, you can specify whether significance tests compare to the date in the previous period (Compare to previous period), or, to rest of the data (Compare to rest of data).

Significance levels and appearance

The symbols used to denote different levels of statistical significance.

By default, Q specifies ten different levels of significance on a chart. You can add or remove additional levels by pressing the Plus.png or Minus.png symbols.

Only these levels that are less than or equal to the nominated Overall significance level are used. For example, by default the 0.5, 0.2 and 0.1 levels are not shown (as the Overall significance level is set at 0.05).

How the specific rules are applied depends upon whether Cell Comparisons (e.g., arrows and font color) or Column Comparisons (e.g., letters) are used to signify significance. With Cell Comparisons, the length of the arrows is determined by the Corrected p. With Column Comparisons the uncorrected p-value (p) is used.

You can modify the length of arrows and font sizes, the colors used to highlight fonts and arrows. Whether or not arrows, font sizes and colors appear on a particular chart or slide is determined by the Table Styles settings (see Ways of Showing Statistical Significance and How Q Highlights Results as Being Significant).

The column entitled Column letters displays the symbols used to indicate whether there is a significant difference between columns (to be seen, you need to right-click on a table and select Statistics - Cells and Column Comparisons). By default, Q uses lowercase letters to show significant results where the p-value is more than 0.001 and uppercase letters where the p-value is less than or equal to 0.001. Other characters can be entered here, and should be separated by commas. For example, the default characters are entered as "a,b,c,d,e,...,z". When there are more columns in the table than the number of characters that are entered here, Q will repeat the characters where necessary and add a number for each repetition.

How the table works

For each statistical test that is conducted in your tables, Q will use these settings to work out which colors, arrow lengths, or letters to show for that test.

SignificanceLevelsTable.png

If the result for the cell (the Corrected p statistic) is smaller than the Overall Significance Level, Q will look down the rows of the table and check if the Corrected p is smaller than the Cutoff p-value. It will stop at the last row where the Corrected p is smaller than the Cutoff p-value. Q will then use the settings in that row to display the result. For example, if the Corrected p in your cell is 0.04, Q will look down the table and use the settings for the row 0.05, because 0.04 is smaller than 0.05, but it is not smaller than the next row of the table which is 0.01.

Column comparisons

Multiple comparison correction

The corrections available when conducting the post hoc corrections to Column Comparisons. See Multiple Comparisons (Post Hoc Testing).

See Multiple Comparisons (Post Hoc Testing) for more information. By default, Q applies these corrections to the entire table simultaneously, but by checking Within row and span they are applied within each span within each row (e.g., if you have a table showing brand preference in the rows and age and gender in the columns, the corrections will be applied within age in each row and within gender in each row).

Overlaps

Deals with the treatment of overlapping columns. This option is intended for use when replicating results from other programs. This modification only has an effect on a limited amount of the available tests (and, in particular, it cannot, in general, be used to switch on and off dependent tests).

This option is applicable to crosstabs containing numeric or categorical data in the rows and categorical data in the columns (i.e., not to grid questions). By default, when conducting column comparisons with such data Q ignores any overlapping sample. For example, if a table is created which is comparing Coke buyers with Pepsi buyers (in the columns), any tests will automatically filter out people that buy both brands, and, thus, they test Coke buyers that do not buy Pepsi versus Pepsi buyers that do not buy Coke. This occurs when either Default or Exclude is selected (except for Quantum or Survey Reporter Means/Proportions). If you change the setting to Independent, Q then assumes that the samples are entirely independent and thus ignores the overlap. When Dependent is selected, Q conducts a dependent test. Note that for Quantum or Survey Reporter Means/Proportions, dependent tests are used by default, i.e. when Default is selected for the overlaps setting.

This option should generally only be modified in Table Options for specific tables, and is only provided for use when replicating results from other programs (e.g., when the tests for Proportions and/or Means are set to Quantum Proportions, Quantum Means, Survey Reporter Proportions or Survey Reporter Means).

The table below shows the tests used for Quantum and Survey Reporter overlaps settings:

Independent Samples Dependent Samples
Quantum Proportions Independent Samples - Quantum Column Proportions Test Dependent Samples - Quantum Column Proportions Test
Quantum Means Independent Samples - Quantum Column Means Test Dependent Samples - Quantum Column Means Test
Survey Reporter Proportions Independent Samples - Survey Reporter Column Proportions Test Dependent Samples - Survey Reporter Column Proportions Test
Survey Reporter Means Independent Samples - Survey Reporter Column Means Test Dependent Samples - Survey Reporter Column Means Test

Within Row and Span

By default, Q applies these corrections within a row and span, but by un-checking this box, they are applied to the entire table (where statistical tests can be computed). See the diagram beneath ANOVA-Type Test for an understanding of how having this option checked results in the comparisons being grouped.

ANOVA-Type Test

When this is checked a test is conducted within the span in the row of the table prior to performing the multiple comparison correction. For example, the dashed boxes below show the groups of cells that are tested with an ANOVA-Type test. If that test is insignificant, then all the valid comparisons in that span are also shown as insignificant (i.e., no letters are shown). This option is only available for checking when Within row and span in Column comparisons is checked.

AnovaWithin.png

See ANOVA-Type Tests for more information on how Q conducts such tests.

Show redundant tests

Causes the symbols indicating significance to be shown for both columns. If not checked, then only the column containing the higher value is marked as significant. Note that the higher value is the higher value used in the actual test and this may differ from the number shown on the table in situations where there is missing data or overlapping columns.

Show as groups

Causes the symbols to indicate groups of columns that are not statistically different (as opposed to highlighting differences). This is sometimes referred to as common lettering.

See Planned ANOVA-Type Tests .

Recycle column letters

Re-uses the same column letters within each span (e.g., so the first column within a span is always A, etc.).

No test symbol

This symbol is used when a test was not performed, either because the setting for Comparisons did not request a setting or because a test would not be appropriate (e.g., due to the sample sizes being too small or due to the cell containing column totals). By default a dash is shown.

Symbol for non-significant test

This symbol is used when a test could not be performed (e.g., because one of the groups had no data). By default nothing is shown.