# How To Compare A Sub-Group Against The Total

Where appropriate, Q automatically compares sub-groups against the total result.

## Contents |

## Example

In the table below, 50% of the `Young` prefer `Blue`. Amongst the total sample, 40% prefer blue. The upward pointing arrow and the blue font indicate that the score for the `Young` is significantly higher than that for the total sample (i.e., than for the NET).

## The mechanics of how the tests are performed

At an intuitive level, this example can be thought of as testing the 50% for `Blue` amongst the `Young` versus the NET of 40%. Although this intuitive understanding is essentially correct in terms of how to interpret the data, it is not, at a technical level, a valid way to describe the test. To see the problem with this it is useful to work through the maths.

Amongst the `Young`, 50 out of 100 people preferred `Blue`. Amongst the total sample of 300, 120 (40%) preferred `Blue`. However, the 300 respondents in the total sample include the 100 `Young` people, so if we compare the 100 with the 300 we would be double-counting (or, to use the more formal statistical language, we would violate the assumption of independent samples).

The standard solution to this problem is to subtract the data of the `Young` from the total, and then perform the test. So, as 120 people in total preferred `Blue` and 50 of these were `Young`, this means that 70 (35%) of the 200 people that are not `Young` prefer `Blue`. When at its default settings, Q performs the test comparing the 50% of the 100 `Young` with the 35% of the 200 people that are not `Young`.

Note that while the testing is not explicitly being done of the 100 versus the 300, at a conceptual level we can interpret the test as if we had compared the 100 versus the 300. This is because the only way that young people can be different to the total is if they are different to the people that are not young people.

More detailed worked examples of how Q performs such tests is here.

A number statistical tests exist for testing samples where the groups overlap. They are variously known as *dependent*, *overlapping* and *related* sample tests. These tests have not been developed for the problem described above. Rather, they have been developed for the following two problems:

- Where all the columns in a table are permitted to overlap. For example, if the columns represent websites a person has visited in the last month. A person may have visited multiple websites, meaning that they they appear in multiple columns. Note that with such data Q does not, by default, use dependent/overlapping/related samples tests, and instead performs the test amongst respondents that are not overlapping (see Statistical Assumptions#Overlaps).
- Where the table is constructed from repeated measurements (e.g., a Number - Grid or Pick Any - Grid) question.

In theory, it is possible to apply a dependent/overlapping/related samples test to the data in the example above, but if the test is conducted in a valid way it will give the same result as will be obtained if applying the strategy described above (i.e., as the underlying maths of the test would simply remove the effect of the `Young` being in the total sample).

## Column comparisons

If performing Column comparisons you can have Q explicitly test against the NET. See How to Include the Main NET Column in Column Comparisons.