How To Compare A Sub-Group Against The Total

From Q
Jump to navigation Jump to search

Where appropriate, Q automatically compares sub-groups against the total result.


In the table below, 50% of the Young prefer Blue. Amongst the total sample, 40% prefer blue. The upward pointing arrow and the blue font indicate that the score for the Young is significantly higher than that for the total sample (i.e., than for the NET).


The mechanics of how the tests are performed

At an intuitive level, this example can be thought of as testing the 50% for Blue amongst the Young versus the NET of 40%. Although this intuitive understanding is essentially correct in terms of how to interpret the data, it is not, at a technical level, a valid way to describe the test. To see the problem with this it is useful to work through the maths.

Amongst the Young, 50 out of 100 people preferred Blue. Amongst the total sample of 300, 120 (40%) preferred Blue. However, the 300 respondents in the total sample include the 100 Young people, so if we compare the 100 with the 300 we would be double-counting (or, to use the more formal statistical language, we would violate the assumption of independent samples).

The standard solution to this problem is to subtract the data of the Young from the total, and then perform the test. So, as 120 people in total preferred Blue and 50 of these were Young, this means that 70 (35%) of the 200 people that are not Young prefer Blue. When at its default settings, Q performs the test comparing the 50% of the 100 Young with the 35% of the 200 people that are not Young.

Note that while the testing is not explicitly being done of the 100 versus the 300, at a conceptual level we can interpret the test as if we had compared the 100 versus the 300. This is because the only way that young people can be different to the total is if they are different to the people that are not young people.

More detailed worked examples of how Q performs such tests is here.

Dependent/overlapping/related samples tests

A number statistical tests exist for testing samples where the groups overlap. They are variously known as dependent, overlapping and related sample tests. These tests have not been developed for the problem described above. Rather, they have been developed for the following two problems:

  • Where all the columns in a table are permitted to overlap. For example, if the columns represent websites a person has visited in the last month. A person may have visited multiple websites, meaning that they they appear in multiple columns. Note that with such data Q does not, by default, use dependent/overlapping/related samples tests, and instead performs the test amongst respondents that are not overlapping (see Statistical Assumptions#Overlaps).
  • Where the table is constructed from repeated measurements (e.g., a Number - Grid or Pick Any - Grid) question.

In theory, it is possible to apply a dependent/overlapping/related samples test to the data in the example above, but if the test is conducted in a valid way it will give the same result as will be obtained if applying the strategy described above (i.e., as the underlying maths of the test would simply remove the effect of the Young being in the total sample).

Column comparisons

If performing Column comparisons you can have Q explicitly test against the NET. See How to Include the Main NET Column in Column Comparisons.

See also

How To Test Against The NET/Total/Average