Column Comparisons With Missing Repeated Measures Data
Repeated measures data is data where a respondent has provided two or more evaluations and there is a need to compare them (e.g., ratings of satisfaction at different points in time, ratings of the appeal of two different products).
Where there are serious missing data problems in the data, it can be useful to override how Q performs its testing when using Column Comparisons.
Example 1: Lots of missing data
In this example, there is lots of missing data, and no respondents have seen both A and B. Consequently, as Q defaults to only performing testing with respondents that have complete data, no tests are performed (as indicated by the -). In this example, conducting an independent samples test is likely the correct approach (see the instructions below).
Example 2: Some missing data
The table below shows that a higher proportion of respondents said they would buy Product B (20%) than Product A (16%). But, the Column comparisons tell the opposite story, with Product A shown to have a significantly higher Buy score than Product A. While this result seems to be a paradox, it is not an error, and is instead a consequence of a serious missing data problem.
The Column n at the bottom of the table shows that the sample size in Product A is 142, compared to 191 for B. When the data is filtered to include only respondents that have complete data, we get the table below. This is the table that Q has used in the "background" when performing the test of columns A and B. Note that with this table we can see that the first column's score is higher than that of the second column (16% versus 12%).
Understanding the logic of Q
When faced with missing repeated measures data, some people prefer to conduct testing using the numbers shown on the tables. That is, in the case of Examples 1 and 2, Q should just compare the numbers shown. There are a number of reasons why Q does not do this, and instead filters the data:
- Filtering the data is the orthodox solution in statistical testing.
- It increases the user's chance of detecting a problem.
When there is no missing data, Q performs the standard repeated measures tests (see Related Samples Tests - Comparing Two Proportions and Related Samples Tests - Comparing Two Means). Consequently, it would be confusing if Q did something different when some missing data exists.
Filtering the data is the orthodox solution in statistical testing
With repeated measures data such as this, the orthodox statistical treatment of the data is to perform testing only using respondents that have no missing data. Examining the data from Example 2 gives some insight into why this is the orthodox approach.
Looking at the data for Product B in Example 2, 20% of 191 Buy, whereas in the second table 12% of the 140 Buy. From this, we can deduce that the 51 respondents with missing data were, on average, much more likely to have said they would buy. The following table compares the Buy and Not buy data for product B according to whether or not there is any missing data, and shows that 41% of those with some missing data said Buy.
Thus, the 20% shown on the original table is a weighted average of two very different groups' data: the group with missing data who have given much higher Buy ratings, and the people who evaluated both products, who gave lower ratings for Product B. By default, Q ignores the data from those with missing data. There are a number of methodological justifications for this:
- The data from respondents that have seen both products may be more reliable than data from respondents that have only seen one. This is, of course, only a conjecture.
- There are no widely-recognized statistical tests that take into account the missing data. This is discussed in more detail below.
- Filtering the data and employing a repeated measures test provides more statistical power than using an independent samples tests (independent samples tests are discussed below).
- Where there is missing data, a valid analysis requires assumptions to be made about the causes of the missing data. Some possible assumptions and their relevant implications are:
- Data that is missing can be considered to be missing completely at random. If this assumption is correct, then it is safe to ignore the respondents with missing data (although statistical power may be reduced), as is done by default by Q.
- Respondents with missing data are intrinsically different (i.e., the missing data is not ignorable). In surveys, this is often the case (e.g., people with missing data may be people that have missing data because they are less experienced in a category). If this is the case, a valid analysis would need to involve estimating how the 51 respondents with some missing data would have evaluated each of Product A and Product B. That is, if the missing data is not ignorable, it means that it is invalid to compare the original numbers in the first table (20% versus 16%), because the 16% is a biased estimate due to not taking into account the missing data.
It increases the user's chance of detecting a problem
A criticism of the way that Q performs the testing is that "effectively the numbers show one thing but the significance testing shows another". This is an accurate description of how Q works, but the testing with Q was deliberately written to achieve this outcome. The initial table in Example 2 shows the results of 20% and 16% because these are the numbers in the data. However, the test that is shown is Q's attempt to produce the best test possible given the data. Were Q to instead provide a test consistent with the data, it would mean that the test would, more often than not, be invalid. Further, there would be no way for anybody reading the table to identify the problem, as the numbers would appear to make sense. By contrast, by providing the paradoxical result, the user is able to identify that something is wrong and investigate the problem further.
A second cue to the existence of the missing data problem is that Q shows a range of the missing data at the bottom-right of the screen (i.e., base n = from 142 to 191).
Alternative approaches to statistical testing with missing data
Independent samples tests
An alternative approach, which is generally a better approach in the case of large amounts of missing data (e.g., Example 1), is to get Q to ignore the repeated measures nature of the data and directly compare the percentages, assuming they come from independent samples. There are a variety of way of doing this.
Note that applying independent samples tests involves assuming that the data is missing completely at random (this is discussed in more detail in the next section). And, it also involves ignoring the dependence in the data and thus reduces the statistical power.
The most straightforward approach to conducting independent samples tests in Q is to:
- Select the table or tables in which you wish to modify the statistical testing assumptions.
- Select Edit > Table Options > Statistical Assumptions.
- Change the Proportions test to Survey Reporter Proportions or, if testing means, Survey Reporter Means.
- Change Overlaps to Independent.
- Press OK.
An alternative approach is to use the Rule called Significance Testing - Independent Samples Column Means and Proportions Tests. This approach has more flexibility, but it is more complex to use, so introduces a greater risk of users making mistakes.
Dependent samples tests
Dependent samples tests are statistical tests explicitly designed for the problem in Example 2. Two notes of caution about dependent samples tests:
- They are not recognized in the statistical literature. That is, while a number of market research software programs provide these tests, there is no body of published work supporting their validity.
- The tests assume that the data is missing completely at random and, as this assumption is often not appropriate in survey research, these tests should generally not be applied without first checking this assumption. A way to check the assumption is to see if the responses of people with missing data are systematically different to those without, as done above (and, in the case of Example 2, applying a dependent test is not appropriate).
Dependent samples tests are run in Q as follows: