Weights, Effective Sample Size and Design Effects
Contents |
How weights are taken into account in Q
By default, Q assumes that any weight is a sampling weight designed to correct for representativeness issues in a sample (e.g., to correct for an over- or under-representation of women in the sample).
Q assumes that weights are proportional to the inverse of the probability of selection. For example, if one respondent has a weight of 2 and another has a weight of 1, this means that the person with a weight of 2 had only half the chance of being selected for the survey as the other. With the exception of unweighted sample size, all statistics computed by Q take into account the weight. The unweighted sample size is reported as n, Column n, Row n and Base n dependending upon whether it is referring to a count in a cell, a row or column total or the total sample size. The weighted sample size is referred to as Population, Column Population, Row Population and Base Population dependending upon the context.
All statistical tests in Q are modified to take into account the weight in such a way that the average weight is not a determinant of the inference. That is, the same significance testing results are obtained whether the average weight is 1.0 or 1,000,000. For this reason, expansion weights can be used in Q, whereby the weights gross the sample up to the population of interest (which is why the weighted sample size is referred to as Population).
All statistical testing in Q uses either Taylor series linearization or Weight Calibration. Which is used is described in the description of individual tests in Tests Of Statistical Significance and in the various pages on multivariate analyses.
Effective sample size
In most instances, weighting causes a decrease in the statistical significance of results. The effective sample size is a measure of the precision of the survey (e.g., even if you have a sample of 1,000 people, an effective sample size of 100 would indicate that the weighted sample is no more robust than a well-executed un-weighted simple random sample of 100 people).
Each cell shown on a table potentially has a different effective sample size. This is because the impact of weights can differ by statistic. For example, if a study over-recruited buyers of a particular brand, then the effective sample size of the buyers of the brand is likely to be very different to the effective sample size of buyers of other brands (because the weights will differ by brand). Q uses two methods to compute effective sample size. Where Taylor series linearization (see Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R, Wiley) is used, the effective sample size is computed as the ratio of the sampling variance computed by the Taylor series linearization divided by the sampling variance computed under the assumption of a simple random sample with replacement. Where Taylor series linearization is not used, Kish's Effective Sample Size Formula is used.
Effective sample size is used and reported by Q in a variety of ways:
- Effective Base n can be viewed in the cells on most tables.
- Where a data file has been weighted, most tables in Q report the effective sample size in the bottom-right corner of the screen. The proportion shown in brackets indicates the size of the effective sample relative to the total sample size (excluding missing data). The effective sample size at the bottom of a table is computed as follows:
- Effective base n / Base n is computed for every cell in the table.
- The median of these ratios is then computed.
- The median is multiplied by the largest Base n on the table and is rounded.
- In some statistical tests the effective sample size is used to modify the weight (see Weight Calibration).
Note that the design effect, discussed in the next section, also impacts upon the effective sample size.
Design effects
The design effect is computed as the actual sample size divided by the effective sample size. Thus, where the true sampling variance is twice that computed under the assumption of simple random sampling the design effect is 2.0. (see SurveyAnalysis.org for a discussion of design effects).
Q automatically computes design effect for weighted data (unless this option is changed). However, from Q4.10 onwards, Q also permits an additional design effect (Extra deff in Statistical Assumptions) to be taken into account in significance testing to reflect aspects of the sample design that are not taken into account by the weight. The precise way it is taken into account is determined by the setting of Weights and Significance:
- Automatic. On tests of means and proportions on tables, the computation is as indicated below for Taylor Series Linearization. Otherwise, the Kish approximation is used.
- Taylor Series Linearization. The sampling variance is first estimated using Taylor series linearization, and this estimate is multiplied by the specified value of Extra Deff.
- Kish approximation. The effective sample size is used in place of the sample size when computing the variance, where the effective sample size is computed using:
effective sample size = where:
- is the un-calibrated weight of the of observations, and
- is the supplied extra design effect.
- deff = 1. Statistical inference is conducted under the assumption that the weights are frequency weights (see [1]), where the frequency weights are the supplied weights divided by the supplied extra design effect.
- deff = Sample size / sum of weights. Statistical inference is conducted under the assumption that the weights are frequency weights (see [2]), where the frequency weights are the supplied weights normalized to have an average value of 1 and then divided by the supplied extra design effect.
- Unweighted sample size in tests. The un-weighted sample size divided by the supplied extra design effect is used in all statistical inference. Note that if a respondent has a missing or non-positive weight, they are excluded from the analysis and the sample size. When weight calibration is used, this assumption is equivalent to deff = Sample size / sum of weights.