# Effective Sample Size Greater Than 100%

In most situations the *effective sample size* (Effective Base n) is smaller than the actual sample size (Base n). However, this is not always the case. There are two situations where the effective sample size as shown in Q can be greater than 100%: when comparing groups where a group has been over-recruited, and where variances are different within strata.

## Contents

## Comparing groups where a group has been over-recruited

Consider a simple example. Let us say a survey was designed to compare the attitudes of indigenous with non-indigenous Australians, which represent, respectively, 5% versus 95% of the Australian population. Such a study would generally employ *non-proportional stratification*, over-recruiting the indigenous Australians. For example, the study may be designed so that the indigenous Australians represent 50% of the sample (500).

The reason for using such a non-proportional sample design is because we are more likely to find a significant difference if comparing a sample of 500 indigenous Australians with a sample of 500 non-indigenous Australians than if comparing a sample of 50 indigenous Australians with a sample of 950 non-indigenous Australians.

If conducting such an analysis, the effective sample size will be greater than 100% as due to the non-proportional sampling the sampling error is smaller than if simple random sampling has been conducted (i.e., which would have involved a sample of 50 indigenous Australians). Note that this is the intuitively sensible result: it is consistent with the motivation for over-recruiting indigenous Australians in the sample.

## Where *strata* have different variances

Where different *strata* of a sample have difference variances for a statistic that is being estimated then it is *optimal* to over-recruit respondents in the groups with the higher variances (this is referred to as *Neyman allocation* in the statistics literature). Thus, where a sample is recruited such that there is over-recruitment of groups with higher variances then this can lead to an effective sample size of more than 100.

## Comparison to other programs

Other software designed for taking sampling designs into account will also produce effective sample sizes that exceed 100% of the actual sample size (e.g., IBM's *SPSS Complex Samples*) and the `surveys` package for *R*.

Many of the programs used within the market research industry for analyzing surveys, such as IBM's *Survey Reporter*, instead use Weight Calibration using Kish's Effective Sample Size Formula. This was also used in earlier versions of Q and is still used in many analyses in Q (however, all crosstabs involving means and proportions in Q use *Taylor Series Linearization*). You can use Kish's Effective Sample Size Formula in Q by changing the Statistical Assumptions setting of **Weights and significance** to **Kish's approximation**.