Base

From Q
Jump to navigation Jump to search

Definition

The base is defined as the denominator used in computing statistics such as the Average, %, Column %, Row %, Total % and % Share. For example, when computing a Column % in Q, the formula is:

Column % = Population / Column Population

and the base is thus the Column Population. Where the data is unweighted, this is equivalent to Column n.

More detail about which how the base is defined for various statistics can be found on the Statistics page.

How the base is computed in Q

The base is computed separately for every cell in a table.

Categorical data

When a categorical question is used in computing a statistic, a case is included in the base if:

Numeric data

When a numeric question is used in computing a statistic, a case is included in the base if:

Difference from other programs

Q's concept of the base is essentially the same as those in all modern statistics programs. However, some traditional crosstabbing programs and algorithms have a different definition, instead defining the base as the total number of cases (or sum of weights of the cases) that have data in at least one cell in the table. These definitions give different answers in the following situations:

  1. Where there is multiple response data (Pick Any and Pick Any - Grid), and cases exist with no selections (e.g., people have not chosen any options in a multiple response question), the percentages that Q shows will be systematically lower. This is usually obvious by the NET showing a value other than 100%.
  2. Where different cases have different sample sizes (e.g., if randomization was used, so that people only saw a subset of answers), many results will differ, because the traditional crosstabbing programs are not designed for data such as this so produce incorrect results.

Q can be made to replicate the results of other programs by rebasing tables, although some caution should be undertaken prior to attempting to do this, as the approach used in Q is in general a more valid approach. The only situations where the traditional crosstabbing programs' computations are preferable are when the NaNs and/or Missing Data selections in the data are incorrect, and it is generally a safer approach to correct these problems in either Q or the raw data file, rather than use approaches based on the assumption that the data file is incorrect.

See also