Statistics

From Q
(Redirected from Column Comparisons)
Jump to navigation Jump to search
Related Online Training modules
Question Types and Statistics
Crosstabs
Grids
Generally it is best to access online training from within Q by selecting Help > Online Training

Q has an expert system which automatically selects the most appropriate statistics to show on a table. Most commonly, these are percentages and averages.

Alternative statistics can be shown within each cell, to the right of the table or below the table. These statistics can be selected by right-clicking on the table and selecting from within Statistics – Cells and, on some tables, also from Statistics – Right and Statistics – Below.

Which statistics are available depends upon the type of questions selected. Hold down the Ctrl key while clicking to prevent the menu from closing, in order to toggle more than one statistic.

Statistics can be renamed either for the entire project or for specific tables within a project (see Output Text).

%

The weighted proportion of respondents to give a particular response. This is computed as Population/Base Population.

Note that in some other programs a different computation is used, whereby the percentage is only computed for respondents that have provided data; thus in Q it is possible to have a NET which is less than 100% (which indicates some people have not selected anything) while in the other programs the NET, which is often labeled as Total, always shows 100%).

% Column Responses

The proportion of the total number of responses (weighted) in the column represented by the cell. Refer to % Responses for more information. Where the data contains missing values and/or hidden categories the percentages may not add up to 100%, and the value of the statistic will not correspond to the value obtained by summing the values of n in each column.

% Column Share

The proportion of the total Sum for the column of the table represented by the cell. Where the data contains missing values and/or hidden categories the percentages may not add up to 100%.

% Excluding NaN

Same as % except that categories with a NaN Value have been excluded from the denominator when computing the percentage.

% Responses

The proportion of the total number of responses (weighted) represented by the cell. Only applies to Pick Any and Pick Any - Compact questions. This is computed as the percentage shown in the cell divided by the sum of the percentages in the table. Note that where a category in a table has been created by merging other categories, these are de-duplicated prior to performing the calculation. Where the data contains missing values and/or hidden categories the percentages may not add up to 100%, and the value of the statistic will not correspond to the value obtained by summing the values of n in the table.

% Row Responses

The proportion of the total number of responses (weighted) in the row represented by the cell. Refer to % Responses for more information. Where the data contains missing values and/or hidden categories the percentages may not add up to 100%, and the value of the statistic will not correspond to the value obtained by summing the values of n in each row.

% Row Share

The proportion of the total Sum for the row of the table represented by the cell. Where the data contains missing values and/or hidden categories the percentages may not add up to 100%.

% Share

The proportion of the total Sum for the table represented by the Sum cell in a margin of the table (obtained via Statistics – Right or Statistics – Below). If you delete some categories in a table, but leave these categories in the NET, the share shown in the NET may exceed 100%. Where the data contains missing values and/or hidden categories the percentages may not add up to 100%.

% Total Responses

The proportion of the total number of responses (weighted) represented by the cell. Refer to % Responses for more information. Where the data contains missing values and/or hidden categories the percentages may not add up to 100%, and the value of the statistic will not correspond to the value obtained by summing the values of n in the table.

% Total Share

The proportion of the total Sum for the table represented by the cell. Where the data contains missing values and/or hidden categories the percentages may not add up to 100%.

5th Percentile

The value which 5% of non-missing observations are equal to or below. This is referred to as [math]\displaystyle{ \hat{Q}_1 (p) }[/math] in Hyndman and Fan (1996). In the edge-case where exactly 5% of values are less than a specific value ([math]\displaystyle{ x_1 }[/math]), the 5th percentile is computed as [math]\displaystyle{ 0.95 \times x_i + 0.05 \times x_{i+1} }[/math], where [math]\displaystyle{ x_{i+1} }[/math] is the next highest value; this method is sometimes known as the HAVERAGE method and is referred to as [math]\displaystyle{ \hat{Q}_6 (p) }[/math] in Hyndman and Fan (1996).

25th Percentile

The value which 25% of non-missing observations are equal to or below. This is referred to as [math]\displaystyle{ \hat{Q}_1 (p) }[/math] in Hyndman and Fan (1996). In the edge-case where exactly 25% of values are less than a specific value ([math]\displaystyle{ x_1 }[/math]), the 25th percentile is computed as [math]\displaystyle{ 0.75 \times x_i + 0.25 \times x_{i+1} }[/math] where [math]\displaystyle{ x_{i+1} }[/math] is the next highest value; this is sometimes known as the HAVERAGE method and is referred to as [math]\displaystyle{ \hat{Q}_6 (p) }[/math] in Hyndman and Fan (1996).

75th Percentile

The value which 75% of non-missing observations are equal to or below. This is referred to as [math]\displaystyle{ \hat{Q}_1 (p) }[/math] in Hyndman and Fan (1996). In the edge-case where exactly 75% of values are less than a specific value ([math]\displaystyle{ x_1 }[/math]), the 75th percentile is computed as [math]\displaystyle{ 0.25 \times x_i + 0.75 \times x_{i+1} }[/math], where [math]\displaystyle{ x_{i+1} }[/math] is the next highest value; this method is sometimes known as the HAVERAGE method and is referred to as [math]\displaystyle{ \hat{Q}_6 (p) }[/math] in Hyndman and Fan (1996).

95th Percentile

The value which 95% of non-missing observations are equal to or below. This is referred to as [math]\displaystyle{ \hat{Q}_1 (p) }[/math] in Hyndman and Fan (1996). In the edge-case where exactly 95% of values are less than a specific value ([math]\displaystyle{ x_1 }[/math]), the 5th percentile is computed as [math]\displaystyle{ 0.05 \times x_i + 0.95 \times x_{i+1} }[/math], where [math]\displaystyle{ x_{i+1} }[/math] is the next highest value; this method is sometimes known as the HAVERAGE method and is referred to as [math]\displaystyle{ \hat{Q}_6 (p) }[/math] in Hyndman and Fan (1996).

Average

The weighted mean, which is defined as Sum / Population.

Base n

The total unweighted sample size that is used to construct the cell in the table. Keep in mind, Base n is not affected by weights, except if the weight for some respondents is zero -- essentially removing them from the analysis.

Base Population

The estimated number of people in the total population from which the data is derived. It is the weighted total sample size (i.e., Base n). Unless different sample sizes apply to different cells in the table, such as with a split cell experimental design (randomized experiment), the Base Population will be the same for all cells in the table. Where a Date question is used, Base Population only relates to the individual time period.

Coefficient

The coefficient in an Experiment or Ranking question.

Column %

The weighted proportion of respondents in a column to give a particular response. Note that in some other programs, a different computation is used, whereby the percentage is only computed for respondents that have provided data; thus in Q it is possible to have a NET which is less than 100% (which indicates some people have not selected anything) while in the other programs the NET, which is often labeled as Total, always shows 100%).

Column Comparisons

Statistical tests comparing columns (within rows).

When used in Statistics - Below this will return the results of the tests conducted on the Statistics - Below > Average statistic.

See also

Column n

The number of observations containing data in the column. For example, if the column represents males, then the Column n is the number of males in the sample that do not have missing data on the row variables. Keep in mind, Column n is not affected by weights, except if the weight for a particular sub-group is zero -- essentially removing them from the analysis. Also note, this definition means that if you change a Pick Any question to a Number - Multi question, this will cause the Column n to change (and become equivalent to the Base n).

Column Names

The names of columns used in Column Comparisons. See Statistical Assumptions for more information.

Column Population

The weighted sample size of the column.

Columns Compared

The names of the columns that were compared against in Column Comparisons.

Column Standard Error

The standard error of the Column % (expressed as a proportion).

Column Standard Error of Mean

The standard error of a mean.

Corrected p

A corrected p-value taking into account if the Cell comparisons setting of Multiple comparison correction is set to False Discovery Rate. If not, Corrected p = p.

Correlation

A value of 0 indicates no statistical association. A positive number indicates a positive association – as one goes up the other, on average, also goes up, and vice versa for negative values. The further a correlation from 0, the stronger the relationship. Correlations range from -1 to 1. The correlation is available when a Number or Number - Multi question is selected in each of the blue and brown drop-down menus. The specific type of correlation is specified in the Correlations setting in Statistical Assumptions.

Cumulative %

The cumulative percentage, starting with the top of the table.

d.f.

Degrees of freedom. Only computed when using parametric statistical tests that compute a t-statistic.

Effective n

Equivalent to n, except adjusted to reflect the effective sample size. See Effective sample size.

Effective Base n

Equivalent to Base n, except adjusted to reflect the effective sample size. See Effective sample size.

Expected Average

The expected value of the statistic under the assumption of independence. The statistic that Expected refers to is sometimes a percentage, sometimes an expected average and sometimes an expected value of Population. This depends on the table. See Effective sample size.

Expected %

The expected value of the statistic under the assumption of independence. The statistic that Expected refers to is sometimes a percentage, sometimes an expected average and sometimes an expected value of Population. This depends on the table. See also Summaries of Grid Questions (Number - Grid and Pick Any - Grid).

Expected Correlation

The expected value of the statistic under the assumption of independence. The statistic that Expected refers to is sometimes a percentage, sometimes an expected average and sometimes an expected value of Population. This depends on the table.

Expected n

The expected value of the statistic under the assumption of independence. The statistic that Expected refers to is sometimes a percentage, sometimes an expected average and sometimes an expected value of Population. This depends on the table.

Index

The ratio of the row percentage (or column percentage) to the total for the column (or row) multiplied by 100. For example, if Pepsi has an Index of 150 for Males, this would mean that males are 50% more likely to purchase Pepsi than the average for the population.

Interquartile Range

The difference between the 75th and 25th percentiles.

Labels

The labels assigned to specific values of categorical variables. For example, “Gender” may be associated with the value “1”. Only available when RAW DATA is selected in the brown drop-down.

Lower Confidence Interval

The lower-bound of the Confidence Interval.

Lower Confidence Interval %

The lower-bound of the Confidence Interval.

Maximum

The highest value of a respondent.

Median

The value which 50% of non-missing observations are equal to or below. For example, if 51% of respondents have a value of 0 and 49% have a value of 1 then the median will be reported as 0. This definition is different to that employed in some other statistics and market research programs. This is referred to as [math]\displaystyle{ \hat{Q}_1 (p) }[/math] in Hyndman and Fan (1996). In the edge-case where exactly 50% of values are less than a specific value ([math]\displaystyle{ x_1 }[/math]), the 50th percentile is computed as [math]\displaystyle{ 0.5 \times x_i + 0.5 \times x_{i+1} }[/math], where [math]\displaystyle{ x_{i+1} }[/math] is the next highest value; this method is sometimes known as the HAVERAGE method and is referred to as referred to as [math]\displaystyle{ \hat{Q}_6 (p) }[/math] in Hyndman and Fan (1996).

Minimum

The lowest value of a respondent.

Missing n

The number of observations with missing data.

Mode

The most commonly occurring non-missing value. Where there is a tie, the lowest value is selected.

Multiple Comparison Adjustment

Corrected p / p.

n

The number of people in the data file in the cell in the table. This is commonly referred to as a count. When the table involves numeric data, n is the sample size used to compute the average or correlation. Keep in mind, n is not affected by weights, except if the weight for a particular sub-group is zero -- essentially removing them from the analysis.

n Observations

The number of observations in repeated measures data; used to compute the statistics in the cell. Used in Experiment and Ranking questions.

Not duplicate

A 1 indicates that the cell is not “overlapping” with another cell and a 0 indicates that it overlaps. For example, if a table contains “18 to 24”, “25 to 24” and “18 to 34” then the “18 to 34” cells will have a value of 0. This is used in the computation of multiple comparison adjustments (i.e., if a cell overlaps other cells, it is not used when computing cutoff values).

Observation

Observation was introduced in Q 4.6

The observation number (in terms of the rows in the data file) that gave the response in the cell.

p

The p-value for the cell; it does not take into account any Multiple Comparison Corrections that have been specified. Refer to p-Values for a definition.

Population

The estimated number of people in the population (as determined from the weight) in the cell in the table. When the table is showing categorical data (i.e., Pick One - Multi, Pick Any, Pick One and Pick Any - Grid), the Population is a weighted count (i.e., the weighted n). When the table involves numeric data, Population is the estimate of the number of people in the population to which the average or correlation applies.

Probability %

Ranking questions display a Probability % which is an estimate of the proportion of the population that would choose an option first (i.e., with the highest ranking). This statistic is not computed by counting up the proportion to give an option the highest rank; rather, it is derived from Coefficient (via a logit transformation) and takes into account all the ranking information across the sample. In general, Probability % has a lower sampling error than will be computed if counting up the proportion to choose a particular option once and can thus be a more reliable statistic for assessing preferences.

See MaxDiff Case Study for an example discussing interpretation of this statistic.

See: See Allison, P. D. and N. A. Christakis (1994). "Logit Models for Sets of Ranked Items." Sociological Methodology 24: 199-228.

Range

Maximum - Minimum.

Residual

The difference between the observed and expected value.

Residual %

The difference between the observed and expected %.

Row %

The weighted proportion of respondents in a row to give a particular response. Note that in some other programs, a different computation is used, whereby the percentage is only computed for respondents that have provided data; thus in Q it is possible to have a NET which is less than 100% (which indicates some people have not selected anything) while in the other programs the NET, which is often labeled as Total, always shows 100%).

Row Population

The weighted sample size of the row.

Row n

The number of observations containing data in the row. Respondents that have either not selected any of the options in the columns of the brown drop-down question, or, have missing data in one or more of the cells of the row, are not counted. Keep in mind, Row n is not affected by weights, except if the weight for a particular sub-group is zero -- essentially removing them from the analysis.

Standard Deviation

A measure of central tendency. The population standard deviation is obtained by having Bessel's correction off for Means in Statistical Assumptions, and the sample standard deviation is obtained by having it on (this is the default). See also Results Are Different to those from Another Program#Standard deviation.

Where the sample size is 1, a NaN is returned. See also Standard deviations with weighted data

Standard Error

The estimated standard deviation of the sampling distribution of the primary statistic on the table. Where the data is numeric, this is computed for the Average, where it is an Experiment it is computed for the Coefficient, where a Ranking it is for the Probability %/100, and where it is a proportion, it is computed for whichever exists of %/100 or Total %/100, except for with Pick One - Multi questions in which case the proportion is whichever is applicable of a Row % and Column %. Note that percentages in Q are multiple by 100 (i.e., they do not appear as proportions), whereas the standard errors are of the proportions (i.e., they need to be multiplied by 100 if comparing with the percentages).

Where the data is not weighted, or, Weights and significance is computed using Kish's correction or Set to a specified value, the standard error of the proportion [math]\displaystyle{ p }[/math] is computed as:

[math]\displaystyle{ \sigma_p=\sqrt{\frac{p(1-p)}{\sum^n_{i=1}w_i - b}} }[/math]

where [math]\displaystyle{ b=1 }[/math] if Bessel's correction is selected and 0 otherwise and where [math]\displaystyle{ w_i=1 }[/math] where no weight has been applied and [math]\displaystyle{ w_i }[/math] is the calibrated weight if Weights and significance is set to Kish correction or Set to a specified value (see Weights, Effective Sample Size and Design Effects). Where the Weights and significance has been set to Automatic or Taylor series linearization, then Taylor series linearization is used to compute the standard error, and the Bessel correction is always assumed to be 1 (which will give a different result to those used in most statistical texts, which instead have no Bessel correction).

Where the data is not weighted, or, Weights and significance is computed using Kish correction or Set to a specified value, the standard error of the mean is computed as:

[math]\displaystyle{ \sigma_{\bar x}=\sqrt{\frac{1}{(\sum^n_{i=1}w_i)(\sum^n_{i=1}w_i - b)}\sum^n_{i=1}w_i(x_i - \frac{1}{\sum^n_{i=1}w_i}\sum^n_{i=1}w_i x_i})^2 }[/math]

where [math]\displaystyle{ b=1 }[/math] if Bessel’s correction is selected and 0 otherwise and [math]\displaystyle{ w_i=1 }[/math] were no weight has been applied, [math]\displaystyle{ w_i }[/math] is the calibrated weight if Design effect for weight is set to Kish correction or Set to a specified value (see Weights, Effective Sample Size and Design Effects). Where the Design effect for weight has been set to Automatic or Taylor series linearization, then Taylor series linearization is used to compute the standard error.

Refer to Train, Kenneth E. (2009), Discrete Choice Methods with Simulation, Second Edition: Cambridge for a details regarding computation of standard errors in experiments and to Logit Models for Sets of Ranked Items Paul D. Allison; Nicholas A. Christakis, Sociological Methodology, Vol. 24. (1994), pp. 199-228, for technical details regarding parameters of Ranking questions.

Sum

The weighted sum of all the responses, defined as Average multiplied by Population.

Text

The raw data from text variables (e.g., verbatim responses from open-ended questions).

Text With No Blanks

The raw data from text variables (e.g., verbatim responses from open-ended questions), with the blank entries provided at the end (i.e., underneath the non-blank values).

Total %

The weighted proportion of respondents to give a particular response. This is computed as Population/Base Population.

Trimmed Average

The average computed after deleting the lowest 5% and highest 5% of respondents with non-missing values.

t-Statistic

Coefficient divided by the Standard Error, except for Ranking questions where the t-statistic is computed using the Probability % and the standard error for this computation is not shown. Only computed when using parametric statistical tests that compute a t-statistic.

Unique Text

Unique text responses (i.e., all duplicates removed).

Upper Confidence Interval

The upper-bound of the Confidence Interval. Only computed when using parametric statistical tests that compute a t-statistic or z-statistic.

Upper Confidence Interval %

The upper-bound of the Confidence Interval. Only computed when using parametric statistical tests that compute a t-statistic or z-statistic.

Values

The values of the variable(s) in a question (e.g., a “1” may represent men). Only available when RAW DATA is selected in the brown drop-down.

z-Statistic

The value from a unit normal distribution which corresponds to the two-tailed value of p. The further this value is from 0, the more statistically significant the difference between the observed data and the Expected data, where 1.96 and -1.96 each equate to p being 0.05. Larger numbers can be shown as infinity.