Regression - Driver (Importance) Analysis - Shapley

From Q
Jump to: navigation, search

This QScript computes Shapley Importance Scores, normalized so that their absolute values add up to 100%.

Technical details

Shapley importance determines what proportion of R-square from a linear regression model can be attributed to each independent variable. For cases where there are more than 15 independent variables, Relative Importance Analysis values are returned, as the two yield highly similar results. In such cases, Relative Importance Analysis runs in a reasonable length of time, in contrast to Shapley, which could take a few minutes to a few hours. Driver (Importance) Analysis contains a more technical discussion of this and alternative ways of computing importance. Dependent and independent variables may be any non-text variables from the same data file in the project. All input variables are treated as numeric under the linear regression model.

Sign

For Q 4.9.1 or later, (positive and negative) signs are applied to driver analysis scores to match the signs of the corresponding linear regression coefficients from the model including all of the independent variables. Thus the direction of the influence of each independent variable is presented in the scores, in addition to the magnitude. The magnitudes of the driver analysis scores (except for Elasticity) are normalized to sum to 100%. Statistical tests are conducted on the signed raw scores, and the value of test statistics may be different from previous versions, resulting in different test results.

R-squared

The R-Squared statistic is presented at the bottom of the output table. It is computed using a (weighted) linear regression with all of the independent variables.

Crosstabs

A crosstab with a Pick One, Pick Any or Date question may be created by selecting such a question in the brown drop-down. A separate driver analysis is conducted for each category in the question in the brown drop-down, which corresponds to a column in the crosstab. The significance in each crosstab cell is determined according to Testing the Complement of a Cell, i.e. a t-test is conducted comparing whether the raw score in the cell is statistically different from the raw score obtained from the complement of the category.

Charts

Certain types of charts (e.g. bar and column charts) may be created from the driver analysis table. As with tables, only certain statistics are valid for charts.

Inaccurate tooltips

The score and confidence bound are written over the top of existing statistics. However, the tooltips will relate to the original statistics, and so should be ignored.

Further reading: Key Driver Analysis Software

Versions

This QScript requires Q 4.8.7. For Q 4.8.8 or later, the dependent question, independent question, crosstab question, filters and weight may be from two data files, e.g., the dependent question, the first filter and weight may be from data file 1 while the independent question, crosstab question and the second filter may be from data file 2. These two data files must have a one to one relationship specified. Note however that three or more data files are not supported and the variables in the independent question must be from the same data file.

For Q 4.9.1 or later, (positive and negative) signs are applied to driver analysis scores to match the signs of the corresponding linear regression coefficients. Thus the direction of the influence of each independent variable is presented in the scores, in addition to the magnitude. The magnitudes of the driver analysis scores (except for Elasticity) will still be normalized to sum to 100%. Statistical tests are conducted on the signed raw scores, and the value of test statistics may be different from previous versions, resulting in different test results.

As of 1 April 2016, the ability to reuse previous input independent and dependent questions has been removed. In addition, the independent question may no longer be changed in the table. This is due to the discovery of a bug in which these features lead to incorrect data being used in calculations. When running driver analysis, each independent question is constructed specifically for the dependent question that was selected. This is so that missing cases from the dependent question are also marked as missing in the independent question. Therefore independent questions should not be reused in an analysis with another dependent question, otherwise missing cases from the original dependent question will also be missing in the new analysis. This problem will be apparent in the table, as the Base n reported below the table will be smaller than the the actual sample size (which is the number of respondents who have complete data for both dependent and independent variables). Driver analysis tables created prior to 1 April 2016 may be affected if they were created with previous driver analysis questions, or if their independent questions have been replaced in the Blue dropdown, and these can be replaced by:

  1. Taking note of the Dependent and Independent variables that have been used.
  2. Right-clicking the table in the Report and selecting Delete.
  3. Running the appropriate driver analysis option from the Online Library, selecting the same Dependent and Independent variables.

Further reading: Key Driver Analysis Software

Statistics

Statistics - Right and Statistics - Below are not available for driver analysis tables. Only the Statistics - Cells are valid driver analyses:

  • Shapley Importance (normalized scores that sum to 100%)
  • t-Statistic
  • Expected Coefficient
  • Column Population
  • Upper Confidence Bound 0.975 (upper limit of the 95% confidence interval)
  • Lower Confidence Bound 0.025 (lower limit of the 95% confidence interval)
  • Raw Shapley Importance (raw scores)
  • p
  • Corrected p
  • Multiple Comparison Adjustment
  • z-Statistic
  • Not Duplicate
  • Standard Error

How to apply this QScript

  • Start typing the name of the QScript into the Search features and data box in the top right of the Q window.
  • Click on the QScript when it appears in the QScripts and Rules section of the search results.

OR

  • Select Automate > Browse Online Library.
  • Select this QScript from the list.

Customizing the QScript

This QScript is written in JavaScript and can be customized by copying and modifying the JavaScript.

Customizing QScripts in Q4.11 and more recent versions

  • Start typing the name of the QScript into the Search features and data box in the top right of the Q window.
  • Hover your mouse over the QScript when it appears in the QScripts and Rules section of the search results.
  • Press Edit a Copy (bottom-left corner of the preview).
  • Modify the JavaScript (see QScripts for more detail on this).
  • Either:
    • Run the QScript, by pressing the blue triangle button.
    • Save the QScript and run it at a later time, using Automate > Run QScript (Macro) from File.

Customizing QScripts in older versions

  • Copy the JavaScript shown on this page.
  • Create a new text file, giving it a file extension of .QScript. See here for more information about how to do this.
  • Modify the JavaScript (see QScripts for more detail on this).
  • Run the file using Automate > Run QScript (Macro) from File.

JavaScript

// include jStat
includeWeb('Driver Analysis Functions');
if (Q.isOnTheWeb())
    log("This function is not yet available in Displayr.")
else
{
    if (!driverAnalysis('Shapley Importance'))
        log('QScript was cancelled.');
}

Prior to the 4th of August, 2016, this page was known as Multivariate - Driver (Importance) Analysis - Shapley

See also

Further reading: Key Driver Analysis Software