Why Do Shapley and Kruskal Driver Analysis Have Negative Scores?
Shapley and Kruskal driver analyses in Q can produce negative importance scores. This is inconsistent with how driver analysis routines in other programs work. However, it is done to avoid serious misinterpretations that can arise with the traditional computations.
How the signs are computed
As discussed in more detail at Driver (Importance) Analysis#Score signs:
- The actual importance scores are computed in the normal way.
- The signs - that is, whether they scores are positive or negative - are from a multiple linear regression.
Why showing negative signs is "wrong"
Both Shapley and Kruskal are conceived with the goal of computing whether or not a variable is "important", and neither framework has a concept of negative scores within it. As such, the implementation of these methods within Q is inconsistent with the frameworks that are being implemented, and if it is desired to exactly reproduce the Shapley and Kruskal methods, the negative signs should be ignored.
Reason for the adjustment
Shapley and Kruskal implicitly assume that the user understands the true sign of the underlying variable. That is, they assume that the user knows which variables have a positive impact on the outcome variable and which have a negative impact. However, our experience in addressing user support queries is that many users do not have this understanding, and end up drawing the wrong conclusions from these methods as a result.
The following table shows an example. It shows that being feminine is perceived by men as having a negative effect on their liking of cola brands, whereas for women there is a positive effect. If this analysis is performed using the traditional formulation of Shapley, it is easily misread and gives, to quote a user, "the profound insight that men want their colas to be feminine" (i.e., because the traditional calculation shows a score of 20.9 for Male and 6.6 for Female respondents).