Spearman’s Correlation

From Q
Jump to navigation Jump to search

The correlation between two variables, [math]\displaystyle{ x' }[/math] and [math]\displaystyle{ y' }[/math] is:

[math]\displaystyle{ r = \frac{\sum ^n _{i=1}w_i(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum ^n _{i=1}w_i(x_i - \bar{x})^2} \sqrt{\sum ^n _{i=1}w_i(y_i - \bar{y})^2}} }[/math]

where:

[math]\displaystyle{ x = rank(x') }[/math] and [math]\displaystyle{ y = rank(y') }[/math] where ties are replaced by their means,
[math]\displaystyle{ \bar{x} }[/math] and [math]\displaystyle{ \bar{y} }[/math] are their means, respectively,
[math]\displaystyle{ w_i }[/math] is the Calibrated Weight for the [math]\displaystyle{ i }[/math]th of [math]\displaystyle{ n }[/math] observations
[math]\displaystyle{ p \approx \Pr(t_{\sum^n_{i=1}w_i-2} \ge r\sqrt{\frac{\sum^n_{i=1}w_i-2}{1 - r^2}}) }[/math]

Note: a bug existed in versions of Q prior to Q5.0.2, caused by Spearman ranks being calculated without omitting the cases with missing values in any of the two input variables. The impact of this bug should be minimal and proportional to the amount of missing values in the data set. We have observed that the bug typically resulted in a difference in the reported Spearman correlation at the 4th decimal place compared to the correct value. As a result, is unlikely that this bug would have affected Spearman correlation significance test results and any inferences made from them.

See also

Correlations - Comparing Two Numeric Variables