Regression Diagnostics

From Q
Jump to navigation Jump to search


The R-squared statistic is the squared correlation between observed and predicted values of the outcome variable.

Where the outcome variable is categorical, it is converted into integers (i.e., 1, 2, 3, ...) prior to computing the R-squared statistic. Where there is no natural ordering of the R-Squared will usually be uninformative.

The R-squared statistic implicitly assumes that that residuals from a regression have constant variance. This assumption is untrue except for linear regression. Consequently, when using a model that does not assume a linear outcome variable (e.g., Quasi-Poisson), it is theoretically possible (but rare in practice) that the R-squared statistic will decrease when a model improves. An alternative to the R-squared statistic, which does not suffer from this limitation, is McFadden's rho-squared'.

Correct predictions

The proportion of predicted values that are the same as the observed values. See Regression - Diagnostic - Prediction-Accuracy Table for more information.


Akaike's information criterion. contains more information about how information criteria are computed and interpreted.

McFadden's rho-squared

The interpretation of this statistic is broadly similar to the r-squared statistic: 0 indicates no relationship, 1 indicates a perfect model, and values in-between indicate different strengths of relationship. Unlike with R-squared, there is no useful way of comparing different values, other than noting their relativities (i.e., a rho-squared of 0.5 cannot be said to explain twice the variance as one of 0.25). It is primarily used for models with non-numeric outcome variables.