# Regression Diagnostics

## R-squared

The *R-squared* statistic is the squared correlation between observed and predicted values of the *outcome variable*.

Where the outcome variable is categorical, it is converted into integers (i.e., 1, 2, 3, ...) prior to computing the *R-squared* statistic. Where there is no natural ordering of the *R-Squared* will usually be uninformative.

The R-squared statistic implicitly assumes that that residuals from a regression have constant variance. This assumption is untrue except for linear regression. Consequently, when using a model that does not assume a linear outcome variable (e.g., *Quasi-Poisson*), it is theoretically possible (but rare in practice) that the R-squared statistic will decrease when a model improves. An alternative to the R-squared statistic, which does not suffer from this limitation, is *McFadden's rho-squared'*.

## Correct predictions

The proportion of predicted values that are the same as the observed values. See Regression - Diagnostic - Prediction-Accuracy Table for more information.

## AIC

Akaike's information criterion. SurveyAnalysis.org contains more information about how information criteria are computed and interpreted.

## McFadden's rho-squared

The interpretation of this statistic is broadly similar to the r-squared statistic: 0 indicates no relationship, 1 indicates a perfect model, and values in-between indicate different strengths of relationship. Unlike with R-squared, there is no useful way of comparing different values, other than noting their relativities (i.e., a rho-squared of 0.5 cannot be said to explain twice the variance as one of 0.25). It is primarily used for models with non-numeric outcome variables.