Regression - Diagnostic - Plot - Cook's Distance vs Leverage extension

From Q
Jump to navigation Jump to search


A scatterplot showing Cook's distance vs leverage for each observation from a regression.

Example

The following output shows and example plot of the Cook's distances vs leverage for a linear regression model.

Details

Cook's distance and leverage are used to detect highly influential data points, i.e. data points that can have a large effect on the outcome and accuracy of the regression. For large sample sizes, a rough guideline is to consider Cook's distance values above 1 to indicate highly influential points and leverage values greater than 2 times the number of predictors divided by the sample size to indicate high leverage observations. High leverage observations are ones which have predictor values very far from their averages, which can greatly influence the fitted model.

The contours in the scatterplot are standardized residuals labelled with their magnitudes.

Acknowledgements

Uses plot.lm and/or plot.glm function from the stats R package.

References

Cook, R. Dennis (1977). Detection of Influential Observations in Linear Regression. Technometrics. American Statistical Association. 19 (1): 15–18. DOI: 10.2307/1268249.

Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics. 36: 181-191. DOI: 10.2307/2347550.

Fox, J, & Weisberg, S. (2011). An R Companion to Applied Regression. 2nd Edition. SAGE Publications. ISBN: 9781412975148.

Code

includeWeb("QScript R Output Functions");

const menu_location = "Regression > Diagnostic > Plot > Cook's Distance vs Leverage";
errorIfExtensionsUnavailableInQVersion(menu_location);
createDiagnosticROutputFromSelection(menu_location);