Regression - Diagnostic - Plot - Cook's Distance

From Q
Jump to: navigation, search

A line/rug plot showing Cook's Distance for each observation fitted in the regression model.

Example

Sample output from plotting the Cook's distances for a quasi-Poisson regression model. No data points appear to be overly influential.

Details

Used to detect highly influential data points, i.e. data points that can have a large effect on the outcome and accuracy of the regression. For large sample sizes, a rough guideline is to consider values above 4/(n-p), where n is the sample size and p is the number of predictors including the intercept, to indicate highly influential points.

Acknowledgements

Uses plot.lm and/or plot.glm function from the stats R package.

References

Cook, R. Dennis (1977). Detection of Influential Observations in Linear Regression. Technometrics. American Statistical Association. 19 (1): 15–18. DOI: 10.2307/1268249.

Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics. 36: 181-191. DOI: 10.2307/2347550.

Fox, J, & Weisberg, S. (2011). An R Companion to Applied Regression. 2nd Edition. SAGE Publications. ISBN: 9781412975148.

Code

includeWeb("QScript R Output Functions");

main();

function main() {

    // The following 2 variables contain information specific to this diagnostic.
    var required_class = "Regression";
    var output_name_suffix = "cooks.distance";
    
    var item = checkSelectedItemClass(required_class);
    if (item == null)
        return false;
    var r_name = stringToRName(item.referenceName);

    // The following lines contain the R code to run
    var expression = "plot(" + r_name + ", which = 4)"

    return createROutput(item, expression, output_name_suffix);
}