Regression - Stepwise

From Q
Jump to navigation Jump to search


The Stepwise Regression function is a method of systematically selecting variables to fit a model. Based on the Akaike Information Criterion (AIC), the method adds or subtracts explanatory variables from a specified regression model.

Interpretation

Variable statistics measure the impact and significance of individual variables within a model, while overall statistics apply to the model as a whole. Both are shown in the regression output. The variables omitted from the stepwise regression are listed at the top of the input

Variable statistics

Estimate the magnitude of the coefficient indicates the size of the change in the independent variable as the value of the dependent variable changes. A positive number indicates a direct relationship (y increases as x increases), and a negative number indicates an inverse relationship (y decreases as x increases.

The coefficient is colored if the variable is statistically significant at the 5% level.

Standard Error measures the accuracy of an estimate. The smaller the standard error, the more accurate the predictions.

t-statistic the estimate divided by the standard error. The magnitude (either positive or negative) indicates the significance of the variable. The values are highlighted based on their magnitude.

p-value expresses the t-statistic as a probability. A p-value under 0.05 means that the variable is statistically significant at the 5% level; a p-value under 0.01 means that the variable is statistically significant at the 1% level. P-values under 0.05 are shown in bold.

Overall statistics

n the sample size of the model

R-squared assess the goodness of fit of the model. A larger number indicates that the model captures more of the variation in the dependent variable.

AIC Akaike Information Criterion is a measure of the quality of the model. It is the method used to determine whether a variable is included in a stepwise regression.

Example

The following example applies a stepwise regression to a linear model. It uses a forward selection approach (Direction > Forward), which means the regression begins with no variables and tests the addition of each variable to build the model.

The stepwise regression includes fewer variables than the original linear model, omitting variables that do not provide statistically significant improvement.

Before Stepwise After Stepwise

Create a Stepwise Regression Model in Displayr

1. Go to Insert > Regression > Stepwise Regression
2. Under Inputs > Regression model, select the model you want to apply stepwise to
3. [OPTIONAL] Under Inputs > Variables to always include, select any variables that must be included in the model

Object Inspector Options

Regression model A regression R item produced as a result of running Regression - Linear Regression for example. Compatible with all types of regression R items except unweighted Quasi-Poisson models and models estimated using partial data (pairwise correlations) or imputation of missing values.

Output type:

Final: The non-detailed output of the regression model that was chosen as a result of the selection process. This is the default.
Detailed: The detailed text output of the regression model that was chosen as a result of the selection process, as well as the initial and final model formulae, and an overview of which variables were added or removed at each step, with corresponding AIC values.
All: Same as Detailed, plus complete information on each step of the selection process.

Direction:

Forward: Forward selection of variables, starting from an empty model with only the intercept.
Backward: Backward elimination of variables, starting from the original model. This is the default.

Variables to always include The variables that should always be included in the selected model. These variables need to be in the original model. If a variable is not in the original model, it will be ignored, and a warning message will be displayed.

Maximum steps The maximum number of steps to be considered.

Missing Data

The way missing data is treated depends on how missing data was treated in the original model. If exclude cases with missing data was chosen, only cases with no missing data in all predictor variables will be used in the stepwise process, so that models are compared using the same set of cases. Stepwise regression is not compatible with models using partial data (pairwise correlations) and imputation of missing values.

Acknowledgements

Uses the function stepAIC from the R package MASS.

References

Venables, W. N., & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. New York, NY: Springer. ISBN: 0-387-95457-0.

Code

form.setHeading('Stepwise regression');
form.dropBox({label: "Regression model", name: "formModel", types:["RItem:Regression"],
              prompt: "Select the Regression output you would like to perform Stepwise regression on"});
form.comboBox({label: "Output type", alternatives: ["Final", "Detailed", "All"], 
              prompt: "Select how you would like the results of the algorithm displayed.",
              name: "formOutput", default_value: "Final"});
form.comboBox({label: "Direction", alternatives: ["Forward", "Backward"],
              prompt: "The type of algorithm to use.",
              name: "formDirection", default_value: "Backward"});
form.dropBox({label: "Variables to always include (must be in regression model)",
            types:["Variable: Numeric, Date, Money, Categorical, OrderedCategorical"], 
            name: "formAlwaysInclude",
            required: false, 
            multi:true});
form.numericUpDown({label: "Maximum steps", name:"formMaxSteps", minimum: 0,
                    prompt: "Maxiumum number of iterations to run the algorithm for.",
                    maximum: Number.MAX_SAFE_INTEGER, default_value: 1000, increment:1});
library(flipRegression)
if (!is.null(QCalibratedWeight))
    warning("The weight applied to this item was ignored. The analysis will be automatically weighted if a weight was applied to the original regression.")
if (any(QFilter != 1))
    warning("The filter(s) applied to this item were ignored. The analysis will be automatically filtered if filters were applied to the original regression.")
getName <- function(element)
{
    attr(element, "name")
}
always.include <- unlist(lapply(formAlwaysInclude, getName))
stepwise <- Stepwise(formModel, formOutput, formDirection, always.include, formMaxSteps)