# Regression - Quasi-Poisson Regression

The Quasi-Poisson Regression is a generalization of the Poisson regression and is used when modeling an overdispersed count variable.

The Quasi-Poisson Regression is a generalization of the Poisson regression and is used when modeling an overdispersed count variable.

The Poisson model assumes that the variance is equal to the mean, which is not always a fair assumption. When the variance is greater than the mean, a Quasi-Poisson model, which assumes that the variance is a linear function of the mean, is more appropriate.

## Data format

The Quasi-Poisson model requires a count variable as the dependent variable. In Displayr, the best data format for this type is Numeric. A count variable must only include positive integers.

The independent variables can be continuous, categorical, or binary — just as with any regression model.

## Interpretation

Variable statistics measure the impact and significance of individual variables within a model, while overall statistics apply to the model as a whole. Both are shown in the output.

### Variable statistics

Estimate the magnitude of the coefficient indicates the size of the change in the independent variable as the value of the dependent variable changes. A positive number indicates a direct relationship (y increases as x increases), and a negative number indicates an inverse relationship (y decreases as x increases).

The coefficient is colored and bolded if the variable is statistically significant at the 5% level.

Standard Error measures the accuracy of an estimate. The smaller the standard error, the more accurate the predictions.

t-statistic the estimate divided by the standard error. The magnitude (either positive or negative) indicates the significance of the variable. The values are highlighted based on their magnitude.

p-value expresses the t-statistic as a probability. A p-value under 0.05 means that the variable is statistically significant at the 5% level; a p-value under 0.01 means that the variable is statistically significant at the 1% level. P-values under 0.05 are shown in bold.

### Overall statistics

n the sample size of the model

R-squared & McFadden’s rho-squared assess the goodness of fit of the model. A larger number indicates that the model captures more of the variation in the dependent variable.

AIC Akaike information criterion is a measure of the quality of the model. When comparing similar models, the AIC can be used to identify the superior model.

## Example

The example below is a Quasi-Poisson regression that models a survey respondent’s fast-food consumption based on characteristics like age, gender, and work status.

### Create a Quasi-Poisson Regression Model in Displayr

1. Go to Insert > Regression > Quasi-Poisson Regression
2. Under Inputs > Outcome, select your dependent variable
3. Under Inputs > Predictor(s), select your independent variables

## Object Inspector Options

Outcome The variable to be predicted by the predictor variables.

Predictors The variable(s) to predict the outcome.

Algorithm The fitting algorithm. Defaults to Regression but may be changed to other machine learning methods.

Type: You can use this option to toggle between different types of regression models, but note that certain types are not appropriate for certain types of outcome variable. For a count outcome variable, the other types to consider are Poisson and NBD.

Linear See Regression - Linear Regression.
Binary Logit See Regression - Binary Logit.
Ordered Logit See Regression - Ordered Logit.
Multinomial Logit See Regression - Multinomial Logit.
Poisson See Regression - Poisson Regression.
Quasi-Poisson.
NBD See Regression - NBD Regression.

Robust standard errors Computes standard errors that are robust to violations of the assumption of constant variance (i.e., heteroscedasticity). See Robust Standard Errors. This is only available when Type is Linear.

Missing data See Missing Data Options.

Output

Summary The default; as shown in the example above.
Detail Typical R output, some additional information compared to Summary, but without the pretty formatting.
ANOVA Analysis of variance table containing the results of Chi-squared likelihood ratio tests for each predictor.
Relative Importance Analysis The results of a relative importance analysis. See here and the references for more information. This option is not available for Multinomial Logit. Note that categorical predictors are not converted to be numeric, unlike in Driver (Importance) Analysis - Relative Importance Analysis.
Effects Plot Plots the relationship between each of the Predictors and the Outcome. Not available for Multinomial Logit.

Correction The multiple comparisons correction applied when computing the p-values of the post-hoc comparisons.

Variable names Displays Variable Names in the output instead of labels.

Absolute importance scores Whether the absolute value of Relative Importance Analysis scores should be displayed.

Auxiliary variables Variables to be used when imputing missing values (in addition to all the other variables in the model).

Weight. Where a weight has been set for the R Output, it will automatically applied when the model is estimated. By default, the weight is assumed to be a sampling weight, and the standard errors are estimated using Taylor series linearization (by contrast, in the Legacy Regression, weight calibration is used). See Weights, Effective Sample Size and Design Effects.

Filter The data is automatically filtered using any filters prior to estimating the model.

Crosstab Interaction Optional variable to test for interaction with other variables in the model. See Linear Regression for more details.

Automated outlier removal percentage Optional control to remove possible outliers in the data. See Linear Regression for more details on the general methodology. The specific residual used in the case of Quasi-Poisson regression is a type of deviance residual in an unweighted regression and the Pearson residual in a weighted regression. In the unweighted regression case, it uses the rstudent function, giving a quasi-deviance type residual. The Pearson residual in the weighted case adjusts appropriately for the provided survey weights. More details of residual types are found in Davison and Snell (1991).

Stack data Whether the input data should be stacked before analysis. Stacking can be desirable when each individual in the data set has multiple cases and an aggregate model is desired. More information is available at Stacking Data FilesStacked Data. If this option is chosen then the Outcome needs to be a single Question that has a Multi type structure suitable for count regression such as a Number - MultiVariable Set that has a Multi type structure suitable for count regression such as a Numeric - Multi. Similarly, the Predictor(s) need to be a single Question that has a Grid type structure such as a Pick Any - Grid or a Number - GridVariable Set that has a Grid type structure such as a Binary - Grid or a Numeric - Grid. In the process of stacking, the data reductionData Reduction is inspected. Any constructed NETs are removed unless comprised of source values that are mutually exclusive to other codes, such as the result of merging two categories.

Random seed Seed used to initialize the (pseudo)random number generator for the model fitting algorithm. Different seeds may lead to slightly different answers, but should normally not make a large difference.

Additional options are available by editing the code.

### DIAGNOSTICS

Cook's distance plot Creates a line/rug plot showing Cook's Distance for each observation.

Cook's distance vs leverage plot Creates a scatterplot showing Cook's distance vs leverage for each observation.

Influence index plot Creates index plots of studentized residuals, hat values, and Cook's distance.

Multicollinearity (VIF) table Creates a table containing variance inflation factors (VIF) to diagnose multicollinearity.

Normal Q-Q plot Creates a normal Quantile-Quantile (QQ) plot to reveal departures of the residuals from normality.

Prediction-accuracy table Creates a table showing the observed and predicted values, as a heatmap.

Residual normality (Shapiro-Wilk) test Conducts a Shapiro-Wilk test of normality on the (deviance) residuals.

Residuals vs fitted plot Creates a scatterplot of residuals versus fitted values.

Residuals vs leverage plot Creates a plot of residuals versus leverage values.

Scale-location plot Creates a plot of the square root of the absolute standardized residuals by fitted values.

Serial correlation (Durbin-Watson) test Conducts a Durbin-Watson test of serial correlation (auto-correlation) on the residuals.

### SAVE VARIABLE(S)

Save fitted values Creates a new variable containing fitted values for each case in the data.

Save predicted values Creates a new variable containing predicted values for each case in the data.

Save residuals Creates a new variable containing residual values for each case in the data.

When using this feature you can obtain additional information that is stored by the R code which produces the output.

1. To do so, select Create > R Output.
2. In the R CODE, paste: item = YourReferenceName
3. Replace YourReferenceName with the reference name of your item. Find this in the Report tree or by selecting the item and then going to Properties > General > Name from the object inspector on the right.
4. Below the first line of code, you can paste in snippets from below or type in str(item) to see a list of available information.

For a more in depth discussion on extracting information from objects in R, checkout our blog post here.

Properties which may be of interest are:

• Summary outputs from the regression model:
item\$summary\$coefficients # summary regression outputs

## Acknowledgements

Uses the glm from the stats R package. If weights are supplied, the svyglm function from the survey R package is used. See also Regression - Generalized Linear Model.