Include Imputation Technical Details

From Q
Jump to navigation Jump to search

By default, data is imputed using the default settings from the mice R package, which employs Multivariate Imputation by Chained Equations (predictive mean matching) [1]. Care should be taken to ensure that variables have the correct variable type, as this has a big impact on this algorithm. Where a technical error is experienced using mice, the imputation is performed using hot-decking, via the hot.deck package in R.[2]

When applied with regression, missing values in the outcome variable are excluded from the analysis after the imputation has been performed.[3]

Note that although imputation can reduce the bias of parameter estimates, it can create misleading statistical inference (e.g., as the simulated sample size is assumed to be the actual sample size in calculations).

  1. Stef van Buuren and Karin Groothuis-Oudshoorn (2011), "mice: Multivariate Imputation by Chained Equations in R", Journal of Statistical Software, 45:3, 1-67.
  2. Skyler J. Cranmer and Jeff Gill (2013). We Have to Be Discrete About This: A Non-Parametric Imputation Technique for Missing Categorical Data. British Journal of Political Science, 43, pp 425-449.
  3. von Hippel, Paul T. 2007. "Regression With Missing Y's: An Improved Strategy for Analyzing Multiply Imputed Data."