Missing Data - Plot of Patterns

From Q
Jump to navigation Jump to search


Creates a chart showing the patterns of missing data, with shading indicating missing values. As the number of variables increases, the number of possible patterns explode, making this chart very difficult to read. The most straightforward way to address this is to limit the number of variables that you chart. Alternatively, the font sizes are automatically reduced to assist in this problem, but a lot of zooming can be required. The font size can be manually set in the code by modifying the value of cex.numbers, where a 1 indicates a normal sized font, and smaller values indicate smaller font sizes.

Example

The columns at the top show the relative amount of missing values by variable. In this example, we can see that q3 has substantially more missing values than the other variables.

In the grid, each row represents a combination of missing values, with the blue indicating a missing value. The first row shows that 2 observations are missing values on q3 and q4, 336 have no missing values, and 362 are missing data only for q3.

Usage

To run this test in Displayr, go to Insert > More > Missing Data > Plot of Patterns (in Q, go to Automate > Browse Online Library > Missing Data > Plot of Patterns).

In the object inspector, under Inputs > Variables select the variables you want to analyze, change any other settings, and click Calculate to run the function.

Options

Variables The variables to appear in the rows, as categories.

Variable names Displays Variable Names in the output instead of labels.

Filter The data is automatically filtered using any filters prior to estimating the model.

Acknowledgements

This chart is from the aggr function of the VIM R package (Kowarik and Templ 2016).

Alexander Kowarik, Matthias Templ (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16.

Code

var heading_text = 'Missing Data Patterns';
if (!!form.setObjectInspectorTitle)
    form.setObjectInspectorTitle(heading_text, heading_text);
else 
    form.setHeading(heading_text);

form.dropBox({label: "Variables",
            types: ["Variable: Numeric, Date, Money, Categorical, OrderedCategorical"],
            name: "formVariables",
            multi: true,
            height: 8,
            prompt: "Select variables to plot"})

form.checkBox({label: "Variable names", name: "formNames", default_value: false, prompt: "Display names instead of labels"});
library(flipFormat)
library(flipTransformations)
library(VIM)

WarnIfVariablesSelectedFromMultipleDataSets()

raw.data <- ProcessQVariables(data.frame(if (length(formVariables) == 1) formVariables[[1]] else formVariables))

# long labels are removed in VIM
keep.all.labels <- function(labels, at, space.vert,
                         cex.axis = NULL, rotate = NULL, xlim = NULL) {
    return(rep(TRUE, length(labels)))
}
assignInNamespace("prettyLabels", keep.all.labels, "VIM")

names.or.labels <- if (formNames) Names else Labels
names(raw.data) <- sapply(names.or.labels(formVariables), substr, 1, 9)
raw.data <- raw.data[QFilter,, drop = FALSE]

k <- ncol(raw.data)
cex.numbers <- if (k < 8) 1 else 4 / k # Y-axis font sizes.
missing.data.pattern <- suppressWarnings(aggr(raw.data,
                    col = "skyblue",
                    numbers = TRUE,
                    combined = TRUE,
                    prop = FALSE,
                    sortVars = FALSE,
                    labels = names(raw.data),
                    cex.axis = 1,
                    cex.numbers = cex.numbers,
                    gap = 3,
                    ylab = "Missing data patterns"))