Visualization - Sankey Diagram

From Q
Jump to: navigation, search

Creates a Sankey Diagram showing the flows between different values of variables. It is generally advisable to view only a small number of variables.

Creates a Sankey diagram showing the flows between different values of variables. It is generally advisable to view only a small number of variables. Please see the Sankey articles on our blog examples of different how to set up data for Sankey diagrams.

Example

Object Inspector Options

The following is an explanation of the options available in the Object Inspector for this specific visualization. Refer to Visualization Options for general chart formatting options.

Inputs

DATA SOURCE

There are three options for inserting table into a Sankey diagram:

Input table A table with each row describing a set of linked categories.
Variables Categorical variables from a Data set.
Paste or type data Enter a table with each row describing a set of linked categories.

Max. categories The maximum number of categories to display for each variable.

FILTERS & WEIGHT

Weight A dropdown that takes a numeric variable to control the size of each link. This option is only available if the Variables data source is used. Otherwise, use the checkbox last column contains weights.

Chart

APPEARANCE

Links colored by

None: all links are shown in grey.
Source: links are shown in the same color as the source node (left)
Target: links are shown in the same color as the target node (right)
First variable: similar to Source but nodes will also be the same color as nodes they are linked to on the left. If there are multiple such nodes, then the color will be taken from the node which is linked with the largest weight.
Last variable: similar to First variable, but using the color of the Target node, and looking at downstream links.

Variables share common values If the same colors should be used for each variable in the Sankey diagram.

Node colors / Node and link colors Customize colors of the nodes.

Node width Controls width of the nodes.

Vertical spacing between nodes Controls padding between nodes of the same variable.

LABELS

Font family Font family of node labels.

Font size Font size of node labels.

Include variable in node label Prefix node label with the variable name or label.

Include counts in node label Append node label with the number of observations in each category.

Include percentages in node label Append node label with the percentages of each category.

Variable names Displays Variable Names in the node labels if the Variables data source is used.

Tidy labels Extract common prefixes from the node labels.

Label maximum length Number of characters in the node label before it is truncated. Truncated labels will be indicated with an ellipsis. No truncation is applied to numeric variables.

HOVERTEXT

Show percentages instead of counts Show percentages instead of counts in the hovertext (tooltips) for nodes and links.

Acknowledgements

Uses on a variant of the networkD3 htmlwidget, created by Kent Russell.

Technical details

  • An error will occur if more than 20 variables are selected. It is generally advisable to show a relatively small number (e.g., 4 or 5).
  • Although the sankey diagram in this example shows flows between different values of variables, sankey diagrams can be used to show many other types of flows (e.g., migration patterns, regression trees, and energy flows (see https://christophergandrud.github.io/networkD3/).

Code

form.setHeading("Sankey Diagram")
var allow_control_groups = Q.fileFormatVersion() > 10.9; // Group controls for Displayr and later versions of Q
var displayr = Q.isOnTheWeb();
var template_prompt = "Create a template to control settings for all visualizations in the document by " + (displayr ? "selecting 'Insert > Utilities > Visualization > Create Template'" : "inserting 'Visualization > Create Template'");
function isEmpty(x) { return (x == undefined || x.getValue() == null && (x.getValues() == null || x.getValues().length == 0)) }
function isBlankSheet(x) { return (x.getValue() == null || x.getValue().length == 0) }
var controls = [];

if (allow_control_groups)
    form.group("DATA SOURCE");
var tableInput = form.dropBox({label: "Input table", types:["table", "RItem"], name: "formTable", multi : false, required: false});
var varInput = form.dropBox({label: "Variables", name: "formVariables", multi: true, min_inputs: 2, max_inputs: 20, required: false, types:["Variable: Numeric, Date, Money, Categorical, OrderedCategorical, Text"], prompt: "Choose variables from the same data set"});
var pastedInput = form.dataEntry({name: "formEnteredData", label: "Paste or type table", prompt: "Opens a spreadsheet into which you can paste data."})

if (!allow_control_groups || !isEmpty(tableInput) || (isBlankSheet(pastedInput) && isEmpty(varInput)))
    controls.push(tableInput);
if (!allow_control_groups || !isEmpty(varInput) || (isEmpty(tableInput) && isBlankSheet(pastedInput)))
    controls.push(varInput);
if (!allow_control_groups || !isBlankSheet(pastedInput) || (isEmpty(tableInput) && isEmpty(varInput)))
    controls.push(pastedInput);

if (!isEmpty(tableInput) || !isBlankSheet(pastedInput))
{
    var qContainsWgt = form.checkBox({label: "Last column contains weights", name: "formContainsWeights", default_value: false, prompt: "Use the last column of the input table as the weights variable"});
    controls.push(qContainsWgt);
}
var maxCat = form.numericUpDown({label: "Maximum number of categories", name: "formMaxCategories", increment: 1, minimum: 2, default_value: 10, maximum: 100, prompt: "Variables with more categories than this will a number of categories merged. The nodes are merged on the basis of similar linkage patterns"})
controls.push(maxCat);

if (allow_control_groups)
    form.page("Chart");
if (allow_control_groups)
    form.group("APPEARANCE");
var qTemplate = form.dropBox({name: "formTemplate", label: "Use template", types: ["RItem:AppearanceTemplate"], required: false, prompt: template_prompt});
controls.push(qTemplate);
var use_default_fonts = !isEmpty(qTemplate); 

var linkCol = form.comboBox({label: "Links colored by", name: "formLinkColors", alternatives: ['Target', 'Source', 'None', 'First variable', 'Last variable'], default_value: 'Source', prompt: "Choose color scheme for nodes and links"});
controls.push(linkCol);
if (linkCol.getValue() == "Target" || linkCol.getValue() == "Source")
{
    var qShared = form.checkBox({label: "Variables share common values", name: "formSharedValues", default_value: false});
    controls.push(qShared);
}

var colorLabel = "Node and link colors";
if (linkCol.getValue() == "None")
    colorLabel = "Node colors";
palettes = ["Default or template settings", "Legacy colors", "Default colors", "Colorblind safe colors", "Rainbow", "Light pastels", "Strong colors", "Spectral colors (red, yellow, blue)", "Spectral colors (blue, yellow, red)", "Reds, dark to light", "Reds, light to dark", "Greens, dark to light", "Greens, light to dark", "Blues, dark to light", "Blues, light to dark", "Greys, dark to light", "Greys, light to dark", "Heat colors (yellow, red)", "Terrain colors (green, beige, grey)", "Custom color", "Custom gradient", "Custom palette"];
gradual_palettes = ["Blues, light to dark", "Blues, dark to light", "Greys, light to dark", "Greys, dark to light", "Reds, light to dark", "Reds, dark to light", "Greens, light to dark", "Greens, dark to light", "Custom gradient", "Custom palette"];
gradual_palettes_red = ["Reds, light to dark", "Reds, dark to light", "Greys, light to dark", "Greys, dark to light", "Blues, light to dark", "Blues, dark to light", "Greens, light to dark", "Greens, dark to light", "Custom gradient", "Custom palette"];
qColor = form.comboBox({name: "formPalette", label: colorLabel, alternatives: palettes, default_value: palettes[0], required: true});
controls.push(qColor);
if (qColor.getValue() == "Custom color")
{
    var qCustCol = form.colorPicker({name: "formCustomColor", label: "Custom color", default_value: "#5C9AD3"});
    controls.push(qCustCol);
}
if (qColor.getValue() == "Custom gradient")
{
    var qCustGrad1 = form.colorPicker({name: "formCustomGradientStart", label: "Gradient start", default_value: "#5C9AD3"});
    var qCustGrad2 = form.colorPicker({name: "formCustomGradientEnd", label: "Gradient end", default_value: "#ED7D31"});
    controls.push(qCustGrad1);
    controls.push(qCustGrad2);
}
if (qColor.getValue() == "Custom palette")
{
    var qCustPalette = form.textBox({name: "formCustomPalette", label: "Custom palette", default_value: "#5C9AD3, #ED7D31", prompt: "Enter color as a string. Multiple values should be separated by commas."});
    controls.push(qCustPalette);
}
var qNodeWidth = form.numericUpDown({label: "Node width", name: "formNodeWidth", minimum: 0, maximum: 100, default_value: 30});
controls.push(qNodeWidth);
var qNodePad = form.numericUpDown({label: "Vertical spacing between nodes", name: "formNodePad", minimum: 0, maximum: 100, default_value: 10});
controls.push(qNodePad);
var qNodeRight = form.checkBox({name: "formNodeRight", label: "Place right-most nodes at the edge", default_value: false});
controls.push(qNodeRight);

if (allow_control_groups)
    form.group("LABELS");
font_families = ["Arial", "Arial Black", "Comic Sans MS",  "Courier New", "Georgia", "Impact", 
                 "Open Sans", "Tahoma", "Times New Roman", "Trebuchet MS", "Verdana"];

var qFontDefault = form.checkBox({name: "formFontDefault", label: "Use default or template font settings (values axis title)", default_value: use_default_fonts, prompt: template_prompt});
controls.push(qFontDefault);
if (!qFontDefault.getValue())
{    
    var qFontFamily = form.comboBox({label: "Font family", name: "formFontFamily", alternatives: font_families, default_value: "Arial"});
    var qFontSize = form.numericUpDown({label: "Font size", name: "formFontSize", default_value: 9, minimum: 4});
    var qFontUnits = form.comboBox({name: "formFontUnit", label: "Font units", alternatives: ["pt", "px"], default_value: "pt", prompt: "Are font sizes specified in terms of points or pixels?"});
    controls.push(qFontFamily);
    controls.push(qFontSize);
    controls.push(qFontUnits);
}

var qVarShow = form.checkBox({label: "Include variable in node labels", name: "formShowVar", default_value: true, prompt: "Node labels are prefixed with the variable name or label"});
controls.push(qVarShow);
var qCountsShow = form.checkBox({label: "Include counts in node labels", name: "formShowCounts", default_value: false, prompt: "Append node labels with the number of observations in each category"});
controls.push(qCountsShow);
var qPercentagesShow = form.checkBox({label: "Include percentages in node labels", name: "formShowPercentages", default_value: false, prompt: "Append node labels with the percentages of each category"});
controls.push(qPercentagesShow);

if (!isEmpty(varInput) && qVarShow.getValue())
{
    var qVarNames = form.checkBox({label: "Variable names", name: "formNames", default_value: false, prompt: "Show variable names instead of variable labels"});
    controls.push(qVarNames);
}
var qTidyLabels = form.checkBox({label: "Tidy labels", name: "formTidyLabels", default_value: true, prompt: "Extract common prefixes to simpliy labels"});
controls.push(qTidyLabels);
var qLabelMaxLen = form.numericUpDown({label: "Label maximum length", name: "formLabelMaxLen", default_value:100, minimum:10, maximum: 500, increment: 5, prompt: "Maximum number of characters before labels are truncated. Truncated labels will be indicated with an ellipsis"});
controls.push(qLabelMaxLen);

if (allow_control_groups)
    form.group("HOVERTEXT");
var qHoverPercentages = form.checkBox({label: "Show percentages instead of counts", name: "formHoverPercentages", default_value: false});
controls.push(qHoverPercentages);
form.setInputControls(controls);
library(flipPlots)
library(flipData)
library(flipFormat)
library(flipChartBasics)

weights <- NULL
dat <- NULL
dat <- get0("formTable")
if (is.null(dat))
{
    if (exists("formEnteredData") && sum(dim(formEnteredData)) > 0)
        dat <- flipTransformations::ParseEnteredData(formEnteredData)
}
if (is.null(dat))
{
    dat <- as.data.frame(get0("formVariables"))
    if (is.null(dat) || sum(dim(dat)) == 0)
        stop("No data has been provided.")
    weights <- QPopulationWeight
    names(dat) <- if (!isTRUE(get0("formNames"))) Labels(formVariables) else Names(formVariables)
}

if (isTRUE(get0("formContainsWeights")))
{
    weights <- dat[,ncol(dat)]
    dat <- dat[,-ncol(dat)]
}
if (formTidyLabels)
    names(dat) <- ExtractCommonPrefix(names(dat))$shortened.labels
dat <- TidyRawData(dat, weights = weights, subset = QFilter, missing = "Use partial data", error.if.insufficient.obs = FALSE)

sankey.dat <- SankeyDiagram(dat, max.categories = formMaxCategories, link.color = formLinkColors,
                        subset = TRUE, weights = attr(dat, "weights"),
                        variables.share.values = get0("formSharedValues", ifnotfound=FALSE),
                        hovertext.show.percentages = formHoverPercentages,
                        label.show.counts = formShowCounts, label.show.percentages = formShowPercentages,
                        label.show.varname = formShowVar, label.max.length = formLabelMaxLen,
                        output.data.only = TRUE)
num.colors <- length(unique(sankey.dat$nodes$group))
u.ind <- which(!duplicated(sankey.dat$nodes$group, fromLast = formLinkColors %in% c("Target", "Last variable")))
u.names <- sankey.dat$nodes$name[u.ind]

# Different from 'Named colors' in other charts because nodes can be merged
# And questions names/counts/percentages may be included in node name
# we use partial matching.
template <- get0("formTemplate")
if (!is.null(template))
{
    bcol <- rep(NA, num.colors)
    names(bcol) <- u.names
    col.list <- if (!is.null(template$brand.colors)) template$brand.colors else template$colors
    col.names <- names(col.list)
    col.ord <- order(nchar(col.names))
    for (i in col.ord)
    {
        ind <- grep(paste0("\\Q", col.names[i], "\\E"), u.names)
        if (length(ind) > 0)
           bcol[ind] <- col.list[i]
    }
    template$brand.colors <- bcol
}

colors <- NULL
if (formPalette != "Legacy colors")
    colors <- ChartColors(num.colors, given.colors = GetPalette(formPalette, template),
                custom.color = formCustomColor,
                custom.gradient.start = formCustomGradientStart,
                custom.gradient.end = formCustomGradientEnd,
                custom.palette = formCustomPalette, silent = TRUE) 


if (is.null(template))
    template <- list(global.font = list(family = "Arial", color = "#2C2C2C", 
        size = 7.5, units = "pt"), fonts = list(`Values axis title` = list(
        family = "Arial", color = "#2C2C2C", size = 9)))                           


                                 
sankey <- SankeyDiagram(links.and.nodes = sankey.dat, colors = colors, sinks.right = formNodeRight,
                          font.family = get0("formFontFamily", ifnotfound = template$fonts$`Values axis title`$family), 
                          font.size = get0("formFontSize", ifnotfound = template$fonts$`Values axis title`$size),
                          font.unit = get0("formFontUnit", ifnotfound = template$global.font$units), 
                          node.width = formNodeWidth, node.padding = formNodePad)