Segments - Segment Comparison Table

From Q
Jump to: navigation, search


Example

Options

Use one of Variable containing segment membership (default) or K-Means. The second option allows users to construct a K-Means model and profile the predicted clusters in one step.

Variable A nominal or ordinal variable. The categories of the variable will make up the columns of this table.

Profiling variables One or more variable or variable sets. These can be of any type except text.

Show index values Show column percentages and averages as a proportion of the total for the row. This is the same as shown in Q tables.

Shade Provides the option to color each cell based on the standardized value in the cell. The color can be applied to

None No shading is applied
Cell colors Shading is applied to the cell fill color
Font colors Shading is applied to the text in the cell
Arrows Shading is applied to up/down arrows next to the cell value
Fonts and arrows Shading is applied to both the arrow and the cell text
Boxes Shading is applied to a box drawn around the cell text. The width and corner roundness of the box can be adjusted.
Bars Shading is applied to the bars, while the length of the bars reflects the difference between the cell value and the row mean.

Thresholds Values in each cell are standardized and compared against these thresholds to determine whether a cell is shaded by the color for Very small values, Small values, Large values, or Very large values. Numeric values are standardized by dividing by 2 * standard deviation as suggested by Gelman (2007). Column percentages are standardized by dividing by row totals (i.e. they are the index values).

Only shade significant results By default, cell values which are very large or small but not statistically significant will not be shaded. This may occur because there are less observations for a particular category.

Color cell text conditional on significance Color the text in the cells based on whether the average or column percentage is significantly different from the value for observations not in that segment (i.e. column).

Non-significant font color The text in cells which are not significant will be shown in this color instead of the font color.

Use non-parametric test For numeric variables, use the ranks instead of the numeric values to conduct the t-test. This is the same as using a non-parametric test in the statistical assumptions for a Q Table.

False discovery rate correction Adjust p-values to account for the multiple tests conducted in the table.

Confidence level Threshold used in the test to determine significance.

Column labels Optional comma-separated list to override the column labels (or segment names).

Rows to hide Specify rows to hide as a comma separated list. Row names which are not used in the table will be ignored. Use double quotes to escape row names containing commas.

Decimals shown for percentages Number of decimals shown for categoric variables.

Decimals shown for numeric data Number of decimals shown for numeric variables.

Font family The font family for all text in the table

Font size The font size of all text in the table.

Font units The units in which the font size is specified. This can be either "px" or "pt".

Row height The height of a row in the table (with no word wrap). This value will be specified in terms of font units. By default it is 5 + font size.

Column widths A column separated list of values (including units, e.g. "px", "pt", "%") specifying the widths of the column. Each value will be a single column starting from the left. The remaining columns with no specified widths will be equally sized to fill the remaining space.

Column header fill The color of the column header cells.

Row header fill The color of the row header cells containing the names of the profiling variables.

Summary rows fill The color of the cells in the first two rows showing the breakdown of the segmentation variable.

Cell fill The color of the cells (excluding row/column headers) and cells colored by conditional formatting.

Border color The color of the border.

Border width The width of the border in pixels.

Code

var defaultSegType = "Variable containing segment membership";
var segmentType = form.comboBox({name: "formSegmentType", label: "Use", alternatives: ["K-Means", "Variable containing segment membership"], default_value: defaultSegType}).getValue();

if (segmentType == "K-Means")
{
    pageTitle = 'K-Means Cluster Analysis';
    form.dropBox({label: "Variables", types: ["Q: pickone, pickonemulti, number, numbermulti, numbergrid, pickany, pickanycompact, pickanygrid", "Variable: Numeric, Date, Money, Categorical, OrderedCategorical"],  name: "formVariables", multi: true, min_inputs: 1, height: 8, prompt: "Select at least two Variables"}) 
    form.numericUpDown({label: "Number of clusters", increment: 1, minimum: 2, maximum: 1000, default_value: 2, name: "formNumberClusters", prompt: "Specify the number of clusters to identify"})
    form.comboBox({label: "Missing data",  alternatives: ["Error if missing data", "Exclude cases with missing data", "Use partial data", "Imputation (replace missing values with estimates)"],  name: "formMissing", default_value: "Use partial data", prompt: "Options for handling cases with missing data"})
    form.comboBox({label: "Algorithm", alternatives: ["Batch", "Hartigan-Wong", "Forgy", "Lloyd", "MacQueen"], name: "formAlgorithm", default_value: "Batch", prompt: "Specify the k-means clustering algorithm to use"})
    var outputType = form.comboBox({label: "Output", alternatives: ["Means", "Means table", "Segment profiling table"], name: "formOutput", default_value: "Means", prompt: "Select the output type"}).getValue();
    form.textBox({name: "formColLabels", label: "Cluster labels", default_value: "Cluster 1, Cluster 2, Cluster 3", required: false, prompt: "Specify names for clusters as a comma separated list. Leave blank to use default names. If labels are duplicated, then the corresponding clusters will be merged."});
    form.checkBox({label: "Variable names", name: "formNames", default_value: false, prompt: "Display names instead of labels"})
    form.checkBox({label: "Categorical as binary", name: "formBinary", default_value: false, prompt: "Code categorical variables as dummy variables"})

}
else
{
    pageTitle = "Segment Comparison Table";
    form.dropBox({label: "Variable",
            types:["Variable: Categorical, OrderedCategorical", "Question: PickOne"],
            prompt: "Categorical grouping variable used to predict the outcome variables",
            name: "formSegmentation"})
}

form.setHeading(pageTitle);
var profVar = form.dropBox({label: "Profiling variables", name: "formProfVar",
            types:["Questions:!Text", "Variables:!Text"], multi: true, 
            required: segmentType != "K-Means" || outputType == "Segment profiling table"});

if (segmentType == "Variable containing segment membership" || 
    outputType == "Segment profiling table")
{
    form.group({label: "Display", expanded: true});
    form.checkBox({name: "formIndexValues", label: "Show index values", default_value: false,
        prompt: "Show column percentages as a ratio to the row total"});
    //var condFill = form.checkBox({name: "formCondFill", label: "Color cell fill conditional on cell values", default_value: true,
    //    prompt: "Values which are higher than the row mean are shown in blue, and values lower than the row mean are shown in red. Numeric values are scaled by 2 * standard deviation."}); 

    var condShade = form.comboBox({name: "formCondShadeType", label: "Shade", alternatives: ["None", "Cell colors", "Font colors", "Boxes", "Arrows", "Fonts and arrows", "Bars"], default_value: "Cell colors", prompt: "Select whether cells or elements inside the cell should be shaded to highlight the magnitude of the cell value"});
    if (condShade.getValue() == "Boxes")
    {
        form.numericUpDown({name: "formCondBoxWidth", label: "Box width", default_value: 2});
        form.numericUpDown({name: "formCondBoxRadius", label: "Box corner roundness", default_value: 0, maximum: 50, prompt: "Increase value to get rounder corner; setting to 50 gives ovals"});
    }

    if (condShade.getValue() != "None")
    {
        form.colorPicker({name: "formCondShade1", label: "Very small values", default_value: "#E99598"});
        form.numericUpDown({name: "formCondThres1", label: "Threshold for very small values", default_value: -0.2, minimum: -10, increment: 0.01, prompt: "Values are considered very small if the standardized value is smaller than the threshold. Numeric variables are standardized by centering by the population mean and scaling by the standard deviation; categorical variables are standardized by taking the ratio to the population mean"});


        form.colorPicker({name: "formCondShade2", label: "Small values", default_value: "#E5C8C4"});
        form.numericUpDown({name: "formCondThres2", label: "Threshold for small values", default_value: -0.1, minimum: -10, increment: 0.01, prompt: "Values are considered very if the standardized value is smaller than the threshold. Numeric variables are standardized by centering by the population mean and scaling by the standard deviation; categorical variables are standardized by taking the ratio to the population mean"});

        form.colorPicker({name: "formCondShade3", label: "Large values", default_value: "#A9C0DA"});
        form.numericUpDown({name: "formCondThres3", label: "Threshold for large values", default_value: 0.1, minimum: -10, increment: 0.01, prompt: "Values are considered large if the standardized value is larger than the threshold. Numeric variables are standardized by centering by the population mean and scaling by the standard deviation; categorical variables are standardized by taking the ratio to the population mean"});
        
        form.colorPicker({name: "formCondShade4", label: "Very large values", default_value: "#82A5CB"});
        form.numericUpDown({name: "formCondThres4", label: "Threshold for very large values", default_value: 0.2, minimum: -10, increment: 0.01, prompt: "Values are considered very large if the standardized value is larger than the threshold. Numeric variables are standardized by centering by the population mean and scaling by the standard deviation; categorical variables are standardized by taking the ratio to the population mean"});
    }

    var condText = form.checkBox({name: "formCondText", label: "Color cell text conditional on significance testing", default_value: true,
        prompt: "Grey out values which are not significantly different from the row mean"});
    if (condText.getValue())
    {
        form.colorPicker({name: "formCondTextColor", label: "Non-significant font color", default_value: "#999999"});
        form.checkBox({name: "formNonparametric", label: "Use non-parametric test", default_value: false, 
                      prompt: "Convert numeric variables to ranks before performing significance tests"});   
        form.checkBox({name: "formFDR", label: "False discovery rate correction", default_value: false});
        form.numericUpDown({name: "formConfidenceLevel", label: "Confidence level", default_value: 0.95, minimum: 0.0, maximum: 1.0, increment: 0.01});
    }
    if (condShade.getValue() != "None" && condText.getValue())
        form.checkBox({name: "formCondShadeSigOnly", label: "Only shade significant results", default_value: true});

    if (segmentType != "K-Means")
        form.textBox({name: "formColLabels", label: "Column labels", default_value: "", required: false, prompt: "Specify columns names as a comma separated list. Leave blank to use default names"});
    form.textBox({name: "formRowsHide", label: "Rows to hide", default_value: "NET, Total, SUM", prompt: "Specify rows to hide as a comma-separated list. Use double-quotes to escape commas", required: false})

    form.page("Format");
    form.group("Number formatting");
    form.numericUpDown({name: "formDecimalsPercentage", label: "Decimals shown for percentages", default_value:0});
    form.numericUpDown({name: "formDecimalsNumeric", label: "Decimals shown for numeric data", default_value: 1});

    form.group("Font");
    var fontFamilies = font_families = ["Arial", "Arial Black", "Century Gothic", "Comic Sans MS",
                     "Courier New", "Georgia", "Impact", "Open Sans", "Tahoma", "Times New Roman", "Trebuchet MS", "Verdana"];
    form.comboBox({name: "formFontFamily", label: "Font family", default_value: "Arial", alternatives: fontFamilies});
    form.colorPicker({name: "formFontColor", label: "Font color", default_value: "#000000"});
    var fontSize = form.numericUpDown({name: "formFontSize", label: "Font size", default_value: 8});
    form.comboBox({name: "formFontUnits", label: "Font units", alternatives: ["pt", "px"], default_value: "pt"});


    form.group("Spacing");
    form.numericUpDown({name: "formRowHeight", label: "Row height", default_value: fontSize.getValue() + 5});
    form.textBox({name: "formColumnWidths", label: "Column widths", required: false, default_value: "100px, 100px",
         prompt: "Comma separated values, e.g. '40px, 25%' or leave blank for equal widths"});

    form.group("Borders and fill");
    form.colorPicker({name: "formColHeadFill", label: "Column header fill", default_value: "#AEB7BA"});
    form.colorPicker({name: "formRowHeadFill", label: "Row header fill", default_value: "#F1F3F4"});
    form.colorPicker({name: "formSummaryFill", label: "Summary rows fill", default_value: "#FFFFFF"});
    form.colorPicker({name: "formCellFill", label: "Cell fill", default_value: "#FFFFFF"});
    form.colorPicker({name: "formBorderColor", label: "Border color", default_value: "#FFFFFF"});
    form.numericUpDown({name: "formBorderWidth", label: "Border width", default_value: 1});
}
library(flipAnalysisOfVariance)
library(flipCluster)

if (formSegmentType == "K-Means")
    kmeans <- KMeans(formVariables, 
        centers = formNumberClusters,
        algorithm = formAlgorithm,
        output = formOutput,
        subset = QFilter,
        weights = QPopulationWeight,
        missing = formMissing,
        show.labels = !formNames,
        binary = formBinary,
        show.index.values = get0("formIndexValues", ifnotfound = FALSE),
        centers.names = get0("formColLabels"),
        cond.shade = get0("formCondShadeType", ifnotfound = "None"),
        cond.box.radius = get0("formCondBoxRadius"),
        cond.box.width = get0("formCondBoxWidth"),
        cond.shade.colors = if (exists("formCondShadeType")) c(formCondShade1, formCondShade2, formCondShade3, formCondShade4, formCondShade4),
        cond.shade.cutoffs = if (exists("formCondShadeType")) c(formCondThres1, formCondThres2, formCondThres3, formCondThres4),
        cond.shade.sig.only = get0("formCondShadeSigOnly", ifnotfound = FALSE),

        format.percentage.decimals = get0("formDecimalsPercentage", ifnotfound = 0),
        format.numeric.decimals = get0("formDecimalsNumeric", ifnotfound = 0),    
        font.color.nonsignificant = get0("formCondTextColor", ifnotfound = ""),
        font.color.confidence = get0("formConfidenceLevel", ifnotfound = 0.95),
        font.color.FDRcorrection = get0("formFDR", ifnotfound = FALSE),
        font.color.nonparametric = get0("formNonparametric", ifnotfound = FALSE),
        row.names.to.remove = get0("formRowsHide", ifnotfound = ""),
        col.widths = get0("formColumnWidths", ifnotfound = ""),
        row.height = if (exists("formRowHeight")) paste0(formRowHeight, formFontUnits) else NULL, # set NULL for autofit
        global.font.family = get0("formFontFamily", ifnotfound = ""),
        font.size = get0("formFontSize", ifnotfound = 0),
        font.color = get0("formFontColor", ifnotfound = ""),
        font.unit = get0("formFontUnits", ifnotfound = ""),
        col.header.fill = get0("formColHeadFill", ifnotfound = ""),
        row.header.fill = get0("formRowHeadFill", ifnotfound = ""),
        summary.cell.fill = get0("formSummaryFill", ifnotfound = ""),
        cell.fill = get0("formCellFill", ifnotfound = ""),
        border.color = get0("formBorderColor"),
        border.width = get0("formBorderWidth", ifnotfound = 0),
        profile.var = formProfVar)

if (formSegmentType != "K-Means")
{
    segment.table <- SegmentComparisonTable(formProfVar, 
                       group = formSegmentation,
                       show.index.values = formIndexValues,
                       cond.shade = get0("formCondShadeType"),
                       cond.box.radius = get0("formCondBoxRadius"),
                       cond.box.width = get0("formCondBoxWidth"),
                       cond.shade.colors = if (exists("formCondShadeType")) c(formCondShade1, formCondShade2, formCondShade3, formCondShade4),
                       cond.shade.cutoffs = if (exists("formCondShadeType")) c(formCondThres1, formCondThres2, formCondThres3, formCondThres4),
                       cond.shade.sig.only = get0("formCondShadeSigOnly", ifnotfound = FALSE),
                       col.header.labels = get0("formColLabels"),
                       format.percentage.decimals = formDecimalsPercentage,
                       format.numeric.decimals = formDecimalsNumeric,    
                       font.color.set.if.nonsignificant = formCondText,
                       font.color.nonsignificant = formCondTextColor,
                       font.color.confidence = formConfidenceLevel,
                       font.color.FDRcorrection = formFDR,
                       font.color.nonparametric = formNonparametric,
                       row.names.to.remove = formRowsHide,
                       col.widths = formColumnWidths,
                       row.height = paste0(formRowHeight, formFontUnits), # set NULL for autofit
                       global.font.family = formFontFamily,
                       font.size = formFontSize,
                       font.color = formFontColor,
                       font.unit = formFontUnits,
                       col.header.fill = formColHeadFill,
                       row.header.fill = formRowHeadFill,
                       summary.cell.fill = formSummaryFill,
                       cell.fill = formCellFill,
                       border.color = formBorderColor,
                       border.width = formBorderWidth,
                       subset = QFilter,
                       weights = QPopulationWeight)
} else
    kmeans