Dimension Reduction - Correspondence Analysis of a Table

From Q
Jump to navigation Jump to search
VizIcon Correspondence Analysis.svg

Correspondence analysis represents a table as a scatterplot, where the row and column names are shown on the chart.

Our blog contains several posts about correspondence analysis. Included are this introduction, this piece about interpretation of the output, and this more technical description.

How to Create

  1. Add the object by selecting from the menu Anything > Advanced Analysis > Dimension Reduction > Correspondence Analysis of a TableCreate > Dimension Reduction > Correspondence Analysis of a Table
  2. In Inputs > Data source, specify a data source type for your analysis.

Example

Example output Scatter: Correspondence analysis relating cola brands to their personalities. For points close together, you are able to move around the labels on the visualization. For guidance on how to interpret a correspondence map please see this post: Interpret Correspondence Analysis Plots


Example output Text: The Text output shows some of the underlying detail from the model. The Principal inertias (eigenvalues) is the squared canonical correlation (the correlation between the different variable sets in the rows and columns within each dimension).

CA text.png

Input Example: A crosstab or table with something in the rows and columns.


Options

Input table(s) The name of the table(s) containing data to be analyzed. Each table should only contain a single statistic (e.g., Total %). The statistic that is shown first will be used in the analysis. For example, if you have a table showing Total % and Column %, then Column % will be used (whereas Total % is the more orthodox choice). If multiple tables are selected, the correspondence analysis of each table will be shown on the same plot. The row and column names of each table selected must be identical.

Note, that if the supplied table is a Q Table containing "Correlation" as the first statistic, then 1 is automatically added to each value in the table. This ensures that all values are positive, so that the assumptions for Correspondence Analysis are met.

Paste or type table As an alternative to Input table(s), data can be manually entered or pasted. If this option is used, only a single table can be entered.

Trend lines When multiple tables are used as input, there is an option to show trend lines between corresponding points across different tables.

Switch rows and columns Whether or not to transpose the input data source.

Output:

Scatterplot
Bubble Chart
Bubble sizes A numeric vector of sizes for the bubbles with names equal to the row labels.
Bubble colors A numeric vector of values for with names equal to the row labels. A divergent color scale will be constructed using the range of the values as end points. The center of the colorscale can be either the median of the values, or zero. Bubbles will be colored according to the corresponding value. The colors at the ends of the colorscale can be specified in controls under the Chart tab.
Bubble legend title Title of the legend showing bubble sizes.
Moonplot
Text produces output in standard coordinates
Input Table

Normalization The method used to normalize the coordinates of the correspondence analysis chart. This blog post explains the differences between the normalization option. Options are:

Principal (default option) charts the principal coordinates (i.e., the standard coordinates multiplied by the singular values) for both rows and columns.
Row principal charts rows in principal coordinates and columns in standard coordinates.
Row principal (scaled) is as Row principal except columns are scaled by the first singular value so as to appear on a similar scale to rows.
Column principal charts columns in principal coordinates and rows in standard coordinates.
Column principal (scaled) is as Column principal except rows are scaled by the first singular value so as to appear on a similar scale to columns.
Symmetrical (½) charts the standard coordinates multiplied by the square roots of the singular values for both rows and columns.
None charts the standard coordinates for both rows and columns.

Focus The label of a row or column to focus the output. The axes will be rotated so that the label lies along the first dimension. This means that the entirety of the variance due to the label is visible in a 2-dimensional plot. This is useful if the analysis is intended to explain the relationship between the focus label and all other labels, rather than the general relationship between all labels. Note that the first dimension will no longer explain the maximum amount of variance. The second dimension explains the maximum amount of remaining variance whilst remaining perpendicular to the first dimension.

Supplementary A comma delimited list of rows and/or columns which are not used to fit the low-dimensional space, but are plotted in the space. This article describes the uses of supplementary points.

Horizontal dimension, Vertical dimension The dimensions to plot on the horizontal and vertical axes respectively. Since dimensions are output in order of decreasing variance, the first and second dimensions are usually of most interest.

Flip horizontally, Flip vertically Whether to reverse (i.e. invert the sign of) the output coordinates for the specified dimension. This may allow better visualization, especially when comparing maps that are similar apart from reflections.

Rows to ignore, Columns to ignore The names of any rows or columns to be removed from the table prior to analysis.

Use logos for rows When this option is selected, the user can replace the labels in the scatterplot with logos. The logos should be supplied as a comma-separated list of URLs.

Maximum row labels to plot, Maximum column labels to plot These options limit the number of labels shown. It is useful when there are many points with overlapping labels. The remaining points will be shown without labels.

Chart title Optional title for the scatterplot or bubble chart.

Custom legend labels Labels used for the row and column in the legend.

Row legend label, Column legend label Optional labels to be shown in the legend for the row and column projections on a scatter or bubble chart.

Row series color, Column series color Color of the points shown in the labelled scatterplot or bubble chart for a single table.

Color palette Control colors used for labelled scatterplot or bubble chart when multiple tables are used.

Title font size Font size of the chart title.

X-axis title font size Font size of the horizontal axis title.

Y-axis title font size Font size of the vertical axis title.

Labels font size Font size of the of the labels on the scatterplot.

Axis labels font size Font size of the labels on the x- and y-axis.

Legend font size Font size of the legend.

Show gridlines Whether to display gridlines on the plot.

Additional options are available by editing the code.

DIAGNOSTICS

Quality Table Creates an table containing measures of the quality of a correspondence analysis.

Additional Properties

When using this feature you can obtain additional information that is stored by the R code which produces the output.

  1. To do so, select Create > R Output.
  2. In the R CODE, paste: item = YourReferenceName
  3. Replace YourReferenceName with the reference name of your item. Find this in the Report tree or by selecting the item and then going to Properties > General > Name from the object inspector on the right.
  4. Below the first line of code, you can paste in snippets from below or type in str(item) to see a list of available information.

For a more in depth discussion on extracting information from objects in R, checkout our blog post here.

Properties which may be of interest are:

  • Row coordinates:
item$row.coordinates # plot row coordinates
  • Column coordinates:
item$column.coordinates # plot column coordinates
  • Combine row and column coordinates into a single object:
dimensions = rbind(item$row.coordinates,item$column.coordinates) # combined row/column coordinates
  • Just take the first 2 dimensions (columns) (appropriate for export into a scatterplot):
dimensions[,1:2]


Acknowledgements

The R package ca is used to compute the correspondence analysis.


Code

var heading_text = 'Correspondence Analysis of a Table';
if (!!form.setObjectInspectorTitle)
    form.setObjectInspectorTitle(heading_text, heading_text);
else 
    form.setHeading(heading_text);

var allow_control_groups = Q.fileFormatVersion() > 10.9; // Group controls for Displayr and later versions of Q
function isEmpty(x) { return (x == undefined || x.getValue() == null && (x.getValues() == null || x.getValues().length == 0)) }
function isBlankSheet(x) { return (x.getValue() == null || x.getValue().length == 0) }

var controls = [];

var isMult = false;
var tableInput = form.dropBox({label: "Input table(s)", types:["table", "RItem"], name: "formTableToAnalyse",  multi: allow_control_groups, required: false});
var pasteInput = form.dataEntry({name: "formEnteredData", 
                                 prompt: "Opens a spreadsheet into which you can paste data.", 
                                 label: "Paste or type table", required: true,
                                 large_data_error: "The data entered is too large. The best alternative is to add your data as a Data Set, use Table > Raw Data > Variable(s), and connect that table to this analysis."});

if (!allow_control_groups || !isEmpty(tableInput) || isBlankSheet(pasteInput))
{
    controls.push(tableInput);
    if (allow_control_groups)
        isMult = tableInput.getValues().length > 1;
}
if (!allow_control_groups || !isBlankSheet(pasteInput) || isEmpty(tableInput))
    controls.push(pasteInput);

if (isMult)
{
    var trends = form.checkBox({label: "Trend lines", name: "formTrendLine", default_value: false,
                                prompt: "Show trend lines between points from each table"});
    controls.push(trends);
}
var swap = form.checkBox({label: "Switch rows and columns", name: "formTranspose", default_value: false});
controls.push(swap); 
var outOpt = form.comboBox({label: "Output", alternatives:["Scatterplot", "Bubble Chart", "Moonplot", "Text", "Input Table"],
             name: "formOutput",  multi : false, default_value: "Scatterplot", prompt: "Select the output to display"});
controls.push(outOpt);

var use_bubble_color = false; 
if (outOpt.getValue() == "Bubble Chart" && !isMult) 
{
    var bSize = form.dropBox({label: "Bubble sizes", types:["table", "RItem"], name: "formBubbleSizes",  multi : false, required: true,
                              prompt: "Sizes of the bubbles of the row points, labelled with the row labels"});
    controls.push(bSize);
    var bCol = form.dropBox({label: "Bubble colors", types:["table", "RItem"], name: "formBubbleColors",  multi : false, required: false,
                              prompt: "Values used to color bubbles, labelled with the row labels"});
    controls.push(bCol);
    if (bCol.getValue() != null)
    {
        use_bubble_color = true;
        controls.push(form.comboBox({name: "formBubbleColorMidPt", label: "Set midpoint to", alternatives: ["Median of bubble color values", "Zero"], default_value: "Median of bubble color values"}));
    }
    var legTitle = form.textBox({label: "Bubble legend title", type: "text", default_value: "", name: "formLegendTitle", required: false,
                                 prompt: "The title of the legend showing the bubble sizes"});
    controls.push(legTitle);
}

if (outOpt.getValue() != "Text" && outOpt.getValue() != "Input Table")
{ 
    var norm = form.comboBox({label: "Normalization", alternatives:["Principal", "Row principal", "Row principal (scaled)", "Column principal", 
              "Column principal (scaled)", "Symmetrical (½)", "None"], name: "formNormalization",  multi : false, default_value: "Principal",
                              prompt: "Normalization scheme, relating the scaling of the axes"});
    controls.push(norm);
}

if (outOpt.getValue() != "Input Table")
{  
    var focus = form.textBox({label: "Focus", name: "formFocus", required: false,
                              prompt: "Label of one row or column point whose variance is entirely represented in the first dimension"});
    controls.push(focus);
    var supp = form.textBox({label: "Supplementary", name: "formSupplementary", required: false,
                             prompt: "Comma-delimited list of labels that are excluded from fitting but are plotted"});
    controls.push(supp);
}

if (outOpt.getValue() != "Text" && outOpt.getValue() != "Input Table")
{
    var dim1 = form.numericUpDown({label: "Horizontal dimension", name: "formDim1", default_value:1,
                                   prompt: "The dimension to be plotted horizontally"});
    controls.push(dim1);
    var dim2 = form.numericUpDown({label: "Vertical dimension", name: "formDim2", default_value:2,
                                   prompt: "The dimension to be plotted vertically"});
    controls.push(dim2);

    var flipH = form.checkBox({label: "Flip horizontally", name:"formMirrorDim1", default_value: false,
                               prompt: "Reverse the points along the horizontal axis"});
    controls.push(flipH);
    var flipV = form.checkBox({label: "Flip vertically", name:"formMirrorDim2", default_value: false,
                               prompt: "Reverse the points along the vertical axis"});
    controls.push(flipV);
}

var rowIgnore = form.textBox({label: "Rows to ignore", type: "text", default_value: "NET, Total, SUM", name: "formIgnoreRows", required: false,
                              prompt: "Comma-delimited list of rows to be excluded"});
controls.push(rowIgnore);
var colIgnore = form.textBox({label: "Columns to ignore", type: "text", default_value: "NET, Total, SUM", name: "formIgnoreColumns", required: false,
                              prompt: "Comma-delimited list of columns to be excluded"});
controls.push(colIgnore);
 
if (["Scatterplot", "Bubble Chart"].indexOf(outOpt.getValue()) != -1)
{
   if (outOpt.getValue() == "Scatterplot")
   {
        var logoOpt = form.checkBox({label: "Use logos for rows", name:"formUseLogo", default_value: false,
                                     prompt: "Replace the row text labels with logos"});
        controls.push(logoOpt);
        if (logoOpt.getValue())
        {
            var logoUrl = form.textBox({name: "formLogos", label: "Logos", prompt: "Enter URLs as a comma separated list", type: "Text", required: true});
            controls.push(logoUrl);
            var logoSize = form.numericUpDown({name: "formLogoSize", label: "Logo size", default_value: 0.5, increment: 0.1});
            controls.push(logoSize);
        }
    }
    if (allow_control_groups)
        form.page("Chart");
    var maxRLab = form.numericUpDown({label: "Maximum row labels to plot", name: "formRowMaxLab", default_value: 50, maximum: Number.MAX_SAFE_INTEGER,
                                      prompt: "The maximum number of row labels to show"});
    controls.push(maxRLab);
    var maxCLab = form.numericUpDown({label: "Maximum column labels to plot", name: "formColMaxLab", default_value: 50, maximum: Number.MAX_SAFE_INTEGER,
                                      prompt: "The maximum number of column labels to show"});
    controls.push(maxCLab);
    var cTitle = form.textBox({label: "Chart title", type: "text", default_value: "Correspondence analysis", name: "formTitle", required: false});
    controls.push(cTitle);
    if (!isMult)
    {
        var customLabels = form.checkBox({label: "Custom legend labels", name:"formCustomLabels", default_value: false,
                                          prompt: "Set labels of the legend"});
        controls.push(customLabels);
        if (customLabels.getValue())
        {
            var rLabel = form.textBox({label: "Row legend label", type: "text", default_value: "", name: "formRowLabel", required: false,
                                       prompt: "Label to be used for rows in the legend"});
            controls.push(rLabel);
            var cLabel = form.textBox({label: "Column legend label", type: "text", default_value: "", name: "formColumnLabel", required: false,
                                       prompt: "Label to be used for columns in the legend"});
            controls.push(cLabel);
        }
        if (use_bubble_color)
        {
            controls.push(form.colorPicker({label: "Bubble color for minimum value", name: "formBubbleColLow", default_value:"#CC1010"}));
            controls.push(form.colorPicker({label: "Bubble color for midpoint value", name: "formBubbleColMid", default_value:"#444444"}));
            controls.push(form.colorPicker({label: "Bubble color for maximum value", name: "formBubbleColHigh", default_value:"#284FBB"}));

        }
        else
            controls.push(form.colorPicker({label: "Row series color", name: "rowColor", default_value:"#5B9BD5", prompt: "Color of the row points"}));
        controls.push(form.colorPicker({label: "Column series color", name: "colColor", default_value: use_bubble_color ? "#222222":"#ED7D31", prompt: "Color of the column points"}));
   }
}
if (isMult)
{
    var colPalette = form.comboBox({name: "formPalette", label: "Color palette", alternatives: ["Default colors", "Primary colors", "Rainbow", "Light pastels", 
        "Strong colors", "Reds, dark to light", "Reds, light to dark", "Greens, dark to light", "Greens, light to dark", "Blues, dark to light",
        "Blues, light to dark", "Greys, dark to light", "Greys, light to dark", 
        "Heat colors (red, yellow, white)", "Terrain colors (green, beige, grey)"], default_value: "Default colors", required: true,
                                    prompt: "Colors relating points across different tables"});
    controls.push(colPalette);
}


if (["Scatterplot", "Bubble Chart"].indexOf(outOpt.getValue()) != -1)
{
    var titleSz = form.numericUpDown({name:"formTitleFontSize", label:"Title font size", default_value: 20});
    controls.push(titleSz);
    var xtitleSz = form.numericUpDown({name:"formXTitleFontSize", label:"X-axis title font size", default_value: 16});
    controls.push(xtitleSz);
    var ytitleSz = form.numericUpDown({name:"formYTitleFontSize", label:"Y-axis title font size", default_value: 16});
    controls.push(ytitleSz);
    var labSz = form.numericUpDown({name:"formLabelsFontSize", label:"Labels font size", default_value: 14});
    controls.push(labSz);
    var axisSz = form.numericUpDown({name:"formAxisFontSize", label:"Axis labels font size", default_value: 10});
    controls.push(axisSz);
    var legendSz = form.numericUpDown({name:"formLegendFontSize", label:"Legend font size", default_value: 15});
    controls.push(legendSz);
    var gridShow = form.checkBox({label:"Show gridlines", name:"formShowGridlines", default_value: true});
    controls.push(gridShow);
}
form.setInputControls(controls);
library(flipDimensionReduction)
library(flipChartBasics)
x <- get0("formTableToAnalyse")
if (is.null(x))
    x <- flipTransformations::ParseEnteredData(formEnteredData, want.data.frame = TRUE, want.col.names = TRUE, want.row.names = TRUE)
if (is.list(x) && length(x) == 1)
    x <- x[[1]]
transformCorrelationTablesToNonNegative <- function(x){
    if (inherits(x, "CorrelationMatrix")){
        x <- x$cor
        x <- x + 1
    }else if (all(c("questions", "name") %in% names(attributes(x))) &&
        identical(attr(x, "statistic"), "Correlation") ||
        (length(dim(x)) == 3L && identical(dimnames(x)[[3]][1], "Correlation")))
        x <- x + 1
    return(x)
}
x <- transformCorrelationTablesToNonNegative(x)

if (exists("formRowLabel"))
    attr(x, "row.column.names") <- c(formRowLabel, formColumnLabel)

bubble.colors <- NULL
if (!is.null(get0("formBubbleColors")))
{
    x <- flipTables::TidyTabularData(x, row.names.to.remove = formIgnoreRows, col.names.to.remove = formIgnoreColumns, transpose = formTranspose)
    bubble.colors <- MapToColors(formBubbleColors,
       mid.x = if (formBubbleColorMidPt == "Zero") 0 else median(formBubbleColors, na.rm = TRUE),
       min.color = formBubbleColLow, max.color = formBubbleColHigh, mid.color = formBubbleColMid)
    bubble.colors <- MatchTable(bubble.colors, x, x.table.name = "Bubble colors")
}

ca <- CorrespondenceAnalysis(x,
    normalization = if(exists("formNormalization")) formNormalization else "None",
    output = formOutput,
    focus = get0("formFocus"),
    supplementary = get0("formSupplementary"),
    dim1.plot = get0("formDim1", ifnotfound = 1),
    dim2.plot = get0("formDim2", ifnotfound = 2),
    mirror.horizontal = get0("formMirrorDim1", ifnotfound = FALSE),
    mirror.vertical = get0("formMirrorDim2", ifnotfound = FALSE),
    trend.lines = formTrendLine,
    transpose = formTranspose,
    max.row.labels.plot = if (exists("formRowMaxLab")) formRowMaxLab else 0,
    max.col.labels.plot = if (exists("formColMaxLab")) formColMaxLab else 0,
    chart.title = formTitle,
    logos = if (formOutput=="Scatterplot" && formUseLogo) formLogos else NULL,
    logo.size = formLogoSize,
    row.names.to.remove = formIgnoreRows,
    column.names.to.remove = formIgnoreColumns,
    row.color = if (!is.null(bubble.colors)) bubble.colors else rowColor,
    col.color = colColor,
    color.palette = formPalette,
    bubble.size = formBubbleSizes,
    bubble.title = formLegendTitle,
    title.font.size = if (exists("formTitleFontSize")) formTitleFontSize else 0,
    x.title.font.size = if (exists("formXTitleFontSize")) formXTitleFontSize else 0,
    y.title.font.size = if (exists("formYTitleFontSize")) formYTitleFontSize else 0,
    labels.font.size = if (exists("formLabelsFontSize")) formLabelsFontSize else 0,
    axis.font.size = if (exists("formAxisFontSize")) formAxisFontSize else 0,
    legend.font.size = if (exists("formLegendFontSize")) formLegendFontSize else 0,
    show.gridlines = if (exists("formShowGridlines")) formShowGridlines else FALSE,
    use.combined.scatter = TRUE
)
options(width = 200)
correspondence.analysis <- if (formOutput %in% c("Input Table")) print(ca) else ca