Dimension Reduction - Multiple Correspondence Analysis

From Q
Jump to: navigation, search

Multiple Correspondence Analysis analyses categorical variables to detect underlying structure in the data set. See this blog post for en explanation of multiple correspondence analysis and its relationship to correspondence analysis.

How to Create

  1. Add the object:
    1. In Displayr: Insert > Dimension Reduction > Multiple Correspondence Analysis
    2. In Q: Create > Dimension Reduction > Multiple Correspondence Analysis
  2. Select your variables under Inputs > Input variables

Example

The relationship between 5 categorical variables.

Options

Input variables Categorical variables to analyse. Note that when many variables are selected, using weights may cause significant slowdown.

Output How the analysis results should be displayed. The choices are:

  • Scatterplot: A labelled scatterplot showing associations between variables
  • Text: A text representation of the analysis.

Maximum number of labels to plot Limits the number of labels shown in the scatterplot. The remaining points are shown without labels. This can be useful with large data sets to avoid overlapping labels.

Chart title Title of the scatterplot.

Color palette Controls the colors of the points in the Scatterplot output.

Missing data Method for dealing with missing data. See Missing Data Options.

Variable names Displays Variable Names in the output instead of labels.

Additional Properties

When using this feature you can obtain additional information that is stored by the R code which produces the output.

  1. To do so, select Create > R Output.
  2. In the R CODE, paste: item = YourReferenceName
  3. Replace YourReferenceName with the reference name of your item. Find this in the Report tree or by selecting the item and then going to Properties > General > Name from the object inspector on the right.
  4. Below the first line of code, you can paste in snippets from below or type in str(item) to see a list of available information.

For a more in depth discussion on extracting information from objects in R, checkout our blog post here.


Acknowledgements

The R package ca is used to compute the correspondence analysis.

Code

form.setHeading('Multiple Correspondence Analysis');
form.dropBox({label: "Input variables", 
            types: ["Variable: Categorical, OrderedCategorical"], 
            name: "formInputVariables",
            multi: true,
            prompt: "Categorical variables to analyze"})
var outputOpt = form.comboBox({label: "Output", 
              alternatives: ["Scatterplot", "Text"], 
              name: "formOutput", default_value: "Scatterplot"})
if (outputOpt.getValue() == "Scatterplot")
{
    form.numericUpDown({label: "Maximum number of labels to plot", name: "formMaxLab", default_value:20,
                        prompt: "The maximum number of labels to show"});
    form.textBox({name: "formTitle", label: "Chart title", default_value: "Multiple correspondence analysis"});
    form.comboBox({name: "formPalette", label: "Color palette", alternatives: ["Default colors", "Primary colors", 
              "Rainbow", "Light pastels", "Strong colors", "Reds, light to dark", "Greens, light to dark", 
              "Blues, light to dark", "Greys, light to dark", "Heat colors (red, yellow, white)", "Terrain colors (green, beige, grey)"],
               default_value: "Default colors", required: true, prompt: "Coloring of the points according to variable"});
}
form.comboBox({label: "Missing data", 
              alternatives: ["Error if missing data", "Exclude cases with missing data",
                             "Imputation (replace missing values with estimates)"], 
              name: "formMissing", 
              default_value: "Exclude cases with missing data", prompt: "Treatment of missing data values"})
form.checkBox({label: "Variable names", name: "formNames", default_value: false, prompt: "Whether to use variable names instead of labels"});
library(flipDimensionReduction)
mca <- MultipleCorrespondenceAnalysis(QFormula(~formInputVariables),
   output = formOutput,
   chart.title = formTitle,
   max.labels.plot = formMaxLab,
   scatter.palette = ifelse(formOutput=="Scatterplot", formPalette, NA),
   weights = QCalibratedWeight,
   subset = QFilter,
   missing = formMissing,
   show.labels = !formNames)

See Also