Dimension Reduction - Dimension Reduction Scatterplot

From Q
Jump to navigation Jump to search
VizIcon Dimension Reduction Scatterplot.svg

Produces a 2-dimensional scatterplot to visualize either high dimensional numeric data or a distance matrix. A choice of 4 algorithms are available. See the following links for more information about t-SNE, Principal Components Analysis and Multidimensional Scaling.

How to Create

  1. Add the object by selecting from the menu Anything > Advanced Analysis > Dimension Reduction > Dimension Reduction ScatterplotCreate > Dimension Reduction > Dimension Reduction Scatterplot
  2. Under Inputs > Algorithm select one of the algorithms mentioned above
  3. Under Inputs > Input type select an input type

Example

If the input type is Variables, the probability that each point has the same class as its nearest neighbor is calculated. A further variable may be specified to classify the output cases into groups using the Group variable field.

Example output: The projection of 14 variables onto 2 dimensions, with a grouping category.

Settings:

Dim reduc scatter.png

If the input is a distance matrix, output points are labelled. Example output:

Input Example: A distance matrix either pasted in or created in the software. You can learn how to create a distance matrix here: How to Create a Distance Matrix

Dimension Reduction - Plot - Goodness of Fit can be used to assess the accuracy of the fit.

Options

Algorithm Either t-SNE, PCA, MDS - Metric or MDS - Non-metric.

The input data can be provided via one of three options:

Variables The variables or a questionvariable set containing variables that you would like to analyze. Cases with missing data are ignored.
Distance matrix Select an existing distance matrix. This should be a symmetric matrix of distances, such as the output of Correlation - Distances.
Paste or type distance matrix Opens up a blank spreadsheet into which tabular data can be manually entered or pasted.

Group variable A variable to categorize the output. If numeric, the data are shaded from light (lowest values) to dark (highest). If categorical, data points are colored according to their category. This option is only available if Variables are provided.

Create binary variables from unordered categories If selected, unordered categorical Variables with N categories are converted are converted into N-1 binary indicator variables. Otherwise such variables are each converted to a single numeric variable with integers representing categories (as happens for ordered categories). This option is only available if Variables are provided.

Normalize variables For Variables input, whether to normalize the data.

For t-SNE and MDS each variable is standardized to the range [0, 1].
For PCA the correlation matrix is used rather than the covariance matrix.

Perplexity A parameter used by the t-SNE algorithm and related to the number of nearest neighbors considered when placing each data point. The typical useful range is from 5 to 50.

Low values imply that immediately local structure is most important.
High values increase the impact of more distant neighbors and global structure.

Additional Properties

When using this feature you can obtain additional information that is stored by the R code which produces the output.

  1. To do so, select Create > R Output.
  2. In the R CODE, paste: item = YourReferenceName
  3. Replace YourReferenceName with the reference name of your item. Find this in the Report tree or by selecting the item and then going to Properties > General > Name from the object inspector on the right.
  4. Below the first line of code, you can paste in snippets from below or type in str(item) to see a list of available information.

For a more in depth discussion on extracting information from objects in R, checkout our blog post here.


Code

// NOTE: the "printType" control for PCA in this output defaults to "2D Scatterplot" instead of "Loadings Table", unlike other outputs (PCA, MDS, t-SNE).
// Please ensure that this default is preserved when updating this script.

var default_algorithm = "t-SNE";

// VERSION 1.14
function isEmpty(x) { return (x == undefined || x.getValue() == null && (x.getValues() == null || x.getValues().length == 0)) }
function isBlankSheet(x) { return (x.getValue() == null || x.getValue().length == 0) }
var allow_control_groups = Q.fileFormatVersion() > 10.9; // Group controls for Displayr and later versions of Q
var controls = [];

var algo_type = form.comboBox({label: "Algorithm", alternatives: ["PCA", "t-SNE", "MDS - Metric", "MDS - Non-metric"],
                               name: "formAlgorithm", default_value: default_algorithm,
                               prompt: "The method for performing the dimensionality reduction"});
let is_pca = algo_type.getValue() === "PCA";
var heading = is_pca ? "Principal Components Analysis (PCA)" : algo_type.getValue();
if (!!form.setObjectInspectorTitle)
    form.setObjectInspectorTitle(heading, heading);
else 
    form.setHeading(heading);

controls.push(algo_type);

var varInput = form.dropBox({name: "formVariables", label: "Variables",
                             types: ["Q: pickone, pickonemulti, number, numbermulti, numbergrid, pickany, pickanycompact, pickanygrid",
                                     "V:numeric, categorical, ordered categorical"], multi: true, required: false,
                             prompt: "Numeric variables, each representing a dimension"});
var tableInput = form.dropBox({label: "Distance matrix", name: "formDistance", types:["RItem"], required: false,
                               prompt: "Symmetric numeric matrix of distances between points"});
var pasteInput = form.dataEntry({label: "Paste or type distance matrix", name: "formDistanceRaw", prompt: "Opens a spreadsheet into which you can paste data.", required: true, large_data_error: "The data entered is too large. The best alternative is to add your data as a Data Set, use Table > Raw Data > Variable(s), and connect that table to this analysis."})



if (is_pca || !allow_control_groups || !isEmpty(varInput) || (isEmpty(tableInput) && isBlankSheet(pasteInput)))
{
    controls.push(varInput);

    if (is_pca || !allow_control_groups || !isEmpty(varInput))
    {
        let norm = form.checkBox({label: is_pca ? "Use correlation matrix" : "Normalize variables", name: "formNormalization", default_value: true,
                                  prompt: is_pca ? "Use correlation matrix (if selected) or the covariance matrix (if not selected)" : "Standardize variables to [0,1]"});
        controls.push(norm);
    }

    if (!allow_control_groups || !isEmpty(varInput))
    {
        var binVar = form.checkBox({name: "formBinary", label: "Create binary variables from categories", default_value: false,
                                    prompt: "Convert categorical variables to dummy binary variables"});
        controls.push(binVar);
    }
}
if (!is_pca)
{
   if (!allow_control_groups || !isEmpty(tableInput) || (isEmpty(varInput) && isBlankSheet(pasteInput)))
       controls.push(tableInput);
   if (!allow_control_groups || !isBlankSheet(pasteInput) || (isEmpty(varInput) && isEmpty(tableInput)))
       controls.push(pasteInput);
}
if (is_pca)
{
    var selectOpt = form.comboBox({name: "selectRule", label: "Rule for selecting components", alternatives: ["Kaiser rule", "Eigenvalues over", "Number of components"],
                                   default_value: "Kaiser rule", prompt: "Determines how many components are retained"});
    controls.push(selectOpt);
    if (selectOpt.getValue() == "Eigenvalues over")
        controls.push(form.numericUpDown({name: "eigenMin", label: "Cutoff", default_value: 1, maximum: Number.MAX_SAFE_INTEGER, increment: 0.1, prompt: "Minimum eigenvalue to retain component"}));
    if (selectOpt.getValue() == "Number of components")
        controls.push(form.numericUpDown({ name: "numberFactors", label: "Number of components", default_value: 2, increment: 1, minimum: 1, maximum: Number.MAX_SAFE_INTEGER,
                             prompt: "Retain a fixed number of components"}));
    var rotation_type = form.comboBox({ name: "rotationType",
                                        label: "Rotation method",
                                        alternatives: ["None",
                                                     "Varimax",
                                                     "Quartimax",
                                                     "Equamax",
                                                     "Promax",
                                                     "Oblimin"],
                                        default_value: "Varimax", prompt: "Varimax, Quartimax and Equamax produce uncorrelated components"});
    controls.push(rotation_type);
    if (rotation_type.getValue() == "Oblimin")
        controls.push(form.numericUpDown({name: "delta", label: "Delta", default_value: 0, increment: 0.1, maximum:0.8, minimum: -100,
                            prompt: "Oblimin control parameter"}));
    if (rotation_type.getValue() == "Promax")
        controls.push(form.numericUpDown({name: "kappa", label: "Kappa", default_value: 4, increment: 1, minimum: 2, maximum: Number.MAX_SAFE_INTEGER,
                            prompt: "Promax control parameter"}));

    controls.push(form.comboBox({name: "missingType",
                   label: "Missing data:",
                   alternatives: ["Error if missing data", "Exclude cases with missing data", "Use partial data (pairwise correlations)", "Imputation (replace missing values with estimates)"],
                   default_value: "Use partial data (pairwise correlations)", prompt: "Handling of cases with missing data" }));
    var print_type = form.comboBox({ name: "printType", label: "Output",
                                     alternatives: ["Loadings Table", "Structure Matrix", "Variance Explained", "Component Plot",
                                                    "Scree Plot", "Detailed Output", "2D Scatterplot"],
                                     default_value: "2D Scatterplot", prompt: "Output to be shown" });
    controls.push(print_type);
    if (["Component Plot", "Scree Plot", "Variance Explained", "2D Scatterplot"].indexOf(print_type.getValue()) == -1)
    {
        controls.push(form.checkBox({ name: "sortCoefficients", label: "Sort coefficients by size", default_value: true }));
        var suppress = form.checkBox({ name: "suppressCoefficients", label: "Suppress small coefficients", default_value: true,
                                       prompt: "Replace small coefficients with blanks"});
        controls.push(suppress)
        if (suppress.getValue())
            controls.push(form.numericUpDown({ name: "minLoading", label: "Absolute value below", default_value: 0.4, increment: 0.1, minimum: 0, maximum: Number.MAX_SAFE_INTEGER,
                                 prompt: "Threshold to replace small coefficients with blanks"}));
    }

    if (print_type.getValue() == "Component Plot")
        controls.push(form.checkBox({ name: "scatterPlotLabels", label: "Include labels in plots", default_value: true,
                        prompt: "Label the points, else use integers"}));
    if (["Component Plot", "Loadings Table", "Structure Matrix", "Detailed Output"].indexOf(print_type.getValue()) != -1)
        controls.push(form.checkBox({label: "Variable names", name: "formNames", default_value: false, prompt: "Use names instead of labels"}));

}
if (!allow_control_groups || !isEmpty(varInput))
{
    if (!is_pca || print_type.getValue() == "2D Scatterplot")
    {
        var groups = form.dropBox({name: "formGroups", label: "Group variable", types: ["V:numeric, categorical, ordered categorical"], multi:false, required:false, prompt: "Variable used to color the points"});
        controls.push(groups);
    }
}

if (algo_type.getValue() == "t-SNE")
{
    var perplex = form.numericUpDown({name: "formPerplexity", label: "Perplexity", default_value: 10, increment: 1, maximum: 100, minimum: 2,
                                      prompt: "Low values emphasize local rather than global structure"});
    controls.push(perplex);
}
form.setInputControls(controls);
library(flipDimensionReduction)

WarnIfVariablesSelectedFromMultipleDataSets()

dim.reduce <- DimensionReductionScatterplot(algorithm = formAlgorithm,
    data = get0("formVariables"),
    data.groups = if (exists("formGroups") && length(formVariables) > 0) formGroups else NULL,
    table = if (!is.null(get0("formDistanceRaw"))) formDistanceRaw else get0("formDistance"),
    raw.table = !is.null(get0("formDistanceRaw")),
    binary = get0("formBinary", ifnotfound = FALSE),
    perplexity = get0("formPerplexity", ifnotfound = 0),
    normalization = get0("formNormalization", ifnotfound = FALSE),
    # Parameters for PCA
    weights = QCalibratedWeight,
    missing = get0("missingType"),
    select.n.rule = get0("selectRule"),
    rotation = get0("rotationType"),
    eigen.min = get0("eigenMin"),
    n.factors = get0("numberFactors"),
    sort.coefficients.by.size = get0("sortCoefficients"),
    suppress.small.coefficients = get0("suppressCoefficients"),
    min.display.loading.value = get0("minLoading", ifnotfound = 0),
    print.type = get0("printType"),
    plot.labels = get0("scatterPlotLabels"),
    promax.kappa = get0("kappa"),
    oblimin.delta = get0("delta"),
    show.labels = !isTRUE(get0("formNames")),
    subset = QFilter)

See Also