Correlation - Distances

From Q
Jump to: navigation, search

Computes a distance or similarity matrix comparing variables or cases. If a case contains missing values, it is omitted from the analysis. Weights are not applicable when comparing cases. See What is a Distance Matrix? for more information about distance matrices.

Example

Distance and similarity matrices are displayed as triangular heatmaps:

Options

Compare Whether variables or cases are to be compared.

Case labels If cases are compared, these are the labels to use to refer to each case. Otherwise, the case index is used.

Variables The variables that will be included in this analysis.

Variable names Whether to display Variable Names in the output instead of Variable Labels.

Categorical as binary Represents unordered categorical variables as binary variables. Otherwise, they are represented as sequential integers (i.e., 1 for the first category, 2 for the second, etc.). Numeric - Multi variables are treated according to their numeric values and not converted to binary.

Measure Whether to measure similarities or dissimilarities.

Similarity measure The type of similarity measure to use:

Correlation Pearson correlation.
Cosine The cosine of the angle between a pair of vectors which represent a pair of variables or cases.

Distance measure The type of distance/dissimilarity measure to use. The options are (refer to dist for more information):

Euclidean
Squared Euclidean The square of the Euclidean distance
Maximum
Manhattan
Minkowski

Minkowski power (p) The power parameter for the Minkowski distance measure.

Data standardization The standardization method. Choices include:

None No standardization is performed.
z-scores Values are transformed to have mean zero and a standard deviation of one.
Range [-1,1] Values are divided by their range.
Range [0,1] Values are subtracted by their minimum value and divided by their range.
Mean of 1 Values are divided by their mean. If the mean is zero, the values will be unchanged.
Standard deviation of 1 Values are divided by their standard deviation.

For methods that require variation in the values (z-scores, Range [-1,1], Range [0,1] and Standard deviation of 1), if there is no variation in the values, they will be set to zero instead.

Standardize by Whether to standardize by variables or cases.

Measure transformation Transformation of the measures. Choices include None, Absolute values, Reverse sign and Range [0,1]. For Range [0,1], measures on the diagonal are ignored in the transformation.

Show cell values Whether to display cell values, or if this should be determined based on an estimate of available space (Automatic).

Show row labels Whether to display row labels.

Show column labels Whether to display column labels.

Acknowledgements

Uses the R package weights and the d3heatmap htmlwidget.

Code

form.setHeading("Distances");
formCompare = form.comboBox({name: "formCompare",
                             label: "Compare",
                             alternatives: ["Cases", "Variables"],
                             default_value: "Variables",
                             prompt: "Whether to compare cases (rows) or variables (columns)"});
if (formCompare.getValue() == "Cases")
    form.dropBox({name: "formCaseLabels",
                  label: "Case labels",
                  types: ["V:text"],
                  required: false,
                  prompt: "Labels to use to refer to each case"});
form.dropBox({name: "formVariables",
              label: "Variables",
              types: ["V:numeric, categorical, ordered categorical"],
              multi: true,
              prompt: "Select variables to analyse",
              height: 8});
if (formCompare.getValue() == "Variables")
    form.checkBox({label: "Variable names", name: "formNames", default_value: false, prompt: "Display names instead of labels"});
form.checkBox({ name: "binaryCat", label: "Categorical as binary", default_value: false, prompt: "Code categorical variables as dummy variables"});
formMeasure = form.comboBox({name: "formMeasure",
                             label: "Measure",
                             alternatives: ["Dissimilarities", "Similarities"],
                             default_value: "Dissimilarities"});
if (formMeasure.getValue() == "Dissimilarities")
{
    formDistance = form.comboBox({name: "formDistance",
                                  label: "Distance measure",
                                  alternatives: ["Euclidean", "Squared Euclidean", "Maximum", "Manhattan", "Minkowski"],
                                  default_value: "Euclidean"});
    if (formDistance.getValue() == "Minkowski")
        form.numericUpDown({name: "formMinkowski",
                            label: "Minkowski power (p)",
                            default_value: 2,
                            minimum: 1,
                            maximum: 999999});
}
else
    form.comboBox({name: "formSimilarity",
                   label: "Similarity measure",
                   alternatives: ["Correlation", "Cosine"],
                   default_value: "Correlation"});

formStandardize = form.comboBox({name: "formStandardize",
                                 label: "Data standardization",
                                 alternatives: ["None", "z-scores", "Range [-1,1]", "Range [0,1]", "Mean of 1", "Standard deviation of 1"],
                                 default_value: "None",
                                 prompt: "Options to standardize the data"});
if (formStandardize.getValue() != "None")
    form.comboBox({name: "formStandardizeBy",
                   label: "Standardize by",
                   alternatives: ["Variable", "Case"],
                   default_value: "Case",
                   prompt: "Standardize by case (row) or variable (column)"});
form.comboBox({name: "formMeasureTransform",
               label: "Measure transformation",
               alternatives: ["None", "Absolute values", "Reverse sign", "Range [0,1]"],
               default_value: "None",
               prompt: "Options to transform the measure"});
form.comboBox({label: "Show cell values", name: "formCell", default_value: "Automatic",
               alternatives: ["Automatic", "Yes", "No"]});
form.comboBox({label: "Show row labels", name: "formRowLabels", default_value: "Yes",
               alternatives: ["Yes", "No"]});
form.comboBox({label: "Show column labels", name: "formColumnLabels", default_value: "Yes",
               alternatives: ["Yes", "No"]});
library(flipDimensionReduction)
 
distance.matrix <- DistanceMatrix(QDataFrame(formVariables),
    compare = formCompare,
    case.labels = formCaseLabels,
    variable.names = formNames,
    binary = binaryCat,
    measure = formMeasure,
    similarity.measure = formSimilarity,
    distance.measure = formDistance,
    minkowski = formMinkowski,
    standardization = formStandardize,
    standardize.by = formStandardizeBy,
    measure.transformation = formMeasureTransform,
    show.cell.values = formCell,
    show.row.labels = formRowLabels,
    show.column.labels = formColumnLabels,
    subset = QFilter,
    weights = QCalibratedWeight)