Segments - Hierarchical Cluster Analysis

From Q
Jump to: navigation, search

This method is only available in Q5.

Creates a dendrogram of the distance between variables using hierarchical cluster analysis



Variables The variables that you would like to analyze.

Distance The formula used to compute the distance between points, prior to clustering.

Clustering method The algorithm used to form the clusters. The default is ward.D2, which is usually known as Ward's method.

Number of clusters The number of clusters to color-code in the dendrogram.

Variable names Displays Variable Names in the output.

Categorical as binary Represents unordered categorical variables as binary variables. Otherwise, they are represented as sequential integers (i.e., 1 for the first category, 2 for the second, etc.).

Label margin Set the width of the right-hand margin to accommodate long labels.


The R package networkD3 is used to create the dendrogram.


form.setHeading('Hierarchical Cluster Analysis');
form.dropBox({ name: "formVariables", label: "Variables", types: ["V:numeric, categorical, ordered categorical"], multi:true });
form.numericUpDown({name: "formClusters", label: "Number of clusters", default_value: 1, increment: 1, maximum:100, minimum: 1});
form.comboBox({ name: "formDistanceMethod", label: "Distance", alternatives: ["euclidean", "maximum", "manhattan", "canberra", "binary"], default_value: "euclidean" });
form.comboBox({ name: "formClusteringMethod", label: "Clustering method", alternatives: ["ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"], default_value: "ward.D2"});
form.checkBox({label: "Variable names", name: "formNames", default_value: false});
form.checkBox({ name: "binaryCat", label: "Categorical as binary", default_value: false });
form.numericUpDown({name: "formLabelMargin", label: "Label margin", default_value: 200, increment: 50, minimum: 0, maximum: 10000});

dat <- TidyRawData(QDataFrame(formVariables), subset = QFilter, as.binary = binaryCat,
                    weights = QCalibratedWeight, as.numeric = TRUE,
                    missing = "Exclude cases with missing data",
                    extract.common.lab.prefix = !formNames)
if (!formNames)
    colnames(dat) <- sapply(dat, attr, "label")

weights <- attr(dat, "weights")
if (!is.null(weights))
    dat <- sweep(dat, 1, weights, "*")

number.segments <- formClusters
if (number.segments > ncol(dat))
  stop("You have more segments than variables in the analysis.")

# Computing the distance matrix.
distance.matrix <- dist(t(data.matrix(dat)), method = formDistanceMethod)
# Hierarchical cluster analysis.
hc <- hclust(distance.matrix, formClusteringMethod)
qColors <- c("red","blue","green","brown","orange")
while ( number.segments > length(qColors))
       qColors <- c(qColors, qColors)
colors <- qColors[cutree(hc, number.segments)]

hca.2 <- dendroNetwork(hc,
    textColour = colors,
    width = QOutputSizeWidth * 72 - 20,
    height = QOutputSizeHeight * 72 - 20,
    margins = list(top = 0, right = formLabelMargin, bottom = 0, left = 5))