Latent Class Analysis and Mixture Models

From Q
Jump to navigation Jump to search

Types of latent class analysis

There are two qualitatively different varieties of latent class analysis in widespread use in survey research:

  1. Latent class regression, where the purpose of the analysis is to identify segments that contain different parameters. This model is most commonly used for creating segments with choice modeling data.
  2. Model-based clustering, where a series of numeric, categorical or ranking variables are used to create segments. This variant of latent class analysis is most commonly applied when creating segments from attitudinal data. It is essentially an improvement on cluster analysis, in that it can deal with multiple types of data (rather than just numeric data) and it automatically addresses missing values.

In a formal statistical sense these are just applications of the same model. Within Q, when Segments is used to conduct latent class analysis it automatically chooses which of these models to run based on the data that is selected. If the data is an Experiment, such as a choice model, then Q's latent class analysis is the same as latent class regression. If the data consists of numeric ratings, rankings, categorical variables or binary variables, Q's latent class analysis is model-based clustering. And, if the data consists of both an experiment and, say, ratings, then Q will estimate a latent class model using both types of data..

Mixture distribution

A latent class model assumes the existence of a latent categorical variable (i.e., it assumes that the population consists of a finite number of types of people). This is the default mixing distribution used in Q (when running Create > Segments). Alternative Distributions can be chosen.

The standard alternative to the latent class model is to assume that a single multivariate normal distribution describes the population. This model is sometimes referred to in marketing research as the 'hierarchical Bayes'. A generalization of this model is to estimate multiple multivariate normal distrbutions (i.e., one per segment). This model can, in theory, approximate any type of heterogeneity[1] (i.e., it is a substantially more general model than 'hierarchical Bayes'). Q automatically fits this model when the user sets the Distribution to Multivariate Normal - Full Covariance; if a single segment is specified the model is basically the same as with hierarchical bayes.

Additional mixture models are available by setting Distribution. These constrain the properties of the covariance matrix (e.g., assuming that the covariance matrix is identical in classes, diagonal, block diagonal and spherical).

Response variable type (e.g., linear, categorical)

As in the rest of Q, the types of models that are estimated are determined automatically by the program by looking at the Question Types and Variable Types.

How the data is set up in Q Statistical model
Question Type = Experiment, Dependent Variable's Variable Type = Numeric Linear regression (e.g., latent class linear regression)
Question Type = Experiment, Dependent Variable's Variable Type = Categorical Multinomial Logit (e.g., latent class logit)
Question Type = Experiment, Dependent Variable's Variable Type = Ordered Categorical Rank-Ordered Logit with Ties
Question Type = Pick Any

Question Type = Pick One - Multi

Question Type = Pick One

Ranking Rank-Ordered Logit Model with Ties
Number Normal
Number - Multi Multivariate Normal

Save Individual-Level Parameter Means and Standard Deviations

When a mixture model is created using the Experiment question type, Q is able to produce an estimate of the parameter for each respondent. This is done by right-clicking on the tree-like output and selecting Save Individual-Level Parameter Means and Standard Deviations. See Individual-Level Parameters for more information.

See also



Further reading: Latent Class Analysis Software

  1. .Kenneth Train (2009), Discrete Choice Methods with Simulation, Cambridge University Press, Second edition, 2009.