Choice Modeling - Latent Class Analysis

From Q
Jump to: navigation, search

Analyse a choice-based conjoint experiment with Latent Class Analysis.

The experimental design and respondent choices are required. These may be provided together (as an Experiment question or Sawtooth CHO format file), or separately with the design as an Experimental design R output, Sawtooth dual file or JMP file, plus respondent choices and tasks as variables. Simulated responses may be used in place of respondent choices.

How to Create

  1. Add the object:
    1. In Displayr: Insert > More > Choice Modeling > Latent Class Analysis
    2. In Q: Automate > Browse Online Library > Choice Modeling > Latent Class Analysis
  2. In Inputs > Design source chose the appropriate source from the list of options
  3. Other required fields, will be highlighted in red

Example

Output Example:
The table below shows the output of the analysis, containing histograms of the estimated parameters of the respondents:


Input Example:
The analysis can take various inputs as described in the Options section. The above output was created using an Experiment question, which looks like the table below. You can learn more about how to setup Experiment questions on our Category:Experiments page.


Options

EXPERIMENTAL DESIGN

Design source The source of the experimental design. Choices include Data set, Experimental design R output, Sawtooth CHO format, Sawtooth dual file format as data set, JMP format as data set and Experiment question.

Version A variable containing the version indices (first column) from the (Sawtooth or JMP) design file, which has been uploaded as a data set.

Task A variable containing the task indices (second column) from the (Sawtooth or JMP) design file, which has been uploaded as a data set.

Attributes Variables containing the attributes from the (Sawtooth or JMP) design file, which has been uploaded as a data set. Alternative-specific designs are supported (attributes that do not apply to an alternative are coded as a 0).

Experimental design A Choice Model Design output.

CHO file text variable A text variable of lines from the CHO file, which has been uploaded as a data set. Note that the CHO file first needs to be renamed to have a file extension of .txt instead of .cho, so that it can be uploaded to Q as a data set.

Enter attribute levels Attribute levels for the design that are entered into a spreadsheet-style data editor. Each column begins with an attribute name and is followed by its attribute levels.

Code some categorical attributes as numeric Whether to treat some categorical attributes as numeric. If checked, a text box will appear below to allow the attribute and numeric coding to be specified as a comma-separated list, e.g. Weight, 1, 2, 3, 4. When one text box is filled, another text box will appear for another attribute to be specified.

Experiment question A choice-based conjoint Experiment question.

RESPONDENT DATA

Data source The respondent choice data to use, where the options differ based on which Design source was chosen. One option is to use simulated choices from priors. If this is checked, a button called "Enter priors" will appear immediately below, allowing priors to be entered. The format of the priors needs to follow those for Choice Modeling - Experimental Design. If priors are required for numeric attributes, place the numeric attribute name, mean and sd (optional) in the top row (this is the same as with categorical attributes), and in the second row, repeat the numeric attribute name followed by the values for mean and sd for the numeric parameter.

Respondent IDs A variable containing respondent IDs corresponding to those in the CHO file.

Prior source Choose between using priors from the choice model design output or manually entering the priors. If the design output contains no priors, prior means and standard deviations of 0 are assumed. Available when Experimental Design is selected as the Data source.

Simulated sample size The number of simulated respondents to generate.

Choices Variables containing the choices made by respondents.

Tasks Variables containing the sets of tasks that have been presented to respondents.

Version A variable containing the versions of tasks presented to respondents.

Missing data See Missing Data Options.

MODEL

Type The type of model to fit. The options are Latent Class Analysis and Hierarchical Bayes.

Number of classes The number of classes in the latent class analysis over respondents.

Questions left out for cross-validation The number of questions to leave out per respondent to be used for cross-validation.

Alternative-specific constants Whether to include alternative-specific constants in the model.

Seed The random seed used to determine the random initial parameters of the model and also used to determine the random questions to leave out for cross-validation.

Number of starts The number of times to run LCA with different initial parameters, where the final model is the one with the best log-likelihood.

DIAGNOSTICS

Class parameters Produces a table of class parameters from a latent class analysis.

Parameter statistics table Parameter statistics from the choice model.

Utilities plot Plots the utility of variables in a Choice Model.

SAVE VARIABLE(S)

Save class membership probabilities Saves variables that contain the probability of each case being in each latent class (if applicable).

Save individual-level coefficients Saves variables that contain the estimated coefficients for each case (e.g., respondent).

Save class membership Saves a variable to the data set containing class membership (i.e., the class with the highest posterior probability of class membership for the case).

Save proportion of correct predictions Saves a variable to the data set containing the proportion of correct predictions for each each case (e.g., respondent).

Save RLH Saves a variable to the data set containing the root likelihood for each case (e.g., respondent).

Save utilities (mean = 0) Saves variables that contain utilities scaled to have mean of 0 (within each attribute).

Save utilities (min = 0, mean range = 100) Saves variables that contain utilities scaled to have a minimum of 0 (within attribute) with a mean range of 100 (for each case).

Save utilities (min= 0, max range = 100) Saves variables that contain utilities scaled to have a minimum of 0 (within attribute) with a maximum range of 100 (for each case).

Save utilities (min = 0) Saves variables that contain utilities scaled to have a minimum of 0 (within each attribute).

Save utilities (mean = 0, mean range = 100) Saves variables that contain utilities scaled to have a mean of 0 (within attribute) with a maximum range of 100 (for each case).

Save utilities (mean = 0, max range = 100) Saves variables that contain utilities scaled to have a mean of 0 (within attribute) with a maximum range of 100 (for each case).

Additional Properties

When using this feature you can obtain additional information that is stored by the R code which produces the output.

  1. To do so, select Create > R Output.
  2. In the R CODE, paste: item = YourReferenceName
  3. Replace YourReferenceName with the reference name of your item. Find this in the Report tree or by selecting the item and then going to Properties > General > Name from the object inspector on the right.
  4. Below the first line of code, you can paste in snippets from below or type in str(item) to see a list of available information.

For a more in depth discussion on extracting information from objects in R, checkout our blog post here.


Code

var allow_control_groups = Q.fileFormatVersion() > 10.9; // Group controls for Displayr and later versions of Q

// Collect controls in an array and define "formType" comboBox first so that we can only define "formDRN" if
// formType == "Hierarchical Bayes" (dual-response nones are not implemented for MNL and LCA)
var controls = [];

if (allow_control_groups)
    form.group("Model"); // make sure this cbox appears in right group

var type = form.comboBox({name: "formType", label: "Type", 
                          prompt: "Type of choice model to fit",
                          alternatives: ["Multinomial logit",
                                         "Latent class analysis",
                                         "Hierarchical Bayes"],
                          default_value: "Latent class analysis"});

if (allow_control_groups)
    form.group(null);  // reset group

var is_mnl = type.getValue() == "Multinomial logit";
var is_lc = type.getValue() == "Latent class analysis";
var is_hb = type.getValue() == "Hierarchical Bayes";

function data_source_form()
{
    return(form.comboBox({
        label: "Data source", name: "formDataSource",
        alternatives: ["Choice and task variables in data set",
                       "Choice and version variables in data set",
                       "Simulated choices from priors"],
        default_value: "Choice and task variables in data set"
    }));
}

function choices_form(controls)
{
    var db = form.dropBox({name: "formChoices", label: "Choices", required: true, multi: true,
                  prompt: "Select variables from data sets containing the respondent choices for each question",
			   types: ["Variable: Numeric, Categorical, OrderedCategorical"]});
    controls.push(db);
    return(controls);
}

function tasks_form(controls)
{
    var db = form.dropBox({name: "formTasks", label: "Tasks", required: true, multi: true,
                  prompt: "Select variables from data sets containing the task numbers shown to the respondents for each question",
			   types: ["Variable: Numeric, Categorical, OrderedCategorical"]});
    controls.push(db);
    return(controls);
}

function drn_form(controls)
{
    var db = form.dropBox({name: "formDRN", label: "Dual-response 'none' choice", required: false, multi: true,
                  prompt: "Select variables from data sets containing the respondent dual-response 'none' question choices for each question",
			   types: ["Variable: Numeric, Categorical, OrderedCategorical"]});
    controls.push(db);
    return(controls);
}

function version_form(controls)
{
    var db = form.dropBox({name: "formDataVersion", label: "Version", required: true,
                      prompt: "Select a variable from a data set containing the version numbers shown to the respondents",
			   types: ["Variable: Numeric, Categorical, OrderedCategorical"]});
    controls.push(db);
    return(controls);
}

function design_form(controls, is_hb)
{
    var db = form.dropBox({label: "Version",
                  types: ["Variable: Numeric, Categorical, OrderedCategorical"],
			   name: "formVersion", multi: false, prompt: "Version variable from design (first column)"});
    controls.push(db);
    db = form.dropBox({label: "Task",
                  types: ["Variable: Numeric, Categorical, OrderedCategorical"],
                       name: "formTask", multi: false, prompt: "Task variable from design (second column)"});
    controls.push(db);
    db = form.dropBox({label: "Attributes",
                  types: ["Variable: Numeric, Categorical, OrderedCategorical"],
                       name: "formAttributes", multi: true, prompt: "Attribute variables from design"});
    controls.push(db);
    var de = form.dataEntry({name: "formEnteredLevels",
                    prompt: "Enter attribute names and levels",
                    label: "Enter attribute levels"});
    controls.push(de);

    controls = numeric_form(controls);

    if (allow_control_groups)
        form.group("Respondent data");
    var ds_control = data_source_form();
    controls.push(ds_control);
    var data_source = ds_control.getValue();
    if (data_source == "Choice and task variables in data set")
    {
        controls = choices_form(controls);
        controls = tasks_form(controls);
	if (is_hb)
            controls = drn_form(controls);
    }
    else if (data_source == "Choice and version variables in data set")
    {
        controls = choices_form(controls);
        controls = version_form(controls);
	if (is_hb)
            controls = drn_form(controls)
    }
    else if (data_source == "Simulated choices from priors")
    {
        controls = prior_form(controls);
        controls = sample_size_form(controls);
    }
    return(controls);
}

function numeric_form(controls)
{
    var cb = form.checkBox({label: "Code some categorical attributes as numeric",
                            name: "formNumeric", default_value: false});
    var has_numeric = cb.getValue();
    controls.push(cb);
    if (has_numeric)
    {
        var i = 1;
	var tb = form.textBox();
	var attribute = "";
        while (i == 1 || attribute != "") {
            tb = form.textBox({name: "formNumericAttribute" + i,
                                      label: "Attribute " + i,
                                      prompt: "Attribute name followed by numeric coding for each level, delimited by commas",
                               required: i == 1})
	    controls.push(tb);
            attribute = tb.getValue();
            ++i;
        }
    }
    return(controls);
}

function prior_form(controls)
{
    var de = form.dataEntry({name: "formSimulatedPriors",
                    prompt: "Enter priors to use to generate simulated data.",
			     label: "Enter priors"});
    controls.push(de);
    return(controls);
}

function sample_size_form(controls)
{
    var nup = form.numericUpDown({name: "formSampleSize",
                        label: "Simulated sample size",
                        default_value: 300,
                        increment: 100,
                        maximum:1000000,
				  minimum: 0});
    controls.push(nup);
    return(controls);
}

var web_mode = (!!Q.isOnTheWeb && Q.isOnTheWeb());
if (Q.fileFormatVersion() < 12.31 && !web_mode)
{
    var msg = "A newer version of Q (version 5.3) is required to run Hierarchical Bayes. Please contact support@q-researchsoftware.com to upgrade.";
    alert(msg);
    throw msg;
}

if (allow_control_groups)
    form.group("Experimental design")

var dataset = "Data set"
var experiment = web_mode ? "Experiment variable set" : "Experiment question";
var cho = "Sawtooth CHO format";
var dual = "Sawtooth dual file format as data set";
var jmp = "JMP format as data set";
var design = "Experimental design R output";
var dt_cb = form.comboBox({name: "formDataType",
                               prompt: "What input format is your experimental design in?",
                               label: "Design source",
                               alternatives: [dataset, design, cho, dual, jmp, experiment],
                           default_value: dataset});
controls.push(dt_cb);
var data_type = dt_cb.getValue();

if (data_type == dataset || data_type == dual || data_type == jmp){
    controls = design_form(controls, is_hb);
}else if (data_type == experiment)
{
    var db = form.dropBox({label: experiment,
                  types:["Question: Experiment"],
                  prompt: "Select an " + experiment + " from a data set",
                  name: "formExperiment", multi: false});
    controls.push(db);
    controls = numeric_form(controls);

    if (allow_control_groups)
        form.group("Respondent data")

    var sp_cb = form.comboBox({
	      label: "Data source", name: "formDataSource",
	      alternatives: [experiment, "Simulated choices from priors"],
	      default_value: experiment
    });
    controls.push(sp_cb);
    var simulated_prior = sp_cb.getValue() == "Simulated choices from priors";
    if (simulated_prior)
    {
        controls = prior_form(controls);
        controls = sample_size_form(controls);
    }
}
else if (data_type == design)
{
    var db = form.dropBox({name: "formDesignObject", label: "Experimental design",
                  required: true, multi: false,
                  prompt: "Select an output from Choice Modeling - Experimental Design",
                  types: ["RItem:ChoiceModelDesign"]});
    controls.push(db);
    controls = numeric_form(controls);
    
    if (allow_control_groups)
        form.group("Respondent data")

    var ds_control = data_source_form();
    controls.push(ds_control);
    var data_source = ds_control.getValue();
    if (data_source == "Choice and task variables in data set")
    {
        controls = choices_form(controls);
        controls = tasks_form(controls);
	if (is_hb)
            controls = drn_form(controls);
    }
    else if (data_source == "Choice and version variables in data set")
    {
        controls = choices_form(controls);
        controls = version_form(controls);
	if (is_hb)
            controls = drn_form(controls);
    }
    else if (data_source == "Simulated choices from priors")
    {
	var sp_cb = form.comboBox({
            name: "formPriorSource",
            label: "Prior source",
            alternatives: ["Use priors from design", "Enter priors"],
            default_value: "Use priors from design"
        });
	controls.push(sp_cb);
        var simulated_prior_from_design = sp_cb.getValue() == "Use priors from design";
        if (!simulated_prior_from_design)
            controls = prior_form(controls);
        controls = sample_size_form(controls);
    }
}
else if (data_type == cho)
{
    var db = form.dropBox({name: "formChoVariable",
                  label: "CHO file text variable",
                  types: ["Variable: Text"],
			   prompt: "Text variable from an uploaded CHO file"});
    controls.push(db);
    var de = form.dataEntry({name: "formEnteredLevels",
                    prompt: "Enter attribute names and levels",
                    label: "Enter attribute levels"});
    controls.push(de);
    controls = numeric_form(controls);

    if (allow_control_groups)
        form.group("Respondent data")
    var sp_cb = form.comboBox({
        label: "Data source", name: "formDataSource",
        alternatives: ["CHO file", "Simulated choices from priors"],
        default_value: "CHO file"
    });
    controls.push(sp_cb);
    
    var simulated_prior = sp_cb.getValue() == "Simulated choices from priors";
    if (simulated_prior)
    {
        controls = prior_form(controls);
        controls = sample_size_form(controls);
    }
    else{
        db = form.dropBox({label: "Respondent IDs",
                      types:["Question: Number"],
			   name: "formRespondentID", multi: false, required: false});
	controls.push(db);
    }
}
    
var cb = form.comboBox({label: "Missing data", name: "formMissing",
               prompt: "How should missing data be handled?",
               alternatives: ["Error if missing data", "Exclude cases with missing data", "Use partial data"],
               default_value: "Use partial data"});
controls.push(cb);

if (allow_control_groups)
    form.group("Model");

controls.push(type);

var nup = [];
if (is_lc)
{
    nup = form.numericUpDown({name: "formClassesLC", label: "Number of classes", 
                        prompt: "Add latent classes to the model",  
                        default_value: 2, increment: 1, maximum:100, minimum: 1});
    controls.push(nup);
}
if (is_hb){ // separate controls for HB and LC so classes are not carried over
    nup = form.numericUpDown({name: "formClassesHB", label: "Number of classes", 
                        prompt: "Add latent classes to the model",   
                        default_value: 1, increment: 1, maximum:100, minimum: 1});
    controls.push(nup);
}
nup = form.numericUpDown({name: "formCV", label: "Questions left out for cross-validation", 
                    prompt: "Number of questions to exclude from fitting and use for out-of-sample prediction",
                    default_value: 0, increment: 1, maximum:100, minimum: 0});
controls.push(nup);

if (data_type != experiment){
    var cb = form.checkBox({label: "Alternative-specific constants",
                   name: "formASC", default_value: true,
                   prompt: "Include alternative-specific constants in the model"});
    controls.push(cb);
}
if (is_hb)
{
    nup = form.numericUpDown({name: "formSeed", label: "Seed", default_value: 123, 
                        prompt: "The random seed", minimum: -999999999, maximum: 999999999,
                              increment: 1});
    controls.push(nup);

    nup = form.numericUpDown({name: "formIterations", label: "Iterations", default_value: 100, 
                        prompt : "Number of samples to draw from each chain",  
				    increment: 10, maximum:1000000, minimum: 1});
    controls.push(nup);
    var iterations = nup.getValue();

    if (allow_control_groups)
        form.group("Advanced")

    db = form.dropBox({name: "formCovariates", label: "Respondent-specific covariates",
                  prompt: "Select Variables from Data Sets containing profiling (respondent-specific) variables",
                  required: false, multi: true,
                  types: ["Variable: Numeric, Categorical, OrderedCategorical"]});
    controls.push(db);
    nup = form.numericUpDown({name: "formChains", label: "Chains", default_value: 8, 
                        prompt: "Number of separate chains to sample from in parallel",   
                              increment: 1, maximum:1000, minimum: 1});
    controls.push(nup);
    
    var chains = nup.getValue();
    nup = form.numericUpDown({name: "formMaxTreeDepth", label: "Maximum tree depth", 
                        prompt: "Increase if receiving warnings about reaching maximum tree depth",    
                        default_value: 10, increment: 1, maximum:1000, minimum: 1});
    controls.push(nup);

    nup = form.numericUpDown({name: "formAdaptDelta", label: "Adapt delta", 
                        prompt: "Increase if receiving warnings about low adapt delta",    
                        default_value: 0.8, increment: 0.001, maximum: 0.999, minimum: 0.001});
    controls.push(nup);

    if (allow_control_groups)
        form.group("Simulation")
    nup = form.numericUpDown({name: "formSavedDraws", label: "Iterations saved per individual", default_value: 0, 
                        prompt: "The number of utility draws per individual respondent to be used in simulation",   
                              increment: 1, maximum: chains * iterations / 2, minimum: 0});
    controls.push(nup);
}

if (is_lc || is_mnl)
{
    nup = form.numericUpDown({name: "formSeed", label: "Seed", default_value: 123, 
                        prompt: "The random seed", minimum: -999999999, maximum: 999999999,
                              increment: 1});
    controls.push(nup);
}
if (is_lc)
{
    nup = form.numericUpDown({name: "formNStarts", label: "Number of starts", default_value: 1, 
                              prompt: "Number of times to start LCA", minimum: 1, maximum: 1000000, 
                              increment: 1});
    controls.push(nup);
}

form.setInputControls(controls);

if (is_mnl)
    form.setHeading("Choice Modeling - " + "Multinomial logit");
else if (is_lc)
    form.setHeading("Choice Modeling - " + "Latent Class Analysis");
else if (is_hb)
    form.setHeading("Choice Modeling - " + "Hierarchical Bayes");
library(flipU)
library(flipChoice)
simulated.priors <- if (!exists("formSimulatedPriors")) {
    NULL
} else if (is.null(formSimulatedPriors)) {
    structure(character(0), .Dim = c(0L, 0L))
} else
    formSimulatedPriors

is.hb <- formType == "Hierarchical Bayes"

design <- if (exists("formVersion"))
{
    n.rows <- length(formTask)
    n.alternatives <- which(diff(as.numeric(formTask)) == 1)[1]
    alternative <- rep(1:n.alternatives, n.rows / n.alternatives)
    c(list(Version = formVersion, Task = formTask, Alternative = alternative), formAttributes)
}

if (is.hb && !is.null(formCovariates)){
    frml <- QFormula(~formCovariates)
    dat <- QDataFrame(formCovariates)
    if (get0("formClassesHB") == 1)
    {
        if (length(dat)){
            ## Convert back to original/non-syntactic names; DS-4580
            names(dat) <- attr(stats::terms(frml), "term.labels")
            frml <- flipData::AddFormulaBars(frml, dat)
        }else
            frml <- dat <- NULL
    }
}else
    frml <- dat <- NULL

if (formNumeric) {
    n.attributes <- 0
    while (get0(paste0("formNumericAttribute", n.attributes + 1)) != "")
        n.attributes <- n.attributes + 1
    cat.to.num.attr <- sapply(paste0("formNumericAttribute", seq(n.attributes)), get0)
    cat.to.num.attr <- sapply(cat.to.num.attr, ConvertCommaSeparatedStringToVector, simplify = FALSE)
    names(cat.to.num.attr) <- sapply(cat.to.num.attr, function (x) x[1])
    cat.to.num.attr <- sapply(cat.to.num.attr, function (x){
        result <- as.numeric(x[2:length(x)])
        if (any(is.na(result)))
            stop("The coding for the levels of ", x[1] ," needs to be numeric.")
        result
    }, simplify = FALSE)
} else {
    cat.to.num.attr <- NULL
}

choice.model <- FitChoiceModel(
    design = get0("formDesignObject"),
    experiment.data = get0("formExperiment"),
    cho.lines = get0("formChoVariable"),
    attribute.levels = get0("formEnteredLevels"),
    design.variables = design,
    tasks = get0("formTasks"),
    version = get0("formDataVersion"),
    choices = get0("formChoices"),
    dual.response.none.choices = get0("formDRN"),
    n.classes = get0("formClassesLC", ifnotfound = 1) * get0("formClassesHB", ifnotfound = 1),
    subset = if (all(QFilter)) NULL else QFilter,
    weights = QPopulationWeight,
    missing = formMissing,
    tasks.left.out = formCV,
    algorithm = if (is.hb) "HB-Stan" else "LCA",
    hb.iterations = get0("formIterations"),
    hb.chains = get0("formChains"),
    hb.max.tree.depth = get0("formMaxTreeDepth"),
    respondent.ids =  get0("formRespondentID"),
    cov.formula = frml, cov.data = dat,
    synthetic.priors = simulated.priors,
    synthetic.priors.from.design = get0("formPriorSource", ifnotfound = "") == "Use priors from design",
    synthetic.sample.size = get0("formSampleSize"),
    include.choice.parameters = get0("formASC"),
    hb.beta.draws.to.keep = get0("formSavedDraws"),
    categorical.to.numeric.attributes = cat.to.num.attr,
    hb.sigma.prior.rate = 1,
    hb.sigma.prior.shape = 1,
    hb.adapt.delta = if(exists("formAdaptDelta")) formAdaptDelta else 0,
    seed = formSeed,
    lc.n.starts = if (exists("formNStarts")) formNStarts else 1)