Choice Modeling - Hierarchical Bayes

From Q
Jump to: navigation, search

Analyse a choice-based conjoint experiment with Hierarchical Bayes.

The experimental design and respondent choices are required. These may be provided together (as an Experiment question or Sawtooth CHO format file), or separately with the design as an experimental design R output, Sawtooth dual file or JMP file, plus respondent choices and tasks as variables. Simulated responses may be used in place of respondent choices.

Example

The table below shows the output of the analysis, containing histograms of the estimated parameters of the respondents:

Options

Design source The source of the experimental design. Choices include Data set, Experimental design R output, Sawtooth CHO format, Sawtooth dual file format as data set, JMP format as data set and Experiment question.

Version A variable containing the version indices (first column) from the (Sawtooth or JMP) design file, which has been uploaded as a data set.

Task A variable containing the task indices (second column) from the (Sawtooth or JMP) design file, which has been uploaded as a data set.

Attributes Variables containing the attributes from the (Sawtooth or JMP) design file, which has been uploaded as a data set. Alternative-specific designs are supported (attributes that do not apply to an alternative are coded as a 0).

Experimental design A Choice Model Design output.

CHO file text variable A text variable of lines from the CHO file, which has been uploaded as a data set. Note that the CHO file first needs to be renamed to have a file extension of .txt instead of .cho, so that it can be uploaded to Q as a data set.

Enter attribute levels Attribute levels for the design that are entered into a spreadsheet-style data editor. Each column begins with an attribute name and is followed by its attribute levels.

Code some categorical attributes as numeric Whether to treat some categorical attributes as numeric. If checked, a text box will appear below to allow the attribute and numeric coding to be specified as a comma-separated list, e.g. Weight, 1, 2, 3, 4. When one text box is filled, another text box will appear for another attribute to be specified.

Experiment question A choice-based conjoint Experiment question.

Data source The respondent choice data to use, where the options differ based on which Design source was chosen. One option is to use simulated choices from priors. If this is checked, a button called "Enter priors" will appear immediately below, allowing priors to be entered. The format of the priors needs to follow those for Choice Modeling - Experimental Design.

Respondent IDs A variable containing respondent IDs corresponding to those in the CHO file.

Prior source Choose between using priors from the choice model design output or manually entering the priors. If the design output contains no priors, prior means and standard deviations of 0 are assumed. Available when Experimental Design is selected as the Data source.

Simulated sample size The number of simulated respondents to generate.

Choices Variables containing the choices made by respondents.

Tasks Variables containing the sets of tasks that have been presented to respondents.

Dual-response 'none' choice (Optional) Variables indicating dual-response 'None of these' choices. Should be the same number of variables as Choices and Tasks. A value of 1/"Yes" for a respondent indicates that yes, they would buy their selected choice.

Missing data See Missing Data Options.

Type The type of model to fit. The options are Latent Class Analysis and Hierarchical Bayes.

Number of classes The number of classes in the latent class analysis over respondents.

Questions left out for cross-validation The number of questions to leave out per respondent to be used for cross-validation.

Alternative-specific constants Whether to include alternative-specific constants in the model.

Iterations The number of iterations used in the Hierarchical Bayes analysis.

Chains The number of chains used in the Hierarchical Bayes analysis.

Respondent-specific covariates Variables containing respondent-specific covariates to be included in the model.

Maximum tree depth The maximum tree depth parameter. Only increase this if warnings about "tree depth" are shown.

Iterations saved per individual The number of Hierarchical Bayes utility draws to be saved per individual respondent. Draws are used in simulation. The maximum permitted number is Iterations * Chains / 2.

Technical Details

An R package called flipChoice is used to run the Hierarchical Bayes analysis. flipChoice uses rstan to fit the underlying Bayesian statistical model, which is itself an R interface for Stan.

See Also

A worked example including video is available in this blog post

For further information on Hierarchical Bayes modeling, please refer to chapter 5 from Bayesian Statistics and Marketing.

Code

var allow_control_groups = Q.fileFormatVersion() > 10.9; // Group controls for Displayr and later versions of Q

// Collect controls in an array and define "formType" comboBox first so that we can only define "formDRN" if
// formType == "Hierarchical Bayes" (dual-response nones are not implemented for MNL and LCA)
var controls = [];

if (allow_control_groups)
    form.group("Model"); // make sure this cbox appears in right group

var type = form.comboBox({name: "formType", label: "Type", 
                          prompt: "Type of choice model to fit",
                          alternatives: ["Multinomial logit",
                                         "Latent class analysis",
                                         "Hierarchical Bayes"],
                          default_value: "Hierarchical Bayes"});

if (allow_control_groups)
    form.group(null);  // reset group

var is_mnl = type.getValue() == "Multinomial logit";
var is_lc = type.getValue() == "Latent class analysis";
var is_hb = type.getValue() == "Hierarchical Bayes";

function data_source_form()
{
    return(form.comboBox({
        label: "Data source", name: "formDataSource",
        alternatives: ["Choice and task variables in data set",
                       "Choice and version variables in data set",
                       "Simulated choices from priors"],
        default_value: "Choice and task variables in data set"
    }));
}

function choices_form(controls)
{
    var db = form.dropBox({name: "formChoices", label: "Choices", required: true, multi: true,
                  prompt: "Select variables from data sets containing the respondent choices for each question",
			   types: ["Variable: Numeric, Categorical, OrderedCategorical"]});
    controls.push(db);
    return(controls);
}

function tasks_form(controls)
{
    var db = form.dropBox({name: "formTasks", label: "Tasks", required: true, multi: true,
                  prompt: "Select variables from data sets containing the task numbers shown to the respondents for each question",
			   types: ["Variable: Numeric, Categorical, OrderedCategorical"]});
    controls.push(db);
    return(controls);
}

function drn_form(controls)
{
    var db = form.dropBox({name: "formDRN", label: "Dual-response 'none' choice", required: false, multi: true,
                  prompt: "Select variables from data sets containing the respondent dual-response 'none' question choices for each question",
			   types: ["Variable: Numeric, Categorical, OrderedCategorical"]});
    controls.push(db);
    return(controls);
}

function version_form(controls)
{
    var db = form.dropBox({name: "formDataVersion", label: "Version", required: true,
                      prompt: "Select a variable from a data set containing the version numbers shown to the respondents",
			   types: ["Variable: Numeric, Categorical, OrderedCategorical"]});
    controls.push(db);
    return(controls);
}

function design_form(controls, is_hb)
{
    var db = form.dropBox({label: "Version",
                  types: ["Variable: Numeric, Categorical, OrderedCategorical"],
			   name: "formVersion", multi: false, prompt: "Version variable from design (first column)"});
    controls.push(db);
    db = form.dropBox({label: "Task",
                  types: ["Variable: Numeric, Categorical, OrderedCategorical"],
                       name: "formTask", multi: false, prompt: "Task variable from design (second column)"});
    controls.push(db);
    db = form.dropBox({label: "Attributes",
                  types: ["Variable: Numeric, Categorical, OrderedCategorical"],
                       name: "formAttributes", multi: true, prompt: "Attribute variables from design"});
    controls.push(db);
    var de = form.dataEntry({name: "formEnteredLevels",
                    prompt: "Enter attribute names and levels",
                    label: "Enter attribute levels"});
    controls.push(de);

    controls = numeric_form(controls);

    if (allow_control_groups)
        form.group("Respondent data");
    var ds_control = data_source_form();
    controls.push(ds_control);
    var data_source = ds_control.getValue();
    if (data_source == "Choice and task variables in data set")
    {
        controls = choices_form(controls);
        controls = tasks_form(controls);
	if (is_hb)
            controls = drn_form(controls);
    }
    else if (data_source == "Choice and version variables in data set")
    {
        controls = choices_form(controls);
        controls = version_form(controls);
	if (is_hb)
            controls = drn_form(controls)
    }
    else if (data_source == "Simulated choices from priors")
    {
        controls = prior_form(controls);
        controls = sample_size_form(controls);
    }
    return(controls);
}

function numeric_form(controls)
{
    var cb = form.checkBox({label: "Code some categorical attributes as numeric",
                            name: "formNumeric", default_value: false});
    var has_numeric = cb.getValue();
    controls.push(cb);
    if (has_numeric)
    {
        var i = 1;
	var tb = form.textBox();
	var attribute = "";
        while (i == 1 || attribute != "") {
            tb = form.textBox({name: "formNumericAttribute" + i,
                                      label: "Attribute " + i,
                                      prompt: "Attribute name followed by numeric coding for each level, delimited by commas",
                               required: i == 1})
	    controls.push(tb);
            attribute = tb.getValue();
            ++i;
        }
    }
    return(controls);
}

function prior_form(controls)
{
    var de = form.dataEntry({name: "formSimulatedPriors",
                    prompt: "Enter priors to use to generate simulated data.",
			     label: "Enter priors"});
    controls.push(de);
    return(controls);
}

function sample_size_form(controls)
{
    var nup = form.numericUpDown({name: "formSampleSize",
                        label: "Simulated sample size",
                        default_value: 300,
                        increment: 100,
                        maximum:1000000,
				  minimum: 0});
    controls.push(nup);
    return(controls);
}

var web_mode = (!!Q.isOnTheWeb && Q.isOnTheWeb());
if (Q.fileFormatVersion() < 12.31 && !web_mode)
{
    var msg = "A newer version of Q (version 5.3) is required to run Hierarchical Bayes. Please contact support@q-researchsoftware.com to upgrade.";
    alert(msg);
    throw msg;
}

if (allow_control_groups)
    form.group("Experimental design")

var dataset = "Data set"
var experiment = web_mode ? "Experiment variable set" : "Experiment question";
var cho = "Sawtooth CHO format";
var dual = "Sawtooth dual file format as data set";
var jmp = "JMP format as data set";
var design = "Experimental design R output";
var dt_cb = form.comboBox({name: "formDataType",
                               prompt: "What input format is your experimental design in?",
                               label: "Design source",
                               alternatives: [dataset, design, cho, dual, jmp, experiment],
                           default_value: dataset});
controls.push(dt_cb);
var data_type = dt_cb.getValue();

if (data_type == dataset || data_type == dual || data_type == jmp){
    controls = design_form(controls, is_hb);
}else if (data_type == experiment)
{
    var db = form.dropBox({label: experiment,
                  types:["Question: Experiment"],
                  prompt: "Select an " + experiment + " from a data set",
                  name: "formExperiment", multi: false});
    controls.push(db);
    controls = numeric_form(controls);

    if (allow_control_groups)
        form.group("Respondent data")

    var sp_cb = form.comboBox({
	      label: "Data source", name: "formDataSource",
	      alternatives: [experiment, "Simulated choices from priors"],
	      default_value: experiment
    });
    controls.push(sp_cb);
    var simulated_prior = sp_cb.getValue() == "Simulated choices from priors";
    if (simulated_prior)
    {
        controls = prior_form(controls);
        controls = sample_size_form(controls);
    }
}
else if (data_type == design)
{
    var db = form.dropBox({name: "formDesignObject", label: "Experimental design",
                  required: true, multi: false,
                  prompt: "Select an output from Choice Modeling - Experimental Design",
                  types: ["RItem:ChoiceModelDesign"]});
    controls.push(db);
    controls = numeric_form(controls);
    
    if (allow_control_groups)
        form.group("Respondent data")

    var ds_control = data_source_form();
    controls.push(ds_control);
    var data_source = ds_control.getValue();
    if (data_source == "Choice and task variables in data set")
    {
        controls = choices_form(controls);
        controls = tasks_form(controls);
	if (is_hb)
            controls = drn_form(controls);
    }
    else if (data_source == "Choice and version variables in data set")
    {
        controls = choices_form(controls);
        controls = version_form(controls);
	if (is_hb)
            controls = drn_form(controls);
    }
    else if (data_source == "Simulated choices from priors")
    {
	var sp_cb = form.comboBox({
            name: "formPriorSource",
            label: "Prior source",
            alternatives: ["Use priors from design", "Enter priors"],
            default_value: "Use priors from design"
        });
	controls.push(sp_cb);
        var simulated_prior_from_design = sp_cb.getValue() == "Use priors from design";
        if (!simulated_prior_from_design)
            controls = prior_form(controls);
        controls = sample_size_form(controls);
    }
}
else if (data_type == cho)
{
    var db = form.dropBox({name: "formChoVariable",
                  label: "CHO file text variable",
                  types: ["Variable: Text"],
			   prompt: "Text variable from an uploaded CHO file"});
    controls.push(db);
    var de = form.dataEntry({name: "formEnteredLevels",
                    prompt: "Enter attribute names and levels",
                    label: "Enter attribute levels"});
    controls.push(de);
    controls = numeric_form(controls);

    if (allow_control_groups)
        form.group("Respondent data")
    var sp_cb = form.comboBox({
        label: "Data source", name: "formDataSource",
        alternatives: ["CHO file", "Simulated choices from priors"],
        default_value: "CHO file"
    });
    controls.push(sp_cb);
    
    var simulated_prior = sp_cb.getValue() == "Simulated choices from priors";
    if (simulated_prior)
    {
        controls = prior_form(controls);
        controls = sample_size_form(controls);
    }
    else{
        db = form.dropBox({label: "Respondent IDs",
                      types:["Question: Number"],
			   name: "formRespondentID", multi: false, required: false});
	controls.push(db);
    }
}
    
var cb = form.comboBox({label: "Missing data", name: "formMissing",
               prompt: "How should missing data be handled?",
               alternatives: ["Error if missing data", "Exclude cases with missing data", "Use partial data"],
               default_value: "Use partial data"});
controls.push(cb);

if (allow_control_groups)
    form.group("Model");

controls.push(type);

var nup = [];
if (is_lc)
{
    nup = form.numericUpDown({name: "formClassesLC", label: "Number of classes", 
                        prompt: "Add latent classes to the model",  
                        default_value: 2, increment: 1, maximum:100, minimum: 1});
    controls.push(nup);
}
if (is_hb){ // separate controls for HB and LC so classes are not carried over
    nup = form.numericUpDown({name: "formClassesHB", label: "Number of classes", 
                        prompt: "Add latent classes to the model",   
                        default_value: 1, increment: 1, maximum:100, minimum: 1});
    controls.push(nup);
}
nup = form.numericUpDown({name: "formCV", label: "Questions left out for cross-validation", 
                    prompt: "Number of questions to exclude from fitting and use for out-of-sample prediction",
                    default_value: 0, increment: 1, maximum:100, minimum: 0});
controls.push(nup);

if (data_type != experiment){
    var cb = form.checkBox({label: "Alternative-specific constants",
                   name: "formASC", default_value: true,
                   prompt: "Include alternative-specific constants in the model"});
    controls.push(cb);
}
if (is_hb)
{
    nup = form.numericUpDown({name: "formIterations", label: "Iterations", default_value: 100, 
                        prompt : "Number of samples to draw from each chain",  
				    increment: 10, maximum:1000000, minimum: 1});
    controls.push(nup);
    var iterations = nup.getValue();

    if (allow_control_groups)
        form.group("Advanced")

    db = form.dropBox({name: "formCovariates", label: "Respondent-specific covariates",
                  prompt: "Select Variables from Data Sets containing profiling (respondent-specific) variables",
                  required: false, multi: true,
                  types: ["Variable: Numeric, Categorical, OrderedCategorical"]});
    controls.push(db);
    nup = form.numericUpDown({name: "formChains", label: "Chains", default_value: 8, 
                        prompt: "Number of separate chains to sample from in parallel",   
                              increment: 1, maximum:1000, minimum: 1});
    controls.push(nup);
    
    var chains = nup.getValue();
    nup = form.numericUpDown({name: "formMaxTreeDepth", label: "Maximum tree depth", 
                        prompt: "Increase if receiving warnings about reaching maximum tree depth",    
                        default_value: 10, increment: 1, maximum:1000, minimum: 1});
    controls.push(nup);
    if (allow_control_groups)
        form.group("Simulation")
    nup = form.numericUpDown({name: "formSavedDraws", label: "Iterations saved per individual", default_value: 0, 
                        prompt: "The number of utility draws per individual respondent to be used in simulation",   
                              increment: 1, maximum: chains * iterations / 2, minimum: 0});
    controls.push(nup);
}
form.setInputControls(controls);

if (is_mnl)
    form.setHeading("Choice Modeling - " + "Multinomial logit");
else if (is_lc)
    form.setHeading("Choice Modeling - " + "Latent Class Analysis");
else if (is_hb)
    form.setHeading("Choice Modeling - " + "Hierarchical Bayes");
library(flipU)
library(flipChoice)
simulated.priors <- if (!exists("formSimulatedPriors")) {
    NULL
} else if (is.null(formSimulatedPriors)) {
    structure(character(0), .Dim = c(0L, 0L))
} else
    formSimulatedPriors

is.hb <- formType == "Hierarchical Bayes"

design <- if (exists("formVersion"))
{
    n.rows <- length(formTask)
    n.alternatives <- which(diff(as.numeric(formTask)) == 1)[1]
    alternative <- rep(1:n.alternatives, n.rows / n.alternatives)
    c(list(Version = formVersion, Task = formTask, Alternative = alternative), formAttributes)
}

if (is.hb && !is.null(formCovariates)){
    frml <- QFormula(~formCovariates)
    dat <- QDataFrame(formCovariates)
    if (get0("formClassesHB") == 1)
    {
        if (length(dat)){
            ## Convert back to original/non-syntactic names; DS-4580
            names(dat) <- attr(stats::terms(frml), "term.labels")
            frml <- flipData::AddFormulaBars(frml, dat)
        }else
            frml <- dat <- NULL
    }
}else
    frml <- dat <- NULL

if (formNumeric) {
    n.attributes <- 0
    while (get0(paste0("formNumericAttribute", n.attributes + 1)) != "")
        n.attributes <- n.attributes + 1
    cat.to.num.attr <- sapply(paste0("formNumericAttribute", seq(n.attributes)), get0)
    cat.to.num.attr <- sapply(cat.to.num.attr, ConvertCommaSeparatedStringToVector, simplify = FALSE)
    names(cat.to.num.attr) <- sapply(cat.to.num.attr, function (x) x[1])
    cat.to.num.attr <- sapply(cat.to.num.attr, function (x){
        result <- as.numeric(x[2:length(x)])
        if (any(is.na(result)))
            stop("The coding for the levels of ", x[1] ," needs to be numeric.")
        result
    }, simplify = FALSE)
} else {
    cat.to.num.attr <- NULL
}

choice.model <- FitChoiceModel(
    design = get0("formDesignObject"),
    experiment.data = get0("formExperiment"),
    cho.lines = get0("formChoVariable"),
    attribute.levels = get0("formEnteredLevels"),
    design.variables = design,
    tasks = get0("formTasks"),
    version = get0("formDataVersion"),
    choices = get0("formChoices"),
    dual.response.none.choices = get0("formDRN"),
    n.classes = get0("formClassesLC", ifnotfound = 1) * get0("formClassesHB", ifnotfound = 1),
    subset = if (all(QFilter)) NULL else QFilter,
    weights = QPopulationWeight,
    missing = formMissing,
    tasks.left.out = formCV,
    algorithm = if (is.hb) "HB-Stan" else "LCA",
    hb.iterations = get0("formIterations"),
    hb.chains = get0("formChains"),
    hb.max.tree.depth = get0("formMaxTreeDepth"),
    respondent.ids =  get0("formRespondentID"),
    cov.formula = frml, cov.data = dat,
    synthetic.priors = simulated.priors,
    synthetic.priors.from.design = get0("formPriorSource", ifnotfound = "") == "Use priors from design",
    synthetic.sample.size = get0("formSampleSize"),
    include.choice.parameters = get0("formASC"),
    hb.beta.draws.to.keep = get0("formSavedDraws"),
    categorical.to.numeric.attributes = cat.to.num.attr,
    hb.sigma.prior.rate = 1,
    hb.sigma.prior.shape = 1)