Create New Variables - Binary

From Q
Jump to: navigation, search

This QScripttransformation creates a binary Pick Any or Pick Any - GridBinary - Multi or Binary - Grid from user selected questions that contain either NumberNumeric or Categorical variables. It does this by reassigning which output values are counted with positive values counted for NumberNumeric variables. For Categorical variables, the top half of the labels are counted and it will search for positive labels to count in the top half. For example, if the categorical labels are "Yes" and "No", the "Yes" label will count. In larger categorical label structures it will count the positive statements in the top half. For example, in a 5 point scale of "Strongly Disagree, Disagree, Neither, Agree and Strongly Agree" the binary transformed question will count "Agree" or "Strongly Agree" and not count the other labels. More details of the input question types and output types are given in the Technical details section.

Example

In the example below the Pick One variables with ordinal scalesOrdinal variables are combined and transformed into a Pick AnyBinary - Multi. Note that the labels Strongly Agree and Agree are counted for the binary transformed output variable. E.g. For summary output shown below, the variable Like the look of phones has the percentages of 24% and 38% for Strongly agree and Agree respectively that combine into a single binary value of 62% in the output variable.

Binary Transform Example in QBinary Transform Example in Displayr

Technical details

You are required to select at least one question of the types Number, Number - Multi, Number - Grid, Pick One or Pick One - Multi. The Pick One question can have either nominal or ordinal scales.Numeric, Numeric - Multi, Numeric - Grid, Nominal, Nominal - Multi, Ordinal, Ordinal - Multi or Binary - Grid. However, if multiple questions are selected in the transform, all of its variables must be the same type (either all Numeric or all Categorical). If Categorical, then they must have the same label structure. To create a new Categorical Binary output variable from a Numeric set of questions the following options are possible:

  1. Selecting one or more Number questions to produce a Pick Any output.
  2. Selecting one Number - Multi question to produce a Pick Any output.
  3. Selecting two or more Number - Multi questions to produce a Pick Any - Grid output.
  4. Selecting one and only one Number - Grid question to produce a Pick Any - Grid output.

To create a Categorical Binary output variable from a set of Categorical questions, the following options are possible (again with Displayr options shown in parantheses):

  1. Selecting one or more Pick One questions with nominal or ordinal scales, to produce a single Pick Any output.*
  2. Selecting one Pick Any question to produce a single Pick Any output.
  3. Selecting two or more Pick Any questions to produce a single Pick Any - Grid output.*
  1. Selecting one or more Numeric questions to produce a Binary - Multi single question.
  2. Selecting one Numeric - Multi question to produce a single Binary - Multi question.
  3. Selecting two or more Numeric - Multi questions to produce a single Binary - Grid question.
  4. Selecting one and only one Numeric - Grid question to produce a single Binary - Grid question.

To create a Categorical Binary output variable from a set of Categorical questions, the following options are possible (again with Displayr options shown in parantheses):

  1. Selecting one or more Nominal or Ordinal questions to produce a Binary - Multi.*
  2. Selecting one Nominal - Multi or Ordinal - Multi to produce a Binary - Multi.
  3. Selecting two or more Nominal - Multi or Ordinal - Multi questions to produce a single Binary - Grid.*

*:If multiple categorical input questions are selected, then they must have the same label structure, with the same number of labels and in the same order.

How to apply this QScript

  • Start typing the name of the QScript into the Search features and data box in the top right of the Q window.
  • Click on the QScript when it appears in the QScripts and Rules section of the search results.

OR

  • Select Automate > Browse Online Library.
  • Select this QScript from the list.

Customizing the QScript

This QScript is written in JavaScript and can be customized by copying and modifying the JavaScript.

Customizing QScripts in Q4.11 and more recent versions

  • Start typing the name of the QScript into the Search features and data box in the top right of the Q window.
  • Hover your mouse over the QScript when it appears in the QScripts and Rules section of the search results.
  • Press Edit a Copy (bottom-left corner of the preview).
  • Modify the JavaScript (see QScripts for more detail on this).
  • Either:
    • Run the QScript, by pressing the blue triangle button.
    • Save the QScript and run it at a later time, using Automate > Run QScript (Macro) from File.

Customizing QScripts in older versions

  • Copy the JavaScript shown on this page.
  • Create a new text file, giving it a file extension of .QScript. See here for more information about how to do this.
  • Modify the JavaScript (see QScripts for more detail on this).
  • Run the file using Automate > Run QScript (Macro) from File.

JavaScript

includeWeb("QScript Utility Functions");
includeWeb("QScript Selection Functions");
includeWeb("QScript Functions to Generate Outputs");
includeWeb("QScript R Output Functions");

function checkDuplicateVariable(variable_name) {
    let all_variables = project.dataFiles.map(d => d.variables).flat();
    let variables = all_variables.filter(v => {
        return v.name === variable_name || v.label === variable_name;
    })
    return variables.length !== 1;
}

function variableToLabels(variable) {
	var question = variable.question;
	var attributes = question.valueAttributes;
	var values = variable.uniqueValues;
	var relevantValues = values.filter(function(x) {
		return !isDontKnow(attributes.getLabel(x)) && !isNaN(attributes.getValue(x)) && !attributes.getIsMissingData(x);
	});
	return getLabelsForValues(question, relevantValues);
}

function isInArray(value, arr) {
  return arr.indexOf(value) > -1;
}

function onlyUnique(value, index, self) {
	return self.indexOf(value) === index;
}

function recodeCountsInArray(question_or_variable, one_array, zero_array) {
	var values = question_or_variable.uniqueValues;
	var num_vals = values.length;
	var attributes = question_or_variable.valueAttributes;
	for(let j = 0; j < num_vals; j++) {
		if(isInArray(attributes.getValue(values[j]), one_array)) {
			attributes.setCountThisValue(values[j], true);
		} else if(isInArray(attributes.getValue(values[j]), zero_array)) {
			attributes.setCountThisValue(values[j], false);
		}
	}
}

function getVariableOrQuestionLabel(variable) {
	if(/- Multi|- Grid/.test(variable.question.variableSetStructure)) {
		return variable.question.name;
	} else {
		return variable.label;
	}
}

function variablesToBinary(data_file, variables, is_displayr, questions) {
	var suitable_for_grid = suitableForGrid(questions);
	var make_grid = (questions.length > 1 & suitable_for_grid) || (questions.length === 1 & suitable_for_grid & questions[0].questionType === "Number - Grid");
	if(variables[0].variableType === "Numeric") {
		var variable_labels = variables.map(v => v.label);
		var duplicate_variable_labels = variable_labels.some(x => {
			return variable_labels.indexOf(x) !== variable_labels.lastIndexOf(x);
		});
		if(make_grid || duplicate_variable_labels) {
			variable_labels = variables.map((v, v_ind) => v.question.name + " - " + variable_labels[v_ind]);
		}
		var base_question_name = preventDuplicateQuestionName(data_file, variable_labels.filter(onlyUnique).join(" + "));
		if(variables.length === 1) {
			var r_variable_name = "x";
		} else {
			var r_variable_name = "variable.set";
		}
		var last_variable = getLastVariable(variables);
		var new_var_name = base_question_name.replace(/[^a-zA-Z0-9_@\#\$\\]/g, '_').toLowerCase	() + "_";
		new_var_name = randomVariableName(16, new_var_name);
		var variable_names = variables.map(v => {
			return checkDuplicateVariable(v.name) ? generateDisambiguatedVariableName(v) : stringToRName(v.name);
		});
		// Simple assignment if single variable, otherwise data.frame
		if(variables.length === 1) {
			var expression = r_variable_name + ' <- ' + variable_names + '\n';
		} else {
			var df_assignments = [];
			for (i = 0; i < variables.length; i += 1) {
				df_assignments[i] = stringToRName(variable_labels[i]) + " = " +  variable_names[i];
			}
			var def_prefix = r_variable_name + ' <- data.frame(';
			var white_spaces = " ".repeat(def_prefix.length);
			var expression = def_prefix + df_assignments.join(",\n" + white_spaces) + ',\n' + white_spaces + 'check.names = FALSE)\n';
		}
		expression += r_variable_name + " > 0\n" +
				  "# If you wish to change the cut-off for the count (from > 0), modify the code above\n" + 
				  "# E.g. To count values larger than 50, change > 0 to > 50\n" +
				  "# E.g. To count values smaller than or equal to 25, change > 0 to <= 25\n";
		try {
			var question = data_file.newRQuestion(expression, base_question_name, new_var_name, last_variable);
			question.questionType = make_grid ? "Pick Any - Grid" : "Pick Any";
			question.name = preventDuplicateQuestionName(data_file, variables.map(x => getVariableOrQuestionLabel(x)).filter(onlyUnique).join(" + ") + " > 0");
			question.needsCheckValuesToCount = false;
		} catch (e) {
			var structure_name = getVariableNaming(is_displayr);
			log("The binary transform could not be computed for this " + structure_name + ": " + e);
			return false;
		}
	} else {
		var question_name = variables.map(x => getVariableOrQuestionLabel(x)).filter(onlyUnique).join(" + ");
		var new_variables = [];
		if(make_grid) {
			questions.forEach(function (q) {
				try {
					q.variables.forEach(function (v) {
						var new_var = v.duplicate();
						new_var.label =  q.name + " - " + v.label;
						new_var.variableType = "Categorical";
						new_variables.push(new_var);
					 });
				 } catch (e) {
					 q.variables.forEach(function (v) {
						var q_below = data_file.getVariableByName(v.name);
						var new_linked = preventDuplicateVariableName(data_file, v.name) 
						var new_var = data_file.newJavaScriptVariable(v.name, false, new_linked, v.name, q_below);
						new_var.label =  q.name + " - " + v.label;
						new_var.variableType = "Categorical";
						new_variables.push(new_var);
					});
				} 
			});
		} else {
			for(let j = 0; j < variables.length; j++){
				new_variables[j] = variables[j].duplicate();
				new_variables[j].variableType = "Categorical";
			}
		}
		var output_type = make_grid ? "Pick Any - Grid" : "Pick Any";
		var new_question_name = preventDuplicateQuestionName(data_file, question_name);
		var question = data_file.setQuestion(new_question_name, output_type, new_variables);
		var values = new_variables[0].uniqueValues;
		var attributes = new_variables[0].valueAttributes;
		var k = values.filter(function (x) {
			return !isDontKnow(attributes.getLabel(x)) && !isNaN(attributes.getValue(x)) && !attributes.getIsMissingData(x);
		}).length;
		var top_k = Math.floor(k/2);
		var one_array = getTopOrBottomKNonMissingValues(question, top_k, false, {excludeDK:true});
		var one_labs = getLabelsForValues(question, one_array);
		if(one_labs.some(x => /disagree|dislike|hate|dont|don't|^no$|^not|unhappy|unsatisfied|dissatisfied/.test(x.toLowerCase()))){
			one_array = getTopOrBottomKNonMissingValues(question, top_k, true, {excludeDK:true});
			one_labs = getLabelsForValues(question, one_array);
		}
		var zero_array = values.filter(function (x) {
			return isDontKnow(attributes.getLabel(x)) || (one_array.indexOf(x) < 0 && !isNaN(attributes.getValue(x)) && !attributes.getIsMissingData(x));  
		});
		var trailing_name = one_labs[0];
		for(var j = 1; j < one_labs.length; j++){
			trailing_name += " + " + one_labs[j];
		}
		recodeCountsInArray(question, one_array, zero_array);
		question.name = preventDuplicateQuestionName(data_file, question_name + " : " + trailing_name);
	}
	if(!is_displayr){
		var new_name = prompt("Enter a name for the new " + question.questionType + " question : ", question.name);
		if(new_name !== question.name) {
			question.name = new_name;
		}
		var top_group_name = "Binary transformed question";
		var new_group = generateGroupOfSummaryTables(top_group_name, [question]);
		// More recent Q versions can point the user to the new items.
		if (fileFormatVersion() > 8.65) {
			project.report.setSelectedRaw([new_group.subItems[0]]);
		} else {
			log(question.questionType + " question named " + question.name + " has been added to the dataset " + data_file.name);
		}
	}
}

printTypes = function(x, conjunction) {
	var comma_separated = x.slice(0, x.length - 1);
	if(typeof(conjunction) === "undefined" || !conjunction) {
		conjunction = " or ";
	}
	return comma_separated.join(", ") + conjunction + x[x.length - 1];
}

// Check all array elements equal
function arraysEqual(array_1, array_2) {
	var are_equal = true;
	array_1.forEach(function (label, ind) {
		if (label != array_2[ind])
			are_equal = false;
	});
	return are_equal;
}
// check the Variable Set 
checkStructureAndLabels = function(questions, structure_name, is_displayr) {
	// Check all same type
	var variable_set_structures = questions.map(x => x.variableSetStructure);
	// Labels unimportant for Numeric but need to be checked for Categorical
	if(!/^Numeric/.test(variable_set_structures[0])) {
		// Check labels
		var all_variables = getVariablesFromQuestions(questions);
		var all_labels = all_variables.map(x => variableToLabels(x));
		
		// Check lengths
		if (!all_labels.every(x => x.length === all_labels[0].length)) {
			userFeedback(all_variables, all_labels, "length", getVariableNaming(is_displayr));
			return false;
		}
		// Check equal elements in same order
		if (!all_labels.every(function (label_array) { return arraysEqual(label_array, all_labels[0]); }) ) {
			userFeedback(all_variables, all_labels, "not all equal", getVariableNaming(is_displayr));
			return false;
		}
	}
	return true;
}

userFeedback = function(all_variables, variable_labels, error_type, structure_name) {
	var idx = [];
	if(error_type === "length") {
		variable_labels.some((x, x_index) => {
			if(x.length !== variable_labels[0].length){
				idx = x_index;
				return true;
			}
		});
		var pre_message = "The length of the labels should be the same for all selected variables. " + 
			"However, the selected variables don't have the same label lengths. ";
		var post_message = " If a label was miscoded consider excluding it from analysis before running the binary transform again.";
	} else {
		variable_labels.some((x, x_index) => {
			if(!arraysEqual(x, variable_labels[0])){
				idx = x_index;
				return true;
			}
		});
		var pre_message = "The labels from these " + structure_name + " do not match, and so the questions cannot be combined. ";
		var post_message = " Note that the order of the labels need to match for all selected questions the transform to occur.";
	}
	log(pre_message + "For example, the variable '" + getVariableOrQuestionLabel(all_variables[0]) + "' has " + variable_labels[0].length + 
			" labels :" + printTypes(variable_labels[0], " and ") + " while the variable '" + getVariableOrQuestionLabel(all_variables[idx]) +
			"' has " + variable_labels[idx].length + " labels :" + printTypes(variable_labels[idx], " and ") + "." +
		post_message);
}

getVariableNaming = function(is_displayr) {
	return is_displayr ? "variable sets" : "questions";
}

differentTypeFeedback = function(variable_feedback, is_displayr, mixed_message) {
    var structure_name = getVariableNaming(is_displayr);
	var transformation_name = is_displayr ? "transformation" : "QScript";
	var first_var = variable_feedback[0];
	var remaining_vars = variable_feedback.filter(x => x !== first_var).filter(onlyUnique);
	log("The selected " + structure_name + " include " + printTypes([first_var, remaining_vars], " and ")  + 
		 ". These cannot be combined into a Binary output " + structure_name.slice(0, -1) + " with this " + transformation_name + mixed_message);
}

suitableForGrid = function(questions) {
	// Check each question has the same number of variables
	var is_suitable = true;
	var question_variable_names = questions.map(q => q.variables.map(v => v.label));
	if (!question_variable_names.every(x => x.length === question_variable_names[0].length)) {
		return false;
	}
	// Check variables have same names and in the same order
	if (!question_variable_names.every(labels => arraysEqual(labels, question_variable_names[0]))) {
		return false;
	}
	return is_suitable;
}

if (!main())
	log("QScript cancelled.");
else
	conditionallyEmptyLog("QScript finished.");

function main() {
	// Check datafile exists
	if (!requireDataFile()) {
		return false;
	}
	var is_displayr = (!!Q.isOnTheWeb && Q.isOnTheWeb());
	// If Q, get the user to select one data file when there is more than one.
	if (!is_displayr){
 		if(fileFormatVersion() < 13.05) {
 			log("This QScript is not supported in this version of Q. Please use release version 5.4.1.0 or later to use this QScript.");
 			return false;
 		}
		var data_file = requestOneDataFileFromProject();
		var allowed_types = ["Pick One", "Pick One - Multi", "Number", "Number - Multi", "Number - Grid"];
		var user_specified_type = selectOne('Select which input variable types you wish to transform to Binary', allowed_types);
		var candidate_questions = getAllQuestionsByTypes([data_file], [allowed_types[user_specified_type]]);

		if (candidate_questions.length === 0) {
			log("No " + allowed_types[user_specified_type] + " questions found in the data file.");
			return false;
		}
		var selected_questions = selectManyQuestions("Select questions to transform to binary:", candidate_questions, true).questions;
	} else {
		var allowed_types = ["Numeric - Grid", "Nominal - Multi", "Numeric - Multi", "Ordinal - Multi", "Nominal", "Numeric", "Ordinal"];
		var selected_questions = project.report.selectedQuestions();
		// Check if user hasn't selected anything
		if (selected_questions.length == 0) {
			log("To compute a Binary Transform, you must select at least one variable set from a dataset with one of the following structures: " +
				printTypes(allowed_types) + ". Note that the selected variables should have the same structure.");
			return false;
		}
		var sorted_selection = splitArrayIntoApplicableAndNotApplicable(selected_questions, function (q) { return allowed_types.indexOf(q.variableSetStructure) != -1 && !q.isBanner; });
		selected_questions = sorted_selection.applicable;
	    var mixed_message = ". The selected variable sets should also have the same structure. E.g. all Numeric variables or all Categorical " +
				"(mixing Ordinal and Nominal categorical variables is permissible for this transform so long as the label structure is the same).";
		if (sorted_selection.notApplicable.length != 0){
			log("The selected variable sets must all be of type " + printTypes(allowed_types) + mixed_message);
			return false;
		}
		var data_file = selected_questions[0].dataFile;
		// Make sure all questions are from the same data set
		if (!selected_questions.map(function (q) { return q.dataFile.name; }).every(function (type) { return type == data_file.name; })) {
			log("Selected questions or variables are from different Data Sets and cannot be combined.\n" + 
				" Please select questions or variables from a single Data Set.");
			return false;
		}
	}
	// Grab all base variables from all selected items
	var all_variables = getVariablesFromQuestions(selected_questions);
	var variable_set_structures = selected_questions.map(x => x.variableSetStructure);
	var variable_feedback = is_displayr ? variable_set_structures : selected_questions.map(x => x.questionType);
	// If only one question selected, do the Binary transform.
	if (selected_questions.length === 1){
		variablesToBinary(data_file, all_variables, is_displayr, selected_questions);
		return true;
	} else if(variable_set_structures.every(x => /Numeric/.test(x)) || variable_set_structures.every(x => /(Nominal|Ordinal)/.test(x))) {
		if(checkStructureAndLabels(selected_questions)){
			variablesToBinary(data_file, all_variables, is_displayr, selected_questions);
			return true;
		} else {
			return false;
		}
	} else {
		differentTypeFeedback(variable_feedback, is_displayr, mixed_message);
		return false;
	}
}


See also