Calculation - Standard Deviation - Table(s)

From Q
Jump to navigation Jump to search
This page is currently under construction, or it refers to features which are under development and not yet available for use.
This page is under construction. Its contents are only visible to developers!


Calculate the standard deviation of the elements of a table, or calculate the element-wise standard deviation from several tables that have matching row and column labels.

Example

You can use this tool to compute the standard deviation of the numbers in a single table. For example, this table shows the average number of orders per month that people placed at a selection of restaurants.

The result of applying Standard Deviation is:

Note that by default, the standard deviation calculation will exclude any SUM or NET rows or columns in the table, and you can control which rows or columns to exclude in the options (see below).

If you have many tables that share common row and column labels, then you can use this tool to compute the standard deviation for each element that matches between those tables. For example, these tables show the average orders placed at a selection of restaurants. They have common row labels, although a mistake has led the rows of the last table to be out of order:

Applying Standard Deviation to these tables produces a new table that contains the standard deviation of each matching item among the four tables:

Note that the tool has correctly matched up all of the common rows, despite the rows in the last table being out of order.

Options

The output showing the results of the calculation has the following options available in the Object Inspector.

Input The tables to be used in the calculation.

Calculate for inputs with incomplete data If this option is checked, than any missing values in any of the inputs will be ignored in the calculation. If unchecked, then missing values are not removed before calculation and will propagate as missing values in the output.

Variance formula / Standard Deviation formula This option allows you to choose whether the Population or Sample formula is used to compute the Variance or Standard Deviation (see below).

Automatically match elements Only shown when there are multiple inputs to Input. This controls how and whether matching is done between the labels of the inputs. The default, "Yes - hide unmatched", will look for matching labels in the rows and columns of the inputs before proceeding with the calculation, and any rows/columns that are not contained in all the inputs will not be included in the output. See the Example. For a full description of the matching algorithm, see the Technical Details. "Yes - show unmatched" will also perform matching, but any unmatched rows (columns) will appear in the output as rows (columns) of all missing values. Selecting "No" for this option will cause any labels in the data to be ignored and not perform any matching. Selecting "Custom" will bring up two additional controls that allow for specifying the matching behavior for rows and columns separately.

Match rows Only shown if Automatically match elements is set to "Custom". Specifies the matching behavior when comparing row labels of the inputs. "Yes - show unmatched" and "Yes - hide unmatched" look for exact matches in the row labels in the inputs. "Fuzzy - show unmatched" and "Fuzzy - hide unmatched" perform fuzzy matching so that labels that differ only by a single character are considered to be a match.

Match columns Only shown if Automatically match elements is set to "Custom". The options are the same as Match rows, but control the matching between columns.

Rows to exclude Here you can type in row labels that should be excluded from the calculation.

Columns to exclude As above, but for columns.

Technical Details

The default option is to compute the variance or standard deviation using the sample variance formula rather than the population variance formula. You have the option to choose between these two formulas so that you can apply whichever is relevant to your calculation. If you are a Q user, or are comparing results to those obtained in Q, please note that this default is different to Q's Insert Ready-Made Formulas > Variance and Insert Ready-Made Formulas > Standard Deviation which always use the population formula. In both cases, the standard deviation is the square root of the variance, and the two variance formulas are:

Sample Variance

[math]\displaystyle{ \sigma_{sample}^2=\frac{\sum^n_{i=1}(x_i -\frac{\sum^n_{i=1}x_i }{n})^2}{n-1} }[/math]

Population Variance

[math]\displaystyle{ \sigma_{population}^2=\frac{\sum^n_{i=1}(x_i -\frac{\sum^n_{i=1}x_i }{n})^2}{n} }[/math]

When there are multiple inputs, inputs that contain only a single row (column) may be recycled to a matrix/table with the same number of rows (columns) as the other inputs. For example, if the supplied inputs are a table with three rows and two columns and another table with two rows and a single column, the single column will be expanded by rows into a table with three rows and two columns with each row identical to the original column.

When Automatically match elements is set to Yes - show unmatched or Yes - hide unmatched, both exact matches and fuzzy matches (as described above) are considered, and the order of elements may be permuted so that the names match. It also may transpose an input if, for example, the column names of one input match the row names of another input.

Code

const UNCHECK_NAMES = ["SUM", "NET", "TOTAL"];
const MULTI_QUESTIONTYPES = ["Text - Multi",
                             "Pick One - Multi", "Pick Any - Compact",
                             "Pick Any - Grid", "Number - Grid"];
const ALLOWED_R_CLASSES = ["NULL", "numeric", "integer", "logical", "factor", "matrix", "array", "data.frame", "table"];

function getInputNames(input, dim = 0){
    var input_names;
    var listbox_names = {};
    let input_type = input.type;
    if (input_type === "R Output") {
        try {
            var output_class = input.outputClasses;
            if (output_class.includes("array") || output_class.includes("matrix")) {
                var dimnames = input.data.getAttribute([], "dimnames");
                if (dim < dimnames.length && dimnames[dim] != null)
                    input_names = dimnames[dim];
                else
                    input_names = [];
            } else if (output_class.includes("data.frame")) {
                if (dim === 1)
                    input_names = input.data.getAttribute([], "names");
                else {
                    let row_names = input.data.getAttribute([], "row.names");
                    input_names = typeof(row_names[0]) === "string" ? row_names : [];
                }
            } else {
                input_names = dim === 0 ? input.data.getAttribute([], "names") : [];
            }
        }catch(e) {
            input_names = [];
        }
        listbox_names["names"] = input_names;
        listbox_names["initial"] = filterSingleNames(input_names);
    } else {
        let primary_type = input.primary.variableSetStructure;
        let has_multi_or_grid = primary_type.endsWith("Multi") || primary_type.endsWith("Grid");
        let has_columns = !!input.secondary || has_multi_or_grid || input.cellStatistics.length > 1;
        listbox_names = {names: ["foo"], initial: has_columns ? ["bar"] : []};//getTableDimNames(input, dim);
    }
    // DS-3147: replace newline chars/any whitespace with single space
    if (listbox_names["names"].length > 0) {
        Object.keys(listbox_names).map(key => {
            listbox_names[key] = listbox_names[key].map(str => typeof(str) === "string" ? str.replace(/\s+/g, " ") : str);
        });
    }
    return listbox_names;
}

function getTableDimNames(table, dim)
{
    let has_primary = table.primary != null;
    let table_output_names = {"names": [], "initial": []};
    if (has_primary)
    {
        let table_output = table.calculateOutput();
        let is_crosstab_or_multi_or_raw = table.secondary.type === "Question"
	|| MULTI_QUESTIONTYPES.includes(table.primary.questionType)
	|| table.secondary === "RAW DATA";
        if (table.primary.isBanner && table.secondary === "SUMMARY")
            is_crosstab_or_multi_or_raw = false;
        if (dim === 0)
        {
            let row_names = table_output.rowLabels;
            let row_spans = table_output.rowSpans;
            let row_indices = table_output.rowIndices(include_nets_sums = false);
            if (row_spans.length > 1)
            {
                table_output_names = flattenSpanNames(row_names, row_spans);
            } else
            {
                let initial = !!row_indices ? row_names.filter((name, i) => row_indices.includes(i)) : filterSingleNames(row_names);
                table_output_names = {"names": row_names, "initial": initial};
            }
        }
        if (dim === 1)
        {
            let n_columns = table_output.numberColumns;
            let col_spans = n_columns < 2 ? [] : table_output.columnSpans;
            let col_indices = table_output.columnIndices(include_nets_sums = false);
            let col_names = [];
            if (col_spans.length > 1)
            {
                col_names = table_output.columnLabels;
                table_output_names = flattenSpanNames(col_names, col_spans);
            } else
            {
                col_names = is_crosstab_or_multi_or_raw ? table_output.columnLabels : table_output.statistics;
                let initial = !!col_indices ? col_names.filter((name, i) => col_indices.includes(i)) : filterSingleNames(col_names);
                table_output_names = {"names": col_names, "initial": initial};
            }
        }
    }
    return table_output_names;
}

function filterSingleNames(names)
{
    return names.filter(n => !UNCHECK_NAMES.includes(n));
}

function flattenSpanNames(labels, span_names)
{
    let span_length = span_names.length;
    let span_labels = labels;
    let unselect_labels = span_names.filter(span => UNCHECK_NAMES.includes(span["label"]));
    let unselect_span_indices = [];
    if(unselect_labels.length > 0)
    {
        unselect_span_indices = unselect_labels.map(unselect => unselect["indices"]);
        unselect_span_indices = [].concat.apply([], unselect_span_indices);
        unselect_span_indices = uniq(unselect_span_indices);
    }
    let unselected_base_indices = labels.map((l, i) => UNCHECK_NAMES.includes(l) ? i : "").filter(Number);
    let unselected_indices = [].concat.apply([], [unselect_span_indices, unselected_base_indices]);
    unselected_indices = uniq(unselected_indices)
    labels.forEach((item, i) => {
        for (j = 0; j < span_length; j++)
        {
            let curr_span = span_names[j];
            if (curr_span["indices"].includes(i))
            {
                span_labels[i] = span_names[j]["label"] + " - " + span_labels[i];
            }
        }
    });
    let initial_values = span_labels.filter((label, i) => !unselected_indices.includes(i));
    return {"names": span_labels, "initial": initial_values};
}

function recursiveGetItemByGuid(group_item, guid) {
    var cur_sub_items = group_item.subItems;
    for (var j = 0; j < cur_sub_items.length; j++)
    {
        if (cur_sub_items[j].type == "ReportGroup") {
            var res = recursiveGetItemByGuid(cur_sub_items[j], guid);
            if (res != null)
                return(res)
        }
        else if (cur_sub_items[j].guid == guid)
            return(cur_sub_items[j]);
    }
    return null;
}

function removeErroredSelections(item)
{
    return item.type !== "Table" || item.primary != null;
}

function uniq(a) {
    var seen = {};
    return a.filter(function(item) {
        return seen.hasOwnProperty(item) ? false : (seen[item] = true);
    });
}

function addListBoxAfterProcessingNames(all_listbox_names, dim, guid)
{
    if (all_listbox_names.length === 1)
    {
        addListBox(all_listbox_names[0], dim, guid);
    } else
    {
        let keys = Object.keys(all_listbox_names[0]);
        let final_listbox_names = {};
        keys.forEach(key => {
            let names = all_listbox_names.map(names => names[key]);
            names = [].concat.apply([], names);
            final_listbox_names[key] = uniq(names);
        })
        addListBox(final_listbox_names, dim,guid);
    }
}

let user_inputs = form.dropBox({name: "formInputs",
                                label: "Input",
                                duplicates: true,
                                types: ["Table", "RItem: " + ALLOWED_R_CLASSES.join(", ")],
                                multi:true,
                                prompt: "Input data such as a table or R vector or matrix"});
user_inputs = user_inputs.getValues();
form.comboBox({name: 'formRemoveMissing',
               alternatives: ['Yes (show warning)', 'No', 'Yes'],
               label: 'Calculate for inputs with incomplete data',
               prompt: 'If set to \'Yes\', any missing values are removed from the data before the calculation occurs. ' +
                       'If set to \'No\', inputs with any missing values will be assigned a missing value. ' +
                       'Inputs whose values are entirely missing, will always be assigned a missing value ' +
                       'regardless of this setting.',
               default_value: 'Yes (show warning)'});
form.comboBox({name: "formCalculationFormula",
               label: "Standard Deviation formula",
               alternatives: ["Population", "Sample"],
               prompt: "Divides by n in the population formula or (n - 1) in the sample formula",
               default_value: "Sample"});

let row_names = [];
let col_names = [];

var inputs = [];

if (user_inputs.length !== 0)
{
    inputs = user_inputs.map(input => recursiveGetItemByGuid(project.report, input.guid));
    inputs = inputs.filter(removeErroredSelections);
}

if (inputs.length > 0)
{
    let inputs = user_inputs.map(input => recursiveGetItemByGuid(project.report, input.guid));
    row_names = inputs.map(input => getInputNames(input, 0));
    col_names = inputs.map(input => getInputNames(input, 1));
}

row_names = row_names.filter(item => item["names"].length > 0);
col_names = col_names.filter(item => item["names"].length > 0);
let add_row_listbox = row_names.length > 0;
let add_col_listbox = col_names.length > 0;

let add_matching_control = row_names.length > 1 || col_names.length > 1;

if (add_matching_control)
{
    var automatic_choice = form.comboBox({label: "Automatically match elements",
                                          name: "formMatchElements",
                                          alternatives : ["Yes - hide unmatched",
                                                          "Yes - show unmatched",
                                                          "No",
                                                          "Custom"],
                                          default_value: "Yes - hide unmatched",
                                          prompt: "Automatically determine elements to match based off the input row and column labels"});
    automatic_choice = automatic_choice.getValue();
    if (automatic_choice === "Custom")
    {
        let has_both_row_names = row_names.every(item => item["names"].length > 0);
        let has_both_col_names = col_names.every(item => item["names"].length > 0);
        form.comboBox({name: "formMatchRows",
                       label: "Match rows",
                       alternatives: ["Yes - hide unmatched", "Yes - show unmatched", "Fuzzy - hide unmatched", "Fuzzy - show unmatched", "No"],
                       default_value: has_both_row_names ? "Yes - hide unmatched" : "No"});
        form.comboBox({name: "formMatchColumns",
                       label: "Match columns",
                       alternatives: ["Yes - hide unmatched", "Yes - show unmatched", "Fuzzy - hide unmatched", "Fuzzy - show unmatched", "No"],
                       default_value: has_both_col_names ? "Yes - hide unmatched" : "No"});
    }
}

form.textBox({name: "formIncludeRows", label: "Rows to exclude", prompt: "Select the row labels to be excluded in the output table.", default_value: "NET; SUM", required: false});
form.textBox({name: "formIncludeColumns", label: "Columns to exclude", prompt: "Select the columns labels to be excluded in the output table.", default_value: "NET; SUM", required: false});

form.setHeading("Standard Deviation");
library(verbs)

match.elements <- get0("formMatchElements", ifnotfound = c(rows = "No", columns = "No"))
if (length(match.elements) == 1L && match.elements == "Custom")
    match.elements <- c(rows = formMatchRows, columns = formMatchColumns)

removal.choices <- list(formIncludeRows, formIncludeColumns)
categories.to.remove <- ParseCategoriesToRemove(removal.choices, formInputs)
remove.rows    <- categories.to.remove[[1L]]
remove.columns <- categories.to.remove[[2L]]
remove.missing <- startsWith(formRemoveMissing, "Yes")
warn <- if (endsWith(formRemoveMissing, "(show warning)")) TRUE else "MuffleMissingValueWarning"

sd.calc <- StandardDeviation(QInputs(formInputs),
                             sample = formCalculationFormula == "Sample",
                             remove.missing = remove.missing,
                             remove.rows = remove.rows,
                             remove.columns = remove.columns,
                             match.elements = match.elements,
                             warn = warn)