Calculation - Count Each Column - Table(s)

From Q
Jump to navigation Jump to search
This page is currently under construction, or it refers to features which are under development and not yet available for use.
This page is under construction. Its contents are only visible to developers!

This tool is used to count the number of cells within each column of a table whose values match certain conditions:

  • Are greater than, or less than, a certain number.
  • Are greater than or equal to, or less than or equal to, a certain number.
  • Are within a specified range of numbers.
  • Are missing values (NA).
  • Are identical to a specific category label.
  • Satisfy a combination of conditions or ranges such as those listed above.

Examples

This table shows Awareness results for a number of brands over a number of quarters:

Count Each Column can be used to count the number of brands that acheived a score of more than 80% eache quarter. The result is a new table with one result for each brand:

You can choose which rows and columns are included by modifying the options (see below).

Options

Input The tables to be used in the calculation.

Values to count Enter ranges, single values, or inequalities to include in this calculation. Allowed options include:

  • Single values, like 10 - include values exactly equal to a single value, e.g. values which are equal to 10.
  • Sequences of values, like 1, 2, 3.5 - include values that are exactly equal to any of the entered values.
  • Ranges of values, like [1-10] - include all values that fall within a given range. For example, the range listed to the left will match values between 1 and 10, inclusive.
  • Inequalities, like >5.3 - include all values which satisify the inequality For example, the inequality to the left will match all values greater than 5.3. This works for greater than (>), greater than or equal to (>=), less than (<), and less than or equal to (<=).
  • Missing values, NA - match all values which are missing.
  • Infinities, Inf, -Inf - include all values which are infinity or negative infinity.

Categorical Labels When one or more of your inputs is a variable with categories, you can type in the labels of the categories that you wish to match.

Calculate for inputs with incomplete data If this option is checked, than any missing values in any of the inputs will be ignored in the calculation. If unchecked, then missing values are not removed before calculation and will propagate as missing values in the output.

Automatically match elements Only shown when there are multiple inputs to Input. This controls how and whether matching is done between the labels of the inputs. The default, "Yes - hide unmatched", will look for matching labels in the rows and columns of the inputs before proceeding with the calculation, and any rows/columns that are not contained in all the inputs will not be included in the output. See the Example. For a full description of the matching algorithm, see the Technical Details. "Yes - show unmatched" will also perform matching, but any unmatched rows (columns) will appear in the output as rows (columns) of all missing values. Selecting "No" for this option will cause any labels in the data to be ignored and not perform any matching. Selecting "Custom" will bring up two additional controls that allow for specifying the matching behavior for rows and columns separately.

Match rows Only shown if Automatically match elements is set to "Custom". Specifies the matching behavior when comparing row labels of the inputs. "Yes - show unmatched" and "Yes - hide unmatched" look for exact matches in the row labels in the inputs. "Fuzzy - show unmatched" and "Fuzzy - hide unmatched" perform fuzzy matching so that labels that differ only by a single character are considered to be a match.

Match columns Only shown if Automatically match elements is set to "Custom". The options are the same as Match rows, but control the matching between columns.

Rows to exclude Here you can type in row labels that should be excluded from the calculation.

Columns to exclude As above, but for columns.

Technical Details

When there are multiple inputs, inputs that contain only a single row (column) may be recycled to a matrix/table with the same number of rows (columns) as the other inputs. For example, if the supplied inputs are a table with three rows and two columns and another table with two rows and a single column, the single column will be expanded by rows into a table with three rows and two columns with each row identical to the original column.

When Automatically match elements is set to Yes - show unmatched or Yes - hide unmatched, both exact matches and fuzzy matches (as described above) are considered, and the order of elements may be permuted so that the names match. It also may transpose an input if, for example, the column names of one input match the row names of another input.

Code

const UNCHECK_NAMES = ["SUM", "NET", "TOTAL"];
const MULTI_QUESTIONTYPES = ["Text - Multi",
                             "Pick One - Multi", "Pick Any - Compact",
                             "Pick Any - Grid", "Number - Grid"];
const ALLOWED_R_CLASSES = ["NULL", "numeric", "integer", "logical", "factor", "matrix", "array", "data.frame", "table"];

function addListBox(listbox_names, dim) {
    let dim_str = dim === 0 ? "Row" : "Column";
    return form.listBox({name: "formInclude" + dim_str + "s",
                         label: dim_str + "s to include",
                         alternatives: listbox_names["names"], names: listbox_names["names"],
                         required: false,
                         prompt: "Select the " + dim_str.toLowerCase() + " labels to be included in the output table.",
                         initialValues: listbox_names["initial"], multiSelection: true,
                         nItemsVisible: 5});
}

function getInputNames(input, dim = 0){
    var input_names;
    var listbox_names = {};
    let input_type = input.type;
    if (input_type === "R Output") {
        try {
            var output_class = input.outputClasses;
            if (output_class.includes("array") || output_class.includes("matrix")) {
                var dimnames = input.data.getAttribute([], "dimnames");
                if (dim < dimnames.length && dimnames[dim] != null)
                    input_names = dimnames[dim];
                else
                    input_names = [];
            } else if (output_class.includes("data.frame")) {
                if (dim === 1)
                    input_names = input.data.getAttribute([], "names");
                else {
                    let row_names = input.data.getAttribute([], "row.names");
                    input_names = typeof(row_names[0]) === "string" ? row_names : [];
                }
            } else {
                input_names = dim === 0 ? input.data.getAttribute([], "names") : [];
            }
        }catch(e) {
            input_names = [];
        }
        listbox_names["names"] = input_names;
        listbox_names["initial"] = filterSingleNames(input_names);
    } else {
        let primary_type = input.primary.variableSetStructure;
        let has_multi_or_grid = primary_type.endsWith("Multi") || primary_type.endsWith("Grid");
        let has_columns = !!input.secondary || has_multi_or_grid || input.cellStatistics.length > 1;
        listbox_names = {names: ["foo"], initial: has_columns ? ["bar"] : []};//getTableDimNames(input, dim);
    }
    // DS-3147: replace newline chars/any whitespace with single space
    if (listbox_names["names"].length > 0) {
        Object.keys(listbox_names).map(key => {
            listbox_names[key] = listbox_names[key].map(str => typeof(str) === "string" ? str.replace(/\s+/g, " ") : str);
        });
    }
    return listbox_names;
}

function getTableDimNames(table, dim)
{
    let has_primary = table.primary != null;
    let table_output_names = {"names": [], "initial": []};
    if (has_primary)
    {
        let table_output = table.calculateOutput();
        let is_crosstab_or_multi_or_raw = table.secondary.type === "Question"
	|| MULTI_QUESTIONTYPES.includes(table.primary.questionType)
	|| table.secondary === "RAW DATA";
        if (table.primary.isBanner && table.secondary === "SUMMARY")
            is_crosstab_or_multi_or_raw = false;
        if (dim === 0)
        {
            let row_names = table_output.rowLabels;
            let row_spans = table_output.rowSpans;
            let row_indices = table_output.rowIndices(include_nets_sums = false);
            if (row_spans.length > 1)
            {
                table_output_names = flattenSpanNames(row_names, row_spans);
            } else
            {
                let initial = !!row_indices ? row_names.filter((name, i) => row_indices.includes(i)) : filterSingleNames(row_names);
                table_output_names = {"names": row_names, "initial": initial};
            }
        }
        if (dim === 1)
        {
            let n_columns = table_output.numberColumns;
            let col_spans = n_columns < 2 ? [] : table_output.columnSpans;
            let col_indices = table_output.columnIndices(include_nets_sums = false);
            let col_names = [];
            if (col_spans.length > 1)
            {
                col_names = table_output.columnLabels;
                table_output_names = flattenSpanNames(col_names, col_spans);
            } else
            {
                col_names = is_crosstab_or_multi_or_raw ? table_output.columnLabels : table_output.statistics;
                let initial = !!col_indices ? col_names.filter((name, i) => col_indices.includes(i)) : filterSingleNames(col_names);
                table_output_names = {"names": col_names, "initial": initial};
            }
        }
    }
    return table_output_names;
}

function filterSingleNames(names)
{
    return names.filter(n => !UNCHECK_NAMES.includes(n));
}

function flattenSpanNames(labels, span_names)
{
    let span_length = span_names.length;
    let span_labels = labels;
    let unselect_labels = span_names.filter(span => UNCHECK_NAMES.includes(span["label"]));
    let unselect_span_indices = [];
    if(unselect_labels.length > 0)
    {
        unselect_span_indices = unselect_labels.map(unselect => unselect["indices"]);
        unselect_span_indices = [].concat.apply([], unselect_span_indices);
        unselect_span_indices = uniq(unselect_span_indices);
    }
    let unselected_base_indices = labels.map((l, i) => UNCHECK_NAMES.includes(l) ? i : "").filter(Number);
    let unselected_indices = [].concat.apply([], [unselect_span_indices, unselected_base_indices]);
    unselected_indices = uniq(unselected_indices)
    labels.forEach((item, i) => {
        for (j = 0; j < span_length; j++)
        {
            let curr_span = span_names[j];
            if (curr_span["indices"].includes(i))
            {
                span_labels[i] = span_names[j]["label"] + " - " + span_labels[i];
            }
        }
    });
    let initial_values = span_labels.filter((label, i) => !unselected_indices.includes(i));
    return {"names": span_labels, "initial": initial_values};
}

function recursiveGetItemByGuid(group_item, guid) {
    var cur_sub_items = group_item.subItems;
    for (var j = 0; j < cur_sub_items.length; j++)
    {
        if (cur_sub_items[j].type == "ReportGroup") {
            var res = recursiveGetItemByGuid(cur_sub_items[j], guid);
            if (res != null)
                return(res)
        }
        else if (cur_sub_items[j].guid == guid)
            return(cur_sub_items[j]);
    }
    return null;
}

function determineRange(numeric_vector)
{
    numeric_vector = numeric_vector.filter(val => !isNaN(val) && val !== -2147483648);
    if (numeric_vector.length === 0)
    {
        return as_string ? "NA" : NaN;
    }
    let min = numeric_vector.reduce((x, y) => Math.min(x, y));
    let max = numeric_vector.reduce((x, y) => Math.max(x, y));
    min = Math.floor(min);
    max = Math.ceil(max);
    return [min, max];
}

function convertToOpenInterval(range)
{
    let infinite_values = range.map(val => !isFinite(val));
    let n_infinite = infinite_values.reduce((x, i) => x + i, 0);
    if (n_infinite === 2)
    {
        let signs = range.map(val => Math.sign(val));
        if (signs[0] === signs[1])
            return signs[0] > 0 ? "Inf" : "-Inf";
        return ">=0";
    }
    if (infinite_values[0])
        return "<=" + range[1];
    return ">=" + range[0];
}

function convertNumToRString(numeric_value)
{
    if (isFinite(numeric_value))
        return numeric_value.toString();
    return numeric_value > 0 ? "Inf" : "-Inf";
}

function extractMinAndMaxTable(table)
{
    return [0, Infinity];
}

function isMissing(value, only_nan = true)
{
    return only_nan ? isNaN(value) : isNaN(value) || value === -2147483648;
}

function extractCategoricalOrNumericValues(selection, variable_name = []) {
    let selection_data;
    if (selection.type === "Table")
        return {"values":extractMinAndMaxTable(selection), "type": "numeric"};
    let r_class;
    switch (selection.type)
    {
        case "R Output":
            selection_data = selection.data;
            r_class = selection.outputClasses;
            variable_name = [];
            break;
        case "QScriptROutputTranslator":
            selection_data = selection;
            if (variable_name.length === 0)
                return null;
            try
            {
                r_class = selection.getAttribute(variable_name, "class");
            } catch(e)
            {
                let data_values = selection_data.get(variable_name).flat();
                if (typeof data_values[0] === "string")
                    return null;
                return {"values": determineRange(data_values), "type": "numeric"};
            }
            break;
    }
    if (r_class.includes("NULL"))
        return null;
    if (r_class.includes("data.frame"))
    {
        if (selection.type !== "R Output")
            return null;
        let all_variable_names = selection_data.getAttribute(variable_name, "names");
        return all_variable_names.map(variable_name => extractCategoricalOrNumericValues(selection_data, variable_name));
    }
    if (r_class.includes("factor"))
    {
        let levels = selection_data.getAttribute(variable_name, "levels");
        if (typeof levels[0] === "number")
            levels = levels.map(toString);
        let values = selection_data.get(variable_name).flat();
        let has_missing = values.some(value => isMissing(value, only_nan = false));
        return {"values": levels, "has_missing": has_missing, "type": "categorical"};
    }
    r_values = selection.data.get([]).flat();
    if (r_class.includes("matrix") || r_class.includes("array"))
    {
        r_values = typeof(r_values[0]) === "string" ? "" : determineRange(r_values);
        return {"values": r_values, "type": "numeric"};
    }
    if (r_class.includes("numeric") || r_class.includes("integer") || r_class.includes("table"))
    {
        r_values = determineRange(r_values);
        return {"values": r_values, "type": "numeric"};
    }
    return null;
}

let user_input = form.dropBox({name: "formInput", label: "Input",
                               types: ["table", "RItem:" + ALLOWED_R_CLASSES.join(", ")],
                               prompt: "Input data with rows to sum, e.g. a Table, R matrix"}).getValue();
let row_names = {"names": [], "initial": []};
let col_names = {"names": [], "initial": []};

if (!!user_input)
{
    var input = recursiveGetItemByGuid(project.report, user_input.guid);
    row_names = getInputNames(input, 0);
    col_names = getInputNames(input, 1);
    var input_defaults = extractCategoricalOrNumericValues(input);
    let numeric_defaults = [];
    let categorical_defaults = [];
    let has_missing_data = false;
    let default_structure = Object.prototype.toString.call(input_defaults);
    if (default_structure === "[object Array]")
    {
        input_defaults.forEach(item => {
            if (item == null)
                return null;
            if (item["type"] === "numeric")
            {
                numeric_defaults.push(item["values"]);
            } else
            {
                if (item["has_missing"])
                {
                    has_missing_data = true;
                }
                categorical_defaults.push(item["values"]);
            }
        });
    } else if (default_structure !== "[object Null]")
    {
        switch (input_defaults["type"])
        {
            case "categorical":
                categorical_defaults.push(input_defaults["values"]);
                if (input_defaults["has_missing"])
                    has_missing_data = true;
                break;
            case "numeric":
                numeric_defaults.push(input_defaults["values"]);
                break;
        }

    }
    var labels_selected;
    if (categorical_defaults.length > 0)
    {
        // form.group("Categories to count (Categorical Inputs)");
        categorical_defaults = categorical_defaults.flat();
        let selections = uniq(categorical_defaults);
        let r_levels = selections.map(item => item);
        let initial = selections.map(item => item);
        if (has_missing_data)
        {
            selections.push("Missing data");
            r_levels.push("Missing data used only by Q/Displayr");
        }
        labels_selected = form.listBox({name: "formCategoricalLabels", label: "Categorical labels to count",
                                        alternatives: selections,
                                        names: r_levels,
                                        required: true,
                                        prompt: "Select the categories to be counted in the inputs with categorical labels when creating the output.",
                                        initialValues: initial,
                                        multiSelection: true,
                                        nItemsVisible: 5}).getValues();
    }

    if (numeric_defaults.length > 0)
    {
        // form.group("Values to count (Numeric Inputs)");
        let default_range = [];
        let range = numeric_defaults.reduce((x, y) => [Math.min(x[0], y[0]), Math.max(x[1], y[1])]);
        let infinity_found = range.some(x => !isFinite(x));
        if (infinity_found)
            default_range = convertToOpenInterval(range);
        else
        {
            if (range[0] === range[1])
                range = [range[0]];
            range = range.map(convertNumToRString);
            default_range = range.join("-");
        }
        values_to_count = form.textBox({name: "formNumericValues",
                                        label: "Values to count",
                                        prompt: "Specify the values to be counted in the inputs with numeric values when creating the output. E.g. NA, Inf, 1-3, 5, 6, <=-1, >15",
                                        default_value: default_range}).getValue();
    }
    if (!!values_to_count)
    {
        values_to_count = values_to_count.split(",");
        values_to_count = values_to_count.map(values => values.trim());
    }
    counting_missing_values = (!!values_to_count && values_to_count.includes("NA")) || (!!labels_selected && labels_selected.includes("Missing data used only by Q/Displayr"));

    if (!counting_missing_values)
    {
        form.checkBox({name:"formIgnoreMissing",
                       label:"Calculate for inputs that have incomplete data",
                       prompt:"Allow calculation to proceed even if there are inputs with missing values",
                       default_value: true});
    }
}

function uniq(a) {
    var seen = {};
    return a.filter(function(item) {
        return seen.hasOwnProperty(item) ? false : (seen[item] = true);
    });
}

form.textBox({name: "formIncludeRows", label: "Rows to exclude", prompt: "Select the row labels to be excluded in the output table.", default_value: "NET; SUM", required: false});
form.textBox({name: "formIncludeColumns", label: "Columns to exclude", prompt: "Select the columns labels to be excluded in the output table.", default_value: "NET; SUM", required: false});

form.setHeading("Count Each Column");
library(verbs)

removal.choices <- list(formIncludeRows, formIncludeColumns)
categories.to.remove <- ParseCategoriesToRemove(removal.choices, list(formInput))
remove.rows    <- categories.to.remove[[1L]]
remove.columns <- categories.to.remove[[2L]]

categorical.selections <- get0("formCategoricalLabels", ifnotfound = NULL)

categorical.labels <- if (is.null(categorical.selections)) NULL else names(which(categorical.selections))
numeric.values <- get0("formNumericValues")
ignore.missing <- get0("formIgnoreMissing", ifnotfound = FALSE)

elements.to.count <- list(numeric = numeric.values,
                          categorical = categorical.labels)

count.each.column <- CountEachColumn(QInputs(formInput),
                                     elements.to.count = elements.to.count,
                                     ignore.missing = ignore.missing,
                                     remove.rows = remove.rows,
                                     remove.columns = remove.columns,
                                     warn = TRUE)