# Filter - Filters for Train-Test Split

Create 2 new filters based on a random 70%/30% split of the selected data

This QScript creates 2 new filters based on a random 70%/30% split of the selected data. These filters can then be applied to predictive models in order to separate a training data set from a test data set. The QScript can be amended to adjust the split ratio.

## Example

The result of running this script is shown below. The first 2 variables are the new filters created.

## Technical details

The value of trainPercentage in the QScript code below controls the split ratio. The default of 70 means that 70% of the data (rounded to the nearest whole number of instances) is selected as part of the Training split and the remaining 30% is selected by the Testing split filter.

By adjusting this value as described below in Customizing the QScript the percentages in the training and testing filters can be controlled.

## How to apply this QScript

• Start typing the name of the QScript into the Search features and data box in the top right of the Q window.
• Click on the QScript when it appears in the QScripts and Rules section of the search results.

OR

• Select Automate > Browse Online Library.
• Select this QScript from the list.

## Customizing the QScript

This QScript is written in JavaScript and can be customized by copying and modifying the JavaScript.

### Customizing QScripts in Q4.11 and more recent versions

• Start typing the name of the QScript into the Search features and data box in the top right of the Q window.
• Hover your mouse over the QScript when it appears in the QScripts and Rules section of the search results.
• Press Edit a Copy (bottom-left corner of the preview).
• Modify the JavaScript (see QScripts for more detail on this).
• Either:
• Run the QScript, by pressing the blue triangle button.
• Save the QScript and run it at a later time, using Automate > Run QScript (Macro) from File.

### Customizing QScripts in older versions

• Copy the JavaScript shown on this page.
• Create a new text file, giving it a file extension of .QScript. See here for more information about how to do this.
• Modify the JavaScript (see QScripts for more detail on this).
• Run the file using Automate > Run QScript (Macro) from File.

## JavaScript

// This script creates 2 new filters based upon a random split of the data.

includeWeb("QScript Selection Functions");
includeWeb("QScript Functions to Generate Outputs");

filtersForTrainTestSplit();

function filtersForTrainTestSplit() {

const is_displayr = inDisplayr();

// Set percentage of data used for training set
let trainPercentage = prompt("What percentage of the data set should be used as the training set?", 70);
if (trainPercentage < 0 || trainPercentage > 100) {
log("Invalid split.  Please ensure that trainPercentage is between 0 and 100.");
return false;
}

// Get the data
let dataFile;
const user_selections = getAllUserSelections()
let selected_questions = user_selections.selected_questions;
if (selected_questions.length > 0)
dataFile = project.report.selectedQuestions()[0].dataFile;
else if (project.dataFiles.length == 1)
dataFile = project.dataFiles[0];
else if (project.dataFiles.length == 0) {
return false;
} else if (!is_displayr) {
dataFile = dataFileSelection()[0];
} else {
log("Please select data from a single data set.")
return false;
}

// Create a training filter based on a random sample
let RText = "percentage <- " + trainPercentage + " # Change this number to change the percentage in the training sample\n" +
"set.seed(123) # This ensures that the randomization is identical each time\n" +
"n <- " + dataFile.totalN + " # This is the total sample size\n" +
"indices <- sample.int(n, round(percentage * n / 100))\n" +
"filter <- rep(0, n)\n" +
"filter[indices] <- 1\n" +
"filter";
let new_q_name = preventDuplicateQuestionName(dataFile, "Training sample");

let test;
let train;

try {
train = dataFile.newRVariable(RText, preventDuplicateVariableName(dataFile, "training"), new_q_name, null);
} catch (e) {
log("Could not create train filter: " + e);
return false;
}
train.needsCheck = false;

// Create testing filter of the data not selected by the training filter
RText = "as.numeric(!" + dataFile.name + "$Variables$" + train.name + ")";    // backticks allow hyphen in dataFile.fileName
try {
test = dataFile.newRVariable(RText, preventDuplicateVariableName(dataFile, "testing"), "Testing sample", null);
} catch (e)
{
log("Could not create test filter: " + e);
return false;
}
test.needsCheck = false;

// Combine the 2 new variables into a Pick-Any question
trainTest = dataFile.setQuestion(preventDuplicateQuestionName(dataFile, "Train test split"), "Pick Any", [train, test]);
let suffix = trainTest.name.replace(/^Train test split/, "");
trainTest.variables[0].label = "Training sample" + suffix;
trainTest.variables[1].label = "Testing sample" + suffix;
trainTest.needsCheckValuesToCount = false;
trainTest.isFilter = true;
insertAtHoverButtonIfShown(trainTest);
reportNewRQuestion(trainTest, "Filter");
return true;
}