Cleaning Text for Analysis

From Q
Jump to navigation Jump to search

Some text can contain a number of characters that can influence the functions that run some forms of text analysis. These include a range of characters that simply look like spaces or line breaks, or which are not part of the usual English dictionary. Assuming that your text is in English, the following steps can be taken to create a variable that removes most of these characters, and which will likely improve the automated analyses.

Process

1. On the Variables and Questions tab, right-click anywhere and select Insert Variable(s) > JavaScript Formula > Text
2. In the Expression field, paste in the code here, and on the first line replace my_text_variable with the Name of your existing text variable.

var input = my_text_variable;
input.replace(/[\W_]+/g," ");

3. Click OK

Label your new variable sensibly, and then use this in your further analysis.