Text Analysis - Advanced - Search

From Q
Jump to navigation Jump to search

Searches through a set of text for a particular term and displays a table showing the cases that contain that search term. This is useful when trying to understand the context of a particular word or phrase in your text. The input text can be a Text variable, a character vector in an R Output, or the processed text from a in item created by Text Analysis - Setup Text Analysis. If you use processed text to do the search, the table will display both the original text input as well as the processed text.

This blog post contains an example of searching text from tweets.

Examples

In Displayr, go to Insert > Text Analysis > Advanced > Search.

In Q, go to Create > Text Analysis > Advanced > Search

  1. Under Inputs > Input select a Text Analysis - Advanced - Setup Text Analysis object, a Text variable, or a character vector in an R Output.
  2. Enter a Search Term (in the below example, diet).
  3. Ensure the Automatic box is checked, or click Calculate

TASearch inputs.png

In this example we have first processed the text from a survey question that asked people what they think is the difference between people who drink different types of cola, using Text Analysis - Setup Text Analysis. This is the initial step to some of the other text analysis options, and we used this as the input to our search. As the text has been pre-processed, the table shows both the original text and the results of the processing. If you use a text variable or character vector under Inputs > Input then the result will instead be a single column containing the text only. The search term that has been used is the word diet, and it appears in bold when it is identified in the processed text.



Extract Sorted text

To get an output that contains only the processed text (i.e. only responses that contain the search term), take these steps:

1. Insert a new R Output (Displayr: Insert > R Output; Q: Create > R Output).
2. In the R CODE field, use the code below, where text.search is the name of your Text Search output:

text.search$Processed.Text


3. Ensure the Automatic box is checked, or click Calculate


Options

Input The text that you want to search. This could be a Text variable, an R Output which is a "character" vector, or an item created by Text Analysis - Setup Text Analysis.
Search Term The string that you want to search for.

Code

var heading_text = "Search Text";
if (!!form.setObjectInspectorTitle)
    form.setObjectInspectorTitle(heading_text, heading_text);
else
    form.setHeading(heading_text);
form.dropBox({name: "formInput",
              label: "Input",
              types: ["R:wordBag,character", "V:Text"],
              prompt: "Text variable or output from Text Analysis > Setup"});
form.textBox({name: "formSearch", label: "Search Term", required: true,
              prompt: "Word or phrase"})
library(flipFormat)

print.textSearch <- function(x) 
{
    DataTableWithRItemFormat(x, allow.length.change = FALSE, escape.html = FALSE)    
}

if (is(formInput, "wordBag"))
{
    transformed.text <- if(!is.null(formInput$transformed.text)) form.input$transformed.text else flipTextAnalysis:::createTransformedText(formInput)
    text.to.search <- transformed.text
    data.to.display <- data.frame("Original Text" = formInput$original.text, "Processed Text" = transformed.text)
    search.column <- 2
} else if (is(formInput, "character")) {
    processed.text <- tolower(formInput)
    text.to.search <- processed.text
    data.to.display <- data.frame("Text" = processed.text, stringsAsFactors = FALSE)
    search.column <- 1
} else {
    stop("Unable to interpret the input as text. Please select some raw text or an item created by Text Analysis > Setup Text Analysis.")
}

row.numbers <- 1:length(text.to.search)

if (length(QFilter) > 1) 
{
    text.to.search <- text.to.search[QFilter]
    data.to.display <- data.to.display[QFilter, ,drop = FALSE]
    row.numbers <- row.numbers[QFilter]
}

search.term <- formSearch
search.term <- tolower(search.term)
if (search.term != "")
{
    matches <- grep(search.term, text.to.search)
    text.search.results <- data.to.display[matches,, drop = FALSE]
    if (class(formInput) == "character")
    {
        text.search.results <- data.frame("Text" = text.search.results, stringsAsFactors = FALSE)
    }
    text.search.results[,search.column] <- gsub(search.term, paste0("<b>", search.term, "</b>"), text.search.results[,search.column], fixed = TRUE)
    row.numbers <- row.numbers[matches]
} else {
    text.search.results <- data.to.display    
}

row.names(text.search.results) <- row.numbers
class(text.search.results) <- c("textSearch", class(text.search.results))
if (nrow(text.search.results) == 0)
{
    text.search.results <- "No matches found!"
}
text.search <- text.search.results

See Also