Using R in Q
Chapters within What's New in Q5 (Video)
Analyses can be conducted in Q using the R Language. In every sense, when using R from within Q, you are using "pure" R. All the functions are written in R. Any R code is automatically sent over the internet to a server with a normal version of R installed on it. The results are then sent back, and presented to you in Q. While we have attempted to make it feel like Q and R are one-and-the-same, in reality they are completely different programs which "talk" to each other.
How to use R from within Q
There are a number of ways of using R from within Q:
- Entering R code directly into Q in R Outputs.
- Creating R Variables.
- Creating new Data Sets using R.
- Accessing the R functions using menus and forms. This is how most advanced analyses are conducted in Q (e.g., regression, principal components analysis). This is referred to as Standard R.
- Including R code in QScripts. QScript is Q's automation language.
- Automatic updating. Any R code that is created via R Outputs, Standard R, or QScripts can be set to automatically update when the inputs change (e.g., if the input data changes, if a new data file is created, or if other options are changed).
An R Output is an item in the Report Tree that contains both some R code and the result of the R code.
Creating a R Output
- Selecting Create > R Output (or, right-click on an item in the Report Tree).
- Entering instructions in the R Code box within Properties, on the right-hand side of the screen (in the Object Inspector). These instructions need to be written in the R Language.
- Press the Calculate button. This sends the instructions over a secure internet connection to a computer in the cloud. The result is sent back to Q, and shown on your screen (i.e., the result is the R Output). The output will typically be a table, chart, text string, or an error message.
In the example below, a histogram is created of 9 numbers. (If you are not familiar with the R Language, refer to Learning the R language.)
References to variables and questions
References to tables
New tables and other types of outputs (strings, charts, variables, data files) can be created by manipulating tables. The example below creates a new table as the ratio of two existing tables, removes a few rows, and creates a chart. This chart will automatically update whenever the inputs change (e.g., if the data file is updated, or the questions are re-coded).
Every table has a Reference Name, which can be viewed and changed by right-clicking on the table and selecting Reference Name.... Most of the time, the Reference Name is the same as the name shown in the Report Tree. Where a table's reference name is the same as the name of a variable, question, or R Output, you can disambiguate using QTables$reference.name (e.g., QTables$table.2).
References to other R Outputs
Like a table, an R Output has both a Reference Name and a Name. The Reference Name must be unique within a project. The Reference Name is used to refer to other R Outputs in code. For example, if one R Output has a reference name of x, the code x * 2 in another R Output will show the value of x multiplied by 2.
There are a number of ways of changing the Reference Name of an R Output:
- By changing the Reference Name in the Object Inspector (Properties > General).
- By right-clicking on an R Output in the Report Tree and selecting Reference Name....
- By changing the Name, if the Name and the Reference Name are the same and there are no other R Outputs with the same Reference Name.
- By assigning a variable name in the last line of code. For example, the following code creates an R Output with a Reference Name of dog containing the string (or, in R parlance, character) Sherlock:
dog <- "Sherlock"
Avoiding ambiguous references names
There are situations where two things may have the same Reference Name. For example:
- A table and a variable may both have the Reference Name of Q2.
- An R Output and a table may both have the reference name brand.health.
Where this occurs, any R code code that refers to the non-unique name needs to be disambiguated, by using a Fully Qualified Name:
|Variables||Colas.sav$variable.name or Colas.sav$Variables$variable.name||Colas.sav$d1 or Colas.sav$Variables$d1|
|Questions||Colas.sav$question.name, Colas.sav$Questions$question.name or Colas.sav$VariableSets$variable.set.name||Colas.sav$Age, Colas.sav$Questions$Age or Colas.sav$VariableSets$Age|
An R Variable is a variable in a Data Set, created as follows:
- Create > Variables and Questions > Variable(s) > R Variable...
- Enter code written the R Language in the R Code box. This code should create a vector, table or data-frame, with the same number of observations as in the data file.
- Run the code (F5). If you provide variable or column names, these will be the Labels for the variables when they are created.
- Enter the Variable Base Name. Where your code only creates a single variable, this will be the name of that variable. Otherwise, the new variable names will be whatever you enter here, followed by an underscore and a number (e.g., dog_2).
- Enter a name for the Question.
- Press Add R Variable.
R Data Sets
QScript is Q's macro language, which is used for automation. Many of the menu items in Q are written in QScript. Users can write their own automations using QScript. The key distinctions between QScript and R are:
- QScript can be used for manipulating the user interface (e.g., creating dialog boxes). R cannot.
- QScript can be used for automatically both creating and modifying charts, tables, variables, and questions. By contrast, if you wish to create an R Variable, R Output or, R Data Set you need to either manually create it from the menus, or, create it via QScript. For an example, see Regression - Diagnostic - Prediction-Accuracy Table. Note that within R Outputs you still have all the R functions for creating R data types, such as variables, vectors, and data frames. The distinction being discussed here relates to the ability to control data as shown in the Variables and Questions tab (i.e., a Data Set).
- QScript is generally faster than R (e.g., it is better to create lots of variables in QScript than R).
It is possible to do just about any form of data analysis using R by writing code. Where we think analyses are likely to be used by many of our clients, we have made it available via a graphical user interface (i.e., menus and/or buttons and the like, without needing to write code). We refer to the analyses that we have made available via a graphical user interface as Standard R. The R Logo (i.e., ) is used to mark menu items that use Standard R. See Standard R for more information about how Standard R items work and are created.
R code is automatically re-run whenever:
- Data or outputs that are inputs into R calculations are changed (unless Object Inspector > Properties > R CODE > Automatic is un-checked) (e.g., by changing Values or importing a revised data file).
- The R Code contains instructions for updating data files (see Automatically Updating R DataSets, Variables, and Outputs).
An R Item is a block of code written in the R Language.
When multiple R Outputs are selected, a table displaying the status of the selected R Items will be shown:
R Items that require updating will be greyed out. There are two buttons above the table:
- Update all these items updates all the selected R Items, regardless of whether they require updating.
- Update grey items updates only the the selected R Items that are greyed out.