Using R in Q
Analyses can be conducted in Q using the R Language. In every sense, when using R from within Q, you are using "pure" R. All the functions are written in R. Any R code is automatically sent over the internet to a server with a normal version of R installed on it. The results are then sent back, and presented to you in Q. While we have attempted to make it feel like Q and R are one-and-the-same, in reality they are completely different programs which "talk" to each other.
How to use R from within Q
There are a number of ways of using R from within Q:
- Entering R code directly into Q in R Outputs.
- Creating R Variables.
- Creating new Data Sets using R.
- Accessing the R functions using menus and forms. This is how most advanced analyses are conducted in Q (e.g., regression, principal components analysis). This is referred to as Standard R.
- Including R code in QScripts. QScript is Q's automation language.
- Automatic updating. Any R code that is created via R Outputs, Standard R, or QScripts can be set to automatically update when the inputs change (e.g., if the input data changes, if a new data file is created, or if other options are changed).
An R Output is an item in the Report Tree that contains both some R code and the result of the R code.
Creating a R Output
- Selecting Create > R Output (or, right-click on an item in the Report Tree).
- Entering instructions in the R Code box within Properties, on the right-hand side of the screen (in the Object Inspector). These instructions need to be written in the R Language.
- Press the Calculate button. This sends the instructions over a secure internet connection to a computer in the cloud. The result is sent back to Q, and shown on your screen (i.e., the result is the R Output). The output will typically be a table, chart, text string, or an error message.
In the example below, a histogram is created of 9 numbers. (If you are not familiar with the R Language, refer to Learning the R language.)
References to variables and questions
As well as the raw data, variables and questions have attributes providing metadata about the variable or question. See the Accessing Data Properties from R example dashboard for details.
References to tables
New tables and other types of outputs (strings, charts, variables, data files) can be created by manipulating tables. The example below creates a new table as the ratio of two existing tables, removes a few rows, and creates a chart. This chart will automatically update whenever the inputs change (e.g., if the data file is updated, or the questions are re-coded).
Every table has a Reference Name, which can be viewed and changed by right-clicking on the table and selecting Reference Name.... Most of the time, the Reference Name is the same as the name shown in the Report Tree. Where a table's reference name is the same as the name of a variable, question, or R Output, you can disambiguate using QTables$reference.name (e.g., QTables$table.2).
As well as the raw data, tables have attributes providing metadata about the table. See the Accessing Data Properties from R example dashboard for details.
References to other R Outputs
Like a table, an R Output has both a Reference Name and a Name. The Reference Name must be unique within a project. The Reference Name is used to refer to other R Outputs in code. For example, if one R Output has a reference name of x, the code x * 2 in another R Output will show the value of x multiplied by 2.
There are a number of ways of changing the Reference Name of an R Output:
- By changing the Reference Name in the Object Inspector (Properties > General).
- By right-clicking on an R Output in the Report Tree and selecting Reference Name....
- By changing the Name, if the Name and the Reference Name are the same and there are no other R Outputs with the same Reference Name.
- By assigning a variable name in the last line of code. For example, the following code creates an R Output with a Reference Name of dog containing the string (or, in R parlance, character) Sherlock:
dog <- "Sherlock"
Avoiding ambiguous references names
There are situations where two things may have the same Reference Name. For example:
- A table and a variable may both have the Reference Name of Q2.
- An R Output and a table may both have the reference name brand.health.
Where this occurs, any R code code that refers to the non-unique name needs to be disambiguated, by using a Fully Qualified Name:
|Colas.sav$variable.name or Colas.sav$Variables$variable.name
|Colas.sav$d1 or Colas.sav$Variables$d1
|Colas.sav$question.name, Colas.sav$Questions$question.name or Colas.sav$VariableSets$variable.set.name
|Colas.sav$Age, Colas.sav$Questions$Age or Colas.sav$VariableSets$Age
An R Variable is a variable in a Data Set, created as follows:
- Create > Variables and Questions > Variable(s) > R Variable...
- Enter code written the R Language in the R Code box. This code should create a vector, table or data-frame, with the same number of observations as in the data file.
- Run the code (F5 or press the blue Play button). If you provide variable or column names, these will be the Labels for the variables when they are created.
- Enter the Variable Base Name. Where your code only creates a single variable, this will be the name of that variable. Otherwise, the new variable names will be whatever you enter here, followed by an underscore and a number (e.g., dog_2).
- Enter a name for the Question.
- Press Add R Variable.
R Data Sets
When you refer to Q data (e.g. variables) from your R code then these will be automatically added as variables in R. Other special variables also exist:
- QFilter: see Filters in R.
- QCalibratedWeight/QCalibrationWeight: see Weights in R.
- QDataFrame/QInputs/QFormula: see R_GUI_Controls#Parameter_Value_Substitution.
- QOutputSizeWidth: the width of the R output in inches.
- QOutputSizeHeight: the height of the R output in inches.
- productName: Q or Displayr.
- QAllowLargeResultObject(): Call this, supplying a number of bytes, to allow larger than default outputs from R. The default limit exists to save users from unintentionally large outputs, which might be slow to download.
- QFileFormatVersion: A number that will increase whenever there is a major version of Q, and when change occur in the file format.
- QSettings contains general project settings such as default colors and statistical assumption values.
- QAppearance contains settings specific to this R Output, such as decimal places, font and other formatting options.
- QFileFormatVersion is the Q version number, which can be helpful in detecting an older client.
- QNObservations is the number of observations (rows) in the dataset which an R variable adds to.
More details for QSettings, QAppearance, QFileFormatVersion and QNObservations can be found in the Accessing Graphical User Interface Properties from R example dashboard.
You can use R's tempfile and tempdir as normal, but any files you create will be deleted when your code finishes.
QScript is Q's macro language, which is used for automation. Many of the menu items in Q are written in QScript. Users can write their own automations using QScript. The key distinctions between QScript and R are:
- QScript can be used for manipulating the user interface (e.g., creating dialog boxes). R cannot.
- QScript can be used for automatically both creating and modifying charts, tables, variables, and questions. By contrast, if you wish to create an R Variable, R Output or, R Data Set you need to either manually create it from the menus, or, create it via QScript. For an example, see Regression - Diagnostic - Prediction-Accuracy Table. Note that within R Outputs you still have all the R functions for creating R data types, such as variables, vectors, and data frames. The distinction being discussed here relates to the ability to control data as shown in the Variables and Questions tab (i.e., a Data Set).
- QScript is generally faster than R (e.g., it is better to create lots of variables in QScript than R).
It is possible to do just about any form of data analysis using R by writing code. Where we think analyses are likely to be used by many of our clients, we have made it available via a graphical user interface (i.e., menus and/or buttons and the like, without needing to write code). We refer to the analyses that we have made available via a graphical user interface as Standard R. The R Logo (i.e., ) is used to mark menu items that use Standard R. See Standard R for more information about how Standard R items work and are created.
R code is automatically re-run whenever:
- Data or outputs that are inputs into R calculations are changed (unless Object Inspector > Properties > R CODE > Automatic is un-checked) (e.g., by changing Values or importing a revised data file).
- The R Code contains instructions for updating data files (see Automatically Updating R DataSets, Variables, and Outputs).
An R Item is a block of code written in the R Language.
When multiple R Outputs are selected, a table displaying the status of the selected R Items will be shown:
R Items that require updating will be greyed out. There are two buttons above the table:
- Update all these items updates all the selected R Items, regardless of whether they require updating.
- Update grey items updates only the the selected R Items that are greyed out.
R is an open-source language and there are plenty of free and paid for trainings available online, thus we do not provide formal R training through Q or Displayr. We do have various examples of R code on our blog and wiki, including a Using R in Displayr - Video series that walks through various R coding examples within the context of Displayr/Q. While the videos are recorded using Displayr, all of the examples and code can be downloaded by Q users through links to the QPacks and used to follow along with the video.
R code is allowed to create temporary files, though they will be automatically deleted when it finishes. Temporary files cannot exceed 500MB.