Using R in Q

From Q
Jump to navigation Jump to search

Analyses can be conducted in Q using the R Language. In every sense, when using R from within Q, you are using "pure" R. All the functions are written in R. Any R code is automatically sent over the internet to a server with a normal version of R installed on it. The results are then sent back, and presented to you in Q. While we have attempted to make it feel like Q and R are one-and-the-same, in reality they are completely different programs which "talk" to each other.

How to use R from within Q

There are a number of ways of using R from within Q:

  1. Entering R code directly into Q in R Outputs.
  2. Creating R Variables.
  3. Creating new Data Sets using R.
  4. Accessing the R functions using menus and forms. This is how most advanced analyses are conducted in Q (e.g., regression, principal components analysis). This is referred to as Standard R.
  5. Including R code in QScripts. QScript is Q's automation language.
  6. Automatic updating. Any R code that is created via R Outputs, Standard R, or QScripts can be set to automatically update when the inputs change (e.g., if the input data changes, if a new data file is created, or if other options are changed).

R Outputs

An R Output is an item in the Report Tree that contains both some R code and the result of the R code.

Creating a R Output

  1. Selecting Create > R Output (or, right-click on an item in the Report Tree).
  2. Entering instructions in the R Code box within Properties, on the right-hand side of the screen (in the Object Inspector). These instructions need to be written in the R Language.
  3. Press the Calculate button. This sends the instructions over a secure internet connection to a computer in the cloud. The result is sent back to Q, and shown on your screen (i.e., the result is the R Output). The output will typically be a table, chart, text string, or an error message.

In the example below, a histogram is created of 9 numbers. (If you are not familiar with the R Language, refer to Learning the R language.)

RinQExample.png

References to variables and questions

R code can refer to both variables and questions, by typing either the Variable Name or Question Name into the R Code box. See Data Sets in R for more information.

As well as the raw data, variables and questions have attributes providing metadata about the variable or question. See the Accessing Data Properties from R example dashboard for details.

References to tables

New tables and other types of outputs (strings, charts, variables, data files) can be created by manipulating tables. The example below creates a new table as the ratio of two existing tables, removes a few rows, and creates a chart. This chart will automatically update whenever the inputs change (e.g., if the data file is updated, or the questions are re-coded).

ManipulatingTablesinQ.png

Every table has a Reference Name, which can be viewed and changed by right-clicking on the table and selecting Reference Name.... Most of the time, the Reference Name is the same as the name shown in the Report Tree. Where a table's reference name is the same as the name of a variable, question, or R Output, you can disambiguate using QTables$reference.name (e.g., QTables$table.2).

ReferenceName.PNG

As well as the raw data, tables have attributes providing metadata about the table. See the Accessing Data Properties from R example dashboard for details.

References to other R Outputs

Like a table, an R Output has both a Reference Name and a Name. The Reference Name must be unique within a project. The Reference Name is used to refer to other R Outputs in code. For example, if one R Output has a reference name of x, the code x * 2 in another R Output will show the value of x multiplied by 2.

There are a number of ways of changing the Reference Name of an R Output:

  1. By changing the Reference Name in the Object Inspector (Properties > General).
  2. By right-clicking on an R Output in the Report Tree and selecting Reference Name....
  3. By changing the Name, if the Name and the Reference Name are the same and there are no other R Outputs with the same Reference Name.
  4. By assigning a variable name in the last line of code. For example, the following code creates an R Output with a Reference Name of dog containing the string (or, in R parlance, character) Sherlock:
dog <- "Sherlock"

Avoiding ambiguous references names

There are situations where two things may have the same Reference Name. For example:

  • A table and a variable may both have the Reference Name of Q2.
  • An R Output and a table may both have the reference name brand.health.

Where this occurs, any R code code that refers to the non-unique name needs to be disambiguated, by using a Fully Qualified Name:

Object type Syntax Example
R Outputs QROutputs$item.name QROutputs$r.output.3
Tables QTables$item.name QTables$age.by.gender.3
Variables Colas.sav$variable.name or Colas.sav$Variables$variable.name Colas.sav$d1 or Colas.sav$Variables$d1
Questions Colas.sav$question.name, Colas.sav$Questions$question.name or Colas.sav$VariableSets$variable.set.name Colas.sav$Age, Colas.sav$Questions$Age or Colas.sav$VariableSets$Age

R Variables

An R Variable is a variable in a Data Set, created as follows:

  1. Create > Variables and Questions > Variable(s) > R Variable...
  2. Enter code written the R Language in the R Code box. This code should create a vector, table or data-frame, with the same number of observations as in the data file.
  3. Run the code (F5 or press the blue Play button). If you provide variable or column names, these will be the Labels for the variables when they are created.
  4. Enter the Variable Base Name. Where your code only creates a single variable, this will be the name of that variable. Otherwise, the new variable names will be whatever you enter here, followed by an underscore and a number (e.g., dog_2).
  5. Enter a name for the Question.
  6. Press Add R Variable.

RVariable.png

R Data Sets

Data Sets can be added to a project using R: File > Data Sets > Add to Project > From R. See R Data Sets for more information.

R Environment

When you refer to Q data (e.g. variables) from your R code then these will be automatically added as variables in R. Other special variables also exist:

  • QFilter: see Filters in R.
  • QCalibratedWeight/QCalibrationWeight: see Weights in R.
  • QDataFrame/QInputs/QFormula: see R_GUI_Controls#Parameter_Value_Substitution.
  • QOutputSizeWidth: the width of the R output in inches.
  • QOutputSizeHeight: the height of the R output in inches.
  • productName: Q or Displayr.
  • QAllowLargeResultObject(): Call this, supplying a number of bytes, to allow larger than default outputs from R. The default limit exists to save users from unintentionally large outputs, which might be slow to download.
  • QFileFormatVersion: A number that will increase whenever there is a major version of Q, and when change occur in the file format.
  • QSettings contains general project settings such as default colors and statistical assumption values.
  • QAppearance contains settings specific to this R Output, such as decimal places, font and other formatting options.
  • QFileFormatVersion is the Q version number, which can be helpful in detecting an older client.
  • QNObservations is the number of observations (rows) in the dataset which an R variable adds to.

More details for QSettings, QAppearance, QFileFormatVersion and QNObservations can be found in the Accessing Graphical User Interface Properties from R example dashboard.

You can use R's tempfile and tempdir as normal, but any files you create will be deleted when your code finishes.

QScript

QScript is Q's macro language, which is used for automation. Many of the menu items in Q are written in QScript. Users can write their own automations using QScript. The key distinctions between QScript and R are:

  • QScript can be used for manipulating the user interface (e.g., creating dialog boxes). R cannot.
  • QScript can be used for automatically both creating and modifying charts, tables, variables, and questions. By contrast, if you wish to create an R Variable, R Output or, R Data Set you need to either manually create it from the menus, or, create it via QScript. For an example, see Regression - Diagnostic - Prediction-Accuracy Table. Note that within R Outputs you still have all the R functions for creating R data types, such as variables, vectors, and data frames. The distinction being discussed here relates to the ability to control data as shown in the Variables and Questions tab (i.e., a Data Set).
  • QScript is generally faster than R (e.g., it is better to create lots of variables in QScript than R).
  • It is much easier and faster for users to write R code than QScript. R is specifically designed for data analysis, whereas JavaScript, which is the language that QScript is written for, is designed to be used for many, many, different applications, and the consequence of this is that it can be quite unwieldy for data analysis (i.e., to use JavaScript you need more advanced coding skills and will generally need to write many more lines of code than if trying to achieve the same thing in R).

Standard R

It is possible to do just about any form of data analysis using R by writing code. Where we think analyses are likely to be used by many of our clients, we have made it available via a graphical user interface (i.e., menus and/or buttons and the like, without needing to write code). We refer to the analyses that we have made available via a graphical user interface as Standard R. The R Logo (i.e., RLogo.png) is used to mark menu items that use Standard R. See Standard R for more information about how Standard R items work and are created.

Updating

R code is automatically re-run whenever:

An R Item is a block of code written in the R Language.

When multiple R Outputs are selected, a table displaying the status of the selected R Items will be shown:

Multiple r items.png

R Items that require updating will be greyed out. There are two buttons above the table:

  • Update all these items updates all the selected R Items, regardless of whether they require updating.
  • Update grey items updates only the the selected R Items that are greyed out.

Training

R is an open-source language and there are plenty of free and paid for trainings available online, thus we do not provide formal R training through Q or Displayr. We do have various examples of R code on our blog and wiki, including a Using R in Displayr - Video series that walks through various R coding examples within the context of Displayr/Q. While the videos are recorded using Displayr, all of the examples and code can be downloaded by Q users through links to the QPacks and used to follow along with the video.

Constraints

R code is allowed to create temporary files, though they will be automatically deleted when it finishes. Temporary files cannot exceed 500MB.