Basic Workflow For Checking and Cleaning a Project

From Q
Jump to navigation Jump to search
Related Online Training modules
Starting Projects
Opening Projects
Editing Data
Deleting Individual Observations
Deleting Multiple Observations
Automate Project Setup with QScripts
Re-basing Tables
Re-basing Pick Any Tables
Question Types and Statistics
Generally it is best to access online training from within Q by selecting Help > Online Training

The basic work flow for checking and cleaning a project involves working through the activities described in the different sections on this page. Other than needing to begin by importing data, the steps can be done in any order.

Importing the data

To start a project, select File > Data Sets > Add to Project > From File, and select a data file you wish to analyze in Q. If you already have a project open, you should first select File > New Project.

You will then be shown the Data Import Window which gives options for how the data file will be imported. By default, the option Automatically detect data file structure, e.g. group variables into "questions" (recommended) will be chosen, and you should use this option in most cases. In the Advanced section, the option Tidy up Variable Labels will be selected. If you keep this option, which is usually appropriate, Q will strip out any repetitive text that appears in labels within a multiple response question (e.g., if the labels were Satisfaction: Citibank and Satisfaction: Bank of America, Q will replace these with Citibank and Bank of America). The Advanced option Strip HTML from Labels will remove HTML tags which are sometimes present in labels from online surveys.

Many of the QScripts that are mentioned in the remaining sections of this article appear as options in the Advanced section. You can select them to run at the time of the file import, or you can run them later on by choosing them from Automate > Browse Online Library > Preliminary Project Setup, or by typing the word setup into the Search features and data box in the top right corner of the Q window and clicking in the QScripts and Rules section of the search results.

Checking the data file

Sometimes data files contain errors which can make data analysis difficult and, sometimes, impossible. Q has a QScript for checking for the most common problems. To run this QScript:

  1. Type setup into the Search features and data box at the top right of the Q window.
  2. Click on the QScripts and Rules section.
  3. Select Preliminary Project Setup - Check for Errors in Data File Construction.

This script will scan through the data in your project, looking for common errors in data file setup. It will present a set of tables which highlight the errors so that you can address them, or if they are serious, ask your data provider to fix them and send you a new copy of your data.

Errors that the script tries to identify include:

  • When a variable is the wrong Variable Type. For example, numeric data is stored in a Text variable.
  • Incorrect Missing Data settings in binary variables.
  • Missing labels.

A full list is available on the documentation for the script.

Checking the data

To create table containing basic summary data:

  1. Type setup into the Search features and data box at the top right of the Q window.
  2. Click on the QScripts and Rules section.
  3. Select Preliminary Project Setup - Summary Tables.
  4. Review the tables and address any problems. In particular:
    • Check that the NET value is sensible. See NET is not 100%.
    • Look at the base n at the bottom of the table, as it will often highlight data integrity issues. If it shows a range of values (e.g., base n = from 120 to 139) this indicates that different cells on the tables vary in their sample size. Use Statistics - Cells > Base n, n to explore this in more detail. Where the base n shows a really low number (e.g., base n = 0 to 139), this generally indicates either a problem with the Value Attributes, or, that the NET or SUM on the table should be hidden (right-click on it and select Hide). See Sample Size Seems Too Small for more information.
  5. Run the QScript called Preliminary Project Setup - Tables for Data Checking, which will focus on creating tables that contain results that are automatically identified as requiring attention (e.g., tables with very small cell counts).

Another way to view sample size information by variable is to go to the Variables and Questions tab and press ShowSimpleStatistics.png, which will compute the minimum, maximum, mean and sample size for all the variables in the database.

See Using Scripts to Automate Data Checking and Cleaning for more advanced options.

Hiding irrelevant data

If a table is showing information that you think the user will not want to see (e.g., administrative records):

  • Press the blue arrow (RightArrow.png) to the right of the Blue Drop-Down menu to select the variable in the Variables and Questions tab and hide the question (by pressing on Hidden.png).
  • Return to the Outputs tab and delete the table.

Alternatively, you can get Q to automate this processing using Preliminary Project Setup - Hide Uninteresting Data.

Checking Question Types

Q automates may analyses using Question Type, so ensuring that the Question Type has been set correctly is an essential step to checking a project. If your data file has been created in a good way (see File Formats Supported by Q) then nearly all or nearly all the question types will be automatically setup correctly within Q.

The Question Type used by Q can be identified by looking at the Question Type column next to the relevant variables in the Variables and Questions tab (if you are on the Outputs Tab, press the blue arrow (RightArrow.png) to the right of the drop-down menu that contains the question you wish to review). The Question Type can be modified as follows:

  • If Q has inadvertently grouped together multiple questions, where they should have been kept separate, select the relevant variables, right-click on the selected variables and select Revert to source. Once this has occurred, create new tables in the Outputs tab for each of the variables.
  • If Q has failed to group together multiple variables as a question, select the variables in the Variables and Questions tab, right-click and select Set Question.
  • If the variables have been correctly grouped, but the Question Type is wrong, change the Question Type.

The online tutorial on | Multiple Response Questions and the ensuing tutorials in the Manipulating Data > Questions section of Online Training provides worked examples.

If the Status column in the Variables and Questions tab is showing any yellow cells then this mean that you need to review the Value Attributes (in general, you should review the Value Attributes anyway if new to Q or if you have obtained data from a new supplier).

Reviewing the Value Attributes

If you right-click on a blue or brown heading on a table and select Values Q will open the Value Attributes dialog box. Within this dialog box you can:

  • Set data as missing by checking options in the Missing Data column, which will cause percentages and means to be recomputed.
  • Recode data by editing the contents of the Value column. In particular, if a category is not checked in the Missing Data column, the Value shown will be used when computing averages, medians and other non-categorical statistics. Consequently, it is often useful to replace the Value in the data set with some other more useful value. Common things to change are:
  • Modify the categories used in computing percentages on Pick Any and Pick Any – Grid questions using Count This Value.

There are many other ways of doing each of these operations in Q. For example, if a table shows categories that you wish to specify as missing, right click on the categories and select Remove. Alternatively, right-click on the category and select Values and modify the Value Attributes.

Tidying Names and Labels

If the names of questions appear to be very short then it may be possible to obtain better question names from the labels in the raw data using the QScript: Preliminary Project Setup - Search for Improved Question Names in Data Labels.

Similarly, if labels appear messy, with information about the question included, then it may be possible to tidy them up using the QScript: Preliminary Project Setup - Remove Truncated Text from Variable Labels.

Checking questionnaire skips

There are four different ways of checking questionnaire skips within Q:

  1. Create filters from questions that were used to determine the skips and apply these to tables in the Outputs Tab.
  2. On the Variables and Questions tab, press ShowSimpleStatistics.png which shows the sample size for each variable.
  3. On the Data tab, sort any variables that are used as skips (which causes the other variables rows to be aligned with the variable used in the skips).
  4. Use QScript. In particular, see Checking for Invalid Data.

Changing data (i.e., changing a respondent's values)

There are a variety of different tools for changing respondents' data, including:

Back-coding 'other specifies'

See How to Back Code Other Specify Responses

Deleting cases

If when cleaning and checking the data it is identified that the data contains cases (respondents) that should be deleted, this can be done in the Data tab by right-clicking on rows and selecting Delete Row. Deleted cases are not deleted from the data file, but they are excluded from any analyses. You can return deleted rows by right-clicking and selecting Revert Deleted Rows.

See also