Advanced Data Tidying
The steps described in Basic Workflow For Checking and Cleaning a Project and Constructing Variables to Make Analysis Easy will be sufficient for most in terms of preparing your data. However, where the data is to be used by groups that are relatively inexperienced in data analysis, or, those under extreme time pressure, it can be useful to tidy your data further.
Which questions go where
Where the users are very familiar with the questionnaire and its structure, it is usually best to have the data file reflect the order of the questionnaire. In other situations, the following structures can be better:
- Ordering data according to how often it will be used, with the most regularly used data at the top (e.g., demographics and segments).
- General-to-specific. For instance, category questions could be listed first, then brand questions. And, in the case of trackers, you may choose to have questions that are current and consistently asked at the top and historic or ad-hoc questions at the bottom.
- WHO, WHAT, WHERE, WHEN and WHY.
- SITUATION (when, who with and other aspects of context), BEHAVIOR (i.e., action), and PERSON (personality, values, demographics, etc.).
- INFORMATION SEARCH, AWARENESS, CONSIDERATION, TRIAL, USAGE FREQUENCY and SATISFACTION.
- DEMOGRAPHICS, MEDIA, ATTITUDES, CATEGORY BEHAVIOUR, BRAND BEHAVIOUR.
Standard analysis variables
Any standard analysis variables should be included in the data file or created. These will typically vary by client and industry. For example:
- In packaged goods and financial services studies "family lifestage" is usually relevant.
- In media studies Age-by-Gender is often valuable.
- Medical studies typically create variables on patient attributes.
Hiding uninteresting data
Creating sections headings in the data
Although Q does not allow you to create folders of variables, you can get a similar effect by inserting variables as section headings. Sections can be created in Q by inserting a new Binary Variable and giving it a distinctive design to separate blocks of questions in your data. For instance, you may make it indented and in capitals such as NEEDS AND WANTS and show the proportion of people to complete the section, with the label showing a description of the sample.
Names and labels
Often the names shown in the Question and Label fields of the Variables and Questions tab are messy, containing strange programming characters and truncated question wordings. It is generally a good idea to:
- Tidy them.
- Abbreviate them, so that when they appear in menus and exports they are easy-to-read.
- If the questionnaire has been ordered by question number, include the question number in the name of the question (e.g., Q1. Age). Note that you can include the full wording of the question in the footer (see below).
The following can be useful ways of quickly tidying up names and labels:
- Ensuring that they are created in a neat and organized way in the original data file (e.g., see SPSS Data File Specifications).
- Modifications can be made to label using Find/Replace, which supports wildcards (see Find Replace).
- Copying and pasting the Label column into Excel, modifying in Excel, and pasting back into Q again (by right-clicking the first variable and selecting Paste Labels).
- Preliminary Project Setup - Search for Improved Question Names in Data Labels
- Preliminary Project Setup - Remove Truncated Text from Variable Labels
Sorting categories within a question
Sorting can either be done manually, by dragging-and-dropping, but there are also several options for automatic sorting. These can be found by typing the word sorting into the Search features and data box in the top right of the Q window. The options are:
- Select Sorting and Reordering - Sort from Highest to Lowest (Does Not Update When Data Changes) to sort all questions in the project once according to the data in their SUMMARY tables.
- Select Sorting and Reordering - Sort Rows (Automatically Updates when Data Changes) to apply a table Rule to any selected tables which sorts them according to the results currently shown in the table.
Merging together small categories
Merge together categories with small counts (e.g., collapsing age categories and brands with less than 2% market share). This is often best done by:
- Select Automate > Browse Online Library Preliminary Project Setup > Create Tables for Data Checking. This creates tables containing data with small counts.
- Merging together categories by dragging-and-dropping.
Removing irrelevant SUM and NET categories
Creating a report "shell"
It is sometimes useful when setting up a project to create the "shell" of a report, which can be be modified as per requirements by users.
The Report tree in Q is a useful way of setting out the most important findings in the data, or for providing an overview against key groups such as segments, countries or targets. Depending on the user, this will either be:
- A set of summary tables or charts, that can provide a starting point for the user to use in exploring the project.
- A set of crosstabs with all the tables crossed by a few standard questions
Although there are lots of tools within Q for quickly creating a number of overview tables, the most straightforward approach may be to use one of the following scripts:
- Select Automate > Browse Online Library.
- Select and run whichever seems most appropriate of:
Where a question has only been asked to a subset of the sample, it can be useful to create a a relevant footer and apply it to the appropriate tables.
Footers can be customized. In most cases, this is best done using Table Options. However, if adding footnotes containing question wordings, this is best done using Automate > Browse Online Library > Modifying Footers > Description of Selected Data (e.g., Question name, skips, filtering).
Changing the Appearance of Charts and Tables
Sample size warnings and automated data hiding
- Type the words sample size into the Search features and data box in the top right of the Q window
- Select the desired option from the QScripts and Rules section of the results. For example:
Statistics can be placed on multiple tables at one time by either:
- Multi-selecting lots of tables and using whichever is appropriate of Statistics - Cells, Statistics - Right and Statistics - Below.
- Rules (e.g., Modifying The Whole Table or Plot - Always Show Sample Size).
With tables involving Pick One - Multi and Pick One questions, it is often a good idea to use Statistics - Right and Statistics - Below when setting up the project, as many users will not discover these on their own.
Customizing the names of statistics
Statistics can be renamed (e.g., changing Average to Mean or Net Promoter Score), by either:
- Edit > Project Options > Customize > Output Text which will rename the statistics for the entire project.
- Edit > Table Options > Output Text which will rename the statistics for the selected tables.
- Using Rules (e.g., Modifying Headers - Automatically Rename Row Labels).
However, in general, it is often a bad a idea to rename statistics, as it can make it hard for users to understand how Q works, as it will cause the version they are using to appear different to the version that appears in all of Q's documentation.
Sharing data and results
- Setting Up Your Data in Q for an overview of how to set up data in Q
- File Formats Supported by Q
- Manipulating data files
- Basic Workflow For Checking and Cleaning a Project
- Constructing Variables to Make Analysis Easy
- Setting Statistical Assumptions When Setting Up Projects
- Exporting, Copying and Printing
- Updating Projects with New or Revised Data
- Converting Other Files Types into SPSS or CSV Data Files)