Data File Setup for Tracking Studies
The productive analysis of data from a tracking study requires a cumulative data file. That is, a single data file should contain all waves of the study. Where it is not possible to obtain a cumulative data file then it is not possible to use Q to test for differences between waves.
Appropriate setup of the project
A single project
In general, it is highly desirable to to run a tracking study as a single project in the data collection software. That is, it is usually a very bad idea to set up each wave as a separate project (e.g., it is a bad idea to conduct wave 2 by copying wave 1 and changing it, or, setting up wave 2 in isolation). It is a bad idea to run each wave as a separate project because:
- It will prevent any version control tools from working. That is, better data collection programs will keep a track of all changes to the questionnaire.
- Good data collection programs will prevent users from making 'silly' changes to questionnaires on tracking studies (e.g., adding or removing categories, changing between single and multiple response questions).
- When exporting from a single project, the export will be forced to be consistent. However, if exporting from different projects, there will often be inconsistencies. For example, sometimes question numbers are automatic, with the result being that Q2 in one project will be Q3 in another. Or, sometimes numeric variables will be exported as text and vice versa. Such inconsistencies greatly add to the complexity of analyzing the data.
Defensive programming of the questionnaire
It is commonly the case that the questionnaire will evolve for a tracking study. Changes to the questionnaire, such as the addition of new brands, can make later analysis difficult. For example, if a question collects awareness data, and a new brand is added, then this new brand will necessitate a new variable to be created in the data file. Some useful tricks:
- If changing the wording of an option (e.g., changing "My main brand" to "My number 1 brand"), hide the original option and add a new option. That is, do not simply over-write the existing label. The reason for having them as separate versions is that it will make it possible to disentangle the effects of the different wordings.
- When changing a question, consider hiding the old question and creating a new question. For example, if a question asks people to choose between A, B and C, and then in the second wave, there is a desire to add option D, it is often a good idea to set this up as a completely different question. When this is not done, it is easy to fail to remember when analyzing the data that the initial respondents did not get the option of choosing D.
- Where changing response options in Pick One, Pick One - Multi and Pick Any - Compact questions, create additional variables for exporting that track which respondents saw which versions. This is useful because the information can often not be deduced from the data otherwise (i.e., as there is no way to distinguish between people not choosing an option because it was not there versus not choosing it because it was not applicable).
- A useful trick when programming Pick Any and Pick Any - Grid questions is to put dummy brands into the original data file and hide these. For example, the questionnaire may be setup as offering the following brands: Coke, Pepsi, Fanta, DUMMY1, DUMMY2, and DUMMY3, which then ensures that when the data is exported, it contains the additional variables.
Preparation of a cumulative data file
There are four main ways to create cumulative data files. These are ordered according to their desirability (i.e., 1st is best, 4th is worst).
- Export a single data file from the data collection software. The reason that this is generally the best approach is that where there are changes in the questionnaire these will either have to be resolved prior to exporting, or, will be obvious in the exported data file (i.e., with separate versions of the same question).
- Have the data glued together using data collection software that has tools for addressing different versions of a questionnaire.
- Merge the files together in Q using Tools | Merge Data Files | Add New Cases. Note that this is less prefarable than having them glued together by specialist data collection software, becaues typically the file formats required by specialist data collection software have more metadata, which makes the merging more successful
- Merging the files in another program, such as SPSS. Note that this is generally inferior to using Q to do the merging, as the data merging tools in Q are written with the notion that the data will be used in Q, and thus they tend to produce better merged data files.
Version control and the avoidance of crystallizing errors
Over the course of a tracker it is typical that many small changes will occur in the questionnaire. It is generally useful to have some way of working out which versions of which questions were seen by which respondents. In general, the best way to do this is using version control tools in the data collection software. Two things to be avoided are:
- Merging different versions of questions when merging the data file.
- Recoding data either in SPSS, or, in Q and then exporting as a data file and merging this data file with other files.
The reason that these two things are bad to do are because the cause errors to be locked in (i.e., crystallized). Instead, the better process is to merge together the data files, only merging identical questions, and then use tools within Q to merge different versions of the questionnaire (e.g., Merge Questions), as this makes it easy to identify when changes occurred.
With very large or complex projects it can often be useful to have two Q projects. The first which has all the different versions and any data cleaning. An SPSS data file is then exported from this first version and analyzed in a second Q project.