How to Automatically Code Text Variables

From Q
Jump to navigation Jump to search

Q has the ability to automatically code a single text variable, or multiple text variables (and keep responses alphabetically ordered). This is distinct from Q’s semi-automatic and manual Coding features, which requires you create and allocate responses to categories. Q uses automatic rules to create and allocate codes, behind the scenes. This is very useful for CSV files, where categorical data is often encoded as the labels rather than as numeric values (e.g. a question such as “What is your favourite animal?” would have data values of “Ants”, “Dogs”, “Cats”).

How to automatically code a single text variable

  1. Go to the Variables and Questions tab.
  2. Change the Variable Type dropdown from Text to Categorical.

How to automatically code multiple text variables (where the code frame is shared)

  1. Go to the Variables and Questions tab.
  2. Select the text variables which should be shared.
  3. Right-click and select Set Question.
  4. Confirm the Question Type is Text – Multi and click OK. The text variables have now been combined into a single question.
  5. Change the Question Type dropdown on the new question from Text – Multi to Pick One – Multi.

How automatic coding works

Key points

  • Converting a Text variable to a Categorical variable via Variable Type dropdown will automatically code text responses.
  • Auto coded variables that are part of the same question (e.g. Pick Any, Pick One – Multi, etc.) share the same code frame. This means all text responses from all the source text variables will be coded together and have the same numeric values.
  • When automatically coding multiple text variables that are related, first use Set Question to combine them into a Text – Multi question, and then change the Question Type dropdown to Pick One Multi (or any other categorical type). This ensures responses from all variables are alphabetically ordered.

How Q automatically codes text responses

  • Like the manual Coding feature, spaces at the start and end of responses are ignored, and it is not case sensitive. For example, “ dogs” and “Dogs “ will both be coded as the same category.
  • When making a label for the coded categories, Q chooses the label that occurs most often in the text responses. For example, if the responses were “coke“, “COKE”, “Coke” and “Coke” the auto coded question would use “Coke” as the label for the category.
  • The coded categories are in alphabetical order, both in the Value Attributes dialog and on tables.

How Q deals with changes in the data file

  • Whenever the source text variables are updated (from either an updated data file, or due to an edit within Q), the code frame is automatically re-coded.
  • Whenever auto coded variables are combined into a multi-variable question, their code frames change to include unique responses from all other input text variables.
  • Whenever auto coded variables are moved from a multi-variable question to their own single-variable question, their code frames stop including responses from the other text variables, and just include their own. Importantly, their category values stay the same. This means you can rely on the numeric values staying the same if you use Q’s Ready-Made Formulas.
  • Existing text responses always keep their same category value (e.g. if “Ants” was originally the first alphabetical response with an auto coded value of 1, and “Aardvarks” appeared in the new data, “Ants” would remain with a value of 1, and “Aardvarks” would get a new unique value).
  • The category labels may change if another type of text response becomes the highest occurring response. (e.g. if the new responses were “coke”, “COKE”, “Coke”, “Coke”, “coke” and “coke”, the new label would be “coke”).