The difference between Trees, CHAID, CART and other tree-based models

From Q
Jump to navigation Jump to search

The specific algorithm used in Q for creating Mixed-Mode Trees is different from CHAID, Classification And Regression Trees (CART) and all other well-known tree-based models (see Statistical Model for Latent Class Analysis for a description of the algorithm). Putting aside technicalities, there are a number of important practical differences.

Multiple dependent variables

CHAID, CART and all the standard algorithms for constructing trees are designed for a single dependent variable. By contrast, the maths of Q's trees permits multiple dependent variables, where these dependent variables can be of different types (e.g., a Ranking question, a Number - Multi question, etc). Thus, Q's trees can solve many, many problems that cannot be solved using CHAID, CART or other standard tree-based algorithms.

Polythetic division

CART and many tree methods work by splitting the sample in two, then splitting each of these sub-samples in two, etc. By contrast, Q's algorithm, like CHAID, automatically computes the number of splits per variable. For example, if age is an independent variable with 7 categories, Q will evaluate the options of 1, 2, 3, 4, 5, 6 and 7 splits (see Trees for an example). The main benefit of this is that the resulting trees tend to be easier to interpret.

Designed for segmentation rather than prediction

The specific algorithm that is used in Q attempts to create segments that are maximally different within each 'branch' of the tree. The motivating problem being Q's tree algorithm is that it is designed for segmentation problems. For example, if you have a lot of attitudinal data, and want to understand how demographics can be used to understand differences in that attitudinal data, then this is precisely the type of problem that Q's trees were developed for. By contrast, CART and most tree-based models are instead designed for prediction. No formal analysis has ever been undertaken to compare the predictive analysis of Q's algorithms with the other algorithms. The general advice is that if your focus is on prediction, you should consider using Classification And Regression Trees (CART).

Speed

For comparable problems, Q's algorithm is substantially slower than any of the other algorithms. For this reason, Q automatically creates trees with only two levels (this can be increased via Maximum number of tree levels in Segments > Advanced).

Further reading: Market Segmentation Software