How to Allocate Observations to Segments in Excel
The latent class analysis procedure in Q automatically allocates observations (i.e., cases, respondents) into segments. New data data can be imported into Q and this results in the new observations being automatically classified into segments. Nevertheless, in some situations it is useful to be able to allocate observations into segments in Excel. This How To explains how this can be done.
This description assumes that:
- You have not changed any Advanced segmentation settings.
- You have only used Number, Number - Multi, Pick One, Pick One - Multi and Pick Any questions in the segmentation.
The basic process
- Right-click on one of the segments in the tree output in Q and select View Report.
- Copy and paste all the contents of the report into a sheet in Excel. Make sure that separate columns in the report appear as separate columns in Excel. (If they do not, playing around with the settings in Data > Text to Columns will often assist in this.)
- Call this sheet Pars.
- Export each question from Q into your Excel workbook, by:
- Creating new tables in the Outputs Tab for each question, with RAW DATA selected in the Brown drop-down menu.
- Right-clicking on any table involving Pick One or Pick One - Multi questions and change Statistics - Cells from Values to Labels.
- Export the RAW DATA tables to Excel (as data, not as charts).
- Rename the sheets in Excel to be similar to the question names. (This is only for housekeeping purposes; you can have all the data on a single sheet if so required.)
- Compute the posterior probabilities, segment membership probabilities and segments. This last step is the most complicated and is discussed in the next section.
Compute the posterior probabilities, segment membership probabilities and segments
A worked example of the required calculations is in File:Segment allocator.xlsx. This can readily be adapted for different numbers of questions and observations. The only difficult aspect is computing the formula that appears in cell B3, which needs to be copied for all the segments and observations.
In the example workbook, this is the formula (note that in the example below return characters have been added to make it wrap):
=INDEX(Pars!$B$61:$B$64,Calculations!B$2) *IF(ISERROR(MATCH('Pick One'!$B3,Pars!$A$77:$A$86,0)),1,INDEX(Pars!B$77:B$86,MATCH('Pick One'!$B3,Pars!$A$77:$A$86,0))) *IF(Number!$B3="",1,NORM.DIST(Number!$B3,Pars!B$92,Pars!B$97,FALSE)) *IF('Pick Any'!$B3="",1,IF('Pick Any'!$B3=1,Pars!B$142,1-Pars!B$142)) *IF('Pick Any'!$C3="",1,IF('Pick Any'!$C3=1,Pars!B$143,1-Pars!B$143)) *IF('Number - Multi'!$B3="",1,NORM.DIST('Number - Multi'!$B3,Pars!B$149,Pars!B$155,FALSE)) *IF('Number - Multi'!$C3="",1,NORM.DIST('Number - Multi'!$C3,Pars!B$150,Pars!B$156,FALSE)) *IF(ISERROR(MATCH('Pick One Multi'!$B3,Pars!$A$210:$A$213,0)),1,INDEX(Pars!B$210:B$213,MATCH('Pick One Multi'!$B3,Pars!$A$210:$A$213,0))) *IF(ISERROR(MATCH('Pick One Multi'!$C3,Pars!$A$218:$A$220,0)),1,INDEX(Pars!B$218:B$220,MATCH('Pick One Multi'!$C3,Pars!$A$218:$A$220,0)))
- The first line computes the size of each segment and the reference of Pars!$B$61:$B$64 needs to be modified to reflect where the segment sizes are located.
- The second line works out the density for a Pick One question. The first half (i.e., IF(ISERROR(MATCH('Pick One'!$B3,Pars!$A$77:$A$86,0)),1 works out if there is missing data or not and assigns a density of 1 if the data is missing. The second half assigns the proportion for that segment as the density.
- The third line computes the density from a normal distribution (where it is not missing).
- The fourth and fifth lines compute the density for Pick Any questions. The logic is basically the same for each category as for a Pick One question but the formula is different because Pick Any questions are reported more compactly in the outputs.
- The next two lines compute the density for categories of the Number - Multi question and are identical to those for the Number question.
- The final two lines compute the density for the Pick One - Multi question's variables and are essentially the same as for a Pick One question.
You should check that your spreadsheet gets the same answers as computed in Q. Rounding errors can cause the occasional difference, but it should be obvious when this is the case by comparing the segment probabilities.
If you have errors in your formulas, possible causes are:
- Failure to put the $ in the right places. Refer to Excel documentation for more information.
- Errors caused by Excel when the data was pasted into Excel (e.g., spaces being added to the names of categories).
- Excel sometimes gets hung about about whether an apostrophe is or is not appearing before numbers. The easiest way of resolving this is to insert apostrophes into the category descriptions that appear in the pars sheet for Pick One and Pick One - Multi questions.