From a model fitted to a binary response, predicted probabilities of the event level can be obtained for the observations used to estimate the model or for a separate data set used to validate the model. For a given observation, if its predicted probability exceeds some threshold, then that observation is predicted to be an event by the model.
With both the observed and predicted responses for the observations, a 2x2 table can be created and many useful statistics such as sensitivity, specificity, and others are available to provide information summarizing the fit of the model and selected threshold. SAS Note 24170 discusses and illustrates how many of these statistics can be computed. Additionally, if PROC LOGISTIC is used to fit the binary response model, the 2x2 table cell counts and several statistics for each in a range of thresholds can be obtained using the CTABLE option.
However, these statistics do not exist for a multinomial (multi-level) response. For such responses, a multinomial model can be fit—either ordinal or nominal depending on the nature of the response. Again, predicted probabilities can be obtained but now for each possible response level. For example, with a four-level response a 4x4 table of predicted by observed response results. This table is sometimes called a confusion matrix. If PROC LOGISTIC is used to fit the multinomial model, the PREDPROBS=INDIVIDUAL option in the OUTPUT statement produces these predicted probabilities. SAS Note 22603 illustrates how the table can be created for a multinomial model.
Given this table for a multinomial response, rows and columns of the table can be collapsed in various ways to produce a 2x2 table, making it possible to then compute any of the 2x2 table statistics mentioned above.
Consider the crop data shown in the example titled "Scoring Data Sets" in the PROC LOGISTIC documentation and analyzed in SAS Note 22603 mentioned above. The response variable, Crop, has five levels. The note fits the nominal, multinomial model to the data and show how the 5x5 table of observed by predicted response is obtained. The predicted probabilities from the multinomial model are saved in a data set named PREDS, which is used below.
A couple of overall model statistics can be easily computed for the multinomial case. An overall accuracy (correct classification) rate can be computed from the 5x5 table as shown in SAS Note 22603. Another overall statistic is the area under the ROC curve (AUC). The AUC is commonly used in binary response models as a measure of the predictive ability of the model. An extension of this statistic to the multinomial case is implemented in the MultAUC macro (SAS Note 64029). After running the macro definition in your SAS® session, the AUC can be computed by specifying the following macro call immediately after the PROC LOGISTIC step that uses the PREDPROBS=INDIVIDUAL option.
%multauc()
As shown in SAS Note 22603, the overall accuracy for the multinomial model is 0.5278. The overall AUC for the multinomial model is estimated to be 0.8539. The macro also provides AUC estimates for each pair of levels.
In addition to the overall statistics above for the multinomial model, per-level statistics can be computed by collapsing together rows and columns based on any useful dichotomization of the response levels. Suppose there are two dichotomizations of interest. The first is to compare Clover to the other crops. Another is the combination of Soybeans and Sugarbeets compared to the other crops. A 2x2 table for each can be produced by creating an appropriate format and assigning it to the Crop variable and to the _INTO_ variable, which is created by the PREDPROBS= option and gives the predicted response level. The SENSPEC option is then used in PROC FREQ to obtain four summarizing statistics for the dichotomization of Clover against the other levels. Care must be taken to assign a smaller formatted value to the level or combination of interest and to use the ORDER=FORMATTED option so that the statistics are properly computed.
proc format;
value $notcl 'Clover'=1 other=2;
value $notss 'Soybeans','Sugarbeets'=1 other=2;
run;
proc freq data=preds order=formatted;
format _INTO_ crop $notcl.;
table _INTO_*crop / senspec;
run;
proc freq data=preds order=formatted;
format _INTO_ crop $notss.;
table _INTO_*crop / senspec;
run;
The first table below provides four statistics for the Clover dichotomization. The Soybeans, Sugarbeets dichotomization statistics appear in the second table. Other 2x2 statistics can be computed as discussed in SAS Note 24170 as mentioned above.
Note however that these statistics are based on the multinomial model and do not match those that result from fitting separate binary logistic models to the dichotomizations of the response. So, these statistics are used to understand, for example, how well the multinomial model can distinguish Clover from the other crops.