In the example titledĀ Logistic Modeling with Categorical Predictors (see the Examples section of the LOGISTIC documentation), the model fit at the end of that example contains categorical predictors Sex and Treatment and continuous predictor Age. The response variable, Pain, has levels No and Yes. The predictor, Treatment, has levels P, B, and A, and the predictor, Sex, has levels M and F. The following statements refit the model. The EVENT="No" option tells PROC LOGISTIC to model the probability that Pain=No. The PARAM=REF option specifies reference coding (or parameterization) for the categorical predictors. The effect of coding is discussed below.
proc logistic data=Neuralgia;
class Treatment Sex / param=ref;
model Pain(event="No") = Treatment Sex Age;
run;
The following are the Class Level Information and the Analysis of Maximum Likelihood Estimates tables produced by PROC LOGISTIC.
With this information, you can write the logistic model in terms of the log odds of no pain:
log odds = log[Pr(Pain=No)/Pr(Pain=Yes)] = 15.8669 + 3.1790*TA + 3.7264*TB + 1.8235*SF - 0.2650*Age ,
or in terms of the odds of no pain:
Odds = Pr(Pain=No)/Pr(Pain=Yes) = exp(15.8669 + 3.1790*TA + 3.7264*TB + 1.8235*SF - 0.2650*Age) ,
or in terms of the probability of no pain:
Pr(Pain=No) = 1/(1+exp(-log odds)) = 1/(1+exp[-(15.8669 + 3.1790*TA + 3.7264*TB + 1.8235*SF - 0.2650*Age)]) .
Categorical predictors are represented in the model by sets of coded design variables and the parameters multiply the design variable values. The coding of these design variables depends on the PARAM= option in the CLASS statement and is shown in the Class Level Information table near the beginning of the PROC LOGISTIC results. In the logistic model above, TA and TB are the two design variables created to represent the Treatment predictor. One design variable, SF, is created for the Sex predictor. As shown in the Class Level Information table, Treatment A is represented by TA=1 and TB=0. Similarly, Treatment B is represented by TA=0 and TB=1. For Treatment=P, both design variables equal zero. This is the "reference" parameterization produced by the PARAM=REF option in the CLASS statement.
A new subject with Treatment P, Sex M, and Age 70 would be scored using the model with TA=0, TB=0, SF=0, and Age=70:
log odds = 15.8669 + 3.1790*0 + 3.7264*0 + 1.8235*0 - 0.2650*70 = 15.8669 - 18.55 = -2.68 .
His odds of no pain are:
Odds = exp(15.8669 + 3.1790*0 + 3.7264*0 + 1.8235*0 - 0.2650*70) = exp(15.8669 - 18.55) = 0.068 ,
and his probability of no pain is:
Pr(Pain=No) = 1/(1+exp[2.6831]) = 0.064 .
For a 70 year old male individual under treatment P, his probability of no pain estimated by the model is 0.064. Therefore his probability of pain is 1 - 0.064 = 0.936. Using a maximum probability decision rule, his predicted response is Pain=Yes.
Note that in order to avoid rounding errors in these predicted values, you should always use full precision of the parameter estimates as PROC LOGISTIC does in its calculations. This is discussed and illustrated in section 4 of SAS Note 33307 which also shows ways to easily score new observations using capabilities built in to PROC LOGISTIC.
Other parameterizations
The first model fit in the Logistic Modeling with Categorical Predictors example uses "effects" parameterization. This is the parameterization used when the PARAM= option is not specified or when you specify the PARAM=EFFECT option. Note the difference in the coding of the design variables shown in the Class Level Information table:
Fitting the same model as above with effects parameterization will result in different parameter estimates. However, the model is equivalent to the model using reference parameterization as evidenced by having the same log likelihood value (-2 Log L in the Model Fit Statistics table) and by producing identical scores for observations. The written form of the model equation, as above, does not change, but when using the model to score new observations, the values of the design variables, TA, TB, and SF are different as shown in the Class Level Information table. For example, Treatment P is represented by TA=-1 and TB=-1. A model using effects parameterization is shown and used to score observations in SAS Note 32304.