Estimating limited and discrete dependent variable models with random parameters or random effects using PROC QLIM


Beginning in the SAS® 9.4M4 (TS1M4) (experimental) and SAS® 9.4M6 (TS1M6) (production) releases of SAS/ETS®, you can use the SUBJECT= option in the RANDOM statement in the QLIM procedure to specify random parameters or random effects in single-equation, limited and discrete dependent variable models. These models include binomial probit, binomial logit, ordinal probit, ordinal logit, linear regression, Tobit, truncated regression, and stochastic frontier models. Only one RANDOM statement can be specified. When a RANDOM statement is specified, only one MODEL statement can be specified.

The random effects model is a special case of the random parameters model in which the only random parameter is the intercept. The random parameters model can be applied to panel data where parameter heterogeneity occurs across the cross-sectional units. Panel data is also called cross-sectional time series data. An example of panel data is data collected across different firms over ten years. However, panel data is not necessary to specify the random parameters model. The random parameters model can also be applied to cross-sectional data if you specify a group or subject variable across which parameter heterogeneity occurs. With cross-sectional data, records within each cross-section do not form a time series. An example is data collected over patients within hospitals.

The random parameters specified in the RANDOM statement are assumed to be normally distributed across groups defined by the variable specified in the SUBJECT= option. The random parameters can include regressors, the intercept, or both. For example, the following statement specifies that the intercept parameter and the slope parameter on regressor X are normally distributed across groups defined by the variable GROUP.

random intercept x / subject = group;

If you have panel data and want to specify random effects on cross-sections, specify only INTERCEPT in the RANDOM statement and specify the variable identifying the cross-sections in the SUBJECT= option. See the following example:

random intercept / subject = id;

Following are two examples that illustrate specifying random effects models by using the RANDOM statement in PROC QLIM.

Example 1: Estimating a censored regression (Tobit) model with random effects

The random effects Tobit model is defined as follows, where i is the index for the groups across which random effects occur, and t is the index for observations within each group. For panel data, i is the cross-section index, and t is the time period index.

yit* = x'itβ + εit , where i = 1,2,...,N and t = 1,2,...,Ti
yit = yit* if yit* > 0
yit = 0 if yit* ≤ 0
εit = μi + vit
μi | xi ~ N(0, σμ2)
vit |(xi, μi) ~ N(0, σ2)

The following statements simulate the panel data censored regression model with random effects. The data consist of N=25 cross-sections of Ti=T=20 observations each. The fixed effects portion of the model is yit = 1 + 2X, and the values of X are drawn from a standard normal distribution. The random effects, μi, have normal distribution with mean zero and standard deviation 0.9. The distribution of vit is normal with mean zero and standard deviation 1.2.

     data tobit; 
        sa=.9; su=1.2; 
        b0 = 1; b1 = 2; 
        keep i t y x; 
        do i = 1 to 25; 
           a = rannor(623)*sa; 
           do t = 1 to 20; 
              x = rannor(875); 
              u = rannor(761); 
              ys = b0 + b1*x + a + su*u; 
              if ys > 0 then y=ys; 
              else y=0; 
              output; 
           end; 
        end; 
        run;  

The following PROC QLIM step estimates the Tobit model with random effects. The CENSORED(LB=0) option in the MODEL statement specifies that the dependent variable Y is censored with lower bound at zero. The RANDOM statement specifies that only the intercept is random. The SUBJECT= option specifies that i identifies the cross-sectional units across which random effects occur.

     proc qlim data=tobit;
        model y = x / censored(lb=0);
        random intercept / subject=i;
        run; 

In the results, the Parameter Estimates table contains the model parameter estimates, their standard errors, t statistics, and p-values for the Intercept parameter, the slope parameter on X, and σ (labeled _Sigma), the standard deviation of the error term, vit. Note that the parameters of the fitted model are approximately equal to the true model parameters, b0=1 and b1=2. The estimate of σ, 1.22, is approximately equal to the true standard deviation of vit, which is 1.2.

The estimate of the variance of the random effects, σμ2, and its standard error are printed in the Covariance Estimates of Random Parameters table. The estimate, σμ2 = 0.75, is approximately equal to the true variance of the μi, which is 0.92 = 0.81.

Parameter Estimates
ParameterDFEstimateStandard
Error
t ValueApprox
Pr > |t|
Intercept11.1143250.1582597.04<.0001
x11.9890240.07423426.79<.0001
_Sigma11.2195060.04905124.86<.0001
 
Covariance Estimates of Random Parameters
ParameterDFEstimateStandard
Error
Intercept Intercept10.7544360.210403

Example 2: Estimating a logistic regression model with random effects

The random effects logistic model is defined as above, except for the definition of yit and the distribution of vit.

yit = 1 if yit* > 0
yit = 0 if yit* ≤ 0
vit |(xi, μi) ~ Logistic(0,1)

Under this model, the probability of the event, yit=1, is as follows:

Prob(yit=1) = exp(x'itβ + μi)/(1+exp(x'itβ + μi))

The following example estimates a logistic model using cross-sectional data of 15 randomly selected medical centers. One goal of the study is to compare the occurrence of side effects for two medical procedures A and B. In each center, nA patients were randomly selected and assigned to procedure A, and nB patients were randomly assigned to procedure B.

In the following statements that create the data set, GROUP identifies the procedure, A or B. N is the number of patients who received a given procedure in a particular center. SIDEEFFECT is the number of patients who reported side effects. Since the data is read in aggregated form, the two DO-END blocks expand the data so that each observation represents one patient. A response variable, Y, is created, indicating that a side effect occurred for the patient (Y=1) or not (Y=0).

     data multicenter;
        input center group$ n sideeffect;
        do i=1 to   sideeffect; y=1; output; end; 
        do i=1 to n-sideeffect; y=0; output; end; 
        datalines;
     1  A  32  14
     1  B  33  18
     2  A  30   4
     2  B  28   8
     3  A  23  14
     3  B  24   9
     4  A  22   7
     4  B  22  10
     5  A  20   6
     5  B  21  12
     6  A  19   1
     6  B  20   3
     7  A  17   2
     7  B  17   6
     8  A  16   7
     8  B  15   9
     9  A  13   1
     9  B  14   5
     10  A  13   3
     10  B  13   1
     11  A  11   1
     11  B  12   2
     12  A  10   1
     12  B   9   0
     13  A   9   2
     13  B   9   6
     14  A   8   1
     14  B   8   1
     15  A   7   1
     15  B   8   0
     ; 

The following PROC QLIM step estimates the logistic regression model with random effects. The probability of a side effect (Y=1) resulting from either procedure is a function of the fixed group (procedure) effect and a random effect associated with the center. The fixed group effect is specified by including GROUP in the CLASS and MODEL statements. The random effect of the center is specified by the RANDOM statement with SUBJECT=CENTER. The DISCRETE(DISTRIBUTION=LOGISTIC) option in the MODEL statement specifies a logistic regression model.

     proc qlim data=multicenter;
        class group;
        model y = group / discrete(distribution=logistic);
        random intercept / subject=center;
        run;  

The Parameter Estimates table displays the fixed effects estimates in the model. Since B is the reference level of GROUP, the intercept, -0.8768, estimates the fixed effect of GROUP=B. The estimate for GROUP=A, -0.5001, is an estimate of the difference in the effects of the two groups. The test of this difference indicates that the GROUP effects differ significantly (p=0.0152).

The Covariance Estimates of Random Parameters table displays the estimate of the variance of the random effects, 0.6181, and its standard error, 0.2866.

Parameter Estimates
Parameter DFEstimateStandard
Error
t ValueApprox
Pr > |t|
Intercept 1-0.8768270.238840-3.670.0002
groupA1-0.5000910.205936-2.430.0152
groupB00...
 
Covariance Estimates of Random Parameters
Parameter DFEstimateStandard
Error
Intercept Intercept 10.6181390.286622