You can use the CLUSTER option together with the HCCME= option in the MODEL statement to obtain heteroscedasticity- and cluster-adjusted standard errors in the PANEL procedure. The HCCME= option provides five different forms of a heteroscedasticity-consistent covariance matrix. Four of these, HCCME=0, 1, 2, or 3, can be specified together with the CLUSTER option to obtain a heteroscedasticity- and cluster-adjusted covariance matrix. HCCME=0 specifies White's original heteroscedasticity-consistent covariance matrix. HCCME=1, 2, and 3 specify three modified versions of the White heteroscedasticity-consistent covariance matrix. You can specify the HCCME= option without the CLUSTER option, but if the CLUSTER option is specified, then the HCCME= option must also be specified. For more details, including formulas, for these heteroscedasticity-consistent covariance matrices with and without cluster adjustment, see the PANEL procedure chapter in the SAS/ETS® User's Guide.
The HCCME= and CLUSTER options can be applied to most of the models supported in the PANEL procedure, excepting only the dynamic panel model. Supported models include fixed effects models, random effects models, hybrid models, pooled OLS regression, between-group regression, the Parks method for autoregressive models, and the Da Silva method for moving average models. In the case of one-way and two-way fixed effects models, the variances for cross sectional and time dummy variables and the covariances with and between the dummy variables are not corrected. For two-way fixed effects models, the variances and covariances for the intercept are not corrected. That is because, in fixed effects models, the dummy variables are removed by the within transformations, and their variances and covariances cannot be calculated in the same way as for the other regressors. If you want the correction to apply to variances for the fixed effects dummy variables, and/or to the intercept for the two-way fixed effects models, you can specify the fixed effects in the CLASS and MODEL statements and specify the POOLED option instead of the FIXONE or FIXTWO option. Example 1 below illustrates the usage of the HCCME= and CLUSTER options in fixed effects models with the POOLED option compared to the FIXONE or FIXTWO options.
The cluster adjustment in PROC PANEL is on the cross sectional dimension only, which accounts for correlations within the cross sections. PROC PANEL does not currently provide two-way cluster adjustment. To instead apply the cluster adjustment to the time dimension, which accounts for correlations within the same time period, sort the data by the time period ID variable and then by the cross sectional ID variable. Also specify the time period ID variable before the cross sectional ID variable in the ID statement. Note that the PANEL procedure takes the first ID variable as the cross sectional ID and the second ID variable as time period ID. Example 2 below illustrates this approach.
When fitting a fixed effects model or a pooled OLS regression model, to adjust the standard errors on an arbitrary cluster variable that is neither the cross sectional ID variable nor the time period ID variable, use the following steps. First, sort your data according to the cluster variable. Then create a pseudo time period ID variable within the cluster variable using a DATA step. That transforms the data set into a panel data set with the cluster variable as the cross sectional ID variable and the pseudo time period ID variable as the time period ID variable. Next, in the PROC PANEL step, specify the cluster variable followed by the pseudo time period ID variable in the ID statement. Finally, specify the POOLED, HCCME=, and CLUSTER options in the MODEL statement and specify the fixed effects in the CLASS and MODEL statements.
Similarly, if the data is not panel data and you want to fit an OLS regression with fixed effects and obtain cluster-adjusted standard errors, you can still use the PANEL procedure. To do this, create a pseudo time period ID variable within the clusters using a DATA step, thus creating a pseudo panel data set to be read by the PANEL procedure. Then specify the cluster variable as the cross sectional ID variable, followed by the pseudo time period ID variable in the ID statement, and specify the fixed effects in the CLASS and MODEL statements. Also specify the POOLED, HCCME=, and CLUSTER options in the MODEL statement. Example 3 below illustrates how to create the pseudo time period ID variable according to the cluster variable and obtain cluster-adjusted standard errors when the data is not panel data. The same method also applies in the case of panel data when the cluster variable is neither the cross sectional ID nor the time period ID.
Below are some examples of using the HCCME= and CLUSTER options in the PANEL procedure. The data sets used in the examples are from published sources that have been well studied in the literature. The models fit to these data sets, and the heteroscedasticity and cluster adjustment options specified in the examples are for illustration purposes only. The following examples do not recommend specific models or specific covariance adjustments for these data sets. Consequently, no comment is made regarding the model fit statistics or the significance of the parameter estimates.
Consider a fixed effects model using the cost function data in Greene (1990). Production is the log of output in millions of kilowatt-hours, and Cost is the log of cost in millions of dollars. The one-way, fixed effects model is specified as follows.
Cit = β0 + β1Pit + vi + eit ,
where i=1, 2, ..., N and t=1, 2, ... , T. Cit is the cost for firm i at time t, Pit is the production for firm i at time t, vi is the fixed effect for firm i, and the eit are independent and identically distributed errors with zero mean and variance σ2e.
The following steps create the Electricity data set and sort the data according to the cross section (Firm) and time period (Year).
data Electricity;
input Firm Year Production Cost @@;
datalines;
1 1955 5.36598 1.14867 1 1960 6.03787 1.45185
1 1965 6.37673 1.52257 1 1970 6.93245 1.76627
2 1955 6.54535 1.35041 2 1960 6.69827 1.71109
2 1965 7.40245 2.09519 2 1970 7.82644 2.39480
3 1955 8.07153 2.94628 3 1960 8.47679 3.25967
3 1965 8.66923 3.47952 3 1970 9.13508 3.71795
4 1955 8.64259 3.56187 4 1960 8.93748 3.93400
4 1965 9.23073 4.11161 4 1970 9.52530 4.35523
5 1955 8.69951 3.50116 5 1960 9.01457 3.68998
5 1965 9.04594 3.76410 5 1970 9.21074 4.05573
6 1955 9.37552 4.29114 6 1960 9.65188 4.59356
6 1965 10.21163 4.93361 6 1970 10.34039 5.25520
;
proc sort data=Electricity;
by firm year;
run;
The following PROC PANEL step fits the one-way, fixed effects model with heteroscedasticity and cluster adjustment on the cross sections (Firms). The FIXONE option in the MODEL statement requests the one-way, fixed effects model. The HCCME=1 option requests the White heteroscedasticity-consistent covariance matrix modified by a degree of freedom adjustment. The CLUSTER option specifies cluster adjustment on the cross sections. The PRINTFIXED option displays the fixed effects estimates in the Parameter Estimates table.
proc panel data=Electricity;
id firm year;
model cost = production / fixone printfixed hccme=1 cluster;
run;
Below are the parameter estimates along with their standard errors, t statistics, and p-values. Following this table, a note appears indicating that the standard errors for the fixed effects estimates are not corrected for heteroscedasticity or clustering.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The following step also provides heteroscedasticity- and cluster-corrected standard errors for the fixed effects parameters. The POOLED option in the MODEL statement requests pooled OLS regression. The CLASS and MODEL statements include the cross sectional ID variable, Firm. The HCCME= and CLUSTER options request the degree of freedom adjustment to White's heteroscedasticity-consistent covariance matrix and cluster adjustment. This model is equivalent to the one-way, fixed effects model specified using the FIXONE option above, but the heteroscedasticity and cluster adjustment is applied to all parameters in the model including the fixed effects parameters.
proc panel data=Electricity;
id firm year;
class firm;
model cost = production firm / pooled hccme=1 cluster;
run;
The parameter estimates, their standard errors, t statistics, and associated p-values are shown below. Compared to the FIXONE model above, all parameter estimates remain the same. Also, the standard errors, t statistics, and p-values for the intercept and Production slope parameter remain the same. However, the standard errors, t statistics, and p-values of the Firm fixed effects are adjusted by the POOLED option for heteroscedasticity and clustering and therefore differ from the previous results using the FIXONE option.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For the two-way, fixed effects model, replace the FIXONE option with the FIXTWO option in the MODEL statement. The standard errors for the cross section and time period fixed effects and the intercept parameter are not heteroscedasticity- and cluster-adjusted. If you want to adjust the standard errors for these parameters as well, specify the fixed effects in the CLASS and MODEL statements and specify the POOLED, HCCME=, and CLUSTER options in the MODEL statement similar to the FIXONE case.
Using the same data set as in Example 1 above, suppose you want to fit a one-way, fixed effects model on time. The model is specified as follows,
Cit = β0 + β1Pit + ut + eit ,
where i=1, 2, ..., N, t=1, 2, ... , T, and ut is the fixed effect of time (Year). Otherwise, the model is as in Example 1.
If you want to obtain cluster-adjusted standard errors on the time dimension, sort the data according to the time period first before sorting by cross sections, and specify the time ID variable, Year, before the cross sectional ID variable, Firm, in the ID statement. PROC PANEL takes the first ID variable as the cross sectional ID variable and cluster adjustment is always on the cross sectional dimension. Then specify the one-way, fixed effects model with the FIXONE option in the MODEL statement. Also specify the HCCME= and CLUSTER options in the MODEL statement to apply the cluster adjustment to Years rather than Firms. This syntax is illustrated in the following steps.
proc sort data=Electricity out=sort_time;
by year firm;
run;
proc panel data=sort_time;
id year firm;
model cost = production / fixone printfixed hccme=1 cluster;
run;
The parameter estimates, standard errors, t statistics, and the associated p-values are shown below. Note that the estimates for the CS fixed effects dummy variables, CS1, CS2, and CS3, are actually time fixed effects dummy variables, since Year is specified before Firm in the ID statement. PROC PANEL proceeds as if Years are the cross sections, resulting in the cluster adjustment on time dimension. The note below the table indicates that the fixed effects parameters are not heteroscedasticity and cluster adjusted.
| |||||||||||||||||||||||||||||||||||||||||||||||||||
To also obtain heteroscedasticity- and cluster-adjusted standard errors on the fixed effects parameters, specify the POOLED, HCCME=, and CLUSTER options in the MODEL statement and specify Year in the CLASS and MODEL statements as in Example 1.
proc panel data=sort_time;
id year firm;
class year;
model cost = production year / pooled hccme=1 cluster;
run;
The following table shows the parameter estimates, standard errors, t statistics, and p-values. Note that all the parameter estimates remain the same as in the FIXONE model. The standard errors for the intercept and Production slope parameter are also the same. However, the standard errors for the Year dummy variables are different from the FIXONE model because these are heteroscedasticity- and cluster-adjusted in the POOLED model while those in the FIXONE model are not.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Because PROC PANEL is designed specifically for panel data, you need to have a panel data set and specify a cross sectional ID variable and a time period ID variable in the ID statement to perform any regression in the procedure. In order to use the PANEL procedure to obtain cluster-adjusted standard errors in OLS regression with fixed effects using non-panel data, you need to create a pseudo time ID variable within the clusters. Then specify the cluster variable as the cross sectional ID variable and the pseudo time ID variable as the time period ID variable in the ID statement. In addition, specify the POOLED, HCCME=, and CLUSTER options in the MODEL statement to request pooled OLS regression with heteroscedasticity- and cluster-adjusted standard errors. Follow the same steps if you have panel data but the cluster variable on which you want to obtain cluster adjustment is neither the cross sectional ID variable nor the time period ID variable.
In the following data from Milliken and Johnson (1984), six people (Person) operate each of three machines (Machine) either one, two, or three times, and an overall rating (Rating) of the machine is recorded each time the machine is operated. Consider both Machine and Person as fixed effects. The dependent variable is the overall rating. The independent variables are the fixed effects of Machine and Person. The following steps create the data set.
data Machine;
input Machine Person Rating @@;
datalines;
1 1 52.0 1 2 51.8 1 2 52.8 1 3 60.0 1 4 51.1 1 4 52.3 1 5 50.9
1 5 51.8 1 5 51.4 1 6 46.4 1 6 44.8 1 6 49.2 2 1 64.0 2 2 59.7
2 2 60.0 2 2 59.0 2 3 68.6 2 3 65.8 2 4 63.2 2 4 62.8 2 4 62.2
2 5 64.8 2 5 65.0 2 6 43.7 2 6 44.2 2 6 43.0 3 1 67.5 3 1 67.2
3 1 66.9 3 2 61.5 3 2 61.7 3 2 62.3 3 3 70.8 3 3 70.6 3 3 71.0
3 4 64.1 3 4 66.2 3 4 64.0 3 5 72.1 3 5 72.0 3 5 71.1 3 6 62.0
3 6 61.4 3 6 60.5
;
To obtain standard errors adjusted for the Machine clusters, which accounts for correlations within machines, sort the data by Machine and specify Machine as the cross sectional ID variable.
After sorting, the following DATA step creates a pseudo time period index variable to be specified as the time period ID variable in the ID statement in PROC PANEL. With the BY statement, FIRST.MACHINE identifies the first observation within each Machine and the INDEX_VAR variable is initialized to zero for this observation. The INDEX_VAR+1 statement then creates the observation index within each Machine. The PROC PRINT step displays the first 20 observations of the resulting data set.
proc sort data=Machine;
by machine;
run;
data cluster_machine;
set machine;
by machine;
if first.machine then index_var = 0;
index_var + 1;
run;
proc print data=cluster_machine(obs=20);
run;
|
The following PROC PANEL step fits the pooled OLS regression model with Machine and Person fixed effects and with standard errors adjusted for the Machine clusters on the intercept and both Machine and Person fixed effects estimates. The ID statement specifies Machine as the cross sectional ID variable and index_var as the time period ID variable. The CLASS and MODEL statements specify Machine and Person fixed effects. The POOLED option requests pooled OLS regression. The HCCME=1 option requests the degree of freedom adjustment modification to White's heteroscedasticity-consistent covariance matrix. The CLUSTER option specifies cluster adjustment on the covariance matrix.
proc panel data=cluster_machine;
id machine index_var;
class machine person;
model rating = machine person / pooled hccme=1 cluster;
run;
The results are shown below.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To adjust the standard errors for Person clusters instead of Machine clusters, accounting for correlations within Person rather than Machine, proceed similarly as above. First, sort the data by Person and use Person as the cross sectional ID variable. Then create a pseudo time period ID variable within each person using the BY statement in the DATA step.
proc sort data=Machine out=cluster_person;
by person;
run;
data cluster_person;
set cluster_person;
by person;
if first.person then index_var = 0;
index_var + 1;
run;
proc print data=cluster_person(obs=13);
run;
The first 13 observations are shown below.
|
In PROC PANEL, specify Person as the cross sectional ID variable and index_var as the time period ID variable in the ID statement. The CLASS and MODEL statements specify Machine and Person as fixed effects. The POOLED option requests pooled OLS regression. The HCCME=1 and CLUSTER options request the heteroscedasticity-consistent covariance matrix adjusted for Person clusters.
proc panel data=cluster_person;
id person index_var;
class machine person;
model rating = machine person / pooled hccme=1 cluster;
run;
The results are shown below.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Greene, W. H. (1990). Econometric Analysis. New York: Macmillan.
Milliken, G. A., and Johnson, D. E. (1984). Designed Experiments. Vol. 1 of Analysis of Messy Data. Belmont, CA: Lifetime Learning Publications.