Estimating the population mean in the SURVEYMEANS procedure when the stratified sample has different sampling rates within strata


In a stratified sampling design, when the sampling fraction is the same in all strata, the population mean estimate y is the same as the stratified sampling estimate yst. Because PROC SURVEYMEANS assigns equal weights of 1 to all observations by default, the sample mean is computed as y even under a stratified design. In order to compute yst, the appropriate weight variable must be used. If the STRATA statement is specified without a WEIGHT statement, PROC SURVEYMEANS issues the following message:

NOTE: You are using unequal sampling rates in a stratified design but did not
specify a WEIGHT statement. Unless you also specify a WEIGHT statement,
the analysis will assume equal weights for all observations.

The following demonstrates the issue using the example titled "Stratified Cluster Sample Design" in the PROC SURVEYMEANS documentation. In the data, there is ice cream spending data from three strata: Grade=7, 8, and 9.

data IceCream;
   input Grade Spending @@;
   if (Spending < 10) then Group='less';
     else Group='more';
   datalines;
7 7  7  7  8 12  9 10  7  1  7 10  7  3  8 20  8 19  7 2
7 2  9 15  8 16  7  6  7  6  7  6  9 15  8 17  8 14  9 8
9 8  9  7  7  3  7 12  7  4  9 14  8 18  9  9  7  2  7 1
7 4  7 11  9  8  8 10  8 13  7  2  9  6  9 11  7  2  7 9
;

Based on the total number of students in each stratum, the TOTAL= data set is as follows:

data StudentTotal;
   input Grade _total_;
   datalines;
7 1824
8 1025
9 1151
;

The sampling rate in each stratum is as follows:

Grade  Sampling Rate  
7      20/1824 = 0.011
8       9/1025 = 0.009
9      11/1151 = 0.010

In order to compute an unbiased estimate for yst, each observation needs to be weighted appropriately. In this case, the weights are obtained as the ratio of the overall sampling rate (40/4000=0.01) to the sampling rate for the stratum to which this observation belongs. The weights would be constructed as follows:

data IceCream;
   set IceCream;
   if Grade=7 then Weight=1/(20/1824);
   if Grade=8 then Weight=1/(9/1025);
   if Grade=9 then Weight=1/(11/1151);
run;

The following statements estimate the stratified sample mean yst.

proc surveymeans data=IceCream total=StudentTotal;
   stratum Grade / list;
   var Spending Group;
   weight Weight;
run;

Reference

Cochran, W. G. 1977. Sampling Techniques. New York: John Wiley & Sons.