In a stratified sampling design, when the sampling fraction is the same in all strata, the population mean estimate y is the same as the stratified sampling estimate yst. Because PROC SURVEYMEANS assigns equal weights of 1 to all observations by default, the sample mean is computed as y even under a stratified design. In order to compute yst, the appropriate weight variable must be used. If the STRATA statement is specified without a WEIGHT statement, PROC SURVEYMEANS issues the following message:
The following demonstrates the issue using the example titled "Stratified Cluster Sample Design" in the PROC SURVEYMEANS documentation. In the data, there is ice cream spending data from three strata: Grade=7, 8, and 9.
Based on the total number of students in each stratum, the TOTAL= data set is as follows:
The sampling rate in each stratum is as follows:
In order to compute an unbiased estimate for yst, each observation needs to be weighted appropriately. In this case, the weights are obtained as the ratio of the overall sampling rate (40/4000=0.01) to the sampling rate for the stratum to which this observation belongs. The weights would be constructed as follows:
The following statements estimate the stratified sample mean yst.
Reference
Cochran, W. G. 1977. Sampling Techniques. New York: John Wiley & Sons.