How can I eliminate duplicate observations from a large data set without sorting


Using a CLASS statement in PROC SUMMARY does not require the data set to be sorted in advance. The CLASS statement collapses observations with the same variable values. The _FREQ_ variable in the output data set shows the frequency count of observations with that combination of CLASS variable values. Click the Results tab to see the resulting data set.

    /* Example */

    /* Create a data set with duplicate observations */
    data test;
    input x y z;
    cards;
    1 1 1
    1 1 1
    1 2 1
    1 2 2
    2 2 2
    2 2 2
    2 2 2
    2 2 1
    ;
    run;
    
    proc summary data=test nway;
    class x y z;
    output out=test1(drop=_type_);
    run;

    proc print data=test1;
    run;