What does the following message mean, and how does it affect my results? NOTE: MERGE statement has more than one data set with repeats of BY values.


This message means that SAS has found duplicate values in a BY group for more than one of your data sets. While this might not cause a problem, SAS will still notify you because duplicates in both data sets can produce unexpected results. Here are two data sets that have duplicates of the BY variable:

/* Note that there are two A's and three C's in this sample. */

DATA one;
INPUT id $ fruit $;
CARDS;
a apple
a apple
b banana
c coconut
c coconut
c coconut
;
RUN;

PROC SORT data=one;
BY id;
RUN;

/* Note that there are two B's and two C's in this sample. */

DATA two;
INPUT id $ color $;
CARDS;
a amber
b brown
b black
c cocoa
c cream
;
RUN;

PROC SORT data=two;
BY id;
RUN;

/* There will be two observations of the     */
/* B BY group due to duplicates in DATA two. Note the different       */
/* values of COLOR for the C BY group. These are the correct results;    */
/* however, if you did not know that you had duplicates of C in both */
/* data sets, you might not expect these results.       */


DATA test;
MERGE one two;
BY id;
PROC PRINT;
RUN;

RESULTS:

Obs id fruit   color
1   a  apple   amber
2   a  apple   amber
3   b  banana  brown
4   b  banana  black
5   c  coconut cocoa
6   c  coconut cream
7   c  coconut cream

In this example, SAS notifies you of the presence of duplicates so that you will
not expect COCOA to be assigned across the C BY group.